Linux Documentation
 help / color / mirror / Atom feed
* Re: [PATCH] syscalls: define and explain goal to not call syscalls in the kernel
From: Jonathan Corbet @ 2018-03-30 15:35 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: linux-kernel, linux-doc, viro, x86, torvalds, mingo, tglx, luto
In-Reply-To: <20180325162527.GA17492@light.dominikbrodowski.net>

On Sun, 25 Mar 2018 18:25:27 +0200
Dominik Brodowski <linux@dominikbrodowski.net> wrote:

> As there have been multiple inquiries on the rationale of my patchsets
> removing in-kernel calls to sys_xyzzy(), here is an updated patch 01/NN
> which I will push upstream for v4.17-rc1. I will also include a reference
> to this mail (and therefore to the explanation below) in all related
> patches of the series. Any improvements, hints, suggestions, spelling
> fixes, and/or objections?

I have no objections to the text, but I do wonder about the placement.
The "adding syscalls" document isn't about *invoking* them; I suspect that
few people will see it there.  The coding-style document isn't quite right
either, but I wonder if it might not be a better place in the short term?

What we may really need is an "assorted rules" document that sits near
coding style; we can put stuff like this text, "volatile considered
harmful", and so on there.

Thanks,

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v6 1/5] dt-bindings: soc: qcom: Add device tree binding for GENI SE
From: Karthikeyan Ramasubramanian @ 2018-03-30 17:08 UTC (permalink / raw)
  To: corbet, andy.gross, david.brown, robh+dt, mark.rutland, wsa
  Cc: Karthikeyan Ramasubramanian, linux-doc, linux-arm-msm, devicetree,
	linux-i2c, evgreen, acourbot, swboyd, dianders, bjorn.andersson,
	Sagar Dharia, Girish Mahadevan
In-Reply-To: <1522429700-13083-1-git-send-email-kramasub@codeaurora.org>

Add device tree binding support for the QCOM GENI SE driver.

Signed-off-by: Karthikeyan Ramasubramanian <kramasub@codeaurora.org>
Signed-off-by: Sagar Dharia <sdharia@codeaurora.org>
Signed-off-by: Girish Mahadevan <girishm@codeaurora.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
---
 .../devicetree/bindings/soc/qcom/qcom,geni-se.txt  | 119 +++++++++++++++++++++
 1 file changed, 119 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt

diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt b/Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt
new file mode 100644
index 0000000..d330c73
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt
@@ -0,0 +1,119 @@
+Qualcomm Technologies, Inc. GENI Serial Engine QUP Wrapper Controller
+
+Generic Interface (GENI) based Qualcomm Universal Peripheral (QUP) wrapper
+is a programmable module for supporting a wide range of serial interfaces
+like UART, SPI, I2C, I3C, etc. A single QUP module can provide upto 8 Serial
+Interfaces, using its internal Serial Engines. The GENI Serial Engine QUP
+Wrapper controller is modeled as a node with zero or more child nodes each
+representing a serial engine.
+
+Required properties:
+- compatible:		Must be "qcom,geni-se-qup".
+- reg:			Must contain QUP register address and length.
+- clock-names:		Must contain "m-ahb" and "s-ahb".
+- clocks:		AHB clocks needed by the device.
+
+Required properties if child node exists:
+- #address-cells: 	Must be <1> for Serial Engine Address
+- #size-cells: 		Must be <1> for Serial Engine Address Size
+- ranges: 		Must be present
+
+Properties for children:
+
+A GENI based QUP wrapper controller node can contain 0 or more child nodes
+representing serial devices.  These serial devices can be a QCOM UART, I2C
+controller, SPI controller, or some combination of aforementioned devices.
+Please refer below the child node definitions for the supported serial
+interface protocols.
+
+Qualcomm Technologies Inc. GENI Serial Engine based I2C Controller
+
+Required properties:
+- compatible:		Must be "qcom,geni-i2c".
+- reg: 			Must contain QUP register address and length.
+- interrupts: 		Must contain I2C interrupt.
+- clock-names: 		Must contain "se".
+- clocks: 		Serial engine core clock needed by the device.
+- #address-cells:	Must be <1> for I2C device address.
+- #size-cells:		Must be <0> as I2C addresses have no size component.
+
+Optional property:
+- clock-frequency:	Desired I2C bus clock frequency in Hz.
+			When missing default to 400000Hz.
+
+Child nodes should conform to I2C bus binding as described in i2c.txt.
+
+Qualcomm Technologies Inc. GENI Serial Engine based UART Controller
+
+Required properties:
+- compatible:		Must be "qcom,geni-debug-uart".
+- reg: 			Must contain UART register location and length.
+- interrupts: 		Must contain UART core interrupts.
+- clock-names:		Must contain "se".
+- clocks:		Serial engine core clock needed by the device.
+
+Qualcomm Technologies Inc. GENI Serial Engine based SPI Controller
+
+Required properties:
+- compatible:		Must contain "qcom,geni-spi".
+- reg:			Must contain SPI register location and length.
+- interrupts:		Must contain SPI controller interrupts.
+- clock-names:		Must contain "se".
+- clocks:		Serial engine core clock needed by the device.
+- spi-max-frequency:	Specifies maximum SPI clock frequency, units - Hz.
+- #address-cells:	Must be <1> to define a chip select address on
+			the SPI bus.
+- #size-cells:		Must be <0>.
+
+SPI slave nodes must be children of the SPI master node and conform to SPI bus
+binding as described in Documentation/devicetree/bindings/spi/spi-bus.txt.
+
+Example:
+	geniqup@8c0000 {
+		compatible = "qcom,geni-se-qup";
+		reg = <0x8c0000 0x6000>;
+		clock-names = "m-ahb", "s-ahb";
+		clocks = <&clock_gcc GCC_QUPV3_WRAP_0_M_AHB_CLK>,
+			<&clock_gcc GCC_QUPV3_WRAP_0_S_AHB_CLK>;
+		#address-cells = <1>;
+		#size-cells = <1>;
+		ranges;
+
+		i2c0: i2c@a94000 {
+			compatible = "qcom,geni-i2c";
+			reg = <0xa94000 0x4000>;
+			interrupts = <GIC_SPI 358 IRQ_TYPE_LEVEL_HIGH>;
+			clock-names = "se";
+			clocks = <&clock_gcc GCC_QUPV3_WRAP0_S5_CLK>;
+			pinctrl-names = "default", "sleep";
+			pinctrl-0 = <&qup_1_i2c_5_active>;
+			pinctrl-1 = <&qup_1_i2c_5_sleep>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+		};
+
+		uart0: serial@a88000 {
+			compatible = "qcom,geni-debug-uart";
+			reg = <0xa88000 0x7000>;
+			interrupts = <GIC_SPI 355 IRQ_TYPE_LEVEL_HIGH>;
+			clock-names = "se";
+			clocks = <&clock_gcc GCC_QUPV3_WRAP0_S0_CLK>;
+			pinctrl-names = "default", "sleep";
+			pinctrl-0 = <&qup_1_uart_3_active>;
+			pinctrl-1 = <&qup_1_uart_3_sleep>;
+		};
+
+		spi0: spi@a84000 {
+			compatible = "qcom,geni-spi";
+			reg = <0xa84000 0x4000>;
+			interrupts = <GIC_SPI 354 IRQ_TYPE_LEVEL_HIGH>;
+			clock-names = "se";
+			clocks = <&clock_gcc GCC_QUPV3_WRAP0_S0_CLK>;
+			pinctrl-names = "default", "sleep";
+			pinctrl-0 = <&qup_1_spi_2_active>;
+			pinctrl-1 = <&qup_1_spi_2_sleep>;
+			spi-max-frequency = <19200000>;
+			#address-cells = <1>;
+			#size-cells = <0>;
+		};
+	}
-- 
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v6 0/5] Introduce GENI SE Controller Driver
From: Karthikeyan Ramasubramanian @ 2018-03-30 17:08 UTC (permalink / raw)
  To: corbet, andy.gross, david.brown, robh+dt, mark.rutland, wsa
  Cc: Karthikeyan Ramasubramanian, linux-doc, linux-arm-msm, devicetree,
	linux-i2c, evgreen, acourbot, swboyd, dianders, bjorn.andersson

Generic Interface (GENI) firmware based Qualcomm Universal Peripheral (QUP)
Wrapper is a next generation programmable module for supporting a wide
range of serial interfaces like UART, SPI, I2C, I3C, etc. A single QUP
module can provide upto 8 Serial Interfaces using its internal Serial
Engines (SE). The protocol supported by each interface is determined by
the firmware loaded to the Serial Engine.

This patch series introduces GENI SE Driver to manage the GENI based QUP
Wrapper and the common aspects of all SEs inside the QUP Wrapper. This
patch series also introduces the UART and I2C Controller drivers to
drive the SEs that are programmed with the respective protocols.

[v6]
 * Move the I2C clock-frequency configuration to the SDM845 board file
 * Remove a redundant comment in the I2C driver

[v5]
 * Remove Linux specific property from the device tree binding
 * Clarify I2C SCL time period documentation
 * Remove redundant checks in I2C controller driver during timeout
 * Use 100kHz as the default clock frequency in the I2C controller driver
 * Disable Wrapper controller by default in the SDM845 device tree and
   enable it explicitly for SDM845 MTP
 * Specify I2C clock frequency in the SDM845 device tree
 * Remove bias configuration for I2C pins under sleep state in device tree
 * Drop the serial driver from the patch series since it is merged
 * Specify the UART port options in the SDM845 device tree

[v4]
 * Add SPI controller information in device tree binding
 * Add support for debug UART & I2C controllers in SDM845 device tree
 * Remove any unnecessary parenthesis & casting
 * Identify break character in UART line and pass it to the framework
 * Transmit data from fault handler reliably in debug UART
 * Map the register block when the UART port is requested
 * Move concise exported functions as macros or inlines in public header
 * Move the clock performance table from the wrapper to serial engines
 * Add a lock to synchronize between IRQ & error handling in I2C controller
 * Remove any compiler optimization hints like likely/unlikely
 * Update documentation to clarify tables and hardware blocks

[v3]
 * Update the driver dependencies
 * Use the SPDX License Expression
 * Squash all the controller device tree bindings together
 * Use kernel doc format for documentation
 * Add additional documentation for packing configuration
 * Use clk_bulk_* API for related clocks
 * Remove driver references to pinctrl and their states
 * Replace magic numbers with appropriate macros
 * Update memory barrier usage and associated comments
 * Reduce interlacing of register reads/writes
 * Fix poll_get_char() operation in console UART driver under polling mode
 * Address other comments from Bjorn Andersson to improve code readability

[v2]
 * Updated device tree bindings to describe the hardware
 * Updated SE DT node as child node of QUP Wrapper DT node
 * Moved common AHB clocks to QUP Wrapper DT node
 * Use the standard "clock-frequency" I2C property
 * Update compatible field in UART Controller to reflect hardware manual
 * Addressed other device tree binding specific comments from Rob Herring

Karthikeyan Ramasubramanian (4):
  dt-bindings: soc: qcom: Add device tree binding for GENI SE
  soc: qcom: Add GENI based QUP Wrapper driver
  i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C
    controller
  arm64: dts: sdm845: Add support for an instance of I2C controller

Rajendra Nayak (1):
  arm64: dts: sdm845: Add serial console support

 .../devicetree/bindings/soc/qcom/qcom,geni-se.txt  | 119 ++++
 arch/arm64/boot/dts/qcom/sdm845-mtp.dts            |  60 ++
 arch/arm64/boot/dts/qcom/sdm845.dtsi               |  67 ++
 drivers/i2c/busses/Kconfig                         |  13 +
 drivers/i2c/busses/Makefile                        |   1 +
 drivers/i2c/busses/i2c-qcom-geni.c                 | 649 ++++++++++++++++++
 drivers/soc/qcom/Kconfig                           |   9 +
 drivers/soc/qcom/Makefile                          |   1 +
 drivers/soc/qcom/qcom-geni-se.c                    | 748 +++++++++++++++++++++
 include/linux/qcom-geni-se.h                       | 425 ++++++++++++
 10 files changed, 2092 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,geni-se.txt
 create mode 100644 drivers/i2c/busses/i2c-qcom-geni.c
 create mode 100644 drivers/soc/qcom/qcom-geni-se.c
 create mode 100644 include/linux/qcom-geni-se.h

-- 
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v6 4/5] arm64: dts: sdm845: Add serial console support
From: Karthikeyan Ramasubramanian @ 2018-03-30 17:08 UTC (permalink / raw)
  To: corbet, andy.gross, david.brown, robh+dt, mark.rutland, wsa
  Cc: Rajendra Nayak, linux-doc, linux-arm-msm, devicetree, linux-i2c,
	evgreen, acourbot, swboyd, dianders, bjorn.andersson,
	Karthikeyan Ramasubramanian
In-Reply-To: <1522429700-13083-1-git-send-email-kramasub@codeaurora.org>

From: Rajendra Nayak <rnayak@codeaurora.org>

Add the qup uart node and geni se instance needed to
support the serial console on the MTP.

Signed-off-by: Rajendra Nayak <rnayak@codeaurora.org>
Signed-off-by: Karthikeyan Ramasubramanian <kramasub@codeaurora.org>
---
 arch/arm64/boot/dts/qcom/sdm845-mtp.dts | 41 +++++++++++++++++++++++++++++++++
 arch/arm64/boot/dts/qcom/sdm845.dtsi    | 39 +++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845-mtp.dts b/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
index 979ab49..17b2fb0 100644
--- a/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
+++ b/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
@@ -12,4 +12,45 @@
 / {
 	model = "Qualcomm Technologies, Inc. SDM845 MTP";
 	compatible = "qcom,sdm845-mtp";
+
+	aliases {
+		serial0 = &uart2;
+	};
+
+	chosen {
+		stdout-path = "serial0:115200n8";
+	};
+};
+
+&soc {
+	geniqup@ac0000 {
+		status = "okay";
+
+		serial@a84000 {
+			status = "okay";
+		};
+	};
+
+	pinctrl@3400000 {
+		qup-uart2-default {
+			pinconf_tx {
+				pins = "gpio4";
+				drive-strength = <2>;
+				bias-disable;
+			};
+
+			pinconf_rx {
+				pins = "gpio5";
+				drive-strength = <2>;
+				bias-pull-up;
+			};
+		};
+
+		qup-uart2-sleep {
+			pinconf {
+				pins = "gpio4", "gpio5";
+				bias-pull-down;
+			};
+		};
+	};
 };
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 32f8561..71801b9 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -6,6 +6,7 @@
  */
 
 #include <dt-bindings/interrupt-controller/arm-gic.h>
+#include <dt-bindings/clock/qcom,gcc-sdm845.h>
 
 / {
 	interrupt-parent = <&intc>;
@@ -194,6 +195,20 @@
 			#gpio-cells = <2>;
 			interrupt-controller;
 			#interrupt-cells = <2>;
+
+			qup_uart2_default: qup-uart2-default {
+				pinmux {
+					function = "qup9";
+					pins = "gpio4", "gpio5";
+				};
+			};
+
+			qup_uart2_sleep: qup-uart2-sleep {
+				pinmux {
+					function = "gpio";
+					pins = "gpio4", "gpio5";
+				};
+			};
 		};
 
 		timer@17c90000 {
@@ -272,5 +287,29 @@
 			#interrupt-cells = <4>;
 			cell-index = <0>;
 		};
+
+		geniqup@ac0000 {
+			compatible = "qcom,geni-se-qup";
+			reg = <0xac0000 0x6000>;
+			clock-names = "m-ahb", "s-ahb";
+			clocks = <&gcc GCC_QUPV3_WRAP_1_M_AHB_CLK>,
+				 <&gcc GCC_QUPV3_WRAP_1_S_AHB_CLK>;
+			#address-cells = <1>;
+			#size-cells = <1>;
+			ranges;
+			status = "disabled";
+
+			uart2: serial@a84000 {
+				compatible = "qcom,geni-debug-uart";
+				reg = <0xa84000 0x4000>;
+				clock-names = "se";
+				clocks = <&gcc GCC_QUPV3_WRAP1_S1_CLK>;
+				pinctrl-names = "default", "sleep";
+				pinctrl-0 = <&qup_uart2_default>;
+				pinctrl-1 = <&qup_uart2_sleep>;
+				interrupts = <GIC_SPI 354 IRQ_TYPE_LEVEL_HIGH>;
+				status = "disabled";
+			};
+		};
 	};
 };
-- 
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v6 3/5] i2c: i2c-qcom-geni: Add bus driver for the Qualcomm GENI I2C controller
From: Karthikeyan Ramasubramanian @ 2018-03-30 17:08 UTC (permalink / raw)
  To: corbet, andy.gross, david.brown, robh+dt, mark.rutland, wsa
  Cc: Karthikeyan Ramasubramanian, linux-doc, linux-arm-msm, devicetree,
	linux-i2c, evgreen, acourbot, swboyd, dianders, bjorn.andersson,
	Sagar Dharia, Girish Mahadevan
In-Reply-To: <1522429700-13083-1-git-send-email-kramasub@codeaurora.org>

This bus driver supports the GENI based i2c hardware controller in the
Qualcomm SOCs. The Qualcomm Generic Interface (GENI) is a programmable
module supporting a wide range of serial interfaces including I2C. The
driver supports FIFO mode and DMA mode of transfer and switches modes
dynamically depending on the size of the transfer.

Signed-off-by: Karthikeyan Ramasubramanian <kramasub@codeaurora.org>
Signed-off-by: Sagar Dharia <sdharia@codeaurora.org>
Signed-off-by: Girish Mahadevan <girishm@codeaurora.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
---
 drivers/i2c/busses/Kconfig         |  13 +
 drivers/i2c/busses/Makefile        |   1 +
 drivers/i2c/busses/i2c-qcom-geni.c | 649 +++++++++++++++++++++++++++++++++++++
 3 files changed, 663 insertions(+)
 create mode 100644 drivers/i2c/busses/i2c-qcom-geni.c

diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
index e2954fb..89e642a 100644
--- a/drivers/i2c/busses/Kconfig
+++ b/drivers/i2c/busses/Kconfig
@@ -848,6 +848,19 @@ config I2C_PXA_SLAVE
 	  is necessary for systems where the PXA may be a target on the
 	  I2C bus.
 
+config I2C_QCOM_GENI
+	tristate "Qualcomm Technologies Inc.'s GENI based I2C controller"
+	depends on ARCH_QCOM || COMPILE_TEST
+	depends on QCOM_GENI_SE
+	help
+	  This driver supports GENI serial engine based I2C controller in
+	  master mode on the Qualcomm Technologies Inc.'s SoCs. If you say
+	  yes to this option, support will be included for the built-in I2C
+	  interface on the Qualcomm Technologies Inc.'s SoCs.
+
+	  This driver can also be built as a module.  If so, the module
+	  will be called i2c-qcom-geni.
+
 config I2C_QUP
 	tristate "Qualcomm QUP based I2C controller"
 	depends on ARCH_QCOM
diff --git a/drivers/i2c/busses/Makefile b/drivers/i2c/busses/Makefile
index 2ce8576..201fce1 100644
--- a/drivers/i2c/busses/Makefile
+++ b/drivers/i2c/busses/Makefile
@@ -84,6 +84,7 @@ obj-$(CONFIG_I2C_PNX)		+= i2c-pnx.o
 obj-$(CONFIG_I2C_PUV3)		+= i2c-puv3.o
 obj-$(CONFIG_I2C_PXA)		+= i2c-pxa.o
 obj-$(CONFIG_I2C_PXA_PCI)	+= i2c-pxa-pci.o
+obj-$(CONFIG_I2C_QCOM_GENI)	+= i2c-qcom-geni.o
 obj-$(CONFIG_I2C_QUP)		+= i2c-qup.o
 obj-$(CONFIG_I2C_RIIC)		+= i2c-riic.o
 obj-$(CONFIG_I2C_RK3X)		+= i2c-rk3x.o
diff --git a/drivers/i2c/busses/i2c-qcom-geni.c b/drivers/i2c/busses/i2c-qcom-geni.c
new file mode 100644
index 0000000..3220374
--- /dev/null
+++ b/drivers/i2c/busses/i2c-qcom-geni.c
@@ -0,0 +1,649 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2017-2018, The Linux Foundation. All rights reserved.
+
+#include <linux/clk.h>
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/i2c.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/qcom-geni-se.h>
+#include <linux/spinlock.h>
+
+#define SE_I2C_TX_TRANS_LEN		0x26c
+#define SE_I2C_RX_TRANS_LEN		0x270
+#define SE_I2C_SCL_COUNTERS		0x278
+
+#define SE_I2C_ERR  (M_CMD_OVERRUN_EN | M_ILLEGAL_CMD_EN | M_CMD_FAILURE_EN |\
+			M_GP_IRQ_1_EN | M_GP_IRQ_3_EN | M_GP_IRQ_4_EN)
+#define SE_I2C_ABORT		BIT(1)
+
+/* M_CMD OP codes for I2C */
+#define I2C_WRITE		0x1
+#define I2C_READ		0x2
+#define I2C_WRITE_READ		0x3
+#define I2C_ADDR_ONLY		0x4
+#define I2C_BUS_CLEAR		0x6
+#define I2C_STOP_ON_BUS		0x7
+/* M_CMD params for I2C */
+#define PRE_CMD_DELAY		BIT(0)
+#define TIMESTAMP_BEFORE	BIT(1)
+#define STOP_STRETCH		BIT(2)
+#define TIMESTAMP_AFTER		BIT(3)
+#define POST_COMMAND_DELAY	BIT(4)
+#define IGNORE_ADD_NACK		BIT(6)
+#define READ_FINISHED_WITH_ACK	BIT(7)
+#define BYPASS_ADDR_PHASE	BIT(8)
+#define SLV_ADDR_MSK		GENMASK(15, 9)
+#define SLV_ADDR_SHFT		9
+/* I2C SCL COUNTER fields */
+#define HIGH_COUNTER_MSK	GENMASK(29, 20)
+#define HIGH_COUNTER_SHFT	20
+#define LOW_COUNTER_MSK		GENMASK(19, 10)
+#define LOW_COUNTER_SHFT	10
+#define CYCLE_COUNTER_MSK	GENMASK(9, 0)
+
+enum geni_i2c_err_code {
+	GP_IRQ0,
+	NACK,
+	GP_IRQ2,
+	BUS_PROTO,
+	ARB_LOST,
+	GP_IRQ5,
+	GENI_OVERRUN,
+	GENI_ILLEGAL_CMD,
+	GENI_ABORT_DONE,
+	GENI_TIMEOUT,
+};
+
+#define DM_I2C_CB_ERR		((BIT(NACK) | BIT(BUS_PROTO) | BIT(ARB_LOST)) \
+									<< 5)
+
+#define I2C_AUTO_SUSPEND_DELAY	250
+#define KHZ(freq)		(1000 * freq)
+#define PACKING_BYTES_PW	4
+
+#define ABORT_TIMEOUT		HZ
+#define XFER_TIMEOUT		HZ
+#define RST_TIMEOUT		HZ
+
+struct geni_i2c_dev {
+	struct geni_se se;
+	u32 tx_wm;
+	int irq;
+	int err;
+	struct i2c_adapter adap;
+	struct completion done;
+	struct i2c_msg *cur;
+	int cur_wr;
+	int cur_rd;
+	spinlock_t lock;
+	u32 clk_freq_out;
+	const struct geni_i2c_clk_fld *clk_fld;
+};
+
+struct geni_i2c_err_log {
+	int err;
+	const char *msg;
+};
+
+static const struct geni_i2c_err_log gi2c_log[] = {
+	[GP_IRQ0] = {-EINVAL, "Unknown I2C err GP_IRQ0"},
+	[NACK] = {-ENOTCONN, "NACK: slv unresponsive, check its power/reset-ln"},
+	[GP_IRQ2] = {-EINVAL, "Unknown I2C err GP IRQ2"},
+	[BUS_PROTO] = {-EPROTO, "Bus proto err, noisy/unepxected start/stop"},
+	[ARB_LOST] = {-EBUSY, "Bus arbitration lost, clock line undriveable"},
+	[GP_IRQ5] = {-EINVAL, "Unknown I2C err GP IRQ5"},
+	[GENI_OVERRUN] = {-EIO, "Cmd overrun, check GENI cmd-state machine"},
+	[GENI_ILLEGAL_CMD] = {-EILSEQ, "Illegal cmd, check GENI cmd-state machine"},
+	[GENI_ABORT_DONE] = {-ETIMEDOUT, "Abort after timeout successful"},
+	[GENI_TIMEOUT] = {-ETIMEDOUT, "I2C TXN timed out"},
+};
+
+struct geni_i2c_clk_fld {
+	u32	clk_freq_out;
+	u8	clk_div;
+	u8	t_high_cnt;
+	u8	t_low_cnt;
+	u8	t_cycle_cnt;
+};
+
+/*
+ * Hardware uses the underlying formula to calculate time periods of
+ * SCL clock cycle. Firmware uses some additional cycles excluded from the
+ * below formula and it is confirmed that the time periods are within
+ * specification limits.
+ *
+ * time of high period of SCL: t_high = (t_high_cnt * clk_div) / source_clock
+ * time of low period of SCL: t_low = (t_low_cnt * clk_div) / source_clock
+ * time of full period of SCL: t_cycle = (t_cycle_cnt * clk_div) / source_clock
+ * clk_freq_out = t / t_cycle
+ * source_clock = 19.2 MHz
+ */
+static const struct geni_i2c_clk_fld geni_i2c_clk_map[] = {
+	{KHZ(100), 7, 10, 11, 26},
+	{KHZ(400), 2,  5, 12, 24},
+	{KHZ(1000), 1, 3,  9, 18},
+};
+
+static int geni_i2c_clk_map_idx(struct geni_i2c_dev *gi2c)
+{
+	int i;
+	const struct geni_i2c_clk_fld *itr = geni_i2c_clk_map;
+
+	for (i = 0; i < ARRAY_SIZE(geni_i2c_clk_map); i++, itr++) {
+		if (itr->clk_freq_out == gi2c->clk_freq_out) {
+			gi2c->clk_fld = itr;
+			return 0;
+		}
+	}
+	return -EINVAL;
+}
+
+static void qcom_geni_i2c_conf(struct geni_i2c_dev *gi2c)
+{
+	const struct geni_i2c_clk_fld *itr = gi2c->clk_fld;
+	u32 val;
+
+	writel_relaxed(0, gi2c->se.base + SE_GENI_CLK_SEL);
+
+	val = (itr->clk_div << CLK_DIV_SHFT) | SER_CLK_EN;
+	writel_relaxed(val, gi2c->se.base + GENI_SER_M_CLK_CFG);
+
+	val = itr->t_high_cnt << HIGH_COUNTER_SHFT;
+	val |= itr->t_low_cnt << LOW_COUNTER_SHFT;
+	val |= itr->t_cycle_cnt;
+	writel_relaxed(val, gi2c->se.base + SE_I2C_SCL_COUNTERS);
+}
+
+static void geni_i2c_err_misc(struct geni_i2c_dev *gi2c)
+{
+	u32 m_cmd = readl_relaxed(gi2c->se.base + SE_GENI_M_CMD0);
+	u32 m_stat = readl_relaxed(gi2c->se.base + SE_GENI_M_IRQ_STATUS);
+	u32 geni_s = readl_relaxed(gi2c->se.base + SE_GENI_STATUS);
+	u32 geni_ios = readl_relaxed(gi2c->se.base + SE_GENI_IOS);
+	u32 dma = readl_relaxed(gi2c->se.base + SE_GENI_DMA_MODE_EN);
+	u32 rx_st, tx_st;
+
+	if (dma) {
+		rx_st = readl_relaxed(gi2c->se.base + SE_DMA_RX_IRQ_STAT);
+		tx_st = readl_relaxed(gi2c->se.base + SE_DMA_TX_IRQ_STAT);
+	} else {
+		rx_st = readl_relaxed(gi2c->se.base + SE_GENI_RX_FIFO_STATUS);
+		tx_st = readl_relaxed(gi2c->se.base + SE_GENI_TX_FIFO_STATUS);
+	}
+	dev_dbg(gi2c->se.dev, "DMA:%d tx_stat:0x%x, rx_stat:0x%x, irq-stat:0x%x\n",
+		dma, tx_st, rx_st, m_stat);
+	dev_dbg(gi2c->se.dev, "m_cmd:0x%x, geni_status:0x%x, geni_ios:0x%x\n",
+		m_cmd, geni_s, geni_ios);
+}
+
+static void geni_i2c_err(struct geni_i2c_dev *gi2c, int err)
+{
+	if (!gi2c->err)
+		gi2c->err = gi2c_log[err].err;
+	if (gi2c->cur)
+		dev_dbg(gi2c->se.dev, "len:%d, slv-addr:0x%x, RD/WR:%d\n",
+			gi2c->cur->len, gi2c->cur->addr, gi2c->cur->flags);
+
+	if (err != NACK && err != GENI_ABORT_DONE) {
+		dev_err(gi2c->se.dev, "%s\n", gi2c_log[err].msg);
+		geni_i2c_err_misc(gi2c);
+	}
+}
+
+static irqreturn_t geni_i2c_irq(int irq, void *dev)
+{
+	struct geni_i2c_dev *gi2c = dev;
+	int j;
+	u32 m_stat;
+	u32 rx_st;
+	u32 dm_tx_st;
+	u32 dm_rx_st;
+	u32 dma;
+	struct i2c_msg *cur;
+	unsigned long flags;
+
+	spin_lock_irqsave(&gi2c->lock, flags);
+	m_stat = readl_relaxed(gi2c->se.base + SE_GENI_M_IRQ_STATUS);
+	rx_st = readl_relaxed(gi2c->se.base + SE_GENI_RX_FIFO_STATUS);
+	dm_tx_st = readl_relaxed(gi2c->se.base + SE_DMA_TX_IRQ_STAT);
+	dm_rx_st = readl_relaxed(gi2c->se.base + SE_DMA_RX_IRQ_STAT);
+	dma = readl_relaxed(gi2c->se.base + SE_GENI_DMA_MODE_EN);
+	cur = gi2c->cur;
+
+	if (!cur ||
+	    m_stat & (M_CMD_FAILURE_EN | M_CMD_ABORT_EN) ||
+	    dm_rx_st & (DM_I2C_CB_ERR)) {
+		if (m_stat & M_GP_IRQ_1_EN)
+			geni_i2c_err(gi2c, NACK);
+		if (m_stat & M_GP_IRQ_3_EN)
+			geni_i2c_err(gi2c, BUS_PROTO);
+		if (m_stat & M_GP_IRQ_4_EN)
+			geni_i2c_err(gi2c, ARB_LOST);
+		if (m_stat & M_CMD_OVERRUN_EN)
+			geni_i2c_err(gi2c, GENI_OVERRUN);
+		if (m_stat & M_ILLEGAL_CMD_EN)
+			geni_i2c_err(gi2c, GENI_ILLEGAL_CMD);
+		if (m_stat & M_CMD_ABORT_EN)
+			geni_i2c_err(gi2c, GENI_ABORT_DONE);
+		if (m_stat & M_GP_IRQ_0_EN)
+			geni_i2c_err(gi2c, GP_IRQ0);
+
+		/* Disable the TX Watermark interrupt to stop TX */
+		if (!dma)
+			writel_relaxed(0, gi2c->se.base +
+					   SE_GENI_TX_WATERMARK_REG);
+		goto irqret;
+	}
+
+	if (dma) {
+		dev_dbg(gi2c->se.dev, "i2c dma tx:0x%x, dma rx:0x%x\n",
+			dm_tx_st, dm_rx_st);
+		goto irqret;
+	}
+
+	if (cur->flags & I2C_M_RD &&
+	    m_stat & (M_RX_FIFO_WATERMARK_EN | M_RX_FIFO_LAST_EN)) {
+		u32 rxcnt = rx_st & RX_FIFO_WC_MSK;
+
+		for (j = 0; j < rxcnt; j++) {
+			u32 val;
+			int p = 0;
+
+			val = readl_relaxed(gi2c->se.base + SE_GENI_RX_FIFOn);
+			while (gi2c->cur_rd < cur->len && p < sizeof(val)) {
+				cur->buf[gi2c->cur_rd++] = val & 0xff;
+				val >>= 8;
+				p++;
+			}
+			if (gi2c->cur_rd == cur->len)
+				break;
+		}
+	} else if (!(cur->flags & I2C_M_RD) &&
+		   m_stat & M_TX_FIFO_WATERMARK_EN) {
+		for (j = 0; j < gi2c->tx_wm; j++) {
+			u32 temp;
+			u32 val = 0;
+			int p = 0;
+
+			while (gi2c->cur_wr < cur->len && p < sizeof(val)) {
+				temp = cur->buf[gi2c->cur_wr++];
+				val |= temp << (p * 8);
+				p++;
+			}
+			writel_relaxed(val, gi2c->se.base + SE_GENI_TX_FIFOn);
+			/* TX Complete, Disable the TX Watermark interrupt */
+			if (gi2c->cur_wr == cur->len) {
+				writel_relaxed(0, gi2c->se.base +
+						SE_GENI_TX_WATERMARK_REG);
+				break;
+			}
+		}
+	}
+irqret:
+	if (m_stat)
+		writel_relaxed(m_stat, gi2c->se.base + SE_GENI_M_IRQ_CLEAR);
+
+	if (dma) {
+		if (dm_tx_st)
+			writel_relaxed(dm_tx_st, gi2c->se.base +
+						SE_DMA_TX_IRQ_CLR);
+		if (dm_rx_st)
+			writel_relaxed(dm_rx_st, gi2c->se.base +
+						SE_DMA_RX_IRQ_CLR);
+	}
+	/* if this is err with done-bit not set, handle that through timeout. */
+	if (m_stat & M_CMD_DONE_EN || m_stat & M_CMD_ABORT_EN)
+		complete(&gi2c->done);
+	else if (dm_tx_st & TX_DMA_DONE || dm_tx_st & TX_RESET_DONE)
+		complete(&gi2c->done);
+	else if (dm_rx_st & RX_DMA_DONE || dm_rx_st & RX_RESET_DONE)
+		complete(&gi2c->done);
+
+	spin_unlock_irqrestore(&gi2c->lock, flags);
+	return IRQ_HANDLED;
+}
+
+static void geni_i2c_abort_xfer(struct geni_i2c_dev *gi2c)
+{
+	u32 val;
+	unsigned long time_left = ABORT_TIMEOUT;
+	unsigned long flags;
+
+	spin_lock_irqsave(&gi2c->lock, flags);
+	geni_i2c_err(gi2c, GENI_TIMEOUT);
+	gi2c->cur = NULL;
+	geni_se_abort_m_cmd(&gi2c->se);
+	spin_unlock_irqrestore(&gi2c->lock, flags);
+	do {
+		time_left = wait_for_completion_timeout(&gi2c->done, time_left);
+		val = readl_relaxed(gi2c->se.base + SE_GENI_M_IRQ_STATUS);
+	} while (!(val & M_CMD_ABORT_EN) && time_left);
+
+	if (!(val & M_CMD_ABORT_EN))
+		dev_err(gi2c->se.dev, "Timeout abort_m_cmd\n");
+}
+
+static void geni_i2c_rx_fsm_rst(struct geni_i2c_dev *gi2c)
+{
+	u32 val;
+	unsigned long time_left = RST_TIMEOUT;
+
+	writel_relaxed(1, gi2c->se.base + SE_DMA_RX_FSM_RST);
+	do {
+		time_left = wait_for_completion_timeout(&gi2c->done, time_left);
+		val = readl_relaxed(gi2c->se.base + SE_DMA_RX_IRQ_STAT);
+	} while (!(val & RX_RESET_DONE) && time_left);
+
+	if (!(val & RX_RESET_DONE))
+		dev_err(gi2c->se.dev, "Timeout resetting RX_FSM\n");
+}
+
+static void geni_i2c_tx_fsm_rst(struct geni_i2c_dev *gi2c)
+{
+	u32 val;
+	unsigned long time_left = RST_TIMEOUT;
+
+	writel_relaxed(1, gi2c->se.base + SE_DMA_TX_FSM_RST);
+	do {
+		time_left = wait_for_completion_timeout(&gi2c->done, time_left);
+		val = readl_relaxed(gi2c->se.base + SE_DMA_TX_IRQ_STAT);
+	} while (!(val & TX_RESET_DONE) && time_left);
+
+	if (!(val & TX_RESET_DONE))
+		dev_err(gi2c->se.dev, "Timeout resetting TX_FSM\n");
+}
+
+static int geni_i2c_rx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
+				u32 m_param)
+{
+	dma_addr_t rx_dma;
+	enum geni_se_xfer_mode mode;
+	unsigned long time_left = XFER_TIMEOUT;
+
+	gi2c->cur = msg;
+	mode = msg->len > 32 ? GENI_SE_DMA : GENI_SE_FIFO;
+	geni_se_select_mode(&gi2c->se, mode);
+	writel_relaxed(msg->len, gi2c->se.base + SE_I2C_RX_TRANS_LEN);
+	geni_se_setup_m_cmd(&gi2c->se, I2C_READ, m_param);
+	if (mode == GENI_SE_DMA) {
+		int ret;
+
+		ret = geni_se_rx_dma_prep(&gi2c->se, msg->buf, msg->len,
+								&rx_dma);
+		if (!ret) {
+			mode = GENI_SE_FIFO;
+			geni_se_select_mode(&gi2c->se, mode);
+		}
+	}
+
+	time_left = wait_for_completion_timeout(&gi2c->done, XFER_TIMEOUT);
+	if (!time_left)
+		geni_i2c_abort_xfer(gi2c);
+
+	gi2c->cur_rd = 0;
+	if (mode == GENI_SE_DMA) {
+		if (gi2c->err)
+			geni_i2c_rx_fsm_rst(gi2c);
+		geni_se_rx_dma_unprep(&gi2c->se, rx_dma, msg->len);
+	}
+	return gi2c->err;
+}
+
+static int geni_i2c_tx_one_msg(struct geni_i2c_dev *gi2c, struct i2c_msg *msg,
+				u32 m_param)
+{
+	dma_addr_t tx_dma;
+	enum geni_se_xfer_mode mode;
+	unsigned long time_left;
+
+	gi2c->cur = msg;
+	mode = msg->len > 32 ? GENI_SE_DMA : GENI_SE_FIFO;
+	geni_se_select_mode(&gi2c->se, mode);
+	writel_relaxed(msg->len, gi2c->se.base + SE_I2C_TX_TRANS_LEN);
+	geni_se_setup_m_cmd(&gi2c->se, I2C_WRITE, m_param);
+	if (mode == GENI_SE_DMA) {
+		int ret;
+
+		ret = geni_se_tx_dma_prep(&gi2c->se, msg->buf, msg->len,
+								&tx_dma);
+		if (!ret) {
+			mode = GENI_SE_FIFO;
+			geni_se_select_mode(&gi2c->se, mode);
+		}
+	}
+
+	if (mode == GENI_SE_FIFO) /* Get FIFO IRQ */
+		writel_relaxed(1, gi2c->se.base + SE_GENI_TX_WATERMARK_REG);
+
+	time_left = wait_for_completion_timeout(&gi2c->done, XFER_TIMEOUT);
+	if (!time_left)
+		geni_i2c_abort_xfer(gi2c);
+
+	gi2c->cur_wr = 0;
+	if (mode == GENI_SE_DMA) {
+		if (gi2c->err)
+			geni_i2c_tx_fsm_rst(gi2c);
+		geni_se_tx_dma_unprep(&gi2c->se, tx_dma, msg->len);
+	}
+	return gi2c->err;
+}
+
+static int geni_i2c_xfer(struct i2c_adapter *adap,
+			 struct i2c_msg msgs[],
+			 int num)
+{
+	struct geni_i2c_dev *gi2c = i2c_get_adapdata(adap);
+	int i, ret;
+
+	gi2c->err = 0;
+	reinit_completion(&gi2c->done);
+	ret = pm_runtime_get_sync(gi2c->se.dev);
+	if (ret < 0) {
+		dev_err(gi2c->se.dev, "error turning SE resources:%d\n", ret);
+		pm_runtime_put_noidle(gi2c->se.dev);
+		/* Set device in suspended since resume failed */
+		pm_runtime_set_suspended(gi2c->se.dev);
+		return ret;
+	}
+
+	qcom_geni_i2c_conf(gi2c);
+	for (i = 0; i < num; i++) {
+		u32 m_param = i < (num - 1) ? STOP_STRETCH : 0;
+
+		m_param |= ((msgs[i].addr << SLV_ADDR_SHFT) & SLV_ADDR_MSK);
+
+		if (msgs[i].flags & I2C_M_RD)
+			ret = geni_i2c_rx_one_msg(gi2c, &msgs[i], m_param);
+		else
+			ret = geni_i2c_tx_one_msg(gi2c, &msgs[i], m_param);
+
+		if (ret)
+			break;
+	}
+	if (ret == 0)
+		ret = num;
+
+	pm_runtime_mark_last_busy(gi2c->se.dev);
+	pm_runtime_put_autosuspend(gi2c->se.dev);
+	gi2c->cur = NULL;
+	gi2c->err = 0;
+	return ret;
+}
+
+static u32 geni_i2c_func(struct i2c_adapter *adap)
+{
+	return I2C_FUNC_I2C | (I2C_FUNC_SMBUS_EMUL & ~I2C_FUNC_SMBUS_QUICK);
+}
+
+static const struct i2c_algorithm geni_i2c_algo = {
+	.master_xfer	= geni_i2c_xfer,
+	.functionality	= geni_i2c_func,
+};
+
+static int geni_i2c_probe(struct platform_device *pdev)
+{
+	struct geni_i2c_dev *gi2c;
+	struct resource *res;
+	u32 proto, tx_depth;
+	int ret;
+
+	gi2c = devm_kzalloc(&pdev->dev, sizeof(*gi2c), GFP_KERNEL);
+	if (!gi2c)
+		return -ENOMEM;
+
+	gi2c->se.dev = &pdev->dev;
+	gi2c->se.wrapper = dev_get_drvdata(pdev->dev.parent);
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	gi2c->se.base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(gi2c->se.base))
+		return PTR_ERR(gi2c->se.base);
+
+	gi2c->se.clk = devm_clk_get(&pdev->dev, "se");
+	if (IS_ERR(gi2c->se.clk)) {
+		ret = PTR_ERR(gi2c->se.clk);
+		dev_err(&pdev->dev, "Err getting SE Core clk %d\n", ret);
+		return ret;
+	}
+
+	ret = device_property_read_u32(&pdev->dev, "clock-frequency",
+							&gi2c->clk_freq_out);
+	if (ret) {
+		dev_info(&pdev->dev,
+			"Bus frequency not specified, default to 100kHz.\n");
+		gi2c->clk_freq_out = KHZ(100);
+	}
+
+	gi2c->irq = platform_get_irq(pdev, 0);
+	if (gi2c->irq < 0) {
+		dev_err(&pdev->dev, "IRQ error for i2c-geni\n");
+		return gi2c->irq;
+	}
+
+	ret = geni_i2c_clk_map_idx(gi2c);
+	if (ret) {
+		dev_err(&pdev->dev, "Invalid clk frequency %d Hz: %d\n",
+			gi2c->clk_freq_out, ret);
+		return ret;
+	}
+
+	gi2c->adap.algo = &geni_i2c_algo;
+	init_completion(&gi2c->done);
+	spin_lock_init(&gi2c->lock);
+	platform_set_drvdata(pdev, gi2c);
+	ret = devm_request_irq(&pdev->dev, gi2c->irq, geni_i2c_irq,
+			       IRQF_TRIGGER_HIGH, "i2c_geni", gi2c);
+	if (ret) {
+		dev_err(&pdev->dev, "Request_irq failed:%d: err:%d\n",
+			gi2c->irq, ret);
+		return ret;
+	}
+	/* Disable the interrupt so that the system can enter low-power mode */
+	disable_irq(gi2c->irq);
+	i2c_set_adapdata(&gi2c->adap, gi2c);
+	gi2c->adap.dev.parent = &pdev->dev;
+	gi2c->adap.dev.of_node = pdev->dev.of_node;
+	strlcpy(gi2c->adap.name, "Geni-I2C", sizeof(gi2c->adap.name));
+
+	ret = geni_se_resources_on(&gi2c->se);
+	if (ret) {
+		dev_err(&pdev->dev, "Error turning on resources %d\n", ret);
+		return ret;
+	}
+	proto = geni_se_read_proto(&gi2c->se);
+	tx_depth = geni_se_get_tx_fifo_depth(&gi2c->se);
+	if (proto != GENI_SE_I2C) {
+		dev_err(&pdev->dev, "Invalid proto %d\n", proto);
+		geni_se_resources_off(&gi2c->se);
+		return -ENXIO;
+	}
+	gi2c->tx_wm = tx_depth - 1;
+	geni_se_init(&gi2c->se, gi2c->tx_wm, tx_depth);
+	geni_se_config_packing(&gi2c->se, BITS_PER_BYTE, PACKING_BYTES_PW,
+							true, true, true);
+	geni_se_resources_off(&gi2c->se);
+	dev_dbg(&pdev->dev, "i2c fifo/se-dma mode. fifo depth:%d\n", tx_depth);
+
+	pm_runtime_set_suspended(gi2c->se.dev);
+	pm_runtime_set_autosuspend_delay(gi2c->se.dev, I2C_AUTO_SUSPEND_DELAY);
+	pm_runtime_use_autosuspend(gi2c->se.dev);
+	pm_runtime_enable(gi2c->se.dev);
+	i2c_add_adapter(&gi2c->adap);
+
+	return 0;
+}
+
+static int geni_i2c_remove(struct platform_device *pdev)
+{
+	struct geni_i2c_dev *gi2c = platform_get_drvdata(pdev);
+
+	pm_runtime_disable(gi2c->se.dev);
+	i2c_del_adapter(&gi2c->adap);
+	return 0;
+}
+
+static int __maybe_unused geni_i2c_runtime_suspend(struct device *dev)
+{
+	struct geni_i2c_dev *gi2c = dev_get_drvdata(dev);
+
+	disable_irq(gi2c->irq);
+	geni_se_resources_off(&gi2c->se);
+	return 0;
+}
+
+static int __maybe_unused geni_i2c_runtime_resume(struct device *dev)
+{
+	int ret;
+	struct geni_i2c_dev *gi2c = dev_get_drvdata(dev);
+
+	ret = geni_se_resources_on(&gi2c->se);
+	if (ret)
+		return ret;
+
+	enable_irq(gi2c->irq);
+	return 0;
+}
+
+static int __maybe_unused geni_i2c_suspend_noirq(struct device *dev)
+{
+	if (!pm_runtime_suspended(dev)) {
+		geni_i2c_runtime_suspend(dev);
+		pm_runtime_disable(dev);
+		pm_runtime_set_suspended(dev);
+		pm_runtime_enable(dev);
+	}
+	return 0;
+}
+
+static const struct dev_pm_ops geni_i2c_pm_ops = {
+	SET_NOIRQ_SYSTEM_SLEEP_PM_OPS(geni_i2c_suspend_noirq, NULL)
+	SET_RUNTIME_PM_OPS(geni_i2c_runtime_suspend, geni_i2c_runtime_resume,
+									NULL)
+};
+
+static const struct of_device_id geni_i2c_dt_match[] = {
+	{ .compatible = "qcom,geni-i2c" },
+	{}
+};
+MODULE_DEVICE_TABLE(of, geni_i2c_dt_match);
+
+static struct platform_driver geni_i2c_driver = {
+	.probe  = geni_i2c_probe,
+	.remove = geni_i2c_remove,
+	.driver = {
+		.name = "geni_i2c",
+		.pm = &geni_i2c_pm_ops,
+		.of_match_table = geni_i2c_dt_match,
+	},
+};
+
+module_platform_driver(geni_i2c_driver);
+
+MODULE_DESCRIPTION("I2C Controller Driver for GENI based QUP cores");
+MODULE_LICENSE("GPL v2");
-- 
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v6 5/5] arm64: dts: sdm845: Add support for an instance of I2C controller
From: Karthikeyan Ramasubramanian @ 2018-03-30 17:08 UTC (permalink / raw)
  To: corbet, andy.gross, david.brown, robh+dt, mark.rutland, wsa
  Cc: Karthikeyan Ramasubramanian, linux-doc, linux-arm-msm, devicetree,
	linux-i2c, evgreen, acourbot, swboyd, dianders, bjorn.andersson
In-Reply-To: <1522429700-13083-1-git-send-email-kramasub@codeaurora.org>

Add one instance of GENI based I2C master controller to enable testing
I2C driver using EEPROM slave.

Signed-off-by: Karthikeyan Ramasubramanian <kramasub@codeaurora.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
---
 arch/arm64/boot/dts/qcom/sdm845-mtp.dts | 19 +++++++++++++++++++
 arch/arm64/boot/dts/qcom/sdm845.dtsi    | 28 ++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sdm845-mtp.dts b/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
index 17b2fb0..dbe3a36 100644
--- a/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
+++ b/arch/arm64/boot/dts/qcom/sdm845-mtp.dts
@@ -29,9 +29,28 @@
 		serial@a84000 {
 			status = "okay";
 		};
+
+		i2c@a88000 {
+			clock-frequency = <400000>;
+			status = "okay";
+		};
 	};
 
 	pinctrl@3400000 {
+		qup-i2c10-default {
+			pinconf {
+				pins = "gpio55", "gpio56";
+				drive-strength = <2>;
+				bias-disable;
+			};
+		};
+
+		qup-i2c10-sleep {
+			pinconf {
+				pins = "gpio55", "gpio56";
+			};
+		};
+
 		qup-uart2-default {
 			pinconf_tx {
 				pins = "gpio4";
diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 71801b9..d367020 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -196,6 +196,20 @@
 			interrupt-controller;
 			#interrupt-cells = <2>;
 
+			qup_i2c10_default: qup-i2c10-default {
+				pinmux {
+					function = "qup10";
+					pins = "gpio55", "gpio56";
+				};
+			};
+
+			qup_i2c10_sleep: qup-i2c10-sleep {
+				pinmux {
+					function = "gpio";
+					pins = "gpio55", "gpio56";
+				};
+			};
+
 			qup_uart2_default: qup-uart2-default {
 				pinmux {
 					function = "qup9";
@@ -310,6 +324,20 @@
 				interrupts = <GIC_SPI 354 IRQ_TYPE_LEVEL_HIGH>;
 				status = "disabled";
 			};
+
+			i2c10: i2c@a88000 {
+				compatible = "qcom,geni-i2c";
+				reg = <0xa88000 0x4000>;
+				clock-names = "se";
+				clocks = <&gcc GCC_QUPV3_WRAP1_S2_CLK>;
+				pinctrl-names = "default", "sleep";
+				pinctrl-0 = <&qup_i2c10_default>;
+				pinctrl-1 = <&qup_i2c10_sleep>;
+				interrupts = <GIC_SPI 355 IRQ_TYPE_LEVEL_HIGH>;
+				#address-cells = <1>;
+				#size-cells = <0>;
+				status = "disabled";
+			};
 		};
 	};
 };
-- 
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v6 2/5] soc: qcom: Add GENI based QUP Wrapper driver
From: Karthikeyan Ramasubramanian @ 2018-03-30 17:08 UTC (permalink / raw)
  To: corbet, andy.gross, david.brown, robh+dt, mark.rutland, wsa
  Cc: Karthikeyan Ramasubramanian, linux-doc, linux-arm-msm, devicetree,
	linux-i2c, evgreen, acourbot, swboyd, dianders, bjorn.andersson,
	Sagar Dharia, Girish Mahadevan
In-Reply-To: <1522429700-13083-1-git-send-email-kramasub@codeaurora.org>

This driver manages the Generic Interface (GENI) firmware based Qualcomm
Universal Peripheral (QUP) Wrapper. GENI based QUP is the next generation
programmable module composed of multiple Serial Engines (SE) and supports
a wide range of serial interfaces like UART, SPI, I2C, I3C, etc. This
driver also enables managing the serial interface independent aspects of
Serial Engines.

Signed-off-by: Karthikeyan Ramasubramanian <kramasub@codeaurora.org>
Signed-off-by: Sagar Dharia <sdharia@codeaurora.org>
Signed-off-by: Girish Mahadevan <girishm@codeaurora.org>
---
 drivers/soc/qcom/Kconfig        |   9 +
 drivers/soc/qcom/Makefile       |   1 +
 drivers/soc/qcom/qcom-geni-se.c | 748 ++++++++++++++++++++++++++++++++++++++++
 include/linux/qcom-geni-se.h    | 425 +++++++++++++++++++++++
 4 files changed, 1183 insertions(+)
 create mode 100644 drivers/soc/qcom/qcom-geni-se.c
 create mode 100644 include/linux/qcom-geni-se.h

diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index e050eb8..98ca9f5 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -3,6 +3,15 @@
 #
 menu "Qualcomm SoC drivers"
 
+config QCOM_GENI_SE
+	tristate "QCOM GENI Serial Engine Driver"
+	depends on ARCH_QCOM || COMPILE_TEST
+	help
+	  This driver is used to manage Generic Interface (GENI) firmware based
+	  Qualcomm Technologies, Inc. Universal Peripheral (QUP) Wrapper. This
+	  driver is also used to manage the common aspects of multiple Serial
+	  Engines present in the QUP.
+
 config QCOM_GLINK_SSR
 	tristate "Qualcomm Glink SSR driver"
 	depends on RPMSG
diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
index dcebf28..959aa74 100644
--- a/drivers/soc/qcom/Makefile
+++ b/drivers/soc/qcom/Makefile
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_QCOM_GENI_SE) +=	qcom-geni-se.o
 obj-$(CONFIG_QCOM_GLINK_SSR) +=	glink_ssr.o
 obj-$(CONFIG_QCOM_GSBI)	+=	qcom_gsbi.o
 obj-$(CONFIG_QCOM_MDT_LOADER)	+= mdt_loader.o
diff --git a/drivers/soc/qcom/qcom-geni-se.c b/drivers/soc/qcom/qcom-geni-se.c
new file mode 100644
index 0000000..feed3db2
--- /dev/null
+++ b/drivers/soc/qcom/qcom-geni-se.c
@@ -0,0 +1,748 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2017-2018, The Linux Foundation. All rights reserved.
+
+#include <linux/clk.h>
+#include <linux/slab.h>
+#include <linux/dma-mapping.h>
+#include <linux/io.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/pinctrl/consumer.h>
+#include <linux/platform_device.h>
+#include <linux/qcom-geni-se.h>
+
+/**
+ * DOC: Overview
+ *
+ * Generic Interface (GENI) Serial Engine (SE) Wrapper driver is introduced
+ * to manage GENI firmware based Qualcomm Universal Peripheral (QUP) Wrapper
+ * controller. QUP Wrapper is designed to support various serial bus protocols
+ * like UART, SPI, I2C, I3C, etc.
+ */
+
+/**
+ * DOC: Hardware description
+ *
+ * GENI based QUP is a highly-flexible and programmable module for supporting
+ * a wide range of serial interfaces like UART, SPI, I2C, I3C, etc. A single
+ * QUP module can provide upto 8 serial interfaces, using its internal
+ * serial engines. The actual configuration is determined by the target
+ * platform configuration. The protocol supported by each interface is
+ * determined by the firmware loaded to the serial engine. Each SE consists
+ * of a DMA Engine and GENI sub modules which enable serial engines to
+ * support FIFO and DMA modes of operation.
+ *
+ *
+ *                      +-----------------------------------------+
+ *                      |QUP Wrapper                              |
+ *                      |         +----------------------------+  |
+ *   --QUP & SE Clocks-->         | Serial Engine N            |  +-IO------>
+ *                      |         | ...                        |  | Interface
+ *   <---Clock Perf.----+    +----+-----------------------+    |  |
+ *     State Interface  |    | Serial Engine 1            |    |  |
+ *                      |    |                            |    |  |
+ *                      |    |                            |    |  |
+ *   <--------AHB------->    |                            |    |  |
+ *                      |    |                            +----+  |
+ *                      |    |                            |       |
+ *                      |    |                            |       |
+ *   <------SE IRQ------+    +----------------------------+       |
+ *                      |                                         |
+ *                      +-----------------------------------------+
+ *
+ *                         Figure 1: GENI based QUP Wrapper
+ *
+ * The GENI submodules include primary and secondary sequencers which are
+ * used to drive TX & RX operations. On serial interfaces that operate using
+ * master-slave model, primary sequencer drives both TX & RX operations. On
+ * serial interfaces that operate using peer-to-peer model, primary sequencer
+ * drives TX operation and secondary sequencer drives RX operation.
+ */
+
+/**
+ * DOC: Software description
+ *
+ * GENI SE Wrapper driver is structured into 2 parts:
+ *
+ * geni_wrapper represents QUP Wrapper controller. This part of the driver
+ * manages QUP Wrapper information such as hardware version, clock
+ * performance table that is common to all the internal serial engines.
+ *
+ * geni_se represents serial engine. This part of the driver manages serial
+ * engine information such as clocks, containing QUP Wrapper, etc. This part
+ * of driver also supports operations (eg. initialize the concerned serial
+ * engine, select between FIFO and DMA mode of operation etc.) that are
+ * common to all the serial engines and are independent of serial interfaces.
+ */
+
+#define MAX_CLK_PERF_LEVEL 32
+#define NUM_AHB_CLKS 2
+
+/**
+ * @struct geni_wrapper - Data structure to represent the QUP Wrapper Core
+ * @dev:		Device pointer of the QUP wrapper core
+ * @base:		Base address of this instance of QUP wrapper core
+ * @ahb_clks:		Handle to the primary & secondary AHB clocks
+ */
+struct geni_wrapper {
+	struct device *dev;
+	void __iomem *base;
+	struct clk_bulk_data ahb_clks[NUM_AHB_CLKS];
+};
+
+#define QUP_HW_VER_REG			0x4
+
+/* Common SE registers */
+#define GENI_INIT_CFG_REVISION		0x0
+#define GENI_S_INIT_CFG_REVISION	0x4
+#define GENI_OUTPUT_CTRL		0x24
+#define GENI_CGC_CTRL			0x28
+#define GENI_CLK_CTRL_RO		0x60
+#define GENI_IF_DISABLE_RO		0x64
+#define GENI_FW_S_REVISION_RO		0x6c
+#define SE_GENI_BYTE_GRAN		0x254
+#define SE_GENI_TX_PACKING_CFG0		0x260
+#define SE_GENI_TX_PACKING_CFG1		0x264
+#define SE_GENI_RX_PACKING_CFG0		0x284
+#define SE_GENI_RX_PACKING_CFG1		0x288
+#define SE_GENI_M_GP_LENGTH		0x910
+#define SE_GENI_S_GP_LENGTH		0x914
+#define SE_DMA_TX_PTR_L			0xc30
+#define SE_DMA_TX_PTR_H			0xc34
+#define SE_DMA_TX_ATTR			0xc38
+#define SE_DMA_TX_LEN			0xc3c
+#define SE_DMA_TX_IRQ_EN		0xc48
+#define SE_DMA_TX_IRQ_EN_SET		0xc4c
+#define SE_DMA_TX_IRQ_EN_CLR		0xc50
+#define SE_DMA_TX_LEN_IN		0xc54
+#define SE_DMA_TX_MAX_BURST		0xc5c
+#define SE_DMA_RX_PTR_L			0xd30
+#define SE_DMA_RX_PTR_H			0xd34
+#define SE_DMA_RX_ATTR			0xd38
+#define SE_DMA_RX_LEN			0xd3c
+#define SE_DMA_RX_IRQ_EN		0xd48
+#define SE_DMA_RX_IRQ_EN_SET		0xd4c
+#define SE_DMA_RX_IRQ_EN_CLR		0xd50
+#define SE_DMA_RX_LEN_IN		0xd54
+#define SE_DMA_RX_MAX_BURST		0xd5c
+#define SE_DMA_RX_FLUSH			0xd60
+#define SE_GSI_EVENT_EN			0xe18
+#define SE_IRQ_EN			0xe1c
+#define SE_DMA_GENERAL_CFG		0xe30
+
+/* GENI_OUTPUT_CTRL fields */
+#define DEFAULT_IO_OUTPUT_CTRL_MSK	GENMASK(6, 0)
+
+/* GENI_CGC_CTRL fields */
+#define CFG_AHB_CLK_CGC_ON		BIT(0)
+#define CFG_AHB_WR_ACLK_CGC_ON		BIT(1)
+#define DATA_AHB_CLK_CGC_ON		BIT(2)
+#define SCLK_CGC_ON			BIT(3)
+#define TX_CLK_CGC_ON			BIT(4)
+#define RX_CLK_CGC_ON			BIT(5)
+#define EXT_CLK_CGC_ON			BIT(6)
+#define PROG_RAM_HCLK_OFF		BIT(8)
+#define PROG_RAM_SCLK_OFF		BIT(9)
+#define DEFAULT_CGC_EN			GENMASK(6, 0)
+
+/* SE_GSI_EVENT_EN fields */
+#define DMA_RX_EVENT_EN			BIT(0)
+#define DMA_TX_EVENT_EN			BIT(1)
+#define GENI_M_EVENT_EN			BIT(2)
+#define GENI_S_EVENT_EN			BIT(3)
+
+/* SE_IRQ_EN fields */
+#define DMA_RX_IRQ_EN			BIT(0)
+#define DMA_TX_IRQ_EN			BIT(1)
+#define GENI_M_IRQ_EN			BIT(2)
+#define GENI_S_IRQ_EN			BIT(3)
+
+/* SE_DMA_GENERAL_CFG */
+#define DMA_RX_CLK_CGC_ON		BIT(0)
+#define DMA_TX_CLK_CGC_ON		BIT(1)
+#define DMA_AHB_SLV_CFG_ON		BIT(2)
+#define AHB_SEC_SLV_CLK_CGC_ON		BIT(3)
+#define DUMMY_RX_NON_BUFFERABLE		BIT(4)
+#define RX_DMA_ZERO_PADDING_EN		BIT(5)
+#define RX_DMA_IRQ_DELAY_MSK		GENMASK(8, 6)
+#define RX_DMA_IRQ_DELAY_SHFT		6
+
+/**
+ * geni_se_get_qup_hw_version() - Read the QUP wrapper Hardware version
+ * @se:	Pointer to the corresponding serial engine.
+ *
+ * Return: Hardware Version of the wrapper.
+ */
+u32 geni_se_get_qup_hw_version(struct geni_se *se)
+{
+	struct geni_wrapper *wrapper = se->wrapper;
+
+	return readl_relaxed(wrapper->base + QUP_HW_VER_REG);
+}
+EXPORT_SYMBOL(geni_se_get_qup_hw_version);
+
+static void geni_se_io_set_mode(void __iomem *base)
+{
+	u32 val;
+
+	val = readl_relaxed(base + SE_IRQ_EN);
+	val |= GENI_M_IRQ_EN | GENI_S_IRQ_EN;
+	val |= DMA_TX_IRQ_EN | DMA_RX_IRQ_EN;
+	writel_relaxed(val, base + SE_IRQ_EN);
+
+	val = readl_relaxed(base + SE_GENI_DMA_MODE_EN);
+	val &= ~GENI_DMA_MODE_EN;
+	writel_relaxed(val, base + SE_GENI_DMA_MODE_EN);
+
+	writel_relaxed(0, base + SE_GSI_EVENT_EN);
+}
+
+static void geni_se_io_init(void __iomem *base)
+{
+	u32 val;
+
+	val = readl_relaxed(base + GENI_CGC_CTRL);
+	val |= DEFAULT_CGC_EN;
+	writel_relaxed(val, base + GENI_CGC_CTRL);
+
+	val = readl_relaxed(base + SE_DMA_GENERAL_CFG);
+	val |= AHB_SEC_SLV_CLK_CGC_ON | DMA_AHB_SLV_CFG_ON;
+	val |= DMA_TX_CLK_CGC_ON | DMA_RX_CLK_CGC_ON;
+	writel_relaxed(val, base + SE_DMA_GENERAL_CFG);
+
+	writel_relaxed(DEFAULT_IO_OUTPUT_CTRL_MSK, base + GENI_OUTPUT_CTRL);
+	writel_relaxed(FORCE_DEFAULT, base + GENI_FORCE_DEFAULT_REG);
+}
+
+/**
+ * geni_se_init() - Initialize the GENI serial engine
+ * @se:		Pointer to the concerned serial engine.
+ * @rx_wm:	Receive watermark, in units of FIFO words.
+ * @rx_rfr_wm:	Ready-for-receive watermark, in units of FIFO words.
+ *
+ * This function is used to initialize the GENI serial engine, configure
+ * receive watermark and ready-for-receive watermarks.
+ */
+void geni_se_init(struct geni_se *se, u32 rx_wm, u32 rx_rfr)
+{
+	u32 val;
+
+	geni_se_io_init(se->base);
+	geni_se_io_set_mode(se->base);
+
+	writel_relaxed(rx_wm, se->base + SE_GENI_RX_WATERMARK_REG);
+	writel_relaxed(rx_rfr, se->base + SE_GENI_RX_RFR_WATERMARK_REG);
+
+	val = readl_relaxed(se->base + SE_GENI_M_IRQ_EN);
+	val |= M_COMMON_GENI_M_IRQ_EN;
+	writel_relaxed(val, se->base + SE_GENI_M_IRQ_EN);
+
+	val = readl_relaxed(se->base + SE_GENI_S_IRQ_EN);
+	val |= S_COMMON_GENI_S_IRQ_EN;
+	writel_relaxed(val, se->base + SE_GENI_S_IRQ_EN);
+}
+EXPORT_SYMBOL(geni_se_init);
+
+static void geni_se_select_fifo_mode(struct geni_se *se)
+{
+	u32 proto = geni_se_read_proto(se);
+	u32 val;
+
+	writel_relaxed(0, se->base + SE_GSI_EVENT_EN);
+	writel_relaxed(0xffffffff, se->base + SE_GENI_M_IRQ_CLEAR);
+	writel_relaxed(0xffffffff, se->base + SE_GENI_S_IRQ_CLEAR);
+	writel_relaxed(0xffffffff, se->base + SE_DMA_TX_IRQ_CLR);
+	writel_relaxed(0xffffffff, se->base + SE_DMA_RX_IRQ_CLR);
+	writel_relaxed(0xffffffff, se->base + SE_IRQ_EN);
+
+	val = readl_relaxed(se->base + SE_GENI_M_IRQ_EN);
+	if (proto != GENI_SE_UART) {
+		val |= M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN;
+		val |= M_RX_FIFO_WATERMARK_EN | M_RX_FIFO_LAST_EN;
+	}
+	writel_relaxed(val, se->base + SE_GENI_M_IRQ_EN);
+
+	val = readl_relaxed(se->base + SE_GENI_S_IRQ_EN);
+	if (proto != GENI_SE_UART)
+		val |= S_CMD_DONE_EN;
+	writel_relaxed(val, se->base + SE_GENI_S_IRQ_EN);
+
+	val = readl_relaxed(se->base + SE_GENI_DMA_MODE_EN);
+	val &= ~GENI_DMA_MODE_EN;
+	writel_relaxed(val, se->base + SE_GENI_DMA_MODE_EN);
+}
+
+static void geni_se_select_dma_mode(struct geni_se *se)
+{
+	u32 val;
+
+	writel_relaxed(0, se->base + SE_GSI_EVENT_EN);
+	writel_relaxed(0xffffffff, se->base + SE_GENI_M_IRQ_CLEAR);
+	writel_relaxed(0xffffffff, se->base + SE_GENI_S_IRQ_CLEAR);
+	writel_relaxed(0xffffffff, se->base + SE_DMA_TX_IRQ_CLR);
+	writel_relaxed(0xffffffff, se->base + SE_DMA_RX_IRQ_CLR);
+	writel_relaxed(0xffffffff, se->base + SE_IRQ_EN);
+
+	val = readl_relaxed(se->base + SE_GENI_DMA_MODE_EN);
+	val |= GENI_DMA_MODE_EN;
+	writel_relaxed(val, se->base + SE_GENI_DMA_MODE_EN);
+}
+
+/**
+ * geni_se_select_mode() - Select the serial engine transfer mode
+ * @se:		Pointer to the concerned serial engine.
+ * @mode:	Transfer mode to be selected.
+ */
+void geni_se_select_mode(struct geni_se *se, enum geni_se_xfer_mode mode)
+{
+	WARN_ON(mode != GENI_SE_FIFO && mode != GENI_SE_DMA);
+
+	switch (mode) {
+	case GENI_SE_FIFO:
+		geni_se_select_fifo_mode(se);
+		break;
+	case GENI_SE_DMA:
+		geni_se_select_dma_mode(se);
+		break;
+	case GENI_SE_INVALID:
+	default:
+		break;
+	}
+}
+EXPORT_SYMBOL(geni_se_select_mode);
+
+/**
+ * DOC: Overview
+ *
+ * GENI FIFO packing is highly configurable. TX/RX packing/unpacking consist
+ * of up to 4 operations, each operation represented by 4 configuration vectors
+ * of 10 bits programmed in GENI_TX_PACKING_CFG0 and GENI_TX_PACKING_CFG1 for
+ * TX FIFO and in GENI_RX_PACKING_CFG0 and GENI_RX_PACKING_CFG1 for RX FIFO.
+ * Refer to below examples for detailed bit-field description.
+ *
+ * Example 1: word_size = 7, packing_mode = 4 x 8, msb_to_lsb = 1
+ *
+ *        +-----------+-------+-------+-------+-------+
+ *        |           | vec_0 | vec_1 | vec_2 | vec_3 |
+ *        +-----------+-------+-------+-------+-------+
+ *        | start     | 0x6   | 0xe   | 0x16  | 0x1e  |
+ *        | direction | 1     | 1     | 1     | 1     |
+ *        | length    | 6     | 6     | 6     | 6     |
+ *        | stop      | 0     | 0     | 0     | 1     |
+ *        +-----------+-------+-------+-------+-------+
+ *
+ * Example 2: word_size = 15, packing_mode = 2 x 16, msb_to_lsb = 0
+ *
+ *        +-----------+-------+-------+-------+-------+
+ *        |           | vec_0 | vec_1 | vec_2 | vec_3 |
+ *        +-----------+-------+-------+-------+-------+
+ *        | start     | 0x0   | 0x8   | 0x10  | 0x18  |
+ *        | direction | 0     | 0     | 0     | 0     |
+ *        | length    | 7     | 6     | 7     | 6     |
+ *        | stop      | 0     | 0     | 0     | 1     |
+ *        +-----------+-------+-------+-------+-------+
+ *
+ * Example 3: word_size = 23, packing_mode = 1 x 32, msb_to_lsb = 1
+ *
+ *        +-----------+-------+-------+-------+-------+
+ *        |           | vec_0 | vec_1 | vec_2 | vec_3 |
+ *        +-----------+-------+-------+-------+-------+
+ *        | start     | 0x16  | 0xe   | 0x6   | 0x0   |
+ *        | direction | 1     | 1     | 1     | 1     |
+ *        | length    | 7     | 7     | 6     | 0     |
+ *        | stop      | 0     | 0     | 1     | 0     |
+ *        +-----------+-------+-------+-------+-------+
+ *
+ */
+
+#define NUM_PACKING_VECTORS 4
+#define PACKING_START_SHIFT 5
+#define PACKING_DIR_SHIFT 4
+#define PACKING_LEN_SHIFT 1
+#define PACKING_STOP_BIT BIT(0)
+#define PACKING_VECTOR_SHIFT 10
+/**
+ * geni_se_config_packing() - Packing configuration of the serial engine
+ * @se:		Pointer to the concerned serial engine
+ * @bpw:	Bits of data per transfer word.
+ * @pack_words:	Number of words per fifo element.
+ * @msb_to_lsb:	Transfer from MSB to LSB or vice-versa.
+ * @tx_cfg:	Flag to configure the TX Packing.
+ * @rx_cfg:	Flag to configure the RX Packing.
+ *
+ * This function is used to configure the packing rules for the current
+ * transfer.
+ */
+void geni_se_config_packing(struct geni_se *se, int bpw, int pack_words,
+			    bool msb_to_lsb, bool tx_cfg, bool rx_cfg)
+{
+	u32 cfg0, cfg1, cfg[NUM_PACKING_VECTORS] = {0};
+	int len;
+	int temp_bpw = bpw;
+	int idx_start = msb_to_lsb ? bpw - 1 : 0;
+	int idx = idx_start;
+	int idx_delta = msb_to_lsb ? -BITS_PER_BYTE : BITS_PER_BYTE;
+	int ceil_bpw = ALIGN(bpw, BITS_PER_BYTE);
+	int iter = (ceil_bpw * pack_words) / BITS_PER_BYTE;
+	int i;
+
+	if (iter <= 0 || iter > NUM_PACKING_VECTORS)
+		return;
+
+	for (i = 0; i < iter; i++) {
+		len = min_t(int, temp_bpw, BITS_PER_BYTE) - 1;
+		cfg[i] = idx << PACKING_START_SHIFT;
+		cfg[i] |= msb_to_lsb << PACKING_DIR_SHIFT;
+		cfg[i] |= len << PACKING_LEN_SHIFT;
+
+		if (temp_bpw <= BITS_PER_BYTE) {
+			idx = ((i + 1) * BITS_PER_BYTE) + idx_start;
+			temp_bpw = bpw;
+		} else {
+			idx = idx + idx_delta;
+			temp_bpw = temp_bpw - BITS_PER_BYTE;
+		}
+	}
+	cfg[iter - 1] |= PACKING_STOP_BIT;
+	cfg0 = cfg[0] | (cfg[1] << PACKING_VECTOR_SHIFT);
+	cfg1 = cfg[2] | (cfg[3] << PACKING_VECTOR_SHIFT);
+
+	if (tx_cfg) {
+		writel_relaxed(cfg0, se->base + SE_GENI_TX_PACKING_CFG0);
+		writel_relaxed(cfg1, se->base + SE_GENI_TX_PACKING_CFG1);
+	}
+	if (rx_cfg) {
+		writel_relaxed(cfg0, se->base + SE_GENI_RX_PACKING_CFG0);
+		writel_relaxed(cfg1, se->base + SE_GENI_RX_PACKING_CFG1);
+	}
+
+	/*
+	 * Number of protocol words in each FIFO entry
+	 * 0 - 4x8, four words in each entry, max word size of 8 bits
+	 * 1 - 2x16, two words in each entry, max word size of 16 bits
+	 * 2 - 1x32, one word in each entry, max word size of 32 bits
+	 * 3 - undefined
+	 */
+	if (pack_words || bpw == 32)
+		writel_relaxed(bpw / 16, se->base + SE_GENI_BYTE_GRAN);
+}
+EXPORT_SYMBOL(geni_se_config_packing);
+
+static void geni_se_clks_off(struct geni_se *se)
+{
+	struct geni_wrapper *wrapper = se->wrapper;
+
+	clk_disable_unprepare(se->clk);
+	clk_bulk_disable_unprepare(ARRAY_SIZE(wrapper->ahb_clks),
+						wrapper->ahb_clks);
+}
+
+/**
+ * geni_se_resources_off() - Turn off resources associated with the serial
+ *                           engine
+ * @se:	Pointer to the concerned serial engine.
+ *
+ * Return: 0 on success, standard Linux error codes on failure/error.
+ */
+int geni_se_resources_off(struct geni_se *se)
+{
+	int ret;
+
+	ret = pinctrl_pm_select_sleep_state(se->dev);
+	if (ret)
+		return ret;
+
+	geni_se_clks_off(se);
+	return 0;
+}
+EXPORT_SYMBOL(geni_se_resources_off);
+
+static int geni_se_clks_on(struct geni_se *se)
+{
+	int ret;
+	struct geni_wrapper *wrapper = se->wrapper;
+
+	ret = clk_bulk_prepare_enable(ARRAY_SIZE(wrapper->ahb_clks),
+						wrapper->ahb_clks);
+	if (ret)
+		return ret;
+
+	ret = clk_prepare_enable(se->clk);
+	if (ret)
+		clk_bulk_disable_unprepare(ARRAY_SIZE(wrapper->ahb_clks),
+							wrapper->ahb_clks);
+	return ret;
+}
+
+/**
+ * geni_se_resources_on() - Turn on resources associated with the serial
+ *                          engine
+ * @se:	Pointer to the concerned serial engine.
+ *
+ * Return: 0 on success, standard Linux error codes on failure/error.
+ */
+int geni_se_resources_on(struct geni_se *se)
+{
+	int ret;
+
+	ret = geni_se_clks_on(se);
+	if (ret)
+		return ret;
+
+	ret = pinctrl_pm_select_default_state(se->dev);
+	if (ret)
+		geni_se_clks_off(se);
+
+	return ret;
+}
+EXPORT_SYMBOL(geni_se_resources_on);
+
+/**
+ * geni_se_clk_tbl_get() - Get the clock table to program DFS
+ * @se:		Pointer to the concerned serial engine.
+ * @tbl:	Table in which the output is returned.
+ *
+ * This function is called by the protocol drivers to determine the different
+ * clock frequencies supported by serial engine core clock. The protocol
+ * drivers use the output to determine the clock frequency index to be
+ * programmed into DFS.
+ *
+ * Return: number of valid performance levels in the table on success,
+ *	   standard Linux error codes on failure.
+ */
+int geni_se_clk_tbl_get(struct geni_se *se, unsigned long **tbl)
+{
+	unsigned long freq = 0;
+	int i;
+
+	if (se->clk_perf_tbl) {
+		*tbl = se->clk_perf_tbl;
+		return se->num_clk_levels;
+	}
+
+	se->clk_perf_tbl = devm_kcalloc(se->dev, MAX_CLK_PERF_LEVEL,
+					sizeof(*se->clk_perf_tbl),
+					GFP_KERNEL);
+	if (!se->clk_perf_tbl)
+		return -ENOMEM;
+
+	for (i = 0; i < MAX_CLK_PERF_LEVEL; i++) {
+		freq = clk_round_rate(se->clk, freq + 1);
+		if (!freq || freq == se->clk_perf_tbl[i - 1])
+			break;
+		se->clk_perf_tbl[i] = freq;
+	}
+	se->num_clk_levels = i;
+	*tbl = se->clk_perf_tbl;
+	return se->num_clk_levels;
+}
+EXPORT_SYMBOL(geni_se_clk_tbl_get);
+
+/**
+ * geni_se_clk_freq_match() - Get the matching or closest SE clock frequency
+ * @se:		Pointer to the concerned serial engine.
+ * @req_freq:	Requested clock frequency.
+ * @index:	Index of the resultant frequency in the table.
+ * @res_freq:	Resultant frequency which matches or is closer to the
+ *		requested frequency.
+ * @exact:	Flag to indicate exact multiple requirement of the requested
+ *		frequency.
+ *
+ * This function is called by the protocol drivers to determine the matching
+ * or exact multiple of the requested frequency, as provided by the serial
+ * engine clock in order to meet the performance requirements. If there is
+ * no matching or exact multiple of the requested frequency found, then it
+ * selects the closest floor frequency, if exact flag is not set.
+ *
+ * Return: 0 on success, standard Linux error codes on failure.
+ */
+int geni_se_clk_freq_match(struct geni_se *se, unsigned long req_freq,
+			   unsigned int *index, unsigned long *res_freq,
+			   bool exact)
+{
+	unsigned long *tbl;
+	int num_clk_levels;
+	int i;
+
+	num_clk_levels = geni_se_clk_tbl_get(se, &tbl);
+	if (num_clk_levels < 0)
+		return num_clk_levels;
+
+	if (num_clk_levels == 0)
+		return -EINVAL;
+
+	*res_freq = 0;
+	for (i = 0; i < num_clk_levels; i++) {
+		if (!(tbl[i] % req_freq)) {
+			*index = i;
+			*res_freq = tbl[i];
+			return 0;
+		}
+
+		if (!(*res_freq) || ((tbl[i] > *res_freq) &&
+				     (tbl[i] < req_freq))) {
+			*index = i;
+			*res_freq = tbl[i];
+		}
+	}
+
+	if (exact)
+		return -EINVAL;
+
+	return 0;
+}
+EXPORT_SYMBOL(geni_se_clk_freq_match);
+
+#define GENI_SE_DMA_DONE_EN BIT(0)
+#define GENI_SE_DMA_EOT_EN BIT(1)
+#define GENI_SE_DMA_AHB_ERR_EN BIT(2)
+#define GENI_SE_DMA_EOT_BUF BIT(0)
+/**
+ * geni_se_tx_dma_prep() - Prepare the serial engine for TX DMA transfer
+ * @se:			Pointer to the concerned serial engine.
+ * @buf:		Pointer to the TX buffer.
+ * @len:		Length of the TX buffer.
+ * @iova:		Pointer to store the mapped DMA address.
+ *
+ * This function is used to prepare the buffers for DMA TX.
+ *
+ * Return: 0 on success, standard Linux error codes on failure.
+ */
+int geni_se_tx_dma_prep(struct geni_se *se, void *buf, size_t len,
+			dma_addr_t *iova)
+{
+	struct geni_wrapper *wrapper = se->wrapper;
+	u32 val;
+
+	*iova = dma_map_single(wrapper->dev, buf, len, DMA_TO_DEVICE);
+	if (dma_mapping_error(wrapper->dev, *iova))
+		return -EIO;
+
+	val = GENI_SE_DMA_DONE_EN;
+	val |= GENI_SE_DMA_EOT_EN;
+	val |= GENI_SE_DMA_AHB_ERR_EN;
+	writel_relaxed(val, se->base + SE_DMA_TX_IRQ_EN_SET);
+	writel_relaxed(lower_32_bits(*iova), se->base + SE_DMA_TX_PTR_L);
+	writel_relaxed(upper_32_bits(*iova), se->base + SE_DMA_TX_PTR_H);
+	writel_relaxed(GENI_SE_DMA_EOT_BUF, se->base + SE_DMA_TX_ATTR);
+	writel_relaxed(len, se->base + SE_DMA_TX_LEN);
+	return 0;
+}
+EXPORT_SYMBOL(geni_se_tx_dma_prep);
+
+/**
+ * geni_se_rx_dma_prep() - Prepare the serial engine for RX DMA transfer
+ * @se:			Pointer to the concerned serial engine.
+ * @buf:		Pointer to the RX buffer.
+ * @len:		Length of the RX buffer.
+ * @iova:		Pointer to store the mapped DMA address.
+ *
+ * This function is used to prepare the buffers for DMA RX.
+ *
+ * Return: 0 on success, standard Linux error codes on failure.
+ */
+int geni_se_rx_dma_prep(struct geni_se *se, void *buf, size_t len,
+			dma_addr_t *iova)
+{
+	struct geni_wrapper *wrapper = se->wrapper;
+	u32 val;
+
+	*iova = dma_map_single(wrapper->dev, buf, len, DMA_FROM_DEVICE);
+	if (dma_mapping_error(wrapper->dev, *iova))
+		return -EIO;
+
+	val = GENI_SE_DMA_DONE_EN;
+	val |= GENI_SE_DMA_EOT_EN;
+	val |= GENI_SE_DMA_AHB_ERR_EN;
+	writel_relaxed(val, se->base + SE_DMA_RX_IRQ_EN_SET);
+	writel_relaxed(lower_32_bits(*iova), se->base + SE_DMA_RX_PTR_L);
+	writel_relaxed(upper_32_bits(*iova), se->base + SE_DMA_RX_PTR_H);
+	/* RX does not have EOT buffer type bit. So just reset RX_ATTR */
+	writel_relaxed(0, se->base + SE_DMA_RX_ATTR);
+	writel_relaxed(len, se->base + SE_DMA_RX_LEN);
+	return 0;
+}
+EXPORT_SYMBOL(geni_se_rx_dma_prep);
+
+/**
+ * geni_se_tx_dma_unprep() - Unprepare the serial engine after TX DMA transfer
+ * @se:			Pointer to the concerned serial engine.
+ * @iova:		DMA address of the TX buffer.
+ * @len:		Length of the TX buffer.
+ *
+ * This function is used to unprepare the DMA buffers after DMA TX.
+ */
+void geni_se_tx_dma_unprep(struct geni_se *se, dma_addr_t iova, size_t len)
+{
+	struct geni_wrapper *wrapper = se->wrapper;
+
+	if (iova && !dma_mapping_error(wrapper->dev, iova))
+		dma_unmap_single(wrapper->dev, iova, len, DMA_TO_DEVICE);
+}
+EXPORT_SYMBOL(geni_se_tx_dma_unprep);
+
+/**
+ * geni_se_rx_dma_unprep() - Unprepare the serial engine after RX DMA transfer
+ * @se:			Pointer to the concerned serial engine.
+ * @iova:		DMA address of the RX buffer.
+ * @len:		Length of the RX buffer.
+ *
+ * This function is used to unprepare the DMA buffers after DMA RX.
+ */
+void geni_se_rx_dma_unprep(struct geni_se *se, dma_addr_t iova, size_t len)
+{
+	struct geni_wrapper *wrapper = se->wrapper;
+
+	if (iova && !dma_mapping_error(wrapper->dev, iova))
+		dma_unmap_single(wrapper->dev, iova, len, DMA_FROM_DEVICE);
+}
+EXPORT_SYMBOL(geni_se_rx_dma_unprep);
+
+static int geni_se_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct resource *res;
+	struct geni_wrapper *wrapper;
+	int ret;
+
+	wrapper = devm_kzalloc(dev, sizeof(*wrapper), GFP_KERNEL);
+	if (!wrapper)
+		return -ENOMEM;
+
+	wrapper->dev = dev;
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	wrapper->base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(wrapper->base))
+		return PTR_ERR(wrapper->base);
+
+	wrapper->ahb_clks[0].id = "m-ahb";
+	wrapper->ahb_clks[1].id = "s-ahb";
+	ret = devm_clk_bulk_get(dev, NUM_AHB_CLKS, wrapper->ahb_clks);
+	if (ret) {
+		dev_err(dev, "Err getting AHB clks %d\n", ret);
+		return ret;
+	}
+
+	dev_set_drvdata(dev, wrapper);
+	dev_dbg(dev, "GENI SE Driver probed\n");
+	return devm_of_platform_populate(dev);
+}
+
+static const struct of_device_id geni_se_dt_match[] = {
+	{ .compatible = "qcom,geni-se-qup", },
+	{}
+};
+MODULE_DEVICE_TABLE(of, geni_se_dt_match);
+
+static struct platform_driver geni_se_driver = {
+	.driver = {
+		.name = "geni_se_qup",
+		.of_match_table = geni_se_dt_match,
+	},
+	.probe = geni_se_probe,
+};
+module_platform_driver(geni_se_driver);
+
+MODULE_DESCRIPTION("GENI Serial Engine Driver");
+MODULE_LICENSE("GPL v2");
diff --git a/include/linux/qcom-geni-se.h b/include/linux/qcom-geni-se.h
new file mode 100644
index 0000000..5d61449
--- /dev/null
+++ b/include/linux/qcom-geni-se.h
@@ -0,0 +1,425 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (c) 2017-2018, The Linux Foundation. All rights reserved.
+ */
+
+#ifndef _LINUX_QCOM_GENI_SE
+#define _LINUX_QCOM_GENI_SE
+
+/* Transfer mode supported by GENI Serial Engines */
+enum geni_se_xfer_mode {
+	GENI_SE_INVALID,
+	GENI_SE_FIFO,
+	GENI_SE_DMA,
+};
+
+/* Protocols supported by GENI Serial Engines */
+enum geni_se_protocol_type {
+	GENI_SE_NONE,
+	GENI_SE_SPI,
+	GENI_SE_UART,
+	GENI_SE_I2C,
+	GENI_SE_I3C,
+};
+
+struct geni_wrapper;
+struct clk;
+
+/**
+ * struct geni_se - GENI Serial Engine
+ * @base:		Base Address of the Serial Engine's register block
+ * @dev:		Pointer to the Serial Engine device
+ * @wrapper:		Pointer to the parent QUP Wrapper core
+ * @clk:		Handle to the core serial engine clock
+ * @num_clk_levels:	Number of valid clock levels in clk_perf_tbl
+ * @clk_perf_tbl:	Table of clock frequency input to serial engine clock
+ */
+struct geni_se {
+	void __iomem *base;
+	struct device *dev;
+	struct geni_wrapper *wrapper;
+	struct clk *clk;
+	unsigned int num_clk_levels;
+	unsigned long *clk_perf_tbl;
+};
+
+/* Common SE registers */
+#define GENI_FORCE_DEFAULT_REG		0x20
+#define SE_GENI_STATUS			0x40
+#define GENI_SER_M_CLK_CFG		0x48
+#define GENI_SER_S_CLK_CFG		0x4c
+#define GENI_FW_REVISION_RO		0x68
+#define SE_GENI_CLK_SEL			0x7c
+#define SE_GENI_DMA_MODE_EN		0x258
+#define SE_GENI_M_CMD0			0x600
+#define SE_GENI_M_CMD_CTRL_REG		0x604
+#define SE_GENI_M_IRQ_STATUS		0x610
+#define SE_GENI_M_IRQ_EN		0x614
+#define SE_GENI_M_IRQ_CLEAR		0x618
+#define SE_GENI_S_CMD0			0x630
+#define SE_GENI_S_CMD_CTRL_REG		0x634
+#define SE_GENI_S_IRQ_STATUS		0x640
+#define SE_GENI_S_IRQ_EN		0x644
+#define SE_GENI_S_IRQ_CLEAR		0x648
+#define SE_GENI_TX_FIFOn		0x700
+#define SE_GENI_RX_FIFOn		0x780
+#define SE_GENI_TX_FIFO_STATUS		0x800
+#define SE_GENI_RX_FIFO_STATUS		0x804
+#define SE_GENI_TX_WATERMARK_REG	0x80c
+#define SE_GENI_RX_WATERMARK_REG	0x810
+#define SE_GENI_RX_RFR_WATERMARK_REG	0x814
+#define SE_GENI_IOS			0x908
+#define SE_DMA_TX_IRQ_STAT		0xc40
+#define SE_DMA_TX_IRQ_CLR		0xc44
+#define SE_DMA_TX_FSM_RST		0xc58
+#define SE_DMA_RX_IRQ_STAT		0xd40
+#define SE_DMA_RX_IRQ_CLR		0xd44
+#define SE_DMA_RX_FSM_RST		0xd58
+#define SE_HW_PARAM_0			0xe24
+#define SE_HW_PARAM_1			0xe28
+
+/* GENI_FORCE_DEFAULT_REG fields */
+#define FORCE_DEFAULT	BIT(0)
+
+/* GENI_STATUS fields */
+#define M_GENI_CMD_ACTIVE		BIT(0)
+#define S_GENI_CMD_ACTIVE		BIT(12)
+
+/* GENI_SER_M_CLK_CFG/GENI_SER_S_CLK_CFG */
+#define SER_CLK_EN			BIT(0)
+#define CLK_DIV_MSK			GENMASK(15, 4)
+#define CLK_DIV_SHFT			4
+
+/* GENI_FW_REVISION_RO fields */
+#define FW_REV_PROTOCOL_MSK		GENMASK(15, 8)
+#define FW_REV_PROTOCOL_SHFT		8
+
+/* GENI_CLK_SEL fields */
+#define CLK_SEL_MSK			GENMASK(2, 0)
+
+/* SE_GENI_DMA_MODE_EN */
+#define GENI_DMA_MODE_EN		BIT(0)
+
+/* GENI_M_CMD0 fields */
+#define M_OPCODE_MSK			GENMASK(31, 27)
+#define M_OPCODE_SHFT			27
+#define M_PARAMS_MSK			GENMASK(26, 0)
+
+/* GENI_M_CMD_CTRL_REG */
+#define M_GENI_CMD_CANCEL		BIT(2)
+#define M_GENI_CMD_ABORT		BIT(1)
+#define M_GENI_DISABLE			BIT(0)
+
+/* GENI_S_CMD0 fields */
+#define S_OPCODE_MSK			GENMASK(31, 27)
+#define S_OPCODE_SHFT			27
+#define S_PARAMS_MSK			GENMASK(26, 0)
+
+/* GENI_S_CMD_CTRL_REG */
+#define S_GENI_CMD_CANCEL		BIT(2)
+#define S_GENI_CMD_ABORT		BIT(1)
+#define S_GENI_DISABLE			BIT(0)
+
+/* GENI_M_IRQ_EN fields */
+#define M_CMD_DONE_EN			BIT(0)
+#define M_CMD_OVERRUN_EN		BIT(1)
+#define M_ILLEGAL_CMD_EN		BIT(2)
+#define M_CMD_FAILURE_EN		BIT(3)
+#define M_CMD_CANCEL_EN			BIT(4)
+#define M_CMD_ABORT_EN			BIT(5)
+#define M_TIMESTAMP_EN			BIT(6)
+#define M_RX_IRQ_EN			BIT(7)
+#define M_GP_SYNC_IRQ_0_EN		BIT(8)
+#define M_GP_IRQ_0_EN			BIT(9)
+#define M_GP_IRQ_1_EN			BIT(10)
+#define M_GP_IRQ_2_EN			BIT(11)
+#define M_GP_IRQ_3_EN			BIT(12)
+#define M_GP_IRQ_4_EN			BIT(13)
+#define M_GP_IRQ_5_EN			BIT(14)
+#define M_IO_DATA_DEASSERT_EN		BIT(22)
+#define M_IO_DATA_ASSERT_EN		BIT(23)
+#define M_RX_FIFO_RD_ERR_EN		BIT(24)
+#define M_RX_FIFO_WR_ERR_EN		BIT(25)
+#define M_RX_FIFO_WATERMARK_EN		BIT(26)
+#define M_RX_FIFO_LAST_EN		BIT(27)
+#define M_TX_FIFO_RD_ERR_EN		BIT(28)
+#define M_TX_FIFO_WR_ERR_EN		BIT(29)
+#define M_TX_FIFO_WATERMARK_EN		BIT(30)
+#define M_SEC_IRQ_EN			BIT(31)
+#define M_COMMON_GENI_M_IRQ_EN	(GENMASK(6, 1) | \
+				M_IO_DATA_DEASSERT_EN | \
+				M_IO_DATA_ASSERT_EN | M_RX_FIFO_RD_ERR_EN | \
+				M_RX_FIFO_WR_ERR_EN | M_TX_FIFO_RD_ERR_EN | \
+				M_TX_FIFO_WR_ERR_EN)
+
+/* GENI_S_IRQ_EN fields */
+#define S_CMD_DONE_EN			BIT(0)
+#define S_CMD_OVERRUN_EN		BIT(1)
+#define S_ILLEGAL_CMD_EN		BIT(2)
+#define S_CMD_FAILURE_EN		BIT(3)
+#define S_CMD_CANCEL_EN			BIT(4)
+#define S_CMD_ABORT_EN			BIT(5)
+#define S_GP_SYNC_IRQ_0_EN		BIT(8)
+#define S_GP_IRQ_0_EN			BIT(9)
+#define S_GP_IRQ_1_EN			BIT(10)
+#define S_GP_IRQ_2_EN			BIT(11)
+#define S_GP_IRQ_3_EN			BIT(12)
+#define S_GP_IRQ_4_EN			BIT(13)
+#define S_GP_IRQ_5_EN			BIT(14)
+#define S_IO_DATA_DEASSERT_EN		BIT(22)
+#define S_IO_DATA_ASSERT_EN		BIT(23)
+#define S_RX_FIFO_RD_ERR_EN		BIT(24)
+#define S_RX_FIFO_WR_ERR_EN		BIT(25)
+#define S_RX_FIFO_WATERMARK_EN		BIT(26)
+#define S_RX_FIFO_LAST_EN		BIT(27)
+#define S_COMMON_GENI_S_IRQ_EN	(GENMASK(5, 1) | GENMASK(13, 9) | \
+				 S_RX_FIFO_RD_ERR_EN | S_RX_FIFO_WR_ERR_EN)
+
+/*  GENI_/TX/RX/RX_RFR/_WATERMARK_REG fields */
+#define WATERMARK_MSK			GENMASK(5, 0)
+
+/* GENI_TX_FIFO_STATUS fields */
+#define TX_FIFO_WC			GENMASK(27, 0)
+
+/*  GENI_RX_FIFO_STATUS fields */
+#define RX_LAST				BIT(31)
+#define RX_LAST_BYTE_VALID_MSK		GENMASK(30, 28)
+#define RX_LAST_BYTE_VALID_SHFT		28
+#define RX_FIFO_WC_MSK			GENMASK(24, 0)
+
+/* SE_GENI_IOS fields */
+#define IO2_DATA_IN			BIT(1)
+#define RX_DATA_IN			BIT(0)
+
+/* SE_DMA_TX_IRQ_STAT Register fields */
+#define TX_DMA_DONE			BIT(0)
+#define TX_EOT				BIT(1)
+#define TX_SBE				BIT(2)
+#define TX_RESET_DONE			BIT(3)
+
+/* SE_DMA_RX_IRQ_STAT Register fields */
+#define RX_DMA_DONE			BIT(0)
+#define RX_EOT				BIT(1)
+#define RX_SBE				BIT(2)
+#define RX_RESET_DONE			BIT(3)
+#define RX_FLUSH_DONE			BIT(4)
+#define RX_GENI_GP_IRQ			GENMASK(10, 5)
+#define RX_GENI_CANCEL_IRQ		BIT(11)
+#define RX_GENI_GP_IRQ_EXT		GENMASK(13, 12)
+
+/* SE_HW_PARAM_0 fields */
+#define TX_FIFO_WIDTH_MSK		GENMASK(29, 24)
+#define TX_FIFO_WIDTH_SHFT		24
+#define TX_FIFO_DEPTH_MSK		GENMASK(21, 16)
+#define TX_FIFO_DEPTH_SHFT		16
+
+/* SE_HW_PARAM_1 fields */
+#define RX_FIFO_WIDTH_MSK		GENMASK(29, 24)
+#define RX_FIFO_WIDTH_SHFT		24
+#define RX_FIFO_DEPTH_MSK		GENMASK(21, 16)
+#define RX_FIFO_DEPTH_SHFT		16
+
+#define HW_VER_MAJOR_MASK		GENMASK(31, 28)
+#define HW_VER_MAJOR_SHFT		28
+#define HW_VER_MINOR_MASK		GENMASK(27, 16)
+#define HW_VER_MINOR_SHFT		16
+#define HW_VER_STEP_MASK		GENMASK(15, 0)
+
+#if IS_ENABLED(CONFIG_QCOM_GENI_SE)
+
+u32 geni_se_get_qup_hw_version(struct geni_se *se);
+
+#define geni_se_get_wrapper_version(se, major, minor, step) do { \
+	u32 ver; \
+\
+	ver = geni_se_get_qup_hw_version(se); \
+	major = (ver & HW_VER_MAJOR_MASK) >> HW_VER_MAJOR_SHFT; \
+	minor = (ver & HW_VER_MINOR_MASK) >> HW_VER_MINOR_SHFT; \
+	step = version & HW_VER_STEP_MASK; \
+} while (0)
+
+/**
+ * geni_se_read_proto() - Read the protocol configured for a serial engine
+ * @se:		Pointer to the concerned serial engine.
+ *
+ * Return: Protocol value as configured in the serial engine.
+ */
+static inline u32 geni_se_read_proto(struct geni_se *se)
+{
+	u32 val;
+
+	val = readl_relaxed(se->base + GENI_FW_REVISION_RO);
+
+	return (val & FW_REV_PROTOCOL_MSK) >> FW_REV_PROTOCOL_SHFT;
+}
+
+/**
+ * geni_se_setup_m_cmd() - Setup the primary sequencer
+ * @se:		Pointer to the concerned serial engine.
+ * @cmd:	Command/Operation to setup in the primary sequencer.
+ * @params:	Parameter for the sequencer command.
+ *
+ * This function is used to configure the primary sequencer with the
+ * command and its associated parameters.
+ */
+static inline void geni_se_setup_m_cmd(struct geni_se *se, u32 cmd, u32 params)
+{
+	u32 m_cmd;
+
+	m_cmd = (cmd << M_OPCODE_SHFT) | (params & M_PARAMS_MSK);
+	writel_relaxed(m_cmd, se->base + SE_GENI_M_CMD0);
+}
+
+/**
+ * geni_se_setup_s_cmd() - Setup the secondary sequencer
+ * @se:		Pointer to the concerned serial engine.
+ * @cmd:	Command/Operation to setup in the secondary sequencer.
+ * @params:	Parameter for the sequencer command.
+ *
+ * This function is used to configure the secondary sequencer with the
+ * command and its associated parameters.
+ */
+static inline void geni_se_setup_s_cmd(struct geni_se *se, u32 cmd, u32 params)
+{
+	u32 s_cmd;
+
+	s_cmd = readl_relaxed(se->base + SE_GENI_S_CMD0);
+	s_cmd &= ~(S_OPCODE_MSK | S_PARAMS_MSK);
+	s_cmd |= (cmd << S_OPCODE_SHFT);
+	s_cmd |= (params & S_PARAMS_MSK);
+	writel_relaxed(s_cmd, se->base + SE_GENI_S_CMD0);
+}
+
+/**
+ * geni_se_cancel_m_cmd() - Cancel the command configured in the primary
+ *                          sequencer
+ * @se:	Pointer to the concerned serial engine.
+ *
+ * This function is used to cancel the currently configured command in the
+ * primary sequencer.
+ */
+static inline void geni_se_cancel_m_cmd(struct geni_se *se)
+{
+	writel_relaxed(M_GENI_CMD_CANCEL, se->base + SE_GENI_M_CMD_CTRL_REG);
+}
+
+/**
+ * geni_se_cancel_s_cmd() - Cancel the command configured in the secondary
+ *                          sequencer
+ * @se:	Pointer to the concerned serial engine.
+ *
+ * This function is used to cancel the currently configured command in the
+ * secondary sequencer.
+ */
+static inline void geni_se_cancel_s_cmd(struct geni_se *se)
+{
+	writel_relaxed(S_GENI_CMD_CANCEL, se->base + SE_GENI_S_CMD_CTRL_REG);
+}
+
+/**
+ * geni_se_abort_m_cmd() - Abort the command configured in the primary sequencer
+ * @se:	Pointer to the concerned serial engine.
+ *
+ * This function is used to force abort the currently configured command in the
+ * primary sequencer.
+ */
+static inline void geni_se_abort_m_cmd(struct geni_se *se)
+{
+	writel_relaxed(M_GENI_CMD_ABORT, se->base + SE_GENI_M_CMD_CTRL_REG);
+}
+
+/**
+ * geni_se_abort_s_cmd() - Abort the command configured in the secondary
+ *                         sequencer
+ * @se:	Pointer to the concerned serial engine.
+ *
+ * This function is used to force abort the currently configured command in the
+ * secondary sequencer.
+ */
+static inline void geni_se_abort_s_cmd(struct geni_se *se)
+{
+	writel_relaxed(S_GENI_CMD_ABORT, se->base + SE_GENI_S_CMD_CTRL_REG);
+}
+
+/**
+ * geni_se_get_tx_fifo_depth() - Get the TX fifo depth of the serial engine
+ * @se:	Pointer to the concerned serial engine.
+ *
+ * This function is used to get the depth i.e. number of elements in the
+ * TX fifo of the serial engine.
+ *
+ * Return: TX fifo depth in units of FIFO words.
+ */
+static inline u32 geni_se_get_tx_fifo_depth(struct geni_se *se)
+{
+	u32 val;
+
+	val = readl_relaxed(se->base + SE_HW_PARAM_0);
+
+	return (val & TX_FIFO_DEPTH_MSK) >> TX_FIFO_DEPTH_SHFT;
+}
+
+/**
+ * geni_se_get_tx_fifo_width() - Get the TX fifo width of the serial engine
+ * @se:	Pointer to the concerned serial engine.
+ *
+ * This function is used to get the width i.e. word size per element in the
+ * TX fifo of the serial engine.
+ *
+ * Return: TX fifo width in bits
+ */
+static inline u32 geni_se_get_tx_fifo_width(struct geni_se *se)
+{
+	u32 val;
+
+	val = readl_relaxed(se->base + SE_HW_PARAM_0);
+
+	return (val & TX_FIFO_WIDTH_MSK) >> TX_FIFO_WIDTH_SHFT;
+}
+
+/**
+ * geni_se_get_rx_fifo_depth() - Get the RX fifo depth of the serial engine
+ * @se:	Pointer to the concerned serial engine.
+ *
+ * This function is used to get the depth i.e. number of elements in the
+ * RX fifo of the serial engine.
+ *
+ * Return: RX fifo depth in units of FIFO words
+ */
+static inline u32 geni_se_get_rx_fifo_depth(struct geni_se *se)
+{
+	u32 val;
+
+	val = readl_relaxed(se->base + SE_HW_PARAM_1);
+
+	return (val & RX_FIFO_DEPTH_MSK) >> RX_FIFO_DEPTH_SHFT;
+}
+
+void geni_se_init(struct geni_se *se, u32 rx_wm, u32 rx_rfr);
+
+void geni_se_select_mode(struct geni_se *se, enum geni_se_xfer_mode mode);
+
+void geni_se_config_packing(struct geni_se *se, int bpw, int pack_words,
+			    bool msb_to_lsb, bool tx_cfg, bool rx_cfg);
+
+int geni_se_resources_off(struct geni_se *se);
+
+int geni_se_resources_on(struct geni_se *se);
+
+int geni_se_clk_tbl_get(struct geni_se *se, unsigned long **tbl);
+
+int geni_se_clk_freq_match(struct geni_se *se, unsigned long req_freq,
+			   unsigned int *index, unsigned long *res_freq,
+			   bool exact);
+
+int geni_se_tx_dma_prep(struct geni_se *se, void *buf, size_t len,
+			dma_addr_t *iova);
+
+int geni_se_rx_dma_prep(struct geni_se *se, void *buf, size_t len,
+			dma_addr_t *iova);
+
+void geni_se_tx_dma_unprep(struct geni_se *se, dma_addr_t iova, size_t len);
+
+void geni_se_rx_dma_unprep(struct geni_se *se, dma_addr_t iova, size_t len);
+#endif
+#endif
-- 
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH v2] crypto: doc - clarify hash callbacks state machine
From: Herbert Xu @ 2018-03-30 17:41 UTC (permalink / raw)
  To: Horia Geantă
  Cc: David S. Miller, Jonathan Corbet, linux-crypto, linux-kernel,
	linux-doc
In-Reply-To: <20180320075612.22719-1-horia.geanta@nxp.com>

On Tue, Mar 20, 2018 at 09:56:12AM +0200, Horia Geantă wrote:
> Add a note that it is perfectly legal to "abandon" a request object:
> - call .init() and then (as many times) .update()
> - _not_ call any of .final(), .finup() or .export() at any point in
>   future
> 
> Link: https://lkml.kernel.org/r/20180222114741.GA27631@gondor.apana.org.au
> Signed-off-by: Horia Geantă <horia.geanta@nxp.com>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] syscalls: define and explain goal to not call syscalls in the kernel
From: Dominik Brodowski @ 2018-03-30 18:31 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: linux-kernel, linux-doc, viro, x86, torvalds, mingo, tglx, luto
In-Reply-To: <20180330093518.3d8a92f3@lwn.net>

Jon,

On Fri, Mar 30, 2018 at 09:35:18AM -0600, Jonathan Corbet wrote:
> On Sun, 25 Mar 2018 18:25:27 +0200
> Dominik Brodowski <linux@dominikbrodowski.net> wrote:
> 
> > As there have been multiple inquiries on the rationale of my patchsets
> > removing in-kernel calls to sys_xyzzy(), here is an updated patch 01/NN
> > which I will push upstream for v4.17-rc1. I will also include a reference
> > to this mail (and therefore to the explanation below) in all related
> > patches of the series. Any improvements, hints, suggestions, spelling
> > fixes, and/or objections?
> 
> I have no objections to the text, but I do wonder about the placement.
> The "adding syscalls" document isn't about *invoking* them; I suspect that
> few people will see it there.  The coding-style document isn't quite right
> either, but I wonder if it might not be a better place in the short term?

Well, most of the existing instances where syscalls were called in the
kernel were common codepaths for old and new syscalls or native and compat
syscalls, and syscall multiplexers like sys_ipc() which got replaced or
superseded by many new syscalls. That's what lead me to 
Documentation/process/adding-syscalls.rst . I'm happy to move this text to
Documentation/process/coding-style.rst (as new section 21?), or even to
Documentation/process/do-not-call-syscalls.rst . Just let me know what you
prefer me to push upstream.

Thanks,
	Dominik
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 7/8] clocksource: Add a new timer-ingenic driver
From: Daniel Lezcano @ 2018-03-31  8:10 UTC (permalink / raw)
  To: Paul Cercueil
  Cc: Thomas Gleixner, Jason Cooper, Marc Zyngier, Lee Jones,
	Ralf Baechle, Rob Herring, Jonathan Corbet, Mark Rutland,
	James Hogan, Maarten ter Huurne, linux-clk, devicetree,
	linux-kernel, linux-mips, linux-doc
In-Reply-To: <1522335149.1792.0@smtp.crapouillou.net>

On 29/03/2018 16:52, Paul Cercueil wrote:
> 
> 
> Le mer. 28 mars 2018 à 18:25, Daniel Lezcano <daniel.lezcano@linaro.org>
> a écrit :
>> On 28/03/2018 17:15, Paul Cercueil wrote:
>>>  Le 2018-03-24 07:26, Daniel Lezcano a écrit :
>>>>  On 18/03/2018 00:29, Paul Cercueil wrote:
>>>>>  This driver will use the TCU (Timer Counter Unit) present on the
>>>>> Ingenic
>>>>>  JZ47xx SoCs to provide the kernel with a clocksource and timers.
>>>>
>>>>  Please provide a more detailed description about the timer.
>>>
>>>  There's a doc file for that :)
>>
>> Usually, when there is a new driver I ask for a description in the
>> changelog for reference.
>>
>>>>  Where is the clocksource ?
>>>
>>>  Right, there is no clocksource, just timers.
>>>
>>>>  I don't see the point of using channel idx and pwm checking here.
>>>>
>>>>  There is one clockevent, why create multiple channels ? Can't you
>>>> stick
>>>>  to the usual init routine for a timer.
>>>
>>>  So the idea is that we use all the TCU channels that won't be used
>>> for PWM
>>>  as timers. Hence the PWM checking. Why is this bad?
>>
>> It is not bad but arguable. By checking the channels used by the pwm in
>> the code, you introduce an adherence between two subsystems even if it
>> is just related to the DT parsing part.
>>
>> As it is not needed to have more than one timer in the time framework
>> (at least with the same characteristics), the pwm channels check is
>> pointless. We can assume the author of the DT file is smart enough to
>> prevent conflicts and define a pwm and a timer properly instead of
>> adding more code complexity.
>>
>> In addition, simplifying the code will allow you to use the timer-of
>> code and reduce very significantly the init function.
> 
> That's what I had in my V1 and V2, my DT node for the timer-ingenic driver
> had a "timers" property (e.g. "timers = <4 5>;") to select the channels
> that
> should be used as timers. Then Rob told me I shouldn't do that, and instead
> detect the channels that will be used for PWM.
> 

[ ... ]

How do you specify the channels used for PWM ?

>>>>>
>>>>>  +config INGENIC_TIMER
>>>>>  +    bool "Clocksource/timer using the TCU in Ingenic JZ SoCs"
>>>>>  +    depends on MACH_INGENIC || COMPILE_TEST
>>>>
>>>>  bool "Clocksource/timer using the TCU in Ingenic JZ SoCs" if
>>>> COMPILE_TEST
>>>>
>>>>  Remove the depends MACH_INGENIC.
>>>
>>>  This driver is not useful on anything else than Ingenic SoCs, why
>>> should I
>>>  remove MACH_INGENIC then?
>>
>> For COMPILE_TEST on x86.
> 
> Well that's a logical OR right here, so it will work...

Right, I missed the second part of the condition. For consistency
reason, we don't add a dependency on the platform. The platform will
select it. Look the other timer options and you will see there is no
MACH deps. I'm trying consolidating all these options to have same
format and hopefully factor them out.





-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v4 7/8] clocksource: Add a new timer-ingenic driver, 
From: Paul Cercueil @ 2018-03-31 17:46 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Thomas Gleixner, Jason Cooper, Marc Zyngier, Lee Jones,
	Ralf Baechle, Rob Herring, Jonathan Corbet, Mark Rutland,
	James Hogan, Maarten ter Huurne, linux-clk, devicetree,
	linux-kernel, linux-mips, linux-doc
In-Reply-To: <2234006b-30ff-d5a8-b14a-d6e307c06145@linaro.org>

Le 2018-03-31 10:10, Daniel Lezcano a écrit :
> On 29/03/2018 16:52, Paul Cercueil wrote:
>> 
>> 
>> Le mer. 28 mars 2018 à 18:25, Daniel Lezcano 
>> <daniel.lezcano@linaro.org>
>> a écrit :
>>> On 28/03/2018 17:15, Paul Cercueil wrote:
>>>>  Le 2018-03-24 07:26, Daniel Lezcano a écrit :
>>>>>  On 18/03/2018 00:29, Paul Cercueil wrote:
>>>>>>  This driver will use the TCU (Timer Counter Unit) present on the
>>>>>> Ingenic
>>>>>>  JZ47xx SoCs to provide the kernel with a clocksource and timers.
>>>>> 
>>>>>  Please provide a more detailed description about the timer.
>>>> 
>>>>  There's a doc file for that :)
>>> 
>>> Usually, when there is a new driver I ask for a description in the
>>> changelog for reference.
>>> 
>>>>>  Where is the clocksource ?
>>>> 
>>>>  Right, there is no clocksource, just timers.
>>>> 
>>>>>  I don't see the point of using channel idx and pwm checking here.
>>>>> 
>>>>>  There is one clockevent, why create multiple channels ? Can't you
>>>>> stick
>>>>>  to the usual init routine for a timer.
>>>> 
>>>>  So the idea is that we use all the TCU channels that won't be used
>>>> for PWM
>>>>  as timers. Hence the PWM checking. Why is this bad?
>>> 
>>> It is not bad but arguable. By checking the channels used by the pwm 
>>> in
>>> the code, you introduce an adherence between two subsystems even if 
>>> it
>>> is just related to the DT parsing part.
>>> 
>>> As it is not needed to have more than one timer in the time framework
>>> (at least with the same characteristics), the pwm channels check is
>>> pointless. We can assume the author of the DT file is smart enough to
>>> prevent conflicts and define a pwm and a timer properly instead of
>>> adding more code complexity.
>>> 
>>> In addition, simplifying the code will allow you to use the timer-of
>>> code and reduce very significantly the init function.
>> 
>> That's what I had in my V1 and V2, my DT node for the timer-ingenic 
>> driver
>> had a "timers" property (e.g. "timers = <4 5>;") to select the 
>> channels
>> that
>> should be used as timers. Then Rob told me I shouldn't do that, and 
>> instead
>> detect the channels that will be used for PWM.
>> 
> 
> [ ... ]
> 
> How do you specify the channels used for PWM ?

To detect the channels that will be used as PWM I parse the whole 
devicetree
searching for "pwms" properties; check that the PWM handle is for our 
TCU PWM
driver; then read the PWM number from there.

Of course it's hackish, and it only works for devicetree. I preferred 
the
method with the "timers" property.

>>>>>> 
>>>>>>  +config INGENIC_TIMER
>>>>>>  +    bool "Clocksource/timer using the TCU in Ingenic JZ SoCs"
>>>>>>  +    depends on MACH_INGENIC || COMPILE_TEST
>>>>> 
>>>>>  bool "Clocksource/timer using the TCU in Ingenic JZ SoCs" if
>>>>> COMPILE_TEST
>>>>> 
>>>>>  Remove the depends MACH_INGENIC.
>>>> 
>>>>  This driver is not useful on anything else than Ingenic SoCs, why
>>>> should I
>>>>  remove MACH_INGENIC then?
>>> 
>>> For COMPILE_TEST on x86.
>> 
>> Well that's a logical OR right here, so it will work...
> 
> Right, I missed the second part of the condition. For consistency
> reason, we don't add a dependency on the platform. The platform will
> select it. Look the other timer options and you will see there is no
> MACH deps. I'm trying consolidating all these options to have same
> format and hopefully factor them out.

I'm all for factorisation, but what I dislike with not depending on
MACH_INGENIC, is that the driver now appears in the menuconfig for
every arch, even if it only applies to one MIPS SoC.

Regards,
-Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/2] perf: riscv: preliminary RISC-V support
From: Alex Solomatnikov @ 2018-03-31 22:47 UTC (permalink / raw)
  To: Alan Kao
  Cc: Palmer Dabbelt, Albert Ou, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Jonathan Corbet, linux-riscv, linux-doc,
	linux-kernel, Nick Hu, Greentime Hu
In-Reply-To: <20180329023024.GA32659@andestech.com>

You can add a skew between cores in qemu, something like this:

case CSR_INSTRET:
        core_id()*return cpu_get_host_ticks()/10;
    break;
case CSR_CYCLE:
        return cpu_get_host_ticks();
    break;

Alex

On Wed, Mar 28, 2018 at 7:30 PM, Alan Kao <alankao@andestech.com> wrote:
> Hi Alex,
>
> I'm appreciated for your reply and tests.
>
> On Wed, Mar 28, 2018 at 03:58:41PM -0700, Alex Solomatnikov wrote:
>> Did you test this code?
>
> I did test this patch on QEMU's virt model with multi-hart, which is the only
> RISC-V machine I have for now.  But as I mentioned in
> https://github.com/riscv/riscv-qemu/pull/115 , the hardware counter support
> in QEMU is not fully conformed to the 1.10 Priv-Spec, so I had to slightly
> tweak the code to make reading work.
>
> Specifically, the read to cycle and instret in QEMU looks like this:
> ...
> case CSR_INSTRET:
> case CSR_CYCLE:
> //  if (ctr_ok) {
>         return cpu_get_host_ticks();
> //  }
>     break;
> ...
> and the two lines of comment was the tweak.
>
> On such environment, I did not get anything unexpected.  No matter which of them
> is requested, QEMU returns the host's tick.
>
>>
>> I got funny numbers when I tried to run it on HiFive Unleashed:
>>
>> perf stat mem-latency
>> ...
>>
>>  Performance counter stats for 'mem-latency':
>>
>>         157.907000      task-clock (msec)         #    0.940 CPUs utilized
>>
>>                  1      context-switches          #    0.006 K/sec
>>
>>                  1      cpu-migrations            #    0.006 K/sec
>>
>>               4102      page-faults               #    0.026 M/sec
>>
>>          157923752      cycles                    #    1.000 GHz
>>
>> 9223372034948899840      instructions              # 58403957087.78  insn
>> per cycle
>>    <not supported>      branches
>>
>>    <not supported>      branch-misses
>>
>>
>>        0.168046000 seconds time elapsed
>>
>>
>> Tracing read_counter(), I see this:
>>
>> Jan  1 00:41:50 buildroot user.info kernel: [ 2510.058809] CPU 3:
>> read_counter  idx=0 val=2528358954912
>> Jan  1 00:41:50 buildroot user.info kernel: [ 2510.063339] CPU 3:
>> read_counter  idx=1 val=53892244920
>> Jan  1 00:41:50 buildroot user.info kernel: [ 2510.118160] CPU 3:
>> read_counter  idx=0 val=2528418303035
>> Jan  1 00:41:50 buildroot user.info kernel: [ 2510.122694] CPU 3:
>> read_counter  idx=1 val=53906699665
>> Jan  1 00:41:50 buildroot user.info kernel: [ 2510.216736] CPU 1:
>> read_counter  idx=0 val=2528516878664
>> Jan  1 00:41:50 buildroot user.info kernel: [ 2510.221270] CPU 1:
>> read_counter  idx=1 val=51986369142
>>
>> It looks like the counter values from different cores are subtracted and
>> wraparound occurs.
>>
>
> Thanks for the hint.  It makes sense.  9223372034948899840 is 7fffffff8e66a400,
> which should be a wraparound with the mask I set (63-bit) in the code.
>
> I will try this direction.  Ideally, we can solve it by explicitly syncing the
> hwc->prev_count when a cpu migration event happens.
>
>>
>> Also, core IDs and socket IDs are wrong in perf report:
>>
>
> As Palmer has replied to this, I have no comment here.
>
>> perf report --header -I
>> Error:
>> The perf.data file has no samples!
>> # ========
>> # captured on: Thu Jan  1 02:52:07 1970
>> # hostname : buildroot
>> # os release : 4.15.0-00045-g0d7c030-dirty
>> # perf version : 4.15.0
>> # arch : riscv64
>> # nrcpus online : 4
>> # nrcpus avail : 5
>> # total memory : 8188340 kB
>> # cmdline : /usr/bin/perf record -F 1000 lat_mem_rd -P 1 -W 1 -N 1 -t 10
>> # event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } =
>> 1000, sample_type = IP|TID|TIME|PERIOD, disabled = 1, inherit = 1, mmap =
>> 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3,
>> sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1
>> # sibling cores   : 1
>> # sibling cores   : 2
>> # sibling cores   : 3
>> # sibling cores   : 4
>> # sibling threads : 1
>> # sibling threads : 2
>> # sibling threads : 3
>> # sibling threads : 4
>> # CPU 0: Core ID -1, Socket ID -1
>> # CPU 1: Core ID 0, Socket ID -1
>> # CPU 2: Core ID 0, Socket ID -1
>> # CPU 3: Core ID 0, Socket ID -1
>> # CPU 4: Core ID 0, Socket ID -1
>> # pmu mappings: cpu = 4, software = 1
>> # CPU cache info:
>> #  L1 Instruction          32K [1]
>> #  L1 Data                 32K [1]
>> #  L1 Instruction          32K [2]
>> #  L1 Data                 32K [2]
>> #  L1 Instruction          32K [3]
>> #  L1 Data                 32K [3]
>> # missing features: TRACING_DATA BUILD_ID CPUDESC CPUID NUMA_TOPOLOGY
>> BRANCH_STACK GROUP_DESC AUXTRACE STAT
>> # ========
>>
>>
>> Alex
>>
>
> Many thanks,
> Alan
>
>> On Mon, Mar 26, 2018 at 12:57 AM, Alan Kao <alankao@andestech.com> wrote:
>>
>> > This patch provide a basic PMU, riscv_base_pmu, which supports two
>> > general hardware event, instructions and cycles.  Furthermore, this
>> > PMU serves as a reference implementation to ease the portings in
>> > the future.
>> >
>> > riscv_base_pmu should be able to run on any RISC-V machine that
>> > conforms to the Priv-Spec.  Note that the latest qemu model hasn't
>> > fully support a proper behavior of Priv-Spec 1.10 yet, but work
>> > around should be easy with very small fixes.  Please check
>> > https://github.com/riscv/riscv-qemu/pull/115 for future updates.
>> >
>> > Cc: Nick Hu <nickhu@andestech.com>
>> > Cc: Greentime Hu <greentime@andestech.com>
>> > Signed-off-by: Alan Kao <alankao@andestech.com>
>> > ---
>> >  arch/riscv/Kconfig                  |  12 +
>> >  arch/riscv/include/asm/perf_event.h |  76 +++++-
>> >  arch/riscv/kernel/Makefile          |   1 +
>> >  arch/riscv/kernel/perf_event.c      | 469 ++++++++++++++++++++++++++++++
>> > ++++++
>> >  4 files changed, 554 insertions(+), 4 deletions(-)
>> >  create mode 100644 arch/riscv/kernel/perf_event.c
>> >
>> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> > index 310b9a5d6737..dd4aecfb5265 100644
>> > --- a/arch/riscv/Kconfig
>> > +++ b/arch/riscv/Kconfig
>> > @@ -195,6 +195,18 @@ config RISCV_ISA_C
>> >  config RISCV_ISA_A
>> >         def_bool y
>> >
>> > +menu "PMU type"
>> > +       depends on PERF_EVENTS
>> > +
>> > +config RISCV_BASE_PMU
>> > +       bool "Base Performance Monitoring Unit"
>> > +       def_bool y
>> > +       help
>> > +         A base PMU that serves as a reference implementation and has
>> > limited
>> > +         feature of perf.
>> > +
>> > +endmenu
>> > +
>> >  endmenu
>> >
>> >  menu "Kernel type"
>> > diff --git a/arch/riscv/include/asm/perf_event.h
>> > b/arch/riscv/include/asm/perf_event.h
>> > index e13d2ff29e83..98e2efb02d25 100644
>> > --- a/arch/riscv/include/asm/perf_event.h
>> > +++ b/arch/riscv/include/asm/perf_event.h
>> > @@ -1,13 +1,81 @@
>> > +/* SPDX-License-Identifier: GPL-2.0 */
>> >  /*
>> >   * Copyright (C) 2018 SiFive
>> > + * Copyright (C) 2018 Andes Technology Corporation
>> >   *
>> > - * This program is free software; you can redistribute it and/or
>> > - * modify it under the terms of the GNU General Public Licence
>> > - * as published by the Free Software Foundation; either version
>> > - * 2 of the Licence, or (at your option) any later version.
>> >   */
>> >
>> >  #ifndef _ASM_RISCV_PERF_EVENT_H
>> >  #define _ASM_RISCV_PERF_EVENT_H
>> >
>> > +#include <linux/perf_event.h>
>> > +#include <linux/ptrace.h>
>> > +
>> > +#define RISCV_BASE_COUNTERS    2
>> > +
>> > +/*
>> > + * The RISCV_MAX_COUNTERS parameter should be specified.
>> > + */
>> > +
>> > +#ifdef CONFIG_RISCV_BASE_PMU
>> > +#define RISCV_MAX_COUNTERS     2
>> > +#endif
>> > +
>> > +#ifndef RISCV_MAX_COUNTERS
>> > +#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU."
>> > +#endif
>> > +
>> > +/*
>> > + * These are the indexes of bits in counteren register *minus* 1,
>> > + * except for cycle.  It would be coherent if it can directly mapped
>> > + * to counteren bit definition, but there is a *time* register at
>> > + * counteren[1].  Per-cpu structure is scarce resource here.
>> > + *
>> > + * According to the spec, an implementation can support counter up to
>> > + * mhpmcounter31, but many high-end processors has at most 6 general
>> > + * PMCs, we give the definition to MHPMCOUNTER8 here.
>> > + */
>> > +#define RISCV_PMU_CYCLE                0
>> > +#define RISCV_PMU_INSTRET      1
>> > +#define RISCV_PMU_MHPMCOUNTER3 2
>> > +#define RISCV_PMU_MHPMCOUNTER4 3
>> > +#define RISCV_PMU_MHPMCOUNTER5 4
>> > +#define RISCV_PMU_MHPMCOUNTER6 5
>> > +#define RISCV_PMU_MHPMCOUNTER7 6
>> > +#define RISCV_PMU_MHPMCOUNTER8 7
>> > +
>> > +#define RISCV_OP_UNSUPP                (-EOPNOTSUPP)
>> > +
>> > +struct cpu_hw_events {
>> > +       /* # currently enabled events*/
>> > +       int                     n_events;
>> > +       /* currently enabled events */
>> > +       struct perf_event       *events[RISCV_MAX_COUNTERS];
>> > +       /* vendor-defined PMU data */
>> > +       void                    *platform;
>> > +};
>> > +
>> > +struct riscv_pmu {
>> > +       struct pmu      *pmu;
>> > +
>> > +       /* generic hw/cache events table */
>> > +       const int       *hw_events;
>> > +       const int       (*cache_events)[PERF_COUNT_HW_CACHE_MAX]
>> > +                                      [PERF_COUNT_HW_CACHE_OP_MAX]
>> > +                                      [PERF_COUNT_HW_CACHE_RESULT_MAX];
>> > +       /* method used to map hw/cache events */
>> > +       int             (*map_hw_event)(u64 config);
>> > +       int             (*map_cache_event)(u64 config);
>> > +
>> > +       /* max generic hw events in map */
>> > +       int             max_events;
>> > +       /* number total counters, 2(base) + x(general) */
>> > +       int             num_counters;
>> > +       /* the width of the counter */
>> > +       int             counter_width;
>> > +
>> > +       /* vendor-defined PMU features */
>> > +       void            *platform;
>> > +};
>> > +
>> >  #endif /* _ASM_RISCV_PERF_EVENT_H */
>> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
>> > index 196f62ffc428..849c38d9105f 100644
>> > --- a/arch/riscv/kernel/Makefile
>> > +++ b/arch/riscv/kernel/Makefile
>> > @@ -36,5 +36,6 @@ obj-$(CONFIG_SMP)             += smp.o
>> >  obj-$(CONFIG_MODULES)          += module.o
>> >  obj-$(CONFIG_FUNCTION_TRACER)  += mcount.o
>> >  obj-$(CONFIG_FUNCTION_GRAPH_TRACER)    += ftrace.o
>> > +obj-$(CONFIG_PERF_EVENTS)      += perf_event.o
>> >
>> >  clean:
>> > diff --git a/arch/riscv/kernel/perf_event.c b/arch/riscv/kernel/perf_
>> > event.c
>> > new file mode 100644
>> > index 000000000000..b78cb486683b
>> > --- /dev/null
>> > +++ b/arch/riscv/kernel/perf_event.c
>> > @@ -0,0 +1,469 @@
>> > +/* SPDX-License-Identifier: GPL-2.0 */
>> > +/*
>> > + * Copyright (C) 2008 Thomas Gleixner <tglx@linutronix.de>
>> > + * Copyright (C) 2008-2009 Red Hat, Inc., Ingo Molnar
>> > + * Copyright (C) 2009 Jaswinder Singh Rajput
>> > + * Copyright (C) 2009 Advanced Micro Devices, Inc., Robert Richter
>> > + * Copyright (C) 2008-2009 Red Hat, Inc., Peter Zijlstra
>> > + * Copyright (C) 2009 Intel Corporation, <markus.t.metzger@intel.com>
>> > + * Copyright (C) 2009 Google, Inc., Stephane Eranian
>> > + * Copyright 2014 Tilera Corporation. All Rights Reserved.
>> > + * Copyright (C) 2018 Andes Technology Corporation
>> > + *
>> > + * Perf_events support for RISC-V platforms.
>> > + *
>> > + * Since the spec. (as of now, Priv-Spec 1.10) does not provide enough
>> > + * functionality for perf event to fully work, this file provides
>> > + * the very basic framework only.
>> > + *
>> > + * For platform portings, please check Documentations/riscv/pmu.txt.
>> > + *
>> > + * The Copyright line includes x86 and tile ones.
>> > + */
>> > +
>> > +#include <linux/kprobes.h>
>> > +#include <linux/kernel.h>
>> > +#include <linux/kdebug.h>
>> > +#include <linux/mutex.h>
>> > +#include <linux/bitmap.h>
>> > +#include <linux/irq.h>
>> > +#include <linux/interrupt.h>
>> > +#include <linux/perf_event.h>
>> > +#include <linux/atomic.h>
>> > +#include <asm/perf_event.h>
>> > +
>> > +static const struct riscv_pmu *riscv_pmu __read_mostly;
>> > +static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
>> > +
>> > +/*
>> > + * Hardware & cache maps and their methods
>> > + */
>> > +
>> > +static const int riscv_hw_event_map[] = {
>> > +       [PERF_COUNT_HW_CPU_CYCLES]              = RISCV_PMU_CYCLE,
>> > +       [PERF_COUNT_HW_INSTRUCTIONS]            = RISCV_PMU_INSTRET,
>> > +       [PERF_COUNT_HW_CACHE_REFERENCES]        = RISCV_OP_UNSUPP,
>> > +       [PERF_COUNT_HW_CACHE_MISSES]            = RISCV_OP_UNSUPP,
>> > +       [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]     = RISCV_OP_UNSUPP,
>> > +       [PERF_COUNT_HW_BRANCH_MISSES]           = RISCV_OP_UNSUPP,
>> > +       [PERF_COUNT_HW_BUS_CYCLES]              = RISCV_OP_UNSUPP,
>> > +};
>> > +
>> > +#define C(x) PERF_COUNT_HW_CACHE_##x
>> > +static const int riscv_cache_event_map[PERF_COUNT_HW_CACHE_MAX]
>> > +[PERF_COUNT_HW_CACHE_OP_MAX]
>> > +[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
>> > +       [C(L1D)] = {
>> > +               [C(OP_READ)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +               [C(OP_WRITE)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +               [C(OP_PREFETCH)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +       },
>> > +       [C(L1I)] = {
>> > +               [C(OP_READ)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +               [C(OP_WRITE)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +               [C(OP_PREFETCH)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +       },
>> > +       [C(LL)] = {
>> > +               [C(OP_READ)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +               [C(OP_WRITE)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +               [C(OP_PREFETCH)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +       },
>> > +       [C(DTLB)] = {
>> > +               [C(OP_READ)] = {
>> > +                       [C(RESULT_ACCESS)] =  RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] =  RISCV_OP_UNSUPP,
>> > +               },
>> > +               [C(OP_WRITE)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +               [C(OP_PREFETCH)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +       },
>> > +       [C(ITLB)] = {
>> > +               [C(OP_READ)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +               [C(OP_WRITE)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +               [C(OP_PREFETCH)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +       },
>> > +       [C(BPU)] = {
>> > +               [C(OP_READ)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +               [C(OP_WRITE)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +               [C(OP_PREFETCH)] = {
>> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
>> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
>> > +               },
>> > +       },
>> > +};
>> > +
>> > +static int riscv_map_hw_event(u64 config)
>> > +{
>> > +       if (config >= riscv_pmu->max_events)
>> > +               return -EINVAL;
>> > +
>> > +       return riscv_pmu->hw_events[config];
>> > +}
>> > +
>> > +int riscv_map_cache_decode(u64 config, unsigned int *type,
>> > +                          unsigned int *op, unsigned int *result)
>> > +{
>> > +       return -ENOENT;
>> > +}
>> > +
>> > +static int riscv_map_cache_event(u64 config)
>> > +{
>> > +       unsigned int type, op, result;
>> > +       int err = -ENOENT;
>> > +               int code;
>> > +
>> > +       err = riscv_map_cache_decode(config, &type, &op, &result);
>> > +       if (!riscv_pmu->cache_events || err)
>> > +               return err;
>> > +
>> > +       if (type >= PERF_COUNT_HW_CACHE_MAX ||
>> > +           op >= PERF_COUNT_HW_CACHE_OP_MAX ||
>> > +           result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
>> > +               return -EINVAL;
>> > +
>> > +       code = (*riscv_pmu->cache_events)[type][op][result];
>> > +       if (code == RISCV_OP_UNSUPP)
>> > +               return -EINVAL;
>> > +
>> > +       return code;
>> > +}
>> > +
>> > +/*
>> > + * Low-level functions: reading/writing counters
>> > + */
>> > +
>> > +static inline u64 read_counter(int idx)
>> > +{
>> > +       u64 val = 0;
>> > +
>> > +       switch (idx) {
>> > +       case RISCV_PMU_CYCLE:
>> > +               val = csr_read(cycle);
>> > +               break;
>> > +       case RISCV_PMU_INSTRET:
>> > +               val = csr_read(instret);
>> > +               break;
>> > +       default:
>> > +               WARN_ON_ONCE(idx < 0 || idx > RISCV_MAX_COUNTERS);
>> > +               return -EINVAL;
>> > +       }
>> > +
>> > +       return val;
>> > +}
>> > +
>> > +static inline void write_counter(int idx, u64 value)
>> > +{
>> > +       /* currently not supported */
>> > +}
>> > +
>> > +/*
>> > + * pmu->read: read and update the counter
>> > + *
>> > + * Other architectures' implementation often have a xxx_perf_event_update
>> > + * routine, which can return counter values when called in the IRQ, but
>> > + * return void when being called by the pmu->read method.
>> > + */
>> > +static void riscv_pmu_read(struct perf_event *event)
>> > +{
>> > +       struct hw_perf_event *hwc = &event->hw;
>> > +       u64 prev_raw_count, new_raw_count;
>> > +       u64 oldval;
>> > +       int idx = hwc->idx;
>> > +       u64 delta;
>> > +
>> > +       do {
>> > +               prev_raw_count = local64_read(&hwc->prev_count);
>> > +               new_raw_count = read_counter(idx);
>> > +
>> > +               oldval = local64_cmpxchg(&hwc->prev_count, prev_raw_count,
>> > +                                        new_raw_count);
>> > +       } while (oldval != prev_raw_count);
>> > +
>> > +       /*
>> > +        * delta is the value to update the counter we maintain in the
>> > kernel.
>> > +        */
>> > +       delta = (new_raw_count - prev_raw_count) &
>> > +               ((1ULL << riscv_pmu->counter_width) - 1);
>> > +       local64_add(delta, &event->count);
>> > +       /*
>> > +        * Something like local64_sub(delta, &hwc->period_left) here is
>> > +        * needed if there is an interrupt for perf.
>> > +        */
>> > +}
>> > +
>> > +/*
>> > + * State transition functions:
>> > + *
>> > + * stop()/start() & add()/del()
>> > + */
>> > +
>> > +/*
>> > + * pmu->stop: stop the counter
>> > + */
>> > +static void riscv_pmu_stop(struct perf_event *event, int flags)
>> > +{
>> > +       struct hw_perf_event *hwc = &event->hw;
>> > +
>> > +       WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
>> > +       hwc->state |= PERF_HES_STOPPED;
>> > +
>> > +       if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE))
>> > {
>> > +               riscv_pmu_read(event);
>> > +               hwc->state |= PERF_HES_UPTODATE;
>> > +       }
>> > +}
>> > +
>> > +/*
>> > + * pmu->start: start the event.
>> > + */
>> > +static void riscv_pmu_start(struct perf_event *event, int flags)
>> > +{
>> > +       struct hw_perf_event *hwc = &event->hw;
>> > +
>> > +       if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
>> > +               return;
>> > +
>> > +       if (flags & PERF_EF_RELOAD) {
>> > +               WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
>> > +
>> > +               /*
>> > +                * Set the counter to the period to the next interrupt
>> > here,
>> > +                * if you have any.
>> > +                */
>> > +       }
>> > +
>> > +       hwc->state = 0;
>> > +       perf_event_update_userpage(event);
>> > +
>> > +       /*
>> > +        * Since we cannot write to counters, this serves as an
>> > initialization
>> > +        * to the delta-mechanism in pmu->read(); otherwise, the delta
>> > would be
>> > +        * wrong when pmu->read is called for the first time.
>> > +        */
>> > +       if (local64_read(&hwc->prev_count) == 0)
>> > +               local64_set(&hwc->prev_count, read_counter(hwc->idx));
>> > +}
>> > +
>> > +/*
>> > + * pmu->add: add the event to PMU.
>> > + */
>> > +static int riscv_pmu_add(struct perf_event *event, int flags)
>> > +{
>> > +       struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> > +       struct hw_perf_event *hwc = &event->hw;
>> > +
>> > +       if (cpuc->n_events == riscv_pmu->num_counters)
>> > +               return -ENOSPC;
>> > +
>> > +       /*
>> > +        * We don't have general conunters, so no binding-event-to-counter
>> > +        * process here.
>> > +        *
>> > +        * Indexing using hwc->config generally not works, since config may
>> > +        * contain extra information, but here the only info we have in
>> > +        * hwc->config is the event index.
>> > +        */
>> > +       hwc->idx = hwc->config;
>> > +       cpuc->events[hwc->idx] = event;
>> > +       cpuc->n_events++;
>> > +
>> > +       hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
>> > +
>> > +       if (flags & PERF_EF_START)
>> > +               riscv_pmu_start(event, PERF_EF_RELOAD);
>> > +
>> > +       return 0;
>> > +}
>> > +
>> > +/*
>> > + * pmu->del: delete the event from PMU.
>> > + */
>> > +static void riscv_pmu_del(struct perf_event *event, int flags)
>> > +{
>> > +       struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> > +       struct hw_perf_event *hwc = &event->hw;
>> > +
>> > +       cpuc->events[hwc->idx] = NULL;
>> > +       cpuc->n_events--;
>> > +       riscv_pmu_stop(event, PERF_EF_UPDATE);
>> > +       perf_event_update_userpage(event);
>> > +}
>> > +
>> > +/*
>> > + * Interrupt
>> > + */
>> > +
>> > +static DEFINE_MUTEX(pmc_reserve_mutex);
>> > +typedef void (*perf_irq_t)(void *riscv_perf_irq);
>> > +perf_irq_t perf_irq;
>> > +
>> > +void riscv_pmu_handle_irq(void *riscv_perf_irq)
>> > +{
>> > +}
>> > +
>> > +static perf_irq_t reserve_pmc_hardware(void)
>> > +{
>> > +       perf_irq_t old;
>> > +
>> > +       mutex_lock(&pmc_reserve_mutex);
>> > +       old = perf_irq;
>> > +       perf_irq = &riscv_pmu_handle_irq;
>> > +       mutex_unlock(&pmc_reserve_mutex);
>> > +
>> > +       return old;
>> > +}
>> > +
>> > +void release_pmc_hardware(void)
>> > +{
>> > +       mutex_lock(&pmc_reserve_mutex);
>> > +       perf_irq = NULL;
>> > +       mutex_unlock(&pmc_reserve_mutex);
>> > +}
>> > +
>> > +/*
>> > + * Event Initialization
>> > + */
>> > +
>> > +static atomic_t riscv_active_events;
>> > +
>> > +static void riscv_event_destroy(struct perf_event *event)
>> > +{
>> > +       if (atomic_dec_return(&riscv_active_events) == 0)
>> > +               release_pmc_hardware();
>> > +}
>> > +
>> > +static int riscv_event_init(struct perf_event *event)
>> > +{
>> > +       struct perf_event_attr *attr = &event->attr;
>> > +       struct hw_perf_event *hwc = &event->hw;
>> > +       perf_irq_t old_irq_handler = NULL;
>> > +       int code;
>> > +
>> > +       if (atomic_inc_return(&riscv_active_events) == 1)
>> > +               old_irq_handler = reserve_pmc_hardware();
>> > +
>> > +       if (old_irq_handler) {
>> > +               pr_warn("PMC hardware busy (reserved by oprofile)\n");
>> > +               atomic_dec(&riscv_active_events);
>> > +               return -EBUSY;
>> > +       }
>> > +
>> > +       switch (event->attr.type) {
>> > +       case PERF_TYPE_HARDWARE:
>> > +               code = riscv_pmu->map_hw_event(attr->config);
>> > +               break;
>> > +       case PERF_TYPE_HW_CACHE:
>> > +               code = riscv_pmu->map_cache_event(attr->config);
>> > +               break;
>> > +       case PERF_TYPE_RAW:
>> > +               return -EOPNOTSUPP;
>> > +       default:
>> > +               return -ENOENT;
>> > +       }
>> > +
>> > +       event->destroy = riscv_event_destroy;
>> > +       if (code < 0) {
>> > +               event->destroy(event);
>> > +               return code;
>> > +       }
>> > +
>> > +       /*
>> > +        * idx is set to -1 because the index of a general event should
>> > not be
>> > +        * decided until binding to some counter in pmu->add().
>> > +        *
>> > +        * But since we don't have such support, later in pmu->add(), we
>> > just
>> > +        * use hwc->config as the index instead.
>> > +        */
>> > +       hwc->config = code;
>> > +       hwc->idx = -1;
>> > +
>> > +       return 0;
>> > +}
>> > +
>> > +/*
>> > + * Initialization
>> > + */
>> > +
>> > +static struct pmu min_pmu = {
>> > +       .name           = "riscv-base",
>> > +       .event_init     = riscv_event_init,
>> > +       .add            = riscv_pmu_add,
>> > +       .del            = riscv_pmu_del,
>> > +       .start          = riscv_pmu_start,
>> > +       .stop           = riscv_pmu_stop,
>> > +       .read           = riscv_pmu_read,
>> > +};
>> > +
>> > +static const struct riscv_pmu riscv_base_pmu = {
>> > +       .pmu = &min_pmu,
>> > +       .max_events = ARRAY_SIZE(riscv_hw_event_map),
>> > +       .map_hw_event = riscv_map_hw_event,
>> > +       .hw_events = riscv_hw_event_map,
>> > +       .map_cache_event = riscv_map_cache_event,
>> > +       .cache_events = &riscv_cache_event_map,
>> > +       .counter_width = 63,
>> > +       .num_counters = RISCV_BASE_COUNTERS + 0,
>> > +};
>> > +
>> > +struct pmu * __weak __init riscv_init_platform_pmu(void)
>> > +{
>> > +       riscv_pmu = &riscv_base_pmu;
>> > +       return riscv_pmu->pmu;
>> > +}
>> > +
>> > +int __init init_hw_perf_events(void)
>> > +{
>> > +       struct pmu *pmu = riscv_init_platform_pmu();
>> > +
>> > +       perf_irq = NULL;
>> > +       perf_pmu_register(pmu, "cpu", PERF_TYPE_RAW);
>> > +       return 0;
>> > +}
>> > +arch_initcall(init_hw_perf_events);
>> > --
>> > 2.16.2
>> >
>> >
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 00/32] docs/vm: convert to ReST format
From: Mike Rapoport @ 2018-04-01  6:38 UTC (permalink / raw)
  To: Jonathan Corbet, Andrew Morton
  Cc: Andrey Ryabinin, Richard Henderson, Ivan Kokshaysky, Matt Turner,
	Tony Luck, Fenghua Yu, Ralf Baechle, James Hogan,
	Michael Ellerman, Alexander Viro, linux-kernel, linux-doc,
	kasan-dev, linux-alpha, linux-ia64, linux-mips, linuxppc-dev,
	linux-fsdevel, linux-mm
In-Reply-To: <20180329154607.3d8bda75@lwn.net>

(added akpm)

On Thu, Mar 29, 2018 at 03:46:07PM -0600, Jonathan Corbet wrote:
> On Wed, 21 Mar 2018 21:22:16 +0200
> Mike Rapoport <rppt@linux.vnet.ibm.com> wrote:
> 
> > These patches convert files in Documentation/vm to ReST format, add an
> > initial index and link it to the top level documentation.
> > 
> > There are no contents changes in the documentation, except few spelling
> > fixes. The relatively large diffstat stems from the indentation and
> > paragraph wrapping changes.
> > 
> > I've tried to keep the formatting as consistent as possible, but I could
> > miss some places that needed markup and add some markup where it was not
> > necessary.
> 
> So I've been pondering on these for a bit.  It looks like a reasonable and
> straightforward RST conversion, no real complaints there.  But I do have a
> couple of concerns...
> 
> One is that, as we move documentation into RST, I'm really trying to
> organize it a bit so that it is better tuned to the various audiences we
> have.  For example, ksm.txt is going to be of interest to sysadmin types,
> who might want to tune it.  mmu_notifier.txt is of interest to ...
> somebody, but probably nobody who is thinking in user space.  And so on.
> 
> So I would really like to see this material split up and put into the
> appropriate places in the RST hierarchy - admin-guide for administrative
> stuff, core-api for kernel development topics, etc.  That, of course,
> could be done separately from the RST conversion, but I suspect I know
> what will (or will not) happen if we agree to defer that for now :)

Well, I was actually planning on doing that ;-)

My thinking was to start with mechanical RST conversion and then to start
working on the contents and ordering of the documentation. Some of the
existing files, e.g. ksm.txt, can be moved as is into the appropriate
places, others, like transhuge.txt should be at least split into admin/user
and developer guides.

Another problem with many of the existing mm docs is that they are rather
developer notes and it wouldn't be really straight forward to assign them
to a particular topic.

I believe that keeping the mm docs together will give better visibility of
what (little) mm documentation we have and will make the updates easier.
The documents that fit well into a certain topic could be linked there. For
instance:

-------------------------
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 5bb9161..8f6c6e6 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -63,6 +63,7 @@ configure specific aspects of kernel behavior to your liking.
    pm/index
    thunderbolt
    LSM/index
+   vm/index
 
 .. only::  subproject and html
 
diff --git a/Documentation/admin-guide/vm/index.rst b/Documentation/admin-guide/vm/index.rst
new file mode 100644
index 0000000..d86f1c8
--- /dev/null
+++ b/Documentation/admin-guide/vm/index.rst
@@ -0,0 +1,5 @@
+==============================================
+Knobs and Buttons for Memory Management Tuning
+==============================================
+
+* :ref:`ksm <ksm>`
-------------------------

> The other is the inevitable merge conflicts that changing that many doc
> files will create.  Sending the patches through Andrew could minimize
> that, I guess, or at least make it his problem.  Alternatively, we could
> try to do it as an end-of-merge-window sort of thing.  I can try to manage
> that, but an ack or two from the mm crowd would be nice to have.

I can rebase on top of Andrew's tree if that would help to minimize the
merge conflicts.
 
> Thanks,
> 
> jon
> 

-- 
Sincerely yours,
Mike.

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH] Documentation: devices.rst: minor plural grammar fix
From: Martin Kepplinger @ 2018-04-01 21:59 UTC (permalink / raw)
  To: corbet; +Cc: linux-doc, linux-kernel, Martin Kepplinger

It's authors who request something here, no authors who requests. Let's
fix this.

Signed-off-by: Martin Kepplinger <martink@posteo.de>
---
 Documentation/admin-guide/devices.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/devices.rst b/Documentation/admin-guide/devices.rst
index 7fadc05330dd..f781e80f342d 100644
--- a/Documentation/admin-guide/devices.rst
+++ b/Documentation/admin-guide/devices.rst
@@ -22,7 +22,7 @@ Allocations marked (68k/Amiga) apply to Linux/68k on the Amiga
 platform only.	Allocations marked (68k/Atari) apply to Linux/68k on
 the Atari platform only.
 
-This document is in the public domain.	The authors requests, however,
+This document is in the public domain.	The authors request, however,
 that semantically altered versions are not distributed without
 permission of the authors, assuming the authors can be contacted without
 an unreasonable effort.
-- 
2.16.2

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] Documentation/thermal: Check links and convert to https
From: Viresh Kumar @ 2018-04-02  7:12 UTC (permalink / raw)
  To: Sanjeev Gupta; +Cc: corbet, linux-pm, linux-doc, linux-kernel
In-Reply-To: <20180328145913.14026-1-ghane0@gmail.com>

On 28-03-18, 22:59, Sanjeev Gupta wrote:
> All links working.

And why is it important to convert them to https ?

> Signed-off-by: Sanjeev Gupta <ghane0@gmail.com>
> ---
>  Documentation/thermal/cpu-cooling-api.txt | 2 +-
>  Documentation/thermal/nouveau_thermal     | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/thermal/cpu-cooling-api.txt b/Documentation/thermal/cpu-cooling-api.txt
> index 7df567eaea1a..32917d178c51 100644
> --- a/Documentation/thermal/cpu-cooling-api.txt
> +++ b/Documentation/thermal/cpu-cooling-api.txt
> @@ -5,7 +5,7 @@ Written by Amit Daniel Kachhap <amit.kachhap@linaro.org>
>  
>  Updated: 6 Jan 2015
>  
> -Copyright (c)  2012 Samsung Electronics Co., Ltd(http://www.samsung.com)
> +Copyright (c)  2012 Samsung Electronics Co., Ltd (https://www.samsung.com)
>  
>  0. Introduction
>  
> diff --git a/Documentation/thermal/nouveau_thermal b/Documentation/thermal/nouveau_thermal
> index 6e17a11efcb0..502b0b95c2e2 100644
> --- a/Documentation/thermal/nouveau_thermal
> +++ b/Documentation/thermal/nouveau_thermal
> @@ -79,4 +79,4 @@ Thermal management on Nouveau is new and may not work on all cards. If you have
>  inquiries, please ping mupuf on IRC (#nouveau, freenode).
>  
>  Bug reports should be filled on Freedesktop's bug tracker. Please follow
> -http://nouveau.freedesktop.org/wiki/Bugs
> +https://nouveau.freedesktop.org/wiki/Bugs
> -- 
> 2.15.1

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/2] perf: riscv: preliminary RISC-V support
From: Alan Kao @ 2018-04-02  7:36 UTC (permalink / raw)
  To: Alex Solomatnikov
  Cc: Palmer Dabbelt, Albert Ou, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Jonathan Corbet, linux-riscv, linux-doc,
	linux-kernel, Nick Hu, Greentime Hu
In-Reply-To: <CAJ2AOiPNquo1hGrYjsDcCM4st5Exa7x_-xUV=QW_MCBVCCCkBw@mail.gmail.com>

On Sat, Mar 31, 2018 at 03:47:10PM -0700, Alex Solomatnikov wrote:

The original guess was that maybe, an counter value on a hart is picked
as the minusend, and an old counter value on another hart was recorded
as the subtrahend but numerically larger.  Then, the overflow causes 
by that subtraction.  Please let me name this guess as 
"cross-hart subtraction."

> You can add a skew between cores in qemu, something like this:
> 
> case CSR_INSTRET:
>         core_id()*return cpu_get_host_ticks()/10;
>     break;
> case CSR_CYCLE:
>         return cpu_get_host_ticks();
>     break;
> 

However, I tried similar stuff to reproduce the phenomenon but in vain.
It seems that the cross-hart subtration doesn't even happen, because generic
code handles them.   While I am still looking for the proof to it, I would 
like to have more information on this first:

* What is the frequency of that "funny number" event?  Was that often?

* If you monitor only one hart, will the event disappear?

* What will happen if you change the counter_width to fit U54's counter width?

* Is the test program you used open-sourced?

> Alex
> 

Many thanks,
Alan

> On Wed, Mar 28, 2018 at 7:30 PM, Alan Kao <alankao@andestech.com> wrote:
> > Hi Alex,
> >
> > I'm appreciated for your reply and tests.
> >
> > On Wed, Mar 28, 2018 at 03:58:41PM -0700, Alex Solomatnikov wrote:
> >> Did you test this code?
> >
> > I did test this patch on QEMU's virt model with multi-hart, which is the only
> > RISC-V machine I have for now.  But as I mentioned in
> > https://github.com/riscv/riscv-qemu/pull/115 , the hardware counter support
> > in QEMU is not fully conformed to the 1.10 Priv-Spec, so I had to slightly
> > tweak the code to make reading work.
> >
> > Specifically, the read to cycle and instret in QEMU looks like this:
> > ...
> > case CSR_INSTRET:
> > case CSR_CYCLE:
> > //  if (ctr_ok) {
> >         return cpu_get_host_ticks();
> > //  }
> >     break;
> > ...
> > and the two lines of comment was the tweak.
> >
> > On such environment, I did not get anything unexpected.  No matter which of them
> > is requested, QEMU returns the host's tick.
> >
> >>
> >> I got funny numbers when I tried to run it on HiFive Unleashed:
> >>
> >> perf stat mem-latency
> >> ...
> >>
> >>  Performance counter stats for 'mem-latency':
> >>
> >>         157.907000      task-clock (msec)         #    0.940 CPUs utilized
> >>
> >>                  1      context-switches          #    0.006 K/sec
> >>
> >>                  1      cpu-migrations            #    0.006 K/sec
> >>
> >>               4102      page-faults               #    0.026 M/sec
> >>
> >>          157923752      cycles                    #    1.000 GHz
> >>
> >> 9223372034948899840      instructions              # 58403957087.78  insn
> >> per cycle
> >>    <not supported>      branches
> >>
> >>    <not supported>      branch-misses
> >>
> >>
> >>        0.168046000 seconds time elapsed
> >>
> >>
> >> Tracing read_counter(), I see this:
> >>
> >> Jan  1 00:41:50 buildroot user.info kernel: [ 2510.058809] CPU 3:
> >> read_counter  idx=0 val=2528358954912
> >> Jan  1 00:41:50 buildroot user.info kernel: [ 2510.063339] CPU 3:
> >> read_counter  idx=1 val=53892244920
> >> Jan  1 00:41:50 buildroot user.info kernel: [ 2510.118160] CPU 3:
> >> read_counter  idx=0 val=2528418303035
> >> Jan  1 00:41:50 buildroot user.info kernel: [ 2510.122694] CPU 3:
> >> read_counter  idx=1 val=53906699665
> >> Jan  1 00:41:50 buildroot user.info kernel: [ 2510.216736] CPU 1:
> >> read_counter  idx=0 val=2528516878664
> >> Jan  1 00:41:50 buildroot user.info kernel: [ 2510.221270] CPU 1:
> >> read_counter  idx=1 val=51986369142
> >>
> >> It looks like the counter values from different cores are subtracted and
> >> wraparound occurs.
> >>
> >
> > Thanks for the hint.  It makes sense.  9223372034948899840 is 7fffffff8e66a400,
> > which should be a wraparound with the mask I set (63-bit) in the code.
> >
> > I will try this direction.  Ideally, we can solve it by explicitly syncing the
> > hwc->prev_count when a cpu migration event happens.
> >
> >>
> >> Also, core IDs and socket IDs are wrong in perf report:
> >>
> >
> > As Palmer has replied to this, I have no comment here.
> >
> >> perf report --header -I
> >> Error:
> >> The perf.data file has no samples!
> >> # ========
> >> # captured on: Thu Jan  1 02:52:07 1970
> >> # hostname : buildroot
> >> # os release : 4.15.0-00045-g0d7c030-dirty
> >> # perf version : 4.15.0
> >> # arch : riscv64
> >> # nrcpus online : 4
> >> # nrcpus avail : 5
> >> # total memory : 8188340 kB
> >> # cmdline : /usr/bin/perf record -F 1000 lat_mem_rd -P 1 -W 1 -N 1 -t 10
> >> # event : name = cycles:ppp, , size = 112, { sample_period, sample_freq } =
> >> 1000, sample_type = IP|TID|TIME|PERIOD, disabled = 1, inherit = 1, mmap =
> >> 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, precise_ip = 3,
> >> sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1
> >> # sibling cores   : 1
> >> # sibling cores   : 2
> >> # sibling cores   : 3
> >> # sibling cores   : 4
> >> # sibling threads : 1
> >> # sibling threads : 2
> >> # sibling threads : 3
> >> # sibling threads : 4
> >> # CPU 0: Core ID -1, Socket ID -1
> >> # CPU 1: Core ID 0, Socket ID -1
> >> # CPU 2: Core ID 0, Socket ID -1
> >> # CPU 3: Core ID 0, Socket ID -1
> >> # CPU 4: Core ID 0, Socket ID -1
> >> # pmu mappings: cpu = 4, software = 1
> >> # CPU cache info:
> >> #  L1 Instruction          32K [1]
> >> #  L1 Data                 32K [1]
> >> #  L1 Instruction          32K [2]
> >> #  L1 Data                 32K [2]
> >> #  L1 Instruction          32K [3]
> >> #  L1 Data                 32K [3]
> >> # missing features: TRACING_DATA BUILD_ID CPUDESC CPUID NUMA_TOPOLOGY
> >> BRANCH_STACK GROUP_DESC AUXTRACE STAT
> >> # ========
> >>
> >>
> >> Alex
> >>
> >
> > Many thanks,
> > Alan
> >
> >> On Mon, Mar 26, 2018 at 12:57 AM, Alan Kao <alankao@andestech.com> wrote:
> >>
> >> > This patch provide a basic PMU, riscv_base_pmu, which supports two
> >> > general hardware event, instructions and cycles.  Furthermore, this
> >> > PMU serves as a reference implementation to ease the portings in
> >> > the future.
> >> >
> >> > riscv_base_pmu should be able to run on any RISC-V machine that
> >> > conforms to the Priv-Spec.  Note that the latest qemu model hasn't
> >> > fully support a proper behavior of Priv-Spec 1.10 yet, but work
> >> > around should be easy with very small fixes.  Please check
> >> > https://github.com/riscv/riscv-qemu/pull/115 for future updates.
> >> >
> >> > Cc: Nick Hu <nickhu@andestech.com>
> >> > Cc: Greentime Hu <greentime@andestech.com>
> >> > Signed-off-by: Alan Kao <alankao@andestech.com>
> >> > ---
> >> >  arch/riscv/Kconfig                  |  12 +
> >> >  arch/riscv/include/asm/perf_event.h |  76 +++++-
> >> >  arch/riscv/kernel/Makefile          |   1 +
> >> >  arch/riscv/kernel/perf_event.c      | 469 ++++++++++++++++++++++++++++++
> >> > ++++++
> >> >  4 files changed, 554 insertions(+), 4 deletions(-)
> >> >  create mode 100644 arch/riscv/kernel/perf_event.c
> >> >
> >> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> >> > index 310b9a5d6737..dd4aecfb5265 100644
> >> > --- a/arch/riscv/Kconfig
> >> > +++ b/arch/riscv/Kconfig
> >> > @@ -195,6 +195,18 @@ config RISCV_ISA_C
> >> >  config RISCV_ISA_A
> >> >         def_bool y
> >> >
> >> > +menu "PMU type"
> >> > +       depends on PERF_EVENTS
> >> > +
> >> > +config RISCV_BASE_PMU
> >> > +       bool "Base Performance Monitoring Unit"
> >> > +       def_bool y
> >> > +       help
> >> > +         A base PMU that serves as a reference implementation and has
> >> > limited
> >> > +         feature of perf.
> >> > +
> >> > +endmenu
> >> > +
> >> >  endmenu
> >> >
> >> >  menu "Kernel type"
> >> > diff --git a/arch/riscv/include/asm/perf_event.h
> >> > b/arch/riscv/include/asm/perf_event.h
> >> > index e13d2ff29e83..98e2efb02d25 100644
> >> > --- a/arch/riscv/include/asm/perf_event.h
> >> > +++ b/arch/riscv/include/asm/perf_event.h
> >> > @@ -1,13 +1,81 @@
> >> > +/* SPDX-License-Identifier: GPL-2.0 */
> >> >  /*
> >> >   * Copyright (C) 2018 SiFive
> >> > + * Copyright (C) 2018 Andes Technology Corporation
> >> >   *
> >> > - * This program is free software; you can redistribute it and/or
> >> > - * modify it under the terms of the GNU General Public Licence
> >> > - * as published by the Free Software Foundation; either version
> >> > - * 2 of the Licence, or (at your option) any later version.
> >> >   */
> >> >
> >> >  #ifndef _ASM_RISCV_PERF_EVENT_H
> >> >  #define _ASM_RISCV_PERF_EVENT_H
> >> >
> >> > +#include <linux/perf_event.h>
> >> > +#include <linux/ptrace.h>
> >> > +
> >> > +#define RISCV_BASE_COUNTERS    2
> >> > +
> >> > +/*
> >> > + * The RISCV_MAX_COUNTERS parameter should be specified.
> >> > + */
> >> > +
> >> > +#ifdef CONFIG_RISCV_BASE_PMU
> >> > +#define RISCV_MAX_COUNTERS     2
> >> > +#endif
> >> > +
> >> > +#ifndef RISCV_MAX_COUNTERS
> >> > +#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU."
> >> > +#endif
> >> > +
> >> > +/*
> >> > + * These are the indexes of bits in counteren register *minus* 1,
> >> > + * except for cycle.  It would be coherent if it can directly mapped
> >> > + * to counteren bit definition, but there is a *time* register at
> >> > + * counteren[1].  Per-cpu structure is scarce resource here.
> >> > + *
> >> > + * According to the spec, an implementation can support counter up to
> >> > + * mhpmcounter31, but many high-end processors has at most 6 general
> >> > + * PMCs, we give the definition to MHPMCOUNTER8 here.
> >> > + */
> >> > +#define RISCV_PMU_CYCLE                0
> >> > +#define RISCV_PMU_INSTRET      1
> >> > +#define RISCV_PMU_MHPMCOUNTER3 2
> >> > +#define RISCV_PMU_MHPMCOUNTER4 3
> >> > +#define RISCV_PMU_MHPMCOUNTER5 4
> >> > +#define RISCV_PMU_MHPMCOUNTER6 5
> >> > +#define RISCV_PMU_MHPMCOUNTER7 6
> >> > +#define RISCV_PMU_MHPMCOUNTER8 7
> >> > +
> >> > +#define RISCV_OP_UNSUPP                (-EOPNOTSUPP)
> >> > +
> >> > +struct cpu_hw_events {
> >> > +       /* # currently enabled events*/
> >> > +       int                     n_events;
> >> > +       /* currently enabled events */
> >> > +       struct perf_event       *events[RISCV_MAX_COUNTERS];
> >> > +       /* vendor-defined PMU data */
> >> > +       void                    *platform;
> >> > +};
> >> > +
> >> > +struct riscv_pmu {
> >> > +       struct pmu      *pmu;
> >> > +
> >> > +       /* generic hw/cache events table */
> >> > +       const int       *hw_events;
> >> > +       const int       (*cache_events)[PERF_COUNT_HW_CACHE_MAX]
> >> > +                                      [PERF_COUNT_HW_CACHE_OP_MAX]
> >> > +                                      [PERF_COUNT_HW_CACHE_RESULT_MAX];
> >> > +       /* method used to map hw/cache events */
> >> > +       int             (*map_hw_event)(u64 config);
> >> > +       int             (*map_cache_event)(u64 config);
> >> > +
> >> > +       /* max generic hw events in map */
> >> > +       int             max_events;
> >> > +       /* number total counters, 2(base) + x(general) */
> >> > +       int             num_counters;
> >> > +       /* the width of the counter */
> >> > +       int             counter_width;
> >> > +
> >> > +       /* vendor-defined PMU features */
> >> > +       void            *platform;
> >> > +};
> >> > +
> >> >  #endif /* _ASM_RISCV_PERF_EVENT_H */
> >> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> >> > index 196f62ffc428..849c38d9105f 100644
> >> > --- a/arch/riscv/kernel/Makefile
> >> > +++ b/arch/riscv/kernel/Makefile
> >> > @@ -36,5 +36,6 @@ obj-$(CONFIG_SMP)             += smp.o
> >> >  obj-$(CONFIG_MODULES)          += module.o
> >> >  obj-$(CONFIG_FUNCTION_TRACER)  += mcount.o
> >> >  obj-$(CONFIG_FUNCTION_GRAPH_TRACER)    += ftrace.o
> >> > +obj-$(CONFIG_PERF_EVENTS)      += perf_event.o
> >> >
> >> >  clean:
> >> > diff --git a/arch/riscv/kernel/perf_event.c b/arch/riscv/kernel/perf_
> >> > event.c
> >> > new file mode 100644
> >> > index 000000000000..b78cb486683b
> >> > --- /dev/null
> >> > +++ b/arch/riscv/kernel/perf_event.c
> >> > @@ -0,0 +1,469 @@
> >> > +/* SPDX-License-Identifier: GPL-2.0 */
> >> > +/*
> >> > + * Copyright (C) 2008 Thomas Gleixner <tglx@linutronix.de>
> >> > + * Copyright (C) 2008-2009 Red Hat, Inc., Ingo Molnar
> >> > + * Copyright (C) 2009 Jaswinder Singh Rajput
> >> > + * Copyright (C) 2009 Advanced Micro Devices, Inc., Robert Richter
> >> > + * Copyright (C) 2008-2009 Red Hat, Inc., Peter Zijlstra
> >> > + * Copyright (C) 2009 Intel Corporation, <markus.t.metzger@intel.com>
> >> > + * Copyright (C) 2009 Google, Inc., Stephane Eranian
> >> > + * Copyright 2014 Tilera Corporation. All Rights Reserved.
> >> > + * Copyright (C) 2018 Andes Technology Corporation
> >> > + *
> >> > + * Perf_events support for RISC-V platforms.
> >> > + *
> >> > + * Since the spec. (as of now, Priv-Spec 1.10) does not provide enough
> >> > + * functionality for perf event to fully work, this file provides
> >> > + * the very basic framework only.
> >> > + *
> >> > + * For platform portings, please check Documentations/riscv/pmu.txt.
> >> > + *
> >> > + * The Copyright line includes x86 and tile ones.
> >> > + */
> >> > +
> >> > +#include <linux/kprobes.h>
> >> > +#include <linux/kernel.h>
> >> > +#include <linux/kdebug.h>
> >> > +#include <linux/mutex.h>
> >> > +#include <linux/bitmap.h>
> >> > +#include <linux/irq.h>
> >> > +#include <linux/interrupt.h>
> >> > +#include <linux/perf_event.h>
> >> > +#include <linux/atomic.h>
> >> > +#include <asm/perf_event.h>
> >> > +
> >> > +static const struct riscv_pmu *riscv_pmu __read_mostly;
> >> > +static DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events);
> >> > +
> >> > +/*
> >> > + * Hardware & cache maps and their methods
> >> > + */
> >> > +
> >> > +static const int riscv_hw_event_map[] = {
> >> > +       [PERF_COUNT_HW_CPU_CYCLES]              = RISCV_PMU_CYCLE,
> >> > +       [PERF_COUNT_HW_INSTRUCTIONS]            = RISCV_PMU_INSTRET,
> >> > +       [PERF_COUNT_HW_CACHE_REFERENCES]        = RISCV_OP_UNSUPP,
> >> > +       [PERF_COUNT_HW_CACHE_MISSES]            = RISCV_OP_UNSUPP,
> >> > +       [PERF_COUNT_HW_BRANCH_INSTRUCTIONS]     = RISCV_OP_UNSUPP,
> >> > +       [PERF_COUNT_HW_BRANCH_MISSES]           = RISCV_OP_UNSUPP,
> >> > +       [PERF_COUNT_HW_BUS_CYCLES]              = RISCV_OP_UNSUPP,
> >> > +};
> >> > +
> >> > +#define C(x) PERF_COUNT_HW_CACHE_##x
> >> > +static const int riscv_cache_event_map[PERF_COUNT_HW_CACHE_MAX]
> >> > +[PERF_COUNT_HW_CACHE_OP_MAX]
> >> > +[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
> >> > +       [C(L1D)] = {
> >> > +               [C(OP_READ)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +               [C(OP_WRITE)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +               [C(OP_PREFETCH)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +       },
> >> > +       [C(L1I)] = {
> >> > +               [C(OP_READ)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +               [C(OP_WRITE)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +               [C(OP_PREFETCH)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +       },
> >> > +       [C(LL)] = {
> >> > +               [C(OP_READ)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +               [C(OP_WRITE)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +               [C(OP_PREFETCH)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +       },
> >> > +       [C(DTLB)] = {
> >> > +               [C(OP_READ)] = {
> >> > +                       [C(RESULT_ACCESS)] =  RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] =  RISCV_OP_UNSUPP,
> >> > +               },
> >> > +               [C(OP_WRITE)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +               [C(OP_PREFETCH)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +       },
> >> > +       [C(ITLB)] = {
> >> > +               [C(OP_READ)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +               [C(OP_WRITE)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +               [C(OP_PREFETCH)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +       },
> >> > +       [C(BPU)] = {
> >> > +               [C(OP_READ)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +               [C(OP_WRITE)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +               [C(OP_PREFETCH)] = {
> >> > +                       [C(RESULT_ACCESS)] = RISCV_OP_UNSUPP,
> >> > +                       [C(RESULT_MISS)] = RISCV_OP_UNSUPP,
> >> > +               },
> >> > +       },
> >> > +};
> >> > +
> >> > +static int riscv_map_hw_event(u64 config)
> >> > +{
> >> > +       if (config >= riscv_pmu->max_events)
> >> > +               return -EINVAL;
> >> > +
> >> > +       return riscv_pmu->hw_events[config];
> >> > +}
> >> > +
> >> > +int riscv_map_cache_decode(u64 config, unsigned int *type,
> >> > +                          unsigned int *op, unsigned int *result)
> >> > +{
> >> > +       return -ENOENT;
> >> > +}
> >> > +
> >> > +static int riscv_map_cache_event(u64 config)
> >> > +{
> >> > +       unsigned int type, op, result;
> >> > +       int err = -ENOENT;
> >> > +               int code;
> >> > +
> >> > +       err = riscv_map_cache_decode(config, &type, &op, &result);
> >> > +       if (!riscv_pmu->cache_events || err)
> >> > +               return err;
> >> > +
> >> > +       if (type >= PERF_COUNT_HW_CACHE_MAX ||
> >> > +           op >= PERF_COUNT_HW_CACHE_OP_MAX ||
> >> > +           result >= PERF_COUNT_HW_CACHE_RESULT_MAX)
> >> > +               return -EINVAL;
> >> > +
> >> > +       code = (*riscv_pmu->cache_events)[type][op][result];
> >> > +       if (code == RISCV_OP_UNSUPP)
> >> > +               return -EINVAL;
> >> > +
> >> > +       return code;
> >> > +}
> >> > +
> >> > +/*
> >> > + * Low-level functions: reading/writing counters
> >> > + */
> >> > +
> >> > +static inline u64 read_counter(int idx)
> >> > +{
> >> > +       u64 val = 0;
> >> > +
> >> > +       switch (idx) {
> >> > +       case RISCV_PMU_CYCLE:
> >> > +               val = csr_read(cycle);
> >> > +               break;
> >> > +       case RISCV_PMU_INSTRET:
> >> > +               val = csr_read(instret);
> >> > +               break;
> >> > +       default:
> >> > +               WARN_ON_ONCE(idx < 0 || idx > RISCV_MAX_COUNTERS);
> >> > +               return -EINVAL;
> >> > +       }
> >> > +
> >> > +       return val;
> >> > +}
> >> > +
> >> > +static inline void write_counter(int idx, u64 value)
> >> > +{
> >> > +       /* currently not supported */
> >> > +}
> >> > +
> >> > +/*
> >> > + * pmu->read: read and update the counter
> >> > + *
> >> > + * Other architectures' implementation often have a xxx_perf_event_update
> >> > + * routine, which can return counter values when called in the IRQ, but
> >> > + * return void when being called by the pmu->read method.
> >> > + */
> >> > +static void riscv_pmu_read(struct perf_event *event)
> >> > +{
> >> > +       struct hw_perf_event *hwc = &event->hw;
> >> > +       u64 prev_raw_count, new_raw_count;
> >> > +       u64 oldval;
> >> > +       int idx = hwc->idx;
> >> > +       u64 delta;
> >> > +
> >> > +       do {
> >> > +               prev_raw_count = local64_read(&hwc->prev_count);
> >> > +               new_raw_count = read_counter(idx);
> >> > +
> >> > +               oldval = local64_cmpxchg(&hwc->prev_count, prev_raw_count,
> >> > +                                        new_raw_count);
> >> > +       } while (oldval != prev_raw_count);
> >> > +
> >> > +       /*
> >> > +        * delta is the value to update the counter we maintain in the
> >> > kernel.
> >> > +        */
> >> > +       delta = (new_raw_count - prev_raw_count) &
> >> > +               ((1ULL << riscv_pmu->counter_width) - 1);
> >> > +       local64_add(delta, &event->count);
> >> > +       /*
> >> > +        * Something like local64_sub(delta, &hwc->period_left) here is
> >> > +        * needed if there is an interrupt for perf.
> >> > +        */
> >> > +}
> >> > +
> >> > +/*
> >> > + * State transition functions:
> >> > + *
> >> > + * stop()/start() & add()/del()
> >> > + */
> >> > +
> >> > +/*
> >> > + * pmu->stop: stop the counter
> >> > + */
> >> > +static void riscv_pmu_stop(struct perf_event *event, int flags)
> >> > +{
> >> > +       struct hw_perf_event *hwc = &event->hw;
> >> > +
> >> > +       WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
> >> > +       hwc->state |= PERF_HES_STOPPED;
> >> > +
> >> > +       if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE))
> >> > {
> >> > +               riscv_pmu_read(event);
> >> > +               hwc->state |= PERF_HES_UPTODATE;
> >> > +       }
> >> > +}
> >> > +
> >> > +/*
> >> > + * pmu->start: start the event.
> >> > + */
> >> > +static void riscv_pmu_start(struct perf_event *event, int flags)
> >> > +{
> >> > +       struct hw_perf_event *hwc = &event->hw;
> >> > +
> >> > +       if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
> >> > +               return;
> >> > +
> >> > +       if (flags & PERF_EF_RELOAD) {
> >> > +               WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
> >> > +
> >> > +               /*
> >> > +                * Set the counter to the period to the next interrupt
> >> > here,
> >> > +                * if you have any.
> >> > +                */
> >> > +       }
> >> > +
> >> > +       hwc->state = 0;
> >> > +       perf_event_update_userpage(event);
> >> > +
> >> > +       /*
> >> > +        * Since we cannot write to counters, this serves as an
> >> > initialization
> >> > +        * to the delta-mechanism in pmu->read(); otherwise, the delta
> >> > would be
> >> > +        * wrong when pmu->read is called for the first time.
> >> > +        */
> >> > +       if (local64_read(&hwc->prev_count) == 0)
> >> > +               local64_set(&hwc->prev_count, read_counter(hwc->idx));
> >> > +}
> >> > +
> >> > +/*
> >> > + * pmu->add: add the event to PMU.
> >> > + */
> >> > +static int riscv_pmu_add(struct perf_event *event, int flags)
> >> > +{
> >> > +       struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> >> > +       struct hw_perf_event *hwc = &event->hw;
> >> > +
> >> > +       if (cpuc->n_events == riscv_pmu->num_counters)
> >> > +               return -ENOSPC;
> >> > +
> >> > +       /*
> >> > +        * We don't have general conunters, so no binding-event-to-counter
> >> > +        * process here.
> >> > +        *
> >> > +        * Indexing using hwc->config generally not works, since config may
> >> > +        * contain extra information, but here the only info we have in
> >> > +        * hwc->config is the event index.
> >> > +        */
> >> > +       hwc->idx = hwc->config;
> >> > +       cpuc->events[hwc->idx] = event;
> >> > +       cpuc->n_events++;
> >> > +
> >> > +       hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
> >> > +
> >> > +       if (flags & PERF_EF_START)
> >> > +               riscv_pmu_start(event, PERF_EF_RELOAD);
> >> > +
> >> > +       return 0;
> >> > +}
> >> > +
> >> > +/*
> >> > + * pmu->del: delete the event from PMU.
> >> > + */
> >> > +static void riscv_pmu_del(struct perf_event *event, int flags)
> >> > +{
> >> > +       struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> >> > +       struct hw_perf_event *hwc = &event->hw;
> >> > +
> >> > +       cpuc->events[hwc->idx] = NULL;
> >> > +       cpuc->n_events--;
> >> > +       riscv_pmu_stop(event, PERF_EF_UPDATE);
> >> > +       perf_event_update_userpage(event);
> >> > +}
> >> > +
> >> > +/*
> >> > + * Interrupt
> >> > + */
> >> > +
> >> > +static DEFINE_MUTEX(pmc_reserve_mutex);
> >> > +typedef void (*perf_irq_t)(void *riscv_perf_irq);
> >> > +perf_irq_t perf_irq;
> >> > +
> >> > +void riscv_pmu_handle_irq(void *riscv_perf_irq)
> >> > +{
> >> > +}
> >> > +
> >> > +static perf_irq_t reserve_pmc_hardware(void)
> >> > +{
> >> > +       perf_irq_t old;
> >> > +
> >> > +       mutex_lock(&pmc_reserve_mutex);
> >> > +       old = perf_irq;
> >> > +       perf_irq = &riscv_pmu_handle_irq;
> >> > +       mutex_unlock(&pmc_reserve_mutex);
> >> > +
> >> > +       return old;
> >> > +}
> >> > +
> >> > +void release_pmc_hardware(void)
> >> > +{
> >> > +       mutex_lock(&pmc_reserve_mutex);
> >> > +       perf_irq = NULL;
> >> > +       mutex_unlock(&pmc_reserve_mutex);
> >> > +}
> >> > +
> >> > +/*
> >> > + * Event Initialization
> >> > + */
> >> > +
> >> > +static atomic_t riscv_active_events;
> >> > +
> >> > +static void riscv_event_destroy(struct perf_event *event)
> >> > +{
> >> > +       if (atomic_dec_return(&riscv_active_events) == 0)
> >> > +               release_pmc_hardware();
> >> > +}
> >> > +
> >> > +static int riscv_event_init(struct perf_event *event)
> >> > +{
> >> > +       struct perf_event_attr *attr = &event->attr;
> >> > +       struct hw_perf_event *hwc = &event->hw;
> >> > +       perf_irq_t old_irq_handler = NULL;
> >> > +       int code;
> >> > +
> >> > +       if (atomic_inc_return(&riscv_active_events) == 1)
> >> > +               old_irq_handler = reserve_pmc_hardware();
> >> > +
> >> > +       if (old_irq_handler) {
> >> > +               pr_warn("PMC hardware busy (reserved by oprofile)\n");
> >> > +               atomic_dec(&riscv_active_events);
> >> > +               return -EBUSY;
> >> > +       }
> >> > +
> >> > +       switch (event->attr.type) {
> >> > +       case PERF_TYPE_HARDWARE:
> >> > +               code = riscv_pmu->map_hw_event(attr->config);
> >> > +               break;
> >> > +       case PERF_TYPE_HW_CACHE:
> >> > +               code = riscv_pmu->map_cache_event(attr->config);
> >> > +               break;
> >> > +       case PERF_TYPE_RAW:
> >> > +               return -EOPNOTSUPP;
> >> > +       default:
> >> > +               return -ENOENT;
> >> > +       }
> >> > +
> >> > +       event->destroy = riscv_event_destroy;
> >> > +       if (code < 0) {
> >> > +               event->destroy(event);
> >> > +               return code;
> >> > +       }
> >> > +
> >> > +       /*
> >> > +        * idx is set to -1 because the index of a general event should
> >> > not be
> >> > +        * decided until binding to some counter in pmu->add().
> >> > +        *
> >> > +        * But since we don't have such support, later in pmu->add(), we
> >> > just
> >> > +        * use hwc->config as the index instead.
> >> > +        */
> >> > +       hwc->config = code;
> >> > +       hwc->idx = -1;
> >> > +
> >> > +       return 0;
> >> > +}
> >> > +
> >> > +/*
> >> > + * Initialization
> >> > + */
> >> > +
> >> > +static struct pmu min_pmu = {
> >> > +       .name           = "riscv-base",
> >> > +       .event_init     = riscv_event_init,
> >> > +       .add            = riscv_pmu_add,
> >> > +       .del            = riscv_pmu_del,
> >> > +       .start          = riscv_pmu_start,
> >> > +       .stop           = riscv_pmu_stop,
> >> > +       .read           = riscv_pmu_read,
> >> > +};
> >> > +
> >> > +static const struct riscv_pmu riscv_base_pmu = {
> >> > +       .pmu = &min_pmu,
> >> > +       .max_events = ARRAY_SIZE(riscv_hw_event_map),
> >> > +       .map_hw_event = riscv_map_hw_event,
> >> > +       .hw_events = riscv_hw_event_map,
> >> > +       .map_cache_event = riscv_map_cache_event,
> >> > +       .cache_events = &riscv_cache_event_map,
> >> > +       .counter_width = 63,
> >> > +       .num_counters = RISCV_BASE_COUNTERS + 0,
> >> > +};
> >> > +
> >> > +struct pmu * __weak __init riscv_init_platform_pmu(void)
> >> > +{
> >> > +       riscv_pmu = &riscv_base_pmu;
> >> > +       return riscv_pmu->pmu;
> >> > +}
> >> > +
> >> > +int __init init_hw_perf_events(void)
> >> > +{
> >> > +       struct pmu *pmu = riscv_init_platform_pmu();
> >> > +
> >> > +       perf_irq = NULL;
> >> > +       perf_pmu_register(pmu, "cpu", PERF_TYPE_RAW);
> >> > +       return 0;
> >> > +}
> >> > +arch_initcall(init_hw_perf_events);
> >> > --
> >> > 2.16.2
> >> >
> >> >
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/2] perf: riscv: preliminary RISC-V support
From: Alan Kao @ 2018-04-02  8:18 UTC (permalink / raw)
  To: Alex Solomatnikov
  Cc: Nick Hu, Jonathan Corbet, Peter Zijlstra, Palmer Dabbelt,
	linux-doc, linux-kernel, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Ingo Molnar, Albert Ou, Namhyung Kim,
	linux-riscv, Jiri Olsa, Greentime Hu
In-Reply-To: <20180402073611.GA7694@andestech.com>


Hi Alex,
 
On Mon, Apr 02, 2018 at 03:36:12PM +0800, Alan Kao wrote:
> On Sat, Mar 31, 2018 at 03:47:10PM -0700, Alex Solomatnikov wrote:
> 
> The original guess was that maybe, an counter value on a hart is picked
> as the minusend, and an old counter value on another hart was recorded
> as the subtrahend but numerically larger.  Then, the overflow causes 
> by that subtraction.  Please let me name this guess as 
> "cross-hart subtraction."
>
> > You can add a skew between cores in qemu, something like this:
> > 
> > case CSR_INSTRET:
> >         core_id()*return cpu_get_host_ticks()/10;
> >     break;
> > case CSR_CYCLE:
> >         return cpu_get_host_ticks();
> >     break;
> > 
> 
> However, I tried similar stuff to reproduce the phenomenon but in vain.
> It seems that the 
>
>               ***cross-hart subtration doesn't even happen, because generic
> code handles them.  ...

I am sorry that this observation is wrong.  With appropriate tweak, we
successfully reproduce the behavior and locate the the bug.

This will be fix in v2.


Thanks for the helps.
Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH V5] thermal: Add cooling device's statistics in sysfs
From: Viresh Kumar @ 2018-04-02 10:56 UTC (permalink / raw)
  To: Zhang Rui, Eduardo Valentin
  Cc: Viresh Kumar, Vincent Guittot, linux-doc, linux-kernel, linux-pm

This extends the sysfs interface for thermal cooling devices and exposes
some pretty useful statistics. These statistics have proven to be quite
useful specially while doing benchmarks related to the task scheduler,
where we want to make sure that nothing has disrupted the test,
specially the cooling device which may have put constraints on the CPUs.
The information exposed here tells us to what extent the CPUs were
constrained by the thermal framework.

The write-only "reset" file is used to reset the statistics.

The read-only "time_in_state_ms" file shows the time (in msec) spent by the
device in the respective cooling states, and it prints one line per
cooling state.

The read-only "total_trans" file shows single positive integer value
showing the total number of cooling state transitions the device has
gone through since the time the cooling device is registered or the time
when statistics were reset last.

The read-only "trans_table" file shows a two dimensional matrix, where
an entry <i,j> (row i, column j) represents the number of transitions
from State_i to State_j.

This is how the directory structure looks like for a single cooling
device:

$ ls -R /sys/class/thermal/cooling_device0/
/sys/class/thermal/cooling_device0/:
cur_state  max_state  power  stats  subsystem  type  uevent

/sys/class/thermal/cooling_device0/power:
autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
control               runtime_status

/sys/class/thermal/cooling_device0/stats:
reset  time_in_state_ms  total_trans  trans_table

This is tested on ARM 64-bit Hisilicon hikey620 board running Ubuntu and
ARM 64-bit Hisilicon hikey960 board running Android.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
V4->V5:
- time_in_state's unit is msec now instead of clock_t.
- Remove double setting of ->stats pointer.

V3->V4:
- Added CONFIG_THERMAL_STATISTICS
- Added transition table file in sysfs
- Updated documentation for new sysfs files
- The unit of time in time_in_state is clock_t now
- Separate routines for cooling device stat setup/destroy

V2->V3:
- Total number of states is max_level + 1. The earlier version didn't
  take that into account and so the stats for the highest state were
  missing.

V1->V2:
- Move to sysfs from debugfs

 Documentation/thermal/sysfs-api.txt |  31 +++++
 drivers/thermal/Kconfig             |   7 ++
 drivers/thermal/thermal_core.c      |   3 +-
 drivers/thermal/thermal_core.h      |  10 ++
 drivers/thermal/thermal_helpers.c   |   5 +-
 drivers/thermal/thermal_sysfs.c     | 225 ++++++++++++++++++++++++++++++++++++
 include/linux/thermal.h             |   1 +
 7 files changed, 280 insertions(+), 2 deletions(-)

diff --git a/Documentation/thermal/sysfs-api.txt b/Documentation/thermal/sysfs-api.txt
index bb9a0a53e76b..911399730c1c 100644
--- a/Documentation/thermal/sysfs-api.txt
+++ b/Documentation/thermal/sysfs-api.txt
@@ -255,6 +255,7 @@ temperature) and throttle appropriate devices.
 2. sysfs attributes structure
 
 RO	read only value
+WO	write only value
 RW	read/write value
 
 Thermal sysfs attributes will be represented under /sys/class/thermal.
@@ -286,6 +287,11 @@ if hwmon is compiled in or built as a module.
     |---type:			Type of the cooling device(processor/fan/...)
     |---max_state:		Maximum cooling state of the cooling device
     |---cur_state:		Current cooling state of the cooling device
+    |---stats:			Directory containing cooling device's statistics
+    |---stats/reset:		Writing any value resets the statistics
+    |---stats/time_in_state_ms:	Time (msec) spent in various cooling states
+    |---stats/total_trans:	Total number of times cooling state is changed
+    |---stats/trans_table:	Cooing state transition table
 
 
 Then next two dynamic attributes are created/removed in pairs. They represent
@@ -490,6 +496,31 @@ cur_state
 	- cur_state == max_state means the maximum cooling.
 	RW, Required
 
+stats/reset
+	Writing any value resets the cooling device's statistics.
+	WO, Required
+
+stats/time_in_state_ms:
+	The amount of time spent by the cooling device in various cooling
+	states. The output will have "<state> <time>" pair in each line, which
+	will mean this cooling device spent <time> msec of time at <state>.
+	Output will have one line for each of the supported states.  usertime
+	units here is 10mS (similar to other time exported in /proc).
+	RO, Required
+
+stats/total_trans:
+	A single positive value showing the total number of times the state of a
+	cooling device is changed.
+	RO, Required
+
+stats/trans_table:
+	This gives fine grained information about all the cooling state
+	transitions. The cat output here is a two dimensional matrix, where an
+	entry <i,j> (row i, column j) represents the number of transitions from
+	State_i to State_j. If the transition table is bigger than PAGE_SIZE,
+	reading this will return an -EFBIG error.
+	RO, Required
+
 3. A simple implementation
 
 ACPI thermal zone may support multiple trip points like critical, hot,
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index b6adc54b96f1..82979880f985 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -15,6 +15,13 @@ menuconfig THERMAL
 
 if THERMAL
 
+config THERMAL_STATISTICS
+	bool "Thermal state transition statistics"
+	help
+	  Export thermal state transition statistics information through sysfs.
+
+	  If in doubt, say N.
+
 config THERMAL_EMERGENCY_POWEROFF_DELAY_MS
 	int "Emergency poweroff delay in milli-seconds"
 	depends on THERMAL
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 2b1b0ba393a4..d64325e078db 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -972,8 +972,8 @@ __thermal_cooling_device_register(struct device_node *np,
 	cdev->ops = ops;
 	cdev->updated = false;
 	cdev->device.class = &thermal_class;
-	thermal_cooling_device_setup_sysfs(cdev);
 	cdev->devdata = devdata;
+	thermal_cooling_device_setup_sysfs(cdev);
 	dev_set_name(&cdev->device, "cooling_device%d", cdev->id);
 	result = device_register(&cdev->device);
 	if (result) {
@@ -1106,6 +1106,7 @@ void thermal_cooling_device_unregister(struct thermal_cooling_device *cdev)
 
 	ida_simple_remove(&thermal_cdev_ida, cdev->id);
 	device_unregister(&cdev->device);
+	thermal_cooling_device_destroy_sysfs(cdev);
 }
 EXPORT_SYMBOL_GPL(thermal_cooling_device_unregister);
 
diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index 27e3b1df7360..5e4150261500 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -73,6 +73,7 @@ int thermal_build_list_of_policies(char *buf);
 int thermal_zone_create_device_groups(struct thermal_zone_device *, int);
 void thermal_zone_destroy_device_groups(struct thermal_zone_device *);
 void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *);
+void thermal_cooling_device_destroy_sysfs(struct thermal_cooling_device *cdev);
 /* used only at binding time */
 ssize_t
 thermal_cooling_device_trip_point_show(struct device *,
@@ -84,6 +85,15 @@ ssize_t thermal_cooling_device_weight_store(struct device *,
 					    struct device_attribute *,
 					    const char *, size_t);
 
+#ifdef CONFIG_THERMAL_STATISTICS
+void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
+					 unsigned long new_state);
+#else
+static inline void
+thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
+				    unsigned long new_state) {}
+#endif /* CONFIG_THERMAL_STATISTICS */
+
 #ifdef CONFIG_THERMAL_GOV_STEP_WISE
 int thermal_gov_step_wise_register(void);
 void thermal_gov_step_wise_unregister(void);
diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c
index 8cdf75adcce1..eb03d7e099bb 100644
--- a/drivers/thermal/thermal_helpers.c
+++ b/drivers/thermal/thermal_helpers.c
@@ -187,7 +187,10 @@ void thermal_cdev_update(struct thermal_cooling_device *cdev)
 		if (instance->target > target)
 			target = instance->target;
 	}
-	cdev->ops->set_cur_state(cdev, target);
+
+	if (!cdev->ops->set_cur_state(cdev, target))
+		thermal_cooling_device_stats_update(cdev, target);
+
 	cdev->updated = true;
 	mutex_unlock(&cdev->lock);
 	trace_cdev_update(cdev, target);
diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c
index ba81c9080f6e..23b5e0a709b0 100644
--- a/drivers/thermal/thermal_sysfs.c
+++ b/drivers/thermal/thermal_sysfs.c
@@ -20,6 +20,7 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/string.h>
+#include <linux/jiffies.h>
 
 #include "thermal_core.h"
 
@@ -721,6 +722,7 @@ thermal_cooling_device_cur_state_store(struct device *dev,
 	result = cdev->ops->set_cur_state(cdev, state);
 	if (result)
 		return result;
+	thermal_cooling_device_stats_update(cdev, state);
 	return count;
 }
 
@@ -745,14 +747,237 @@ static const struct attribute_group cooling_device_attr_group = {
 
 static const struct attribute_group *cooling_device_attr_groups[] = {
 	&cooling_device_attr_group,
+	NULL, /* Space allocated for cooling_device_stats_attr_group */
 	NULL,
 };
 
+#ifdef CONFIG_THERMAL_STATISTICS
+struct cooling_dev_stats {
+	spinlock_t lock;
+	unsigned int total_trans;
+	unsigned long state;
+	unsigned long max_states;
+	ktime_t last_time;
+	ktime_t *time_in_state;
+	unsigned int *trans_table;
+};
+
+static void update_time_in_state(struct cooling_dev_stats *stats)
+{
+	ktime_t now = ktime_get(), delta;
+
+	delta = ktime_sub(now, stats->last_time);
+	stats->time_in_state[stats->state] =
+		ktime_add(stats->time_in_state[stats->state], delta);
+	stats->last_time = now;
+}
+
+void thermal_cooling_device_stats_update(struct thermal_cooling_device *cdev,
+					 unsigned long new_state)
+{
+	struct cooling_dev_stats *stats = cdev->stats;
+
+	spin_lock(&stats->lock);
+
+	if (stats->state == new_state)
+		goto unlock;
+
+	update_time_in_state(stats);
+	stats->trans_table[stats->state * stats->max_states + new_state]++;
+	stats->state = new_state;
+	stats->total_trans++;
+
+unlock:
+	spin_unlock(&stats->lock);
+}
+
+static ssize_t
+thermal_cooling_device_total_trans_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct thermal_cooling_device *cdev = to_cooling_device(dev);
+	struct cooling_dev_stats *stats = cdev->stats;
+	int ret;
+
+	spin_lock(&stats->lock);
+	ret = sprintf(buf, "%u\n", stats->total_trans);
+	spin_unlock(&stats->lock);
+
+	return ret;
+}
+
+static ssize_t
+thermal_cooling_device_time_in_state_show(struct device *dev,
+					  struct device_attribute *attr,
+					  char *buf)
+{
+	struct thermal_cooling_device *cdev = to_cooling_device(dev);
+	struct cooling_dev_stats *stats = cdev->stats;
+	ssize_t len = 0;
+	int i;
+
+	spin_lock(&stats->lock);
+	update_time_in_state(stats);
+
+	for (i = 0; i < stats->max_states; i++) {
+		len += sprintf(buf + len, "state%u\t%llu\n", i,
+			       ktime_to_ms(stats->time_in_state[i]));
+	}
+	spin_unlock(&stats->lock);
+
+	return len;
+}
+
+static ssize_t
+thermal_cooling_device_reset_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t count)
+{
+	struct thermal_cooling_device *cdev = to_cooling_device(dev);
+	struct cooling_dev_stats *stats = cdev->stats;
+	int i, states = stats->max_states;
+
+	spin_lock(&stats->lock);
+
+	stats->total_trans = 0;
+	stats->last_time = ktime_get();
+	memset(stats->trans_table, 0,
+	       states * states * sizeof(*stats->trans_table));
+
+	for (i = 0; i < stats->max_states; i++)
+		stats->time_in_state[i] = ktime_set(0, 0);
+
+	spin_unlock(&stats->lock);
+
+	return count;
+}
+
+static ssize_t
+thermal_cooling_device_trans_table_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct thermal_cooling_device *cdev = to_cooling_device(dev);
+	struct cooling_dev_stats *stats = cdev->stats;
+	ssize_t len = 0;
+	int i, j;
+
+	len += snprintf(buf + len, PAGE_SIZE - len, " From  :    To\n");
+	len += snprintf(buf + len, PAGE_SIZE - len, "       : ");
+	for (i = 0; i < stats->max_states; i++) {
+		if (len >= PAGE_SIZE)
+			break;
+		len += snprintf(buf + len, PAGE_SIZE - len, "state%2u  ", i);
+	}
+	if (len >= PAGE_SIZE)
+		return PAGE_SIZE;
+
+	len += snprintf(buf + len, PAGE_SIZE - len, "\n");
+
+	for (i = 0; i < stats->max_states; i++) {
+		if (len >= PAGE_SIZE)
+			break;
+
+		len += snprintf(buf + len, PAGE_SIZE - len, "state%2u:", i);
+
+		for (j = 0; j < stats->max_states; j++) {
+			if (len >= PAGE_SIZE)
+				break;
+			len += snprintf(buf + len, PAGE_SIZE - len, "%8u ",
+				stats->trans_table[i * stats->max_states + j]);
+		}
+		if (len >= PAGE_SIZE)
+			break;
+		len += snprintf(buf + len, PAGE_SIZE - len, "\n");
+	}
+
+	if (len >= PAGE_SIZE) {
+		pr_warn_once("Thermal transition table exceeds PAGE_SIZE. Disabling\n");
+		return -EFBIG;
+	}
+	return len;
+}
+
+static DEVICE_ATTR(total_trans, 0444, thermal_cooling_device_total_trans_show,
+		   NULL);
+static DEVICE_ATTR(time_in_state_ms, 0444,
+		   thermal_cooling_device_time_in_state_show, NULL);
+static DEVICE_ATTR(reset, 0200, NULL, thermal_cooling_device_reset_store);
+static DEVICE_ATTR(trans_table, 0444,
+		   thermal_cooling_device_trans_table_show, NULL);
+
+static struct attribute *cooling_device_stats_attrs[] = {
+	&dev_attr_total_trans.attr,
+	&dev_attr_time_in_state_ms.attr,
+	&dev_attr_reset.attr,
+	&dev_attr_trans_table.attr,
+	NULL
+};
+
+static const struct attribute_group cooling_device_stats_attr_group = {
+	.attrs = cooling_device_stats_attrs,
+	.name = "stats"
+};
+
+static void cooling_device_stats_setup(struct thermal_cooling_device *cdev)
+{
+	struct cooling_dev_stats *stats;
+	unsigned long states;
+	int var;
+
+	if (cdev->ops->get_max_state(cdev, &states))
+		return;
+
+	states++; /* Total number of states is highest state + 1 */
+
+	var = sizeof(*stats);
+	var += sizeof(*stats->time_in_state) * states;
+	var += sizeof(*stats->trans_table) * states * states;
+
+	stats = kzalloc(var, GFP_KERNEL);
+	if (!stats)
+		return;
+
+	stats->time_in_state = (ktime_t *)(stats + 1);
+	stats->trans_table = (unsigned int *)(stats->time_in_state + states);
+	cdev->stats = stats;
+	stats->last_time = ktime_get();
+	stats->max_states = states;
+
+	spin_lock_init(&stats->lock);
+
+	/* Fill the empty slot left in cooling_device_attr_groups */
+	var = ARRAY_SIZE(cooling_device_attr_groups) - 2;
+	cooling_device_attr_groups[var] = &cooling_device_stats_attr_group;
+}
+
+static void cooling_device_stats_destroy(struct thermal_cooling_device *cdev)
+{
+	kfree(cdev->stats);
+	cdev->stats = NULL;
+}
+
+#else
+
+static inline void
+cooling_device_stats_setup(struct thermal_cooling_device *cdev) {}
+static inline void
+cooling_device_stats_destroy(struct thermal_cooling_device *cdev) {}
+
+#endif /* CONFIG_THERMAL_STATISTICS */
+
 void thermal_cooling_device_setup_sysfs(struct thermal_cooling_device *cdev)
 {
+	cooling_device_stats_setup(cdev);
 	cdev->device.groups = cooling_device_attr_groups;
 }
 
+void thermal_cooling_device_destroy_sysfs(struct thermal_cooling_device *cdev)
+{
+	cooling_device_stats_destroy(cdev);
+}
+
 /* these helper will be used only at the time of bindig */
 ssize_t
 thermal_cooling_device_trip_point_show(struct device *dev,
diff --git a/include/linux/thermal.h b/include/linux/thermal.h
index 8c5302374eaa..7834be668d80 100644
--- a/include/linux/thermal.h
+++ b/include/linux/thermal.h
@@ -148,6 +148,7 @@ struct thermal_cooling_device {
 	struct device device;
 	struct device_node *np;
 	void *devdata;
+	void *stats;
 	const struct thermal_cooling_device_ops *ops;
 	bool updated; /* true if the cooling device does not need update */
 	struct mutex lock; /* protect thermal_instances list */
-- 
2.15.0.194.g9af6a3dea062

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 0/6] KASan for arm
From: Abbott Liu @ 2018-04-02 12:04 UTC (permalink / raw)
  To: aryabinin, dvyukov, corbet, linux, christoffer.dall, marc.zyngier,
	kstewart, gregkh, f.fainelli, liuwenliang, akpm, linux, mawilcox,
	pombredanne, ard.biesheuvel, vladimir.murzin, alexander.levin,
	nicolas.pitre, tglx, thgarnie, dhowells, keescook, arnd, geert,
	tixy, julien.thierry, mark.rutland, james.morse, zhichao.huang,
	jinb.park7, labbott, philip, grygorii.strashko, catalin.marinas,
	opendmb, kirill.shutemov, kasan-dev, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-mm

From: Andrey Ryabinin <a.ryabinin@samsung.com>

Changelog:
v3 - v2
- Remove this patch: 2 1-byte checks more safer for memory_is_poisoned_16
  because a unaligned load/store of 16 bytes is rare on arm, and this
  patch is very likely to affect the performance of modern CPUs.
  ---Acked by: Russell King - ARM Linux <linux@armlinux.org.uk>
- Fixed some link error which kasan_pmd_populate,kasan_pte_populate and
  kasan_pud_populate are in section .meminit.text but the function
  kasan_alloc_block which is called by kasan_pmd_populate,
  kasan_pte_populate and kasan_pud_populate is in section .init.text. So
  we need change kasan_pmd_populate,kasan_pte_populate and
  kasan_pud_populate into the section .init.text.
  ---Reported by: Florian Fainelli <f.fainelli@gmail.com>
- Fixed some compile error which caused by the wrong access instruction in
  arch/arm/kernel/entry-common.S.
  ---Reported by: kbuild test robot <lkp@intel.com>
- Disable instrumentation for arch/arm/kvm/hyp/*.
  ---Acked by: Marc Zyngier <marc.zyngier@arm.com>
- Update the set of supported architectures in
  Documentation/dev-tools/kasan.rst.
  ---Acked by:Dmitry Vyukov <dvyukov@google.com>
- The version 2 is tested by:
  Florian Fainelli <f.fainelli@gmail.com> (compile test)
  kbuild test robot <lkp@intel.com>       (compile test)
  Joel Stanley <joel@jms.id.au>           (on ASPEED ast2500(ARMv5))
  
v2 - v1
- Fixed some compiling error which happens on changing kernel compression
  mode to lzma/xz/lzo/lz4.
  ---Reported by: Florian Fainelli <f.fainelli@gmail.com>,
             Russell King - ARM Linux <linux@armlinux.org.uk>
- Fixed a compiling error cause by some older arm instruction set(armv4t)
  don't suppory movw/movt which is reported by kbuild.
- Changed the pte flag from _L_PTE_DEFAULT | L_PTE_DIRTY | L_PTE_XN to
  pgprot_val(PAGE_KERNEL).
  ---Reported by: Russell King - ARM Linux <linux@armlinux.org.uk>
- Moved Enable KASan patch as the last one.
  ---Reported by: Florian Fainelli <f.fainelli@gmail.com>,
     Russell King - ARM Linux <linux@armlinux.org.uk>
- Moved the definitions of cp15 registers from
  arch/arm/include/asm/kvm_hyp.h to arch/arm/include/asm/cp15.h.
  ---Asked by: Mark Rutland <mark.rutland@arm.com>
- Merge the following commits into the commit
  Define the virtual space of KASan's shadow region:
  1) Define the virtual space of KASan's shadow region;
  2) Avoid cleaning the KASan shadow area's mapping table;
  3) Add KASan layout;
- Merge the following commits into the commit
  Initialize the mapping of KASan shadow memory:
  1) Initialize the mapping of KASan shadow memory;
  2) Add support arm LPAE;
  3) Don't need to map the shadow of KASan's shadow memory;
     ---Reported by: Russell King - ARM Linux <linux@armlinux.org.uk>
  4) Change mapping of kasan_zero_page int readonly.
- The version 1 is tested by Florian Fainelli <f.fainelli@gmail.com>
  on a Cortex-A5 (no LPAE).

Hi,all:
   These patches add arch specific code for kernel address sanitizer
(see Documentation/kasan.txt).

   1/8 of kernel addresses reserved for shadow memory. There was no
big enough hole for this, so virtual addresses for shadow were
stolen from user space.

   At early boot stage the whole shadow region populated with just
one physical page (kasan_zero_page). Later, this page reused
as readonly zero shadow for some memory that KASan currently
don't track (vmalloc).

  After mapping the physical memory, pages for shadow memory are
allocated and mapped.

  KASan's stack instrumentation significantly increases stack's
consumption, so CONFIG_KASAN doubles THREAD_SIZE.

  Functions like memset/memmove/memcpy do a lot of memory accesses.
If bad pointer passed to one of these function it is important
to catch this. Compiler's instrumentation cannot do this since
these functions are written in assembly.

  KASan replaces memory functions with manually instrumented variants.
Original functions declared as weak symbols so strong definitions
in mm/kasan/kasan.c could replace them. Original functions have aliases
with '__' prefix in name, so we could call non-instrumented variant
if needed.

  Some files built without kasan instrumentation (e.g. mm/slub.c).
Original mem* function replaced (via #define) with prefixed variants
to disable memory access checks for such files.

  On arm LPAE architecture,  the mapping table of KASan shadow memory(if
PAGE_OFFSET is 0xc0000000, the KASan shadow memory's virtual space is
0xb6e000000~0xbf000000) can't be filled in do_translation_fault function,
because kasan instrumentation maybe cause do_translation_fault function
accessing KASan shadow memory. The accessing of KASan shadow memory in
do_translation_fault function maybe cause dead circle. So the mapping table
of KASan shadow memory need be copyed in pgd_alloc function.


Most of the code comes from:
https://github.com/aryabinin/linux/commit/0b54f17e70ff50a902c4af05bb92716eb95acefe

These patches are tested on vexpress-ca15, vexpress-ca9



Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Joel Stanley <joel@jms.id.au>
Tested-by: Abbott Liu <liuwenliang@huawei.com>
Signed-off-by: Abbott Liu <liuwenliang@huawei.com>

Abbott Liu (2):
  Add TTBR operator for kasan_init
  Define the virtual space of KASan's shadow region

Andrey Ryabinin (4):
  Disable instrumentation for some code
  Replace memory function for kasan
  Initialize the mapping of KASan shadow memory
  Enable KASan for arm

 Documentation/dev-tools/kasan.rst     |   2 +-
 arch/arm/Kconfig                      |   1 +
 arch/arm/boot/compressed/Makefile     |   1 +
 arch/arm/boot/compressed/decompress.c |   2 +
 arch/arm/boot/compressed/libfdt_env.h |   2 +
 arch/arm/include/asm/cp15.h           | 104 ++++++++++++
 arch/arm/include/asm/kasan.h          |  35 ++++
 arch/arm/include/asm/kasan_def.h      |  64 +++++++
 arch/arm/include/asm/kvm_hyp.h        |  52 ------
 arch/arm/include/asm/memory.h         |   5 +
 arch/arm/include/asm/pgalloc.h        |   7 +-
 arch/arm/include/asm/string.h         |  17 ++
 arch/arm/include/asm/thread_info.h    |   4 +
 arch/arm/kernel/entry-armv.S          |   5 +-
 arch/arm/kernel/entry-common.S        |   9 +-
 arch/arm/kernel/head-common.S         |   7 +-
 arch/arm/kernel/setup.c               |   2 +
 arch/arm/kernel/unwind.c              |   3 +-
 arch/arm/kvm/hyp/Makefile             |   4 +
 arch/arm/kvm/hyp/cp15-sr.c            |  12 +-
 arch/arm/kvm/hyp/switch.c             |   6 +-
 arch/arm/lib/memcpy.S                 |   3 +
 arch/arm/lib/memmove.S                |   5 +-
 arch/arm/lib/memset.S                 |   3 +
 arch/arm/mm/Makefile                  |   3 +
 arch/arm/mm/init.c                    |   6 +
 arch/arm/mm/kasan_init.c              | 302 ++++++++++++++++++++++++++++++++++
 arch/arm/mm/mmu.c                     |   7 +-
 arch/arm/mm/pgd.c                     |  14 ++
 arch/arm/vdso/Makefile                |   2 +
 mm/kasan/kasan.c                      |   5 +-
 31 files changed, 618 insertions(+), 76 deletions(-)
 create mode 100644 arch/arm/include/asm/kasan.h
 create mode 100644 arch/arm/include/asm/kasan_def.h
 create mode 100644 arch/arm/mm/kasan_init.c

-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v3 4/6] Define the virtual space of KASan's shadow region
From: Abbott Liu @ 2018-04-02 12:04 UTC (permalink / raw)
  To: aryabinin, dvyukov, corbet, linux, christoffer.dall, marc.zyngier,
	kstewart, gregkh, f.fainelli, liuwenliang, akpm, linux, mawilcox,
	pombredanne, ard.biesheuvel, vladimir.murzin, alexander.levin,
	nicolas.pitre, tglx, thgarnie, dhowells, keescook, arnd, geert,
	tixy, julien.thierry, mark.rutland, james.morse, zhichao.huang,
	jinb.park7, labbott, philip, grygorii.strashko, catalin.marinas,
	opendmb, kirill.shutemov, kasan-dev, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-mm
In-Reply-To: <20180402120440.31900-1-liuwenliang@huawei.com>

Define KASAN_SHADOW_OFFSET,KASAN_SHADOW_START and KASAN_SHADOW_END for arm
kernel address sanitizer.

     +----+ 0xffffffff
     |    |
     |    |
     |    |
     +----+ CONFIG_PAGE_OFFSET
     |    |     |    | |->  module virtual address space area.
     |    |/
     +----+ MODULE_VADDR = KASAN_SHADOW_END
     |    |     |    | |-> the shadow area of kernel virtual address.
     |    |/
     +----+ TASK_SIZE(start of kernel space) = KASAN_SHADOW_START  the
     |    |\  shadow address of MODULE_VADDR
     |    | ---------------------+
     |    |                      |
     +    + KASAN_SHADOW_OFFSET  |-> the user space area. Kernel address
     |    |                      |    sanitizer do not use this space.
     |    | ---------------------+
     |    |/
     ------ 0

1)KASAN_SHADOW_OFFSET:
  This value is used to map an address to the corresponding shadow
address by the following formula:
shadow_addr = (address >> 3) + KASAN_SHADOW_OFFSET;

2)KASAN_SHADOW_START
  This value is the MODULE_VADDR's shadow address. It is the start
of kernel virtual space.

3)KASAN_SHADOW_END
  This value is the 0x100000000's shadow address. It is the end of
kernel addresssanitizer's shadow area. It is also the start of the
module area.

When enable kasan, the definition of TASK_SIZE is not an an 8-bit
rotated constant, so we need to modify the TASK_SIZE access code
in the *.s file.

Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Russell King - ARM Linux <linux@armlinux.org.uk>
Tested-by: Joel Stanley <joel@jms.id.au>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Abbott Liu <liuwenliang@huawei.com>
Signed-off-by: Abbott Liu <liuwenliang@huawei.com>
---
 arch/arm/include/asm/kasan_def.h | 64 ++++++++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/memory.h    |  5 ++++
 arch/arm/kernel/entry-armv.S     |  5 ++--
 arch/arm/kernel/entry-common.S   |  9 ++++--
 arch/arm/mm/init.c               |  6 ++++
 arch/arm/mm/mmu.c                |  7 ++++-
 6 files changed, 90 insertions(+), 6 deletions(-)
 create mode 100644 arch/arm/include/asm/kasan_def.h

diff --git a/arch/arm/include/asm/kasan_def.h b/arch/arm/include/asm/kasan_def.h
new file mode 100644
index 0000000..7b7f424
--- /dev/null
+++ b/arch/arm/include/asm/kasan_def.h
@@ -0,0 +1,64 @@
+/*
+ *  arch/arm/include/asm/kasan_def.h
+ *
+ *  Copyright (c) 2018 Huawei Technologies Co., Ltd.
+ *
+ *  Author: Abbott Liu <liuwenliang@huawei.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __ASM_KASAN_DEF_H
+#define __ASM_KASAN_DEF_H
+
+#ifdef CONFIG_KASAN
+
+/*
+ *    +----+ 0xffffffff
+ *    |    |
+ *    |    |
+ *    |    |
+ *    +----+ CONFIG_PAGE_OFFSET
+ *    |    |\
+ *    |    | |->  module virtual address space area.
+ *    |    |/
+ *    +----+ MODULE_VADDR = KASAN_SHADOW_END
+ *    |    |\
+ *    |    | |-> the shadow area of kernel virtual address.
+ *    |    |/
+ *    +----+ TASK_SIZE(start of kernel space) = KASAN_SHADOW_START  the
+ *    |    |\  shadow address of MODULE_VADDR
+ *    |    | ---------------------+
+ *    |    |                      |
+ *    +    + KASAN_SHADOW_OFFSET  |-> the user space area. Kernel address
+ *    |    |                      |    sanitizer do not use this space.
+ *    |    | ---------------------+
+ *    |    |/
+ *    ------ 0
+ *
+ *1)KASAN_SHADOW_OFFSET:
+ *    This value is used to map an address to the corresponding shadow
+ * address by the following formula:
+ * shadow_addr = (address >> 3) + KASAN_SHADOW_OFFSET;
+ *
+ * 2)KASAN_SHADOW_START
+ *     This value is the MODULE_VADDR's shadow address. It is the start
+ * of kernel virtual space.
+ *
+ * 3) KASAN_SHADOW_END
+ *   This value is the 0x100000000's shadow address. It is the end of
+ * kernel addresssanitizer's shadow area. It is also the start of the
+ * module area.
+ *
+ */
+
+#define KASAN_SHADOW_OFFSET     (KASAN_SHADOW_END - (1<<29))
+
+#define KASAN_SHADOW_START      ((KASAN_SHADOW_END >> 3) + KASAN_SHADOW_OFFSET)
+
+#define KASAN_SHADOW_END        (UL(CONFIG_PAGE_OFFSET) - UL(SZ_16M))
+
+#endif
+#endif
diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h
index 4966677..3ce1a9a 100644
--- a/arch/arm/include/asm/memory.h
+++ b/arch/arm/include/asm/memory.h
@@ -21,6 +21,7 @@
 #ifdef CONFIG_NEED_MACH_MEMORY_H
 #include <mach/memory.h>
 #endif
+#include <asm/kasan_def.h>
 
 /*
  * Allow for constants defined here to be used from assembly code
@@ -37,7 +38,11 @@
  * TASK_SIZE - the maximum size of a user space task.
  * TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area
  */
+#ifndef CONFIG_KASAN
 #define TASK_SIZE		(UL(CONFIG_PAGE_OFFSET) - UL(SZ_16M))
+#else
+#define TASK_SIZE		(KASAN_SHADOW_START)
+#endif
 #define TASK_UNMAPPED_BASE	ALIGN(TASK_SIZE / 3, SZ_16M)
 
 /*
diff --git a/arch/arm/kernel/entry-armv.S b/arch/arm/kernel/entry-armv.S
index 1752033..b4de9e4 100644
--- a/arch/arm/kernel/entry-armv.S
+++ b/arch/arm/kernel/entry-armv.S
@@ -183,7 +183,7 @@ ENDPROC(__und_invalid)
 
 	get_thread_info tsk
 	ldr	r0, [tsk, #TI_ADDR_LIMIT]
-	mov	r1, #TASK_SIZE
+	ldr	r1, =TASK_SIZE
 	str	r1, [tsk, #TI_ADDR_LIMIT]
 	str	r0, [sp, #SVC_ADDR_LIMIT]
 
@@ -437,7 +437,8 @@ ENDPROC(__fiq_abt)
 	@ if it was interrupted in a critical region.  Here we
 	@ perform a quick test inline since it should be false
 	@ 99.9999% of the time.  The rest is done out of line.
-	cmp	r4, #TASK_SIZE
+	ldr	r0, =TASK_SIZE
+	cmp	r4, r0
 	blhs	kuser_cmpxchg64_fixup
 #endif
 #endif
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 3c4f887..78046de 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -51,7 +51,8 @@ ret_fast_syscall:
  UNWIND(.cantunwind	)
 	disable_irq_notrace			@ disable interrupts
 	ldr	r2, [tsk, #TI_ADDR_LIMIT]
-	cmp	r2, #TASK_SIZE
+	ldr	r1, =TASK_SIZE
+	cmp	r2, r1
 	blne	addr_limit_check_failed
 	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
 	tst	r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
@@ -81,7 +82,8 @@ ret_fast_syscall:
 	str	r0, [sp, #S_R0 + S_OFF]!	@ save returned r0
 	disable_irq_notrace			@ disable interrupts
 	ldr	r2, [tsk, #TI_ADDR_LIMIT]
-	cmp	r2, #TASK_SIZE
+	ldr     r1, =TASK_SIZE
+	cmp     r2, r1
 	blne	addr_limit_check_failed
 	ldr	r1, [tsk, #TI_FLAGS]		@ re-check for syscall tracing
 	tst	r1, #_TIF_SYSCALL_WORK | _TIF_WORK_MASK
@@ -116,7 +118,8 @@ ret_slow_syscall:
 	disable_irq_notrace			@ disable interrupts
 ENTRY(ret_to_user_from_irq)
 	ldr	r2, [tsk, #TI_ADDR_LIMIT]
-	cmp	r2, #TASK_SIZE
+	ldr     r1, =TASK_SIZE
+	cmp	r2, r1
 	blne	addr_limit_check_failed
 	ldr	r1, [tsk, #TI_FLAGS]
 	tst	r1, #_TIF_WORK_MASK
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index bd6f451..da11f61 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -538,6 +538,9 @@ void __init mem_init(void)
 #ifdef CONFIG_MODULES
 			"    modules : 0x%08lx - 0x%08lx   (%4ld MB)\n"
 #endif
+#ifdef CONFIG_KASAN
+			"    kasan   : 0x%08lx - 0x%08lx   (%4ld MB)\n"
+#endif
 			"      .text : 0x%p" " - 0x%p" "   (%4td kB)\n"
 			"      .init : 0x%p" " - 0x%p" "   (%4td kB)\n"
 			"      .data : 0x%p" " - 0x%p" "   (%4td kB)\n"
@@ -558,6 +561,9 @@ void __init mem_init(void)
 #ifdef CONFIG_MODULES
 			MLM(MODULES_VADDR, MODULES_END),
 #endif
+#ifdef CONFIG_KASAN
+			MLM(KASAN_SHADOW_START, KASAN_SHADOW_END),
+#endif
 
 			MLK_ROUNDUP(_text, _etext),
 			MLK_ROUNDUP(__init_begin, __init_end),
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index e46a6a4..f5aa1de 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1251,9 +1251,14 @@ static inline void prepare_page_table(void)
 	/*
 	 * Clear out all the mappings below the kernel image.
 	 */
-	for (addr = 0; addr < MODULES_VADDR; addr += PMD_SIZE)
+	for (addr = 0; addr < TASK_SIZE; addr += PMD_SIZE)
 		pmd_clear(pmd_off_k(addr));
 
+#ifdef CONFIG_KASAN
+	/*TASK_SIZE ~ MODULES_VADDR is the KASAN's shadow area -- skip over it*/
+	addr = MODULES_VADDR;
+#endif
+
 #ifdef CONFIG_XIP_KERNEL
 	/* The XIP kernel is mapped in the module area -- skip over it */
 	addr = ((unsigned long)_exiprom + PMD_SIZE - 1) & PMD_MASK;
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 6/6] Enable KASan for arm
From: Abbott Liu @ 2018-04-02 12:04 UTC (permalink / raw)
  To: aryabinin, dvyukov, corbet, linux, christoffer.dall, marc.zyngier,
	kstewart, gregkh, f.fainelli, liuwenliang, akpm, linux, mawilcox,
	pombredanne, ard.biesheuvel, vladimir.murzin, alexander.levin,
	nicolas.pitre, tglx, thgarnie, dhowells, keescook, arnd, geert,
	tixy, julien.thierry, mark.rutland, james.morse, zhichao.huang,
	jinb.park7, labbott, philip, grygorii.strashko, catalin.marinas,
	opendmb, kirill.shutemov, kasan-dev, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-mm
In-Reply-To: <20180402120440.31900-1-liuwenliang@huawei.com>

From: Andrey Ryabinin <a.ryabinin@samsung.com>

This patch enable kernel address sanitizer for arm.

Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Joel Stanley <joel@jms.id.au>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Abbott Liu <liuwenliang@huawei.com>
Signed-off-by: Abbott Liu <liuwenliang@huawei.com>
---
 Documentation/dev-tools/kasan.rst | 2 +-
 arch/arm/Kconfig                  | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst
index f7a18f2..d92120d 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -12,7 +12,7 @@ KASAN uses compile-time instrumentation for checking every memory access,
 therefore you will need a GCC version 4.9.2 or later. GCC 5.0 or later is
 required for detection of out-of-bounds accesses to stack or global variables.
 
-Currently KASAN is supported only for the x86_64 and arm64 architectures.
+Currently KASAN is supported only for the x86_64, arm64 and arm architectures.
 
 Usage
 -----
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 7e3d535..ac2287b 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -49,6 +49,7 @@ config ARM
 	select HAVE_ARCH_BITREVERSE if (CPU_32v7M || CPU_32v7) && !CPU_32v6
 	select HAVE_ARCH_JUMP_LABEL if !XIP_KERNEL && !CPU_ENDIAN_BE32 && MMU
 	select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU
+	select HAVE_ARCH_KASAN if MMU
 	select HAVE_ARCH_MMAP_RND_BITS if MMU
 	select HAVE_ARCH_SECCOMP_FILTER if (AEABI && !OABI_COMPAT)
 	select HAVE_ARCH_THREAD_STRUCT_WHITELIST
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 1/6] Add TTBR operator for kasan_init
From: Abbott Liu @ 2018-04-02 12:04 UTC (permalink / raw)
  To: aryabinin, dvyukov, corbet, linux, christoffer.dall, marc.zyngier,
	kstewart, gregkh, f.fainelli, liuwenliang, akpm, linux, mawilcox,
	pombredanne, ard.biesheuvel, vladimir.murzin, alexander.levin,
	nicolas.pitre, tglx, thgarnie, dhowells, keescook, arnd, geert,
	tixy, julien.thierry, mark.rutland, james.morse, zhichao.huang,
	jinb.park7, labbott, philip, grygorii.strashko, catalin.marinas,
	opendmb, kirill.shutemov, kasan-dev, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-mm
In-Reply-To: <20180402120440.31900-1-liuwenliang@huawei.com>

The purpose of this patch is to provide set_ttbr0/get_ttbr0
to kasan_init function. The definitions of cp15 registers
should be in arch/arm/include/asm/cp15.h rather than
arch/arm/include/asm/kvm_hyp.h, so move them.

Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Russell King - ARM Linux <linux@armlinux.org.uk>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Joel Stanley <joel@jms.id.au>
Tested-by: Abbott Liu <liuwenliang@huawei.com>
Signed-off-by: Abbott Liu <liuwenliang@huawei.com>
---
 arch/arm/include/asm/cp15.h    | 104 +++++++++++++++++++++++++++++++++++++++++
 arch/arm/include/asm/kvm_hyp.h |  52 ---------------------
 arch/arm/kvm/hyp/cp15-sr.c     |  12 ++---
 arch/arm/kvm/hyp/switch.c      |   6 +--
 4 files changed, 113 insertions(+), 61 deletions(-)

diff --git a/arch/arm/include/asm/cp15.h b/arch/arm/include/asm/cp15.h
index 4c9fa72..99ebb31 100644
--- a/arch/arm/include/asm/cp15.h
+++ b/arch/arm/include/asm/cp15.h
@@ -3,6 +3,7 @@
 #define __ASM_ARM_CP15_H
 
 #include <asm/barrier.h>
+#include <linux/stringify.h>
 
 /*
  * CR1 bits (CP#15 CR1)
@@ -65,8 +66,111 @@
 #define __write_sysreg(v, r, w, c, t)	asm volatile(w " " c : : "r" ((t)(v)))
 #define write_sysreg(v, ...)		__write_sysreg(v, __VA_ARGS__)
 
+#define TTBR0_32	__ACCESS_CP15(c2, 0, c0, 0)
+#define TTBR1_32	__ACCESS_CP15(c2, 0, c0, 1)
+#define PAR_32		__ACCESS_CP15(c7, 0, c4, 0)
+#define TTBR0_64	__ACCESS_CP15_64(0, c2)
+#define TTBR1_64	__ACCESS_CP15_64(1, c2)
+#define PAR_64		__ACCESS_CP15_64(0, c7)
+#define VTTBR		__ACCESS_CP15_64(6, c2)
+#define CNTV_CVAL	__ACCESS_CP15_64(3, c14)
+#define CNTVOFF		__ACCESS_CP15_64(4, c14)
+
+#define MIDR		__ACCESS_CP15(c0, 0, c0, 0)
+#define CSSELR		__ACCESS_CP15(c0, 2, c0, 0)
+#define VPIDR		__ACCESS_CP15(c0, 4, c0, 0)
+#define VMPIDR		__ACCESS_CP15(c0, 4, c0, 5)
+#define SCTLR		__ACCESS_CP15(c1, 0, c0, 0)
+#define CPACR		__ACCESS_CP15(c1, 0, c0, 2)
+#define HCR		__ACCESS_CP15(c1, 4, c1, 0)
+#define HDCR		__ACCESS_CP15(c1, 4, c1, 1)
+#define HCPTR		__ACCESS_CP15(c1, 4, c1, 2)
+#define HSTR		__ACCESS_CP15(c1, 4, c1, 3)
+#define TTBCR		__ACCESS_CP15(c2, 0, c0, 2)
+#define HTCR		__ACCESS_CP15(c2, 4, c0, 2)
+#define VTCR		__ACCESS_CP15(c2, 4, c1, 2)
+#define DACR		__ACCESS_CP15(c3, 0, c0, 0)
+#define DFSR		__ACCESS_CP15(c5, 0, c0, 0)
+#define IFSR		__ACCESS_CP15(c5, 0, c0, 1)
+#define ADFSR		__ACCESS_CP15(c5, 0, c1, 0)
+#define AIFSR		__ACCESS_CP15(c5, 0, c1, 1)
+#define HSR		__ACCESS_CP15(c5, 4, c2, 0)
+#define DFAR		__ACCESS_CP15(c6, 0, c0, 0)
+#define IFAR		__ACCESS_CP15(c6, 0, c0, 2)
+#define HDFAR		__ACCESS_CP15(c6, 4, c0, 0)
+#define HIFAR		__ACCESS_CP15(c6, 4, c0, 2)
+#define HPFAR		__ACCESS_CP15(c6, 4, c0, 4)
+#define ICIALLUIS	__ACCESS_CP15(c7, 0, c1, 0)
+#define BPIALLIS	__ACCESS_CP15(c7, 0, c1, 6)
+#define ICIMVAU		__ACCESS_CP15(c7, 0, c5, 1)
+#define ATS1CPR		__ACCESS_CP15(c7, 0, c8, 0)
+#define TLBIALLIS	__ACCESS_CP15(c8, 0, c3, 0)
+#define TLBIALL		__ACCESS_CP15(c8, 0, c7, 0)
+#define TLBIALLNSNHIS	__ACCESS_CP15(c8, 4, c3, 4)
+#define PRRR		__ACCESS_CP15(c10, 0, c2, 0)
+#define NMRR		__ACCESS_CP15(c10, 0, c2, 1)
+#define AMAIR0		__ACCESS_CP15(c10, 0, c3, 0)
+#define AMAIR1		__ACCESS_CP15(c10, 0, c3, 1)
+#define VBAR		__ACCESS_CP15(c12, 0, c0, 0)
+#define CID		__ACCESS_CP15(c13, 0, c0, 1)
+#define TID_URW		__ACCESS_CP15(c13, 0, c0, 2)
+#define TID_URO		__ACCESS_CP15(c13, 0, c0, 3)
+#define TID_PRIV	__ACCESS_CP15(c13, 0, c0, 4)
+#define HTPIDR		__ACCESS_CP15(c13, 4, c0, 2)
+#define CNTKCTL		__ACCESS_CP15(c14, 0, c1, 0)
+#define CNTV_CTL	__ACCESS_CP15(c14, 0, c3, 1)
+#define CNTHCTL		__ACCESS_CP15(c14, 4, c1, 0)
+
 extern unsigned long cr_alignment;	/* defined in entry-armv.S */
 
+static inline void set_par(u64 val)
+{
+	if (IS_ENABLED(CONFIG_ARM_LPAE))
+		write_sysreg(val, PAR_64);
+	else
+		write_sysreg(val, PAR_32);
+}
+
+static inline u64 get_par(void)
+{
+	if (IS_ENABLED(CONFIG_ARM_LPAE))
+		return read_sysreg(PAR_64);
+	else
+		return read_sysreg(PAR_32);
+}
+
+static inline void set_ttbr0(u64 val)
+{
+	if (IS_ENABLED(CONFIG_ARM_LPAE))
+		write_sysreg(val, TTBR0_64);
+	else
+		write_sysreg(val, TTBR0_32);
+}
+
+static inline u64 get_ttbr0(void)
+{
+	if (IS_ENABLED(CONFIG_ARM_LPAE))
+		return read_sysreg(TTBR0_64);
+	else
+		return read_sysreg(TTBR0_32);
+}
+
+static inline void set_ttbr1(u64 val)
+{
+	if (IS_ENABLED(CONFIG_ARM_LPAE))
+		write_sysreg(val, TTBR1_64);
+	else
+		write_sysreg(val, TTBR1_32);
+}
+
+static inline u64 get_ttbr1(void)
+{
+	if (IS_ENABLED(CONFIG_ARM_LPAE))
+		return read_sysreg(TTBR1_64);
+	else
+		return read_sysreg(TTBR1_32);
+}
+
 static inline unsigned long get_cr(void)
 {
 	unsigned long val;
diff --git a/arch/arm/include/asm/kvm_hyp.h b/arch/arm/include/asm/kvm_hyp.h
index 1ab8329..8e8592e 100644
--- a/arch/arm/include/asm/kvm_hyp.h
+++ b/arch/arm/include/asm/kvm_hyp.h
@@ -36,58 +36,6 @@
 	__val;							\
 })
 
-#define TTBR0		__ACCESS_CP15_64(0, c2)
-#define TTBR1		__ACCESS_CP15_64(1, c2)
-#define VTTBR		__ACCESS_CP15_64(6, c2)
-#define PAR		__ACCESS_CP15_64(0, c7)
-#define CNTV_CVAL	__ACCESS_CP15_64(3, c14)
-#define CNTVOFF		__ACCESS_CP15_64(4, c14)
-
-#define MIDR		__ACCESS_CP15(c0, 0, c0, 0)
-#define CSSELR		__ACCESS_CP15(c0, 2, c0, 0)
-#define VPIDR		__ACCESS_CP15(c0, 4, c0, 0)
-#define VMPIDR		__ACCESS_CP15(c0, 4, c0, 5)
-#define SCTLR		__ACCESS_CP15(c1, 0, c0, 0)
-#define CPACR		__ACCESS_CP15(c1, 0, c0, 2)
-#define HCR		__ACCESS_CP15(c1, 4, c1, 0)
-#define HDCR		__ACCESS_CP15(c1, 4, c1, 1)
-#define HCPTR		__ACCESS_CP15(c1, 4, c1, 2)
-#define HSTR		__ACCESS_CP15(c1, 4, c1, 3)
-#define TTBCR		__ACCESS_CP15(c2, 0, c0, 2)
-#define HTCR		__ACCESS_CP15(c2, 4, c0, 2)
-#define VTCR		__ACCESS_CP15(c2, 4, c1, 2)
-#define DACR		__ACCESS_CP15(c3, 0, c0, 0)
-#define DFSR		__ACCESS_CP15(c5, 0, c0, 0)
-#define IFSR		__ACCESS_CP15(c5, 0, c0, 1)
-#define ADFSR		__ACCESS_CP15(c5, 0, c1, 0)
-#define AIFSR		__ACCESS_CP15(c5, 0, c1, 1)
-#define HSR		__ACCESS_CP15(c5, 4, c2, 0)
-#define DFAR		__ACCESS_CP15(c6, 0, c0, 0)
-#define IFAR		__ACCESS_CP15(c6, 0, c0, 2)
-#define HDFAR		__ACCESS_CP15(c6, 4, c0, 0)
-#define HIFAR		__ACCESS_CP15(c6, 4, c0, 2)
-#define HPFAR		__ACCESS_CP15(c6, 4, c0, 4)
-#define ICIALLUIS	__ACCESS_CP15(c7, 0, c1, 0)
-#define BPIALLIS	__ACCESS_CP15(c7, 0, c1, 6)
-#define ICIMVAU		__ACCESS_CP15(c7, 0, c5, 1)
-#define ATS1CPR		__ACCESS_CP15(c7, 0, c8, 0)
-#define TLBIALLIS	__ACCESS_CP15(c8, 0, c3, 0)
-#define TLBIALL		__ACCESS_CP15(c8, 0, c7, 0)
-#define TLBIALLNSNHIS	__ACCESS_CP15(c8, 4, c3, 4)
-#define PRRR		__ACCESS_CP15(c10, 0, c2, 0)
-#define NMRR		__ACCESS_CP15(c10, 0, c2, 1)
-#define AMAIR0		__ACCESS_CP15(c10, 0, c3, 0)
-#define AMAIR1		__ACCESS_CP15(c10, 0, c3, 1)
-#define VBAR		__ACCESS_CP15(c12, 0, c0, 0)
-#define CID		__ACCESS_CP15(c13, 0, c0, 1)
-#define TID_URW		__ACCESS_CP15(c13, 0, c0, 2)
-#define TID_URO		__ACCESS_CP15(c13, 0, c0, 3)
-#define TID_PRIV	__ACCESS_CP15(c13, 0, c0, 4)
-#define HTPIDR		__ACCESS_CP15(c13, 4, c0, 2)
-#define CNTKCTL		__ACCESS_CP15(c14, 0, c1, 0)
-#define CNTV_CTL	__ACCESS_CP15(c14, 0, c3, 1)
-#define CNTHCTL		__ACCESS_CP15(c14, 4, c1, 0)
-
 #define VFP_FPEXC	__ACCESS_VFP(FPEXC)
 
 /* AArch64 compatibility macros, only for the timer so far */
diff --git a/arch/arm/kvm/hyp/cp15-sr.c b/arch/arm/kvm/hyp/cp15-sr.c
index c478281..d365e3c 100644
--- a/arch/arm/kvm/hyp/cp15-sr.c
+++ b/arch/arm/kvm/hyp/cp15-sr.c
@@ -31,8 +31,8 @@ void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
 	ctxt->cp15[c0_CSSELR]		= read_sysreg(CSSELR);
 	ctxt->cp15[c1_SCTLR]		= read_sysreg(SCTLR);
 	ctxt->cp15[c1_CPACR]		= read_sysreg(CPACR);
-	*cp15_64(ctxt, c2_TTBR0)	= read_sysreg(TTBR0);
-	*cp15_64(ctxt, c2_TTBR1)	= read_sysreg(TTBR1);
+	*cp15_64(ctxt, c2_TTBR0)	= read_sysreg(TTBR0_64);
+	*cp15_64(ctxt, c2_TTBR1)	= read_sysreg(TTBR1_64);
 	ctxt->cp15[c2_TTBCR]		= read_sysreg(TTBCR);
 	ctxt->cp15[c3_DACR]		= read_sysreg(DACR);
 	ctxt->cp15[c5_DFSR]		= read_sysreg(DFSR);
@@ -41,7 +41,7 @@ void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt)
 	ctxt->cp15[c5_AIFSR]		= read_sysreg(AIFSR);
 	ctxt->cp15[c6_DFAR]		= read_sysreg(DFAR);
 	ctxt->cp15[c6_IFAR]		= read_sysreg(IFAR);
-	*cp15_64(ctxt, c7_PAR)		= read_sysreg(PAR);
+	*cp15_64(ctxt, c7_PAR)		= read_sysreg(PAR_64);
 	ctxt->cp15[c10_PRRR]		= read_sysreg(PRRR);
 	ctxt->cp15[c10_NMRR]		= read_sysreg(NMRR);
 	ctxt->cp15[c10_AMAIR0]		= read_sysreg(AMAIR0);
@@ -60,8 +60,8 @@ void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
 	write_sysreg(ctxt->cp15[c0_CSSELR],	CSSELR);
 	write_sysreg(ctxt->cp15[c1_SCTLR],	SCTLR);
 	write_sysreg(ctxt->cp15[c1_CPACR],	CPACR);
-	write_sysreg(*cp15_64(ctxt, c2_TTBR0),	TTBR0);
-	write_sysreg(*cp15_64(ctxt, c2_TTBR1),	TTBR1);
+	write_sysreg(*cp15_64(ctxt, c2_TTBR0),	TTBR0_64);
+	write_sysreg(*cp15_64(ctxt, c2_TTBR1),	TTBR1_64);
 	write_sysreg(ctxt->cp15[c2_TTBCR],	TTBCR);
 	write_sysreg(ctxt->cp15[c3_DACR],	DACR);
 	write_sysreg(ctxt->cp15[c5_DFSR],	DFSR);
@@ -70,7 +70,7 @@ void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt)
 	write_sysreg(ctxt->cp15[c5_AIFSR],	AIFSR);
 	write_sysreg(ctxt->cp15[c6_DFAR],	DFAR);
 	write_sysreg(ctxt->cp15[c6_IFAR],	IFAR);
-	write_sysreg(*cp15_64(ctxt, c7_PAR),	PAR);
+	write_sysreg(*cp15_64(ctxt, c7_PAR),	PAR_64);
 	write_sysreg(ctxt->cp15[c10_PRRR],	PRRR);
 	write_sysreg(ctxt->cp15[c10_NMRR],	NMRR);
 	write_sysreg(ctxt->cp15[c10_AMAIR0],	AMAIR0);
diff --git a/arch/arm/kvm/hyp/switch.c b/arch/arm/kvm/hyp/switch.c
index ae45ae9..94d5bb9 100644
--- a/arch/arm/kvm/hyp/switch.c
+++ b/arch/arm/kvm/hyp/switch.c
@@ -134,12 +134,12 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
 	if (!(hsr & HSR_DABT_S1PTW) && (hsr & HSR_FSC_TYPE) == FSC_PERM) {
 		u64 par, tmp;
 
-		par = read_sysreg(PAR);
+		par = read_sysreg(PAR_64);
 		write_sysreg(far, ATS1CPR);
 		isb();
 
-		tmp = read_sysreg(PAR);
-		write_sysreg(par, PAR);
+		tmp = read_sysreg(PAR_64);
+		write_sysreg(par, PAR_64);
 
 		if (unlikely(tmp & 1))
 			return false; /* Translation failed, back to guest */
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 2/6] Disable instrumentation for some code
From: Abbott Liu @ 2018-04-02 12:04 UTC (permalink / raw)
  To: aryabinin, dvyukov, corbet, linux, christoffer.dall, marc.zyngier,
	kstewart, gregkh, f.fainelli, liuwenliang, akpm, linux, mawilcox,
	pombredanne, ard.biesheuvel, vladimir.murzin, alexander.levin,
	nicolas.pitre, tglx, thgarnie, dhowells, keescook, arnd, geert,
	tixy, julien.thierry, mark.rutland, james.morse, zhichao.huang,
	jinb.park7, labbott, philip, grygorii.strashko, catalin.marinas,
	opendmb, kirill.shutemov, kasan-dev, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-mm
In-Reply-To: <20180402120440.31900-1-liuwenliang@huawei.com>

From: Andrey Ryabinin <a.ryabinin@samsung.com>

Disable instrumentation for arch/arm/boot/compressed/*
,arch/arm/kvm/hyp/* and arch/arm/vdso/* because those
code won't linkd with kernel image.

Disable kasan check in the function unwind_pop_register
because it doesn't matter that kasan checks failed when
unwind_pop_register read stack memory of task.

Reviewed-by: Russell King - ARM Linux <linux@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Joel Stanley <joel@jms.id.au>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Abbott Liu <liuwenliang@huawei.com>
Signed-off-by: Abbott Liu <liuwenliang@huawei.com>
---
 arch/arm/boot/compressed/Makefile | 1 +
 arch/arm/kernel/unwind.c          | 3 ++-
 arch/arm/kvm/hyp/Makefile         | 4 ++++
 arch/arm/vdso/Makefile            | 2 ++
 4 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 45a6b9b..966103e 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -24,6 +24,7 @@ OBJS		+= hyp-stub.o
 endif
 
 GCOV_PROFILE		:= n
+KASAN_SANITIZE		:= n
 
 #
 # Architecture dependencies
diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
index 0bee233..2e55c7d 100644
--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -249,7 +249,8 @@ static int unwind_pop_register(struct unwind_ctrl_block *ctrl,
 		if (*vsp >= (unsigned long *)ctrl->sp_high)
 			return -URC_FAILURE;
 
-	ctrl->vrs[reg] = *(*vsp)++;
+	ctrl->vrs[reg] = READ_ONCE_NOCHECK(*(*vsp));
+	(*vsp)++;
 	return URC_OK;
 }
 
diff --git a/arch/arm/kvm/hyp/Makefile b/arch/arm/kvm/hyp/Makefile
index 63d6b40..0a8b500 100644
--- a/arch/arm/kvm/hyp/Makefile
+++ b/arch/arm/kvm/hyp/Makefile
@@ -24,3 +24,7 @@ obj-$(CONFIG_KVM_ARM_HOST) += hyp-entry.o
 obj-$(CONFIG_KVM_ARM_HOST) += switch.o
 CFLAGS_switch.o		   += $(CFLAGS_ARMV7VE)
 obj-$(CONFIG_KVM_ARM_HOST) += s2-setup.o
+
+GCOV_PROFILE	:= n
+KASAN_SANITIZE	:= n
+UBSAN_SANITIZE	:= n
diff --git a/arch/arm/vdso/Makefile b/arch/arm/vdso/Makefile
index bb411821..87abbb7 100644
--- a/arch/arm/vdso/Makefile
+++ b/arch/arm/vdso/Makefile
@@ -30,6 +30,8 @@ CFLAGS_vgettimeofday.o = -O2
 # Disable gcov profiling for VDSO code
 GCOV_PROFILE := n
 
+KASAN_SANITIZE := n
+
 # Force dependency
 $(obj)/vdso.o : $(obj)/vdso.so
 
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 5/6] Initialize the mapping of KASan shadow memory
From: Abbott Liu @ 2018-04-02 12:04 UTC (permalink / raw)
  To: aryabinin, dvyukov, corbet, linux, christoffer.dall, marc.zyngier,
	kstewart, gregkh, f.fainelli, liuwenliang, akpm, linux, mawilcox,
	pombredanne, ard.biesheuvel, vladimir.murzin, alexander.levin,
	nicolas.pitre, tglx, thgarnie, dhowells, keescook, arnd, geert,
	tixy, julien.thierry, mark.rutland, james.morse, zhichao.huang,
	jinb.park7, labbott, philip, grygorii.strashko, catalin.marinas,
	opendmb, kirill.shutemov, kasan-dev, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-mm
In-Reply-To: <20180402120440.31900-1-liuwenliang@huawei.com>

From: Andrey Ryabinin <a.ryabinin@samsung.com>

This patch initializes KASan shadow region's page table and memory.
There are two stage for KASan initializing:
1. At early boot stage the whole shadow region is mapped to just
   one physical page (kasan_zero_page). It's finished by the function
   kasan_early_init which is called by __mmap_switched(arch/arm/kernel/
   head-common.S)
             ---Andrey Ryabinin <a.ryabinin@samsung.com>

2. After the calling of paging_init, we use kasan_zero_page as zero
   shadow for some memory that KASan don't need to track, and we alloc
   new shadow space for the other memory that KASan need to track. These
   issues are finished by the function kasan_init which is call by
   setup_arch.
            ---Andrey Ryabinin <a.ryabinin@samsung.com>

3. Add support arm LPAE
   If LPAE is enabled, KASan shadow region's mapping table need be copyed
   in pgd_alloc function.
            ---Abbott Liu <liuwenliang@huawei.com>

4. In 64bit machine, size_t is unsigned long, but int 32bit machine,
   size_t is unsigned int, so we need type conversion in
   the function of kasan_cache_create.
            ---Abbott Liu <liuwenliang@huawei.com>

5. Change kasan_pte_populate,kasan_pmd_populate,kasan_pud_populate,
   kasan_pgd_populate from .meminit.text section to .init.text section.
           ---Reported by: Florian Fainelli <f.fainelli@gmail.com>
           ---Signed off by: Abbott Liu <liuwenliang@huawei.com>

Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Co-Developed-by: Abbott Liu <liuwenliang@huawei.com>
Reviewed-by: Russell King - ARM Linux <linux@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reported-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Joel Stanley <joel@jms.id.au>
Tested-by: Abbott Liu <liuwenliang@huawei.com>
Signed-off-by: Abbott Liu <liuwenliang@huawei.com>
---
 arch/arm/include/asm/kasan.h       |  35 +++++
 arch/arm/include/asm/pgalloc.h     |   7 +-
 arch/arm/include/asm/thread_info.h |   4 +
 arch/arm/kernel/head-common.S      |   3 +
 arch/arm/kernel/setup.c            |   2 +
 arch/arm/mm/Makefile               |   3 +
 arch/arm/mm/kasan_init.c           | 302 +++++++++++++++++++++++++++++++++++++
 arch/arm/mm/pgd.c                  |  14 ++
 mm/kasan/kasan.c                   |   5 +-
 9 files changed, 371 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/include/asm/kasan.h
 create mode 100644 arch/arm/mm/kasan_init.c

diff --git a/arch/arm/include/asm/kasan.h b/arch/arm/include/asm/kasan.h
new file mode 100644
index 0000000..1801f4d
--- /dev/null
+++ b/arch/arm/include/asm/kasan.h
@@ -0,0 +1,35 @@
+/*
+ * arch/arm/include/asm/kasan.h
+ *
+ * Copyright (c) 2015 Samsung Electronics Co., Ltd.
+ * Author: Andrey Ryabinin <ryabinin.a.a@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#ifndef __ASM_KASAN_H
+#define __ASM_KASAN_H
+
+#ifdef CONFIG_KASAN
+
+#include <asm/kasan_def.h>
+
+#define KASAN_SHADOW_SCALE_SHIFT 3
+
+/*
+ * Compiler uses shadow offset assuming that addresses start
+ * from 0. Kernel addresses don't start from 0, so shadow
+ * for kernel really starts from 'compiler's shadow offset' +
+ * ('kernel address space start' >> KASAN_SHADOW_SCALE_SHIFT)
+ */
+
+extern void kasan_init(void);
+
+#else
+static inline void kasan_init(void) { }
+#endif
+
+#endif
diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h
index 2d7344f..f170659 100644
--- a/arch/arm/include/asm/pgalloc.h
+++ b/arch/arm/include/asm/pgalloc.h
@@ -50,8 +50,11 @@ static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
  */
 #define pmd_alloc_one(mm,addr)		({ BUG(); ((pmd_t *)2); })
 #define pmd_free(mm, pmd)		do { } while (0)
-#define pud_populate(mm,pmd,pte)	BUG()
-
+#ifndef CONFIG_KASAN
+#define pud_populate(mm, pmd, pte)	BUG()
+#else
+#define pud_populate(mm, pmd, pte)	do { } while (0)
+#endif
 #endif	/* CONFIG_ARM_LPAE */
 
 extern pgd_t *pgd_alloc(struct mm_struct *mm);
diff --git a/arch/arm/include/asm/thread_info.h b/arch/arm/include/asm/thread_info.h
index e71cc35..bc681a0 100644
--- a/arch/arm/include/asm/thread_info.h
+++ b/arch/arm/include/asm/thread_info.h
@@ -16,7 +16,11 @@
 #include <asm/fpstate.h>
 #include <asm/page.h>
 
+#ifdef CONFIG_KASAN
+#define THREAD_SIZE_ORDER	2
+#else
 #define THREAD_SIZE_ORDER	1
+#endif
 #define THREAD_SIZE		(PAGE_SIZE << THREAD_SIZE_ORDER)
 #define THREAD_START_SP		(THREAD_SIZE - 8)
 
diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S
index c79b829..20161e2 100644
--- a/arch/arm/kernel/head-common.S
+++ b/arch/arm/kernel/head-common.S
@@ -115,6 +115,9 @@ __mmap_switched:
 	str	r8, [r2]			@ Save atags pointer
 	cmp	r3, #0
 	strne	r10, [r3]			@ Save control register values
+#ifdef CONFIG_KASAN
+	bl	kasan_early_init
+#endif
 	mov	lr, #0
 	b	start_kernel
 ENDPROC(__mmap_switched)
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index fc40a2b..81c3e9df 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -62,6 +62,7 @@
 #include <asm/unwind.h>
 #include <asm/memblock.h>
 #include <asm/virt.h>
+#include <asm/kasan.h>
 
 #include "atags.h"
 
@@ -1118,6 +1119,7 @@ void __init setup_arch(char **cmdline_p)
 	early_ioremap_reset();
 
 	paging_init(mdesc);
+	kasan_init();
 	request_standard_resources(mdesc);
 
 	if (mdesc->restart)
diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile
index 9dbb849..573203e 100644
--- a/arch/arm/mm/Makefile
+++ b/arch/arm/mm/Makefile
@@ -111,3 +111,6 @@ obj-$(CONFIG_CACHE_L2X0_PMU)	+= cache-l2x0-pmu.o
 obj-$(CONFIG_CACHE_XSC3L2)	+= cache-xsc3l2.o
 obj-$(CONFIG_CACHE_TAUROS2)	+= cache-tauros2.o
 obj-$(CONFIG_CACHE_UNIPHIER)	+= cache-uniphier.o
+
+KASAN_SANITIZE_kasan_init.o    := n
+obj-$(CONFIG_KASAN)            += kasan_init.o
diff --git a/arch/arm/mm/kasan_init.c b/arch/arm/mm/kasan_init.c
new file mode 100644
index 0000000..461cc85
--- /dev/null
+++ b/arch/arm/mm/kasan_init.c
@@ -0,0 +1,302 @@
+/*
+ * This file contains kasan initialization code for ARM.
+ *
+ * Copyright (c) 2018 Samsung Electronics Co., Ltd.
+ * Author: Andrey Ryabinin <ryabinin.a.a@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/bootmem.h>
+#include <linux/kasan.h>
+#include <linux/kernel.h>
+#include <linux/memblock.h>
+#include <linux/start_kernel.h>
+#include <asm/cputype.h>
+#include <asm/highmem.h>
+#include <asm/mach/map.h>
+#include <asm/memory.h>
+#include <asm/page.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+#include <asm/procinfo.h>
+#include <asm/proc-fns.h>
+#include <asm/tlbflush.h>
+#include <asm/cp15.h>
+#include <linux/sched/task.h>
+
+#include "mm.h"
+
+static pgd_t tmp_pgd_table[PTRS_PER_PGD] __initdata __aligned(1ULL << 14);
+
+pmd_t tmp_pmd_table[PTRS_PER_PMD] __page_aligned_bss;
+
+static __init void *kasan_alloc_block(size_t size, int node)
+{
+	return memblock_virt_alloc_try_nid(size, size, __pa(MAX_DMA_ADDRESS),
+					BOOTMEM_ALLOC_ACCESSIBLE, node);
+}
+
+static void __init kasan_early_pmd_populate(unsigned long start,
+					unsigned long end, pud_t *pud)
+{
+	unsigned long addr;
+	unsigned long next;
+	pmd_t *pmd;
+
+	pmd = pmd_offset(pud, start);
+	for (addr = start; addr < end;) {
+		pmd_populate_kernel(&init_mm, pmd, kasan_zero_pte);
+		next = pmd_addr_end(addr, end);
+		addr = next;
+		flush_pmd_entry(pmd);
+		pmd++;
+	}
+}
+
+static void __init kasan_early_pud_populate(unsigned long start,
+				unsigned long end, pgd_t *pgd)
+{
+	unsigned long addr;
+	unsigned long next;
+	pud_t *pud;
+
+	pud = pud_offset(pgd, start);
+	for (addr = start; addr < end;) {
+		next = pud_addr_end(addr, end);
+		kasan_early_pmd_populate(addr, next, pud);
+		addr = next;
+		pud++;
+	}
+}
+
+void __init kasan_map_early_shadow(pgd_t *pgdp)
+{
+	int i;
+	unsigned long start = KASAN_SHADOW_START;
+	unsigned long end = KASAN_SHADOW_END;
+	unsigned long addr;
+	unsigned long next;
+	pgd_t *pgd;
+
+	for (i = 0; i < PTRS_PER_PTE; i++)
+		set_pte_at(&init_mm, KASAN_SHADOW_START + i*PAGE_SIZE,
+			&kasan_zero_pte[i], pfn_pte(
+				virt_to_pfn(kasan_zero_page),
+				__pgprot(_L_PTE_DEFAULT | L_PTE_DIRTY
+					| L_PTE_XN)));
+
+	pgd = pgd_offset_k(start);
+	for (addr = start; addr < end;) {
+		next = pgd_addr_end(addr, end);
+		kasan_early_pud_populate(addr, next, pgd);
+		addr = next;
+		pgd++;
+	}
+}
+
+extern struct proc_info_list *lookup_processor_type(unsigned int);
+
+void __init kasan_early_init(void)
+{
+	struct proc_info_list *list;
+
+	/*
+	 * locate processor in the list of supported processor
+	 * types.  The linker builds this table for us from the
+	 * entries in arch/arm/mm/proc-*.S
+	 */
+	list = lookup_processor_type(read_cpuid_id());
+	if (list) {
+#ifdef MULTI_CPU
+		processor = *list->proc;
+#endif
+	}
+
+	BUILD_BUG_ON((KASAN_SHADOW_END - (1UL << 29)) != KASAN_SHADOW_OFFSET);
+	kasan_map_early_shadow(swapper_pg_dir);
+}
+
+static void __init clear_pgds(unsigned long start,
+			unsigned long end)
+{
+	for (; start && start < end; start += PMD_SIZE)
+		pmd_clear(pmd_off_k(start));
+}
+
+pte_t * __init kasan_pte_populate(pmd_t *pmd, unsigned long addr, int node)
+{
+	pte_t *pte = pte_offset_kernel(pmd, addr);
+
+	if (pte_none(*pte)) {
+		pte_t entry;
+		void *p = kasan_alloc_block(PAGE_SIZE, node);
+
+		if (!p)
+			return NULL;
+		entry = pfn_pte(virt_to_pfn(p),
+			__pgprot(pgprot_val(PAGE_KERNEL)));
+		set_pte_at(&init_mm, addr, pte, entry);
+	}
+	return pte;
+}
+
+pmd_t * __init kasan_pmd_populate(pud_t *pud, unsigned long addr, int node)
+{
+	pmd_t *pmd = pmd_offset(pud, addr);
+
+	if (pmd_none(*pmd)) {
+		void *p = kasan_alloc_block(PAGE_SIZE, node);
+
+		if (!p)
+			return NULL;
+		pmd_populate_kernel(&init_mm, pmd, p);
+	}
+	return pmd;
+}
+
+pud_t * __init kasan_pud_populate(pgd_t *pgd, unsigned long addr, int node)
+{
+	pud_t *pud = pud_offset(pgd, addr);
+
+	if (pud_none(*pud)) {
+		void *p = kasan_alloc_block(PAGE_SIZE, node);
+
+		if (!p)
+			return NULL;
+		pr_err("populating pud addr %lx\n", addr);
+		pud_populate(&init_mm, pud, p);
+	}
+	return pud;
+}
+
+pgd_t * __init kasan_pgd_populate(unsigned long addr, int node)
+{
+	pgd_t *pgd = pgd_offset_k(addr);
+
+	if (pgd_none(*pgd)) {
+		void *p = kasan_alloc_block(PAGE_SIZE, node);
+
+		if (!p)
+			return NULL;
+		pgd_populate(&init_mm, pgd, p);
+	}
+	return pgd;
+}
+
+static int __init create_mapping(unsigned long start, unsigned long end,
+				int node)
+{
+	unsigned long addr = start;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	pr_info("populating shadow for %lx, %lx\n", start, end);
+
+	for (; addr < end; addr += PAGE_SIZE) {
+		pgd = kasan_pgd_populate(addr, node);
+		if (!pgd)
+			return -ENOMEM;
+
+		pud = kasan_pud_populate(pgd, addr, node);
+		if (!pud)
+			return -ENOMEM;
+
+		pmd = kasan_pmd_populate(pud, addr, node);
+		if (!pmd)
+			return -ENOMEM;
+
+		pte = kasan_pte_populate(pmd, addr, node);
+		if (!pte)
+			return -ENOMEM;
+	}
+	return 0;
+}
+
+
+void __init kasan_init(void)
+{
+	struct memblock_region *reg;
+	u64 orig_ttbr0;
+	int i;
+
+	/*
+	 * We are going to perform proper setup of shadow memory.
+	 * At first we should unmap early shadow (clear_pgds() call bellow).
+	 * However, instrumented code couldn't execute without shadow memory.
+	 * tmp_pgd_table and tmp_pmd_table used to keep early shadow mapped
+	 * until full shadow setup will be finished.
+	 */
+	orig_ttbr0 = get_ttbr0();
+
+#ifdef CONFIG_ARM_LPAE
+	memcpy(tmp_pmd_table,
+		pgd_page_vaddr(*pgd_offset_k(KASAN_SHADOW_START)),
+		sizeof(tmp_pmd_table));
+	memcpy(tmp_pgd_table, swapper_pg_dir, sizeof(tmp_pgd_table));
+	set_pgd(&tmp_pgd_table[pgd_index(KASAN_SHADOW_START)],
+		__pgd(__pa(tmp_pmd_table) | PMD_TYPE_TABLE | L_PGD_SWAPPER));
+	set_ttbr0(__pa(tmp_pgd_table));
+#else
+	memcpy(tmp_pgd_table, swapper_pg_dir, sizeof(tmp_pgd_table));
+	set_ttbr0((u64)__pa(tmp_pgd_table));
+#endif
+	flush_cache_all();
+	local_flush_bp_all();
+	local_flush_tlb_all();
+
+	clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
+
+	kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)VMALLOC_START),
+				kasan_mem_to_shadow((void *)-1UL) + 1);
+
+	for_each_memblock(memory, reg) {
+		void *start = __va(reg->base);
+		void *end = __va(reg->base + reg->size);
+
+		if (reg->base + reg->size > arm_lowmem_limit)
+			end = __va(arm_lowmem_limit);
+		if (start >= end)
+			break;
+
+		create_mapping((unsigned long)kasan_mem_to_shadow(start),
+			(unsigned long)kasan_mem_to_shadow(end),
+			NUMA_NO_NODE);
+	}
+
+	/*1.the module's global variable is in MODULES_VADDR ~ MODULES_END,
+	 *  so we need mapping.
+	 *2.PKMAP_BASE ~ PKMAP_BASE+PMD_SIZE's shadow and MODULES_VADDR
+	 *  ~ MODULES_END's shadow is in the same PMD_SIZE, so we cant
+	 *  use kasan_populate_zero_shadow.
+	 */
+	create_mapping(
+		(unsigned long)kasan_mem_to_shadow((void *)MODULES_VADDR),
+
+		(unsigned long)kasan_mem_to_shadow((void *)(PKMAP_BASE +
+							PMD_SIZE)),
+		NUMA_NO_NODE);
+
+	/*
+	 * KAsan may reuse the contents of kasan_zero_pte directly, so we
+	 * should make sure that it maps the zero page read-only.
+	 */
+	for (i = 0; i < PTRS_PER_PTE; i++)
+		set_pte_at(&init_mm, KASAN_SHADOW_START + i*PAGE_SIZE,
+			&kasan_zero_pte[i],
+			pfn_pte(virt_to_pfn(kasan_zero_page),
+				__pgprot(pgprot_val(PAGE_KERNEL)
+					| L_PTE_RDONLY)));
+	memset(kasan_zero_page, 0, PAGE_SIZE);
+	set_ttbr0(orig_ttbr0);
+	flush_cache_all();
+	local_flush_bp_all();
+	local_flush_tlb_all();
+	pr_info("Kernel address sanitizer initialized\n");
+	init_task.kasan_depth = 0;
+}
diff --git a/arch/arm/mm/pgd.c b/arch/arm/mm/pgd.c
index 61e281c..4644a21 100644
--- a/arch/arm/mm/pgd.c
+++ b/arch/arm/mm/pgd.c
@@ -64,6 +64,20 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 	new_pmd = pmd_alloc(mm, new_pud, 0);
 	if (!new_pmd)
 		goto no_pmd;
+#ifdef CONFIG_KASAN
+	/*
+	 *Copy PMD table for KASAN shadow mappings.
+	 */
+	init_pgd = pgd_offset_k(TASK_SIZE);
+	init_pud = pud_offset(init_pgd, TASK_SIZE);
+	init_pmd = pmd_offset(init_pud, TASK_SIZE);
+	new_pmd = pmd_offset(new_pud, TASK_SIZE);
+	memcpy(new_pmd, init_pmd,
+		(pmd_index(MODULES_VADDR)-pmd_index(TASK_SIZE))
+		* sizeof(pmd_t));
+	clean_dcache_area(new_pmd, PTRS_PER_PMD*sizeof(pmd_t));
+#endif
+
 #endif
 
 	if (!vectors_high()) {
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index e13d911..6d32623 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -358,8 +358,9 @@ void kasan_cache_create(struct kmem_cache *cache, size_t *size,
 	if (redzone_adjust > 0)
 		*size += redzone_adjust;
 
-	*size = min(KMALLOC_MAX_SIZE, max(*size, cache->object_size +
-					optimal_redzone(cache->object_size)));
+	*size = min_t(unsigned long, KMALLOC_MAX_SIZE,
+			max(*size, cache->object_size +
+				optimal_redzone(cache->object_size)));
 
 	/*
 	 * If the metadata doesn't fit, don't enable KASAN at all.
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH v3 3/6] Replace memory function for kasan
From: Abbott Liu @ 2018-04-02 12:04 UTC (permalink / raw)
  To: aryabinin, dvyukov, corbet, linux, christoffer.dall, marc.zyngier,
	kstewart, gregkh, f.fainelli, liuwenliang, akpm, linux, mawilcox,
	pombredanne, ard.biesheuvel, vladimir.murzin, alexander.levin,
	nicolas.pitre, tglx, thgarnie, dhowells, keescook, arnd, geert,
	tixy, julien.thierry, mark.rutland, james.morse, zhichao.huang,
	jinb.park7, labbott, philip, grygorii.strashko, catalin.marinas,
	opendmb, kirill.shutemov, kasan-dev, linux-doc, linux-kernel,
	linux-arm-kernel, kvmarm, linux-mm
In-Reply-To: <20180402120440.31900-1-liuwenliang@huawei.com>

From: Andrey Ryabinin <a.ryabinin@samsung.com>

Functions like memset/memmove/memcpy do a lot of memory accesses.
If bad pointer passed to one of these function it is important
to catch this. Compiler's instrumentation cannot do this since
these functions are written in assembly.

KASan replaces memory functions with manually instrumented variants.
Original functions declared as weak symbols so strong definitions
in mm/kasan/kasan.c could replace them. Original functions have aliases
with '__' prefix in name, so we could call non-instrumented variant
if needed.

We must use __memcpy/__memset to replace memcpy/memset when we copy
.data to RAM and when we clear .bss, because kasan_early_init can't
be called before the initialization of .data and .bss.

Reviewed-by: Russell King - ARM Linux <linux@armlinux.org.uk>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Joel Stanley <joel@jms.id.au>
Tested-by: Abbott Liu <liuwenliang@huawei.com>
Signed-off-by: Abbott Liu <liuwenliang@huawei.com>
---
 arch/arm/boot/compressed/decompress.c |  2 ++
 arch/arm/boot/compressed/libfdt_env.h |  2 ++
 arch/arm/include/asm/string.h         | 17 +++++++++++++++++
 arch/arm/kernel/head-common.S         |  4 ++--
 arch/arm/lib/memcpy.S                 |  3 +++
 arch/arm/lib/memmove.S                |  5 ++++-
 arch/arm/lib/memset.S                 |  3 +++
 7 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/arch/arm/boot/compressed/decompress.c b/arch/arm/boot/compressed/decompress.c
index a2ac3fe..0596077 100644
--- a/arch/arm/boot/compressed/decompress.c
+++ b/arch/arm/boot/compressed/decompress.c
@@ -49,8 +49,10 @@ extern int memcmp(const void *cs, const void *ct, size_t count);
 #endif
 
 #ifdef CONFIG_KERNEL_XZ
+#ifndef CONFIG_KASAN
 #define memmove memmove
 #define memcpy memcpy
+#endif
 #include "../../../../lib/decompress_unxz.c"
 #endif
 
diff --git a/arch/arm/boot/compressed/libfdt_env.h b/arch/arm/boot/compressed/libfdt_env.h
index 0743781..736ed36 100644
--- a/arch/arm/boot/compressed/libfdt_env.h
+++ b/arch/arm/boot/compressed/libfdt_env.h
@@ -17,4 +17,6 @@ typedef __be64 fdt64_t;
 #define fdt64_to_cpu(x)		be64_to_cpu(x)
 #define cpu_to_fdt64(x)		cpu_to_be64(x)
 
+#undef memset
+
 #endif
diff --git a/arch/arm/include/asm/string.h b/arch/arm/include/asm/string.h
index 111a1d8..1f9016b 100644
--- a/arch/arm/include/asm/string.h
+++ b/arch/arm/include/asm/string.h
@@ -15,15 +15,18 @@ extern char * strchr(const char * s, int c);
 
 #define __HAVE_ARCH_MEMCPY
 extern void * memcpy(void *, const void *, __kernel_size_t);
+extern void *__memcpy(void *dest, const void *src, __kernel_size_t n);
 
 #define __HAVE_ARCH_MEMMOVE
 extern void * memmove(void *, const void *, __kernel_size_t);
+extern void *__memmove(void *dest, const void *src, __kernel_size_t n);
 
 #define __HAVE_ARCH_MEMCHR
 extern void * memchr(const void *, int, __kernel_size_t);
 
 #define __HAVE_ARCH_MEMSET
 extern void * memset(void *, int, __kernel_size_t);
+extern void *__memset(void *s, int c, __kernel_size_t n);
 
 #define __HAVE_ARCH_MEMSET32
 extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t);
@@ -39,4 +42,18 @@ static inline void *memset64(uint64_t *p, uint64_t v, __kernel_size_t n)
 	return __memset64(p, v, n * 8, v >> 32);
 }
 
+
+
+#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
+
+/*
+ * For files that not instrumented (e.g. mm/slub.c) we
+ * should use not instrumented version of mem* functions.
+ */
+
+#define memcpy(dst, src, len) __memcpy(dst, src, len)
+#define memmove(dst, src, len) __memmove(dst, src, len)
+#define memset(s, c, n) __memset(s, c, n)
+#endif
+
 #endif
diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S
index 6e0375e..c79b829 100644
--- a/arch/arm/kernel/head-common.S
+++ b/arch/arm/kernel/head-common.S
@@ -99,7 +99,7 @@ __mmap_switched:
  THUMB(	ldmia	r4!, {r0, r1, r2, r3} )
  THUMB(	mov	sp, r3 )
 	sub	r2, r2, r1
-	bl	memcpy				@ copy .data to RAM
+	bl	__memcpy			@ copy .data to RAM
 #endif
 
    ARM(	ldmia	r4!, {r0, r1, sp} )
@@ -107,7 +107,7 @@ __mmap_switched:
  THUMB(	mov	sp, r3 )
 	sub	r2, r1, r0
 	mov	r1, #0
-	bl	memset				@ clear .bss
+	bl	__memset			@ clear .bss
 
 	ldmia	r4, {r0, r1, r2, r3}
 	str	r9, [r0]			@ Save processor ID
diff --git a/arch/arm/lib/memcpy.S b/arch/arm/lib/memcpy.S
index 64111bd..79a83f8 100644
--- a/arch/arm/lib/memcpy.S
+++ b/arch/arm/lib/memcpy.S
@@ -61,6 +61,8 @@
 
 /* Prototype: void *memcpy(void *dest, const void *src, size_t n); */
 
+.weak memcpy
+ENTRY(__memcpy)
 ENTRY(mmiocpy)
 ENTRY(memcpy)
 
@@ -68,3 +70,4 @@ ENTRY(memcpy)
 
 ENDPROC(memcpy)
 ENDPROC(mmiocpy)
+ENDPROC(__memcpy)
diff --git a/arch/arm/lib/memmove.S b/arch/arm/lib/memmove.S
index 69a9d47..313db6c 100644
--- a/arch/arm/lib/memmove.S
+++ b/arch/arm/lib/memmove.S
@@ -27,12 +27,14 @@
  * occurring in the opposite direction.
  */
 
+.weak memmove
+ENTRY(__memmove)
 ENTRY(memmove)
 	UNWIND(	.fnstart			)
 
 		subs	ip, r0, r1
 		cmphi	r2, ip
-		bls	memcpy
+		bls	__memcpy
 
 		stmfd	sp!, {r0, r4, lr}
 	UNWIND(	.fnend				)
@@ -225,3 +227,4 @@ ENTRY(memmove)
 18:		backward_copy_shift	push=24	pull=8
 
 ENDPROC(memmove)
+ENDPROC(__memmove)
diff --git a/arch/arm/lib/memset.S b/arch/arm/lib/memset.S
index ed6d35d..64aa06a 100644
--- a/arch/arm/lib/memset.S
+++ b/arch/arm/lib/memset.S
@@ -16,6 +16,8 @@
 	.text
 	.align	5
 
+.weak memset
+ENTRY(__memset)
 ENTRY(mmioset)
 ENTRY(memset)
 UNWIND( .fnstart         )
@@ -135,6 +137,7 @@ UNWIND( .fnstart            )
 UNWIND( .fnend   )
 ENDPROC(memset)
 ENDPROC(mmioset)
+ENDPROC(__memset)
 
 ENTRY(__memset32)
 UNWIND( .fnstart         )
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-doc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox