* [PATCH v2 5/5] riscv: dts: renesas: rzfive-smarc-som: Drop deleting interrupt properties from ETH0/1 nodes
From: Prabhakar @ 2024-04-03 20:35 UTC (permalink / raw)
To: Geert Uytterhoeven, Thomas Gleixner, Rob Herring,
Krzysztof Kozlowski, Conor Dooley, Magnus Damm, Paul Walmsley,
Palmer Dabbelt, Albert Ou
Cc: linux-kernel, devicetree, linux-renesas-soc, linux-riscv,
Prabhakar, Lad Prabhakar
In-Reply-To: <20240403203503.634465-1-prabhakar.mahadev-lad.rj@bp.renesas.com>
From: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Now that we have enabled IRQC support for RZ/Five SoC switch to interrupt
mode for ethernet0/1 PHYs instead of polling mode.
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
v1->v2
- Included RB tag from Geert
---
.../riscv/boot/dts/renesas/rzfive-smarc-som.dtsi | 16 ----------------
1 file changed, 16 deletions(-)
diff --git a/arch/riscv/boot/dts/renesas/rzfive-smarc-som.dtsi b/arch/riscv/boot/dts/renesas/rzfive-smarc-som.dtsi
index 72d9b6fba526..86b2f15375ec 100644
--- a/arch/riscv/boot/dts/renesas/rzfive-smarc-som.dtsi
+++ b/arch/riscv/boot/dts/renesas/rzfive-smarc-som.dtsi
@@ -7,22 +7,6 @@
#include <arm64/renesas/rzg2ul-smarc-som.dtsi>
-#if (!SW_ET0_EN_N)
-ð0 {
- phy0: ethernet-phy@7 {
- /delete-property/ interrupt-parent;
- /delete-property/ interrupts;
- };
-};
-#endif
-
-ð1 {
- phy1: ethernet-phy@7 {
- /delete-property/ interrupt-parent;
- /delete-property/ interrupts;
- };
-};
-
&sbc {
status = "disabled";
};
--
2.34.1
^ permalink raw reply related
* Re: [PATCH v3 0/6] Add Synopsys DesignWare HDMI RX Controller
From: Deborah Brouwer @ 2024-04-03 21:13 UTC (permalink / raw)
To: Krzysztof Kozlowski
Cc: Shreeya Patel, mchehab, hverkuil, hverkuil-cisco, heiko, robh,
krzysztof.kozlowski+dt, conor+dt, mturquette, sboyd, p.zabel,
shawn.wen, kernel, linux-kernel, linux-media, devicetree,
linux-arm-kernel, linux-rockchip, linux-clk, linux-arm
In-Reply-To: <86150c89-11d5-4d52-987e-974b1a03018f@linaro.org>
On Wed, Apr 03, 2024 at 01:24:05PM +0200, Krzysztof Kozlowski wrote:
> On 03/04/2024 13:20, Shreeya Patel wrote:
> > On Wednesday, April 03, 2024 15:51 IST, Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote:
> >
> >> On 03/04/2024 11:24, Shreeya Patel wrote:
> >>> On Thursday, March 28, 2024 04:20 IST, Shreeya Patel <shreeya.patel@collabora.com> wrote:
> >>>
> >>>> This series implements support for the Synopsys DesignWare
> >>>> HDMI RX Controller, being compliant with standard HDMI 1.4b
> >>>> and HDMI 2.0.
> >>>>
> >>>
> >>> Hi Mauro and Hans,
> >>>
> >>> I haven't received any reviews so far. Hence, this is just a gentle reminder to review this patch series.
> >>
> >> Why did you put clk changes here? These go via different subsystem. That
> >> might be one of obstacles for your patchset.
> >>
> >
> > I added clock changes in this patch series because HDMIRX driver depends on it.
> > I thought it is wrong to send the driver patches which don't even compile?
>
> Hm, why HDMIRX driver depends on clock? How? This sounds really wrong.
> Please get it reviewed internally first.
>
> >
> > Since you are a more experienced developer, can you help me understand what would
> > be the right way to send patches in such scenarios?
>
> I am not the substitute for your Collabora engineers and peers. You do
> not get free work from the community. First, do the work and review
> internally, to solve all trivial things, like how to submit patches
> upstream or how to make your driver buildable, and then ask community
> for the review.
I don't think Shreeya was asking for "free" work from the community.
Her question wasn't trivial or obvious since reasonable people seem to sometimes
disagree about where to send a patch especially if it's needed to make a series compile.
I heard the issue was already resolved but had to say something since this accusation
seemed so unfair.
>
> Best regards,
> Krzysztof
>
>
^ permalink raw reply
* [PATCH v9 0/4] PCI: brcmstb: Configure appropriate HW CLKREQ# mode
From: Jim Quinlan @ 2024-04-03 21:38 UTC (permalink / raw)
To: linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas,
Lorenzo Pieralisi, Cyril Brulebois, Phil Elwell,
bcm-kernel-feedback-list, james.quinlan
Cc: Conor Dooley,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
Florian Fainelli, Jim Quinlan, Krzysztof Kozlowski,
Krzysztof Wilczyński,
moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
open list,
moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
Lorenzo Pieralisi, Rob Herring
[-- Attachment #1: Type: text/plain, Size: 5702 bytes --]
v9 -- v8 was setting an internal bus timeout to accomodate large L1 exit
latencies. After meeting the PCIe HW team it was revealed that the
HW default timeout value was set low for the purposes of HW debugging
convenience; for nominal operation it needs to be set to a higher
value independent of this submission's purpose. This is now a
separate commit.
-- With v8, Bjorne asked what was preventing a device from exceeding the
time required for the above internal bus timeout. The answer to this
is for us to set the endpoints' max latency {no-,}snoop value to
something below this internal timeout value. If the endpoint
respects this value as it should, it will not send an LTR request
with a larger latency value and not put itself in a situation
that requires more latency than is possible for the platform.
Typically, ACPI or FW sets these max latency values. In most of our
systems we do not have this happening so it is up to the RC driver to
set these values in the endpoint devices. If the endpoints already
have non-zero values that are lower than what we are setting, we let
them be, as it is possible ACPI or FW set them and knows something
that we do not.
-- The "clkreq" commit has only been changed to remove the code that was
setting the timeout value, as this code is now its own commit.
v8 -- Un-advertise L1SS capability when in "no-l1ss" mode (Bjorn)
-- Squashed last two commits of v7 (Bjorn)
-- Fix DT binding description text wrapping (Bjorn)
-- Fix incorrect Spec reference (Bjorn)
s/PCIe Spec/PCIe Express Mini CEM 2.1 specification/
-- Text substitutions (Bjorn)
s/WRT/With respect to/
s/Tclron/T_CLRon/
v7 -- Manivannan Sadhasivam suggested (a) making the property look like a
network phy-mode and (b) keeping the code simple (not counting clkreq
signal appearances, un-advertising capabilites, etc). This is
what I have done. The property is now "brcm,clkreq-mode" and
the values may be one of "safe", "default", and "no-l1ss". The
default setting is to employ the most capable power savings mode.
v6 -- No code has been changed.
-- Changed commit subject and comment in "#PERST" commit (Bjorn, Cyril)
-- Changed sign-off and author email address for all commits.
This was due to a change in Broadcom's upstreaming policy.
v5 -- Remove DT property "brcm,completion-timeout-us" from
"DT bindings" commit. Although this error may be reported
as a completion timeout, its cause was traced to an
internal bus timeout which may occur even when there is
no PCIe access being processed. We set a timeout of four
seconds only if we are operating in "L1SS CLKREQ#" mode.
-- Correct CEM 2.0 reference provided by HW engineer,
s/3.2.5.2.5/3.2.5.2.2/ (Bjorn)
-- Add newline to dev_info() string (Stefan)
-- Change variable rval to unsigned (Stefan)
-- s/implementaion/implementation/ (Bjorn)
-- s/superpowersave/powersupersave/ (Bjorn)
-- Slightly modify message on "PERST#" commit.
-- Rebase to torvalds master
v4 -- New commit that asserts PERST# for 2711/RPi SOCs at PCIe RC
driver probe() time. This is done in Raspian Linux and its
absence may be the cause of a failing test case.
-- New commit that removes stale comment.
v3 -- Rewrote commit msgs and comments refering panics if L1SS
is enabled/disabled; the code snippet that unadvertises L1SS
eliminates the panic scenario. (Bjorn)
-- Add reference for "400ns of CLKREQ# assertion" blurb (Bjorn)
-- Put binding names in DT commit Subject (Bjorn)
-- Add a verb to a commit's subject line (Bjorn)
-- s/accomodat(\w+)/accommodat$1/g (Bjorn)
-- Rewrote commit msgs and comments refering panics if L1SS
is enabled/disabled; the code snippet that unadvertises L1SS
eliminates the panic scenario. (Bjorn)
v2 -- Changed binding property 'brcm,completion-timeout-msec' to
'brcm,completion-timeout-us'. (StefanW for standard suffix).
-- Warn when clamping timeout value, and include clamped
region in message. Also add min and max in YAML. (StefanW)
-- Qualify description of "brcm,completion-timeout-us" so that
it refers to PCIe transactions. (StefanW)
-- Remvove mention of Linux specifics in binding description. (StefanW)
-- s/clkreq#/CLKREQ#/g (Bjorn)
-- Refactor completion-timeout-us code to compare max and min to
value given by the property (as opposed to the computed value).
v1 -- The current driver assumes the downstream devices can
provide CLKREQ# for ASPM. These commits accomodate devices
w/ or w/o clkreq# and also handle L1SS-capable devices.
-- The Raspian Linux folks have already been using a PCIe RC
property "brcm,enable-l1ss". These commits use the same
property, in a backward-compatible manner, and the implementaion
adds more detail and also automatically identifies devices w/o
a clkreq# signal, i.e. most devices plugged into an RPi CM4
IO board.
Jim Quinlan (4):
dt-bindings: PCI: brcmstb: Add property "brcm,clkreq-mode"
PCI: brcmstb: Set reasonable value for internal bus timeout
PCI: brcmstb: Set downstream maximum {no-}snoop LTR values
PCI: brcmstb: Configure HW CLKREQ# mode appropriate for downstream
device
.../bindings/pci/brcm,stb-pcie.yaml | 18 ++
drivers/pci/controller/pcie-brcmstb.c | 161 +++++++++++++++++-
2 files changed, 170 insertions(+), 9 deletions(-)
base-commit: 9f8413c4a66f2fb776d3dc3c9ed20bf435eb305e
--
2.17.1
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4210 bytes --]
^ permalink raw reply
* [PATCH v9 1/4] dt-bindings: PCI: brcmstb: Add property "brcm,clkreq-mode"
From: Jim Quinlan @ 2024-04-03 21:38 UTC (permalink / raw)
To: linux-pci, Nicolas Saenz Julienne, Bjorn Helgaas,
Lorenzo Pieralisi, Cyril Brulebois, Phil Elwell,
bcm-kernel-feedback-list, james.quinlan
Cc: Jim Quinlan, Florian Fainelli, Lorenzo Pieralisi,
Krzysztof Wilczyński, Rob Herring, Krzysztof Kozlowski,
Conor Dooley, moderated list:BROADCOM BCM7XXX ARM ARCHITECTURE,
moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
open list
In-Reply-To: <20240403213902.26391-1-james.quinlan@broadcom.com>
[-- Attachment #1: Type: text/plain, Size: 2171 bytes --]
The Broadcom STB/CM PCIe HW -- a core that is also used by RPi SOCs --
requires the driver to deliberately place the RC HW one of three CLKREQ#
modes. The "brcm,clkreq-mode" property allows the user to override the
default setting. If this property is omitted, the default mode shall be
"default".
Signed-off-by: Jim Quinlan <james.quinlan@broadcom.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Rob Herring <robh@kernel.org>
---
.../devicetree/bindings/pci/brcm,stb-pcie.yaml | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml b/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
index 7e15aae7d69e..22491f7f8852 100644
--- a/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
+++ b/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
@@ -64,6 +64,24 @@ properties:
aspm-no-l0s: true
+ brcm,clkreq-mode:
+ description: A string that determines the operating
+ clkreq mode of the PCIe RC HW with respect to controlling the refclk
+ signal. There are three different modes -- "safe", which drives the
+ refclk signal unconditionally and will work for all devices but does
+ not provide any power savings; "no-l1ss" -- which provides Clock
+ Power Management, L0s, and L1, but cannot provide L1 substate (L1SS)
+ power savings. If the downstream device connected to the RC is L1SS
+ capable AND the OS enables L1SS, all PCIe traffic may abruptly halt,
+ potentially hanging the system; "default" -- which provides L0s, L1,
+ and L1SS, but not compliant to provide Clock Power Management;
+ specifically, may not be able to meet the T_CLRon max timing of 400ns
+ as specified in "Dynamic Clock Control", section 3.2.5.2.2 PCI
+ Express Mini CEM 2.1 specification. This situation is atypical and
+ should happen only with older devices.
+ $ref: /schemas/types.yaml#/definitions/string
+ enum: [ safe, no-l1ss, default ]
+
brcm,scb-sizes:
description: u64 giving the 64bit PCIe memory
viewport size of a memory controller. There may be up to
--
2.17.1
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4210 bytes --]
^ permalink raw reply related
* Re: [PATCH net-next v2 0/9] Add support for OPEN Alliance 10BASE-T1x MACPHY Serial Interface
From: Benjamin Bigler @ 2024-04-03 21:40 UTC (permalink / raw)
To: Parthiban.Veerasooran
Cc: netdev, devicetree, linux-kernel, linux-doc, Horatiu.Vultur,
Woojung.Huh, Nicolas.Ferre, UNGLinuxDriver, Thorsten.Kummermehr,
davem, edumazet, kuba, pabeni, robh+dt, krzysztof.kozlowski+dt,
conor+dt, corbet, Steen.Hegelund, rdunlap, horms, casper.casan,
andrew
In-Reply-To: <0596fce8-223b-494e-907e-f13d75f211cd@microchip.com>
Hi Parthiban,
Sorry for the late answer, I was quite busy the last few days.
On Mon, 2024-03-25 at 13:24 +0000, Parthiban.Veerasooran@microchip.com wrote:
> Hi Benjamin Bigler,
>
> Thank you for your testing and feedback. It would be really helpful to
> bring the driver to a good shape. We really appreciate your efforts on this.
>
> On 24/03/24 5:25 pm, Benjamin Bigler wrote:
> > [Some people who received this message don't often get email from benjamin@bigler.one. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> >
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> >
> > Hi Parthiban
> >
> > I hope I send this in the right context as it is not related to just one patch or
> > some specific code.
> >
> > I conducted UDP load testing using three i.MX8MM boards in conjunction with the
> > LAN8651. The setup involved one board functioning as a server, which is just
> > echoing back received data, while the remaining two boards acted as clients,
> > sending UDP packets of different sizes in various bursts to the server.
> > Due to hardware constraints, the SPI bus speed was limited to 15 MHz, which might
> > have influenced the results.
> >
> > During the tests I experienced some issues:
> >
> > - The boards just start receiving after first sending something (ping another board).
> > Some measurements showed that the irq stays asserted after init. This makes sense
> > as far as I understand the chapter 7.7 of the specification, the irq is deasserted
> > on reception of the first data header following CSn being asserted. As a workaround
> > I trigger the thread at the end of oa_tc6_init.
> It looks like the IRQ is asserted on RESET completion and expects a data
> chunk from host to deassert the IRQ. I used to test the driver in RPI 4
> using iperf3. For some reason I never faced this issue, may be when the
> network device is being registered there might be some packet
> transmission which leads to deliver a data chunk so that the IRQ is
> deasserted. Thanks for the workaround. I think that would be the
> solution to solve this issue. Adding the below lines in the end of the
> function oa_tc6_init() will trigger the oa_tc6_spi_thread_handler() to
> perform an empty data chunk transfer which will deassert the IRQ before
> starting the actual data transfer.
I have ipv6 disabled and use static ipv4 addresses. That could be the reason why on
my side no packet is sent.
>
> /* oa_tc6_sw_reset_macphy() function resets and clears the MAC-PHY reset
> * complete status. IRQ is also asserted on reset completion and it is
> * remain asserted until MAC-PHY receives a data chunk. So performing an
> * empty data chunk transmission will deassert the IRQ. Refer section
> * 7.7 and 9.2.8.8 in the OPEN Alliance specification for more details.
> */
> tc6->int_flag = true;
> wake_up_interruptible(&tc6->spi_wq);
Perfect, thats the same I added and also works on my side.
> >
> > - If there is a lot of traffic, the receive buffer overflow error spams the log.
> >
> > - If there is a lot of traffic, I got various kernel panics in oa_tc6_update_rx_skb.
> > Mostly because more data to rx_skb is added than allocated and sometimes because
> > rx_skb is null in oa_tc6_update_rx_skb or oa_tc6_prcs_rx_frame_end. Some debugging
> > with a logic analyzer showed that the chip is not behave correctly. There is more
> > bytes between start_valid and end_valid than there should be. Also there
> > seems to be 2 end_valid without a start_valid between. What is common is that the incorrect
> > frame starts in a chunk where end_valid and start_valid is set.
> > In my opinion its a problem in the chip (maybe related to the errata in the next point)
> > but the driver should be resilent and just drop the packet and not cause a kernel panic.
> Usually I run into this issue "receive buffer overflow" when I run RPI 4
> with default cpu governor setting which is "ondemand". In this case,
> even though if I set SPI clock speed as 15 MHz the RPI 4 core clock is
> clocking down when it is idle which leads delivering half of the
> configured SPI clock speed around 5.9 MHz. So the systems like RPI 4
> need performance mode enabled to get the proper clock speed for SPI.
> Refer below link for more details.
>
> https://github.com/raspberrypi/linux/issues/3381#issuecomment-1144723750
>
> I used to enable performance mode using the below command.
>
> echo performance | sudo tee
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > /dev/null
>
> So please ensure the SPI clock speed using a logic analyzer to get the
> maximum throughput without receive buffer overflow.
>
> Of course, I agree that the driver should not crash in case of receive
> buffer overflow. By referring your investigations, I understand that the
> buffers in the MAC-PHY is being continuously overwritten again and again
> as the host is very slow to read the data from the MAC-PHY buffers
> through SPI which alters the descriptors. There might be two reasons why
> we run into this situation.
> 1. The host is busy doing something else and delays to initiate SPI even
> though SPI clock speed is 15 MHz.
> 2. The SPI clock speed is less than 15 MHz.
Sorry there is a missunderstanding between us. The receive buffer overflow is not
causing any harm except filling the log. In my setup I get in one day about 35000
entries. I am not sure if its appropriate to log these errors.
The SPI Frequency is at 14.8 MHz. If I just have 2 boards connected, I am not able
to reproduce this. Only with 3 boards when 2 boards sends multiple big ethernet
frames (1512 byte per Frame) to one, I get these log entries.
The latency seems to be quite low, from IRQ to start reading first frame it takes
always less than 500us. Also the boards are just running the udp test.
>
> I use the below iperf3 setup for my testing and never faced the driver
> crash issue even though faced "receive buffer overflow" error when I run
> RPI 4 with "ondemand" default mode.
>
> Node 0 - Raspberry Pi 4 with LAN8650 MAC-PHY
> $ iperf3 -s
> Node 1 - Raspberry Pi 4 with EVB-LAN8670-USB USB Stick
> $ iperf3 -c 192.168.5.100 -u -b 10M -i 1 -t 0
>
> and vice versa.
>
> I never faced "receive buffer overflow" error when I run RPI 4 with
> "performance" mode enabled and even though all the cores are stressed
> using the below command,
>
> $ yes >/dev/null & yes >/dev/null & yes >/dev/null & yes >/dev/null &
>
> Can you share more details about your testing setup and applications you
> use, so that I will try to reproduce the issue in my setup to debug the
> driver?
I use a internal tool which does some stress tests using udp. Unfortunately,
I am not allowed to publish it, but a colleague works on a rust implementation,
which we can publish, but its not fully ready yet.
On one board the tool is running in server mode. It just echoes back the received
data. On the 2 other boards the tool is running in client mode. It sends various
sized udp-packets in different bursts and then checks if it receives the same
data in the same order.
The crashes only happens when ZARFE is not set (with Rev B0). When the crash
happens, I see on the logic analyzer that there are more bytes than mtu + headers
between the frame where start_valid is set and the frame where end_valid is set.
Then this happens:
[ 437.155673] skbuff: skb_over_panic: text:ffff80007a8c2bd8 len:1600 put:64 head:ffff00000de28080
data:ffff00000de280c0 tail:0x680 end:0x640 dev:eth1
[ 437.168987] kernel BUG at net/core/skbuff.c:192!
[ 437.173612] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
[ 437.180407] Modules linked in: ppp_async crc_ccitt ppp_generic slhc lan865x oa_tc6 bec_infoo(O)
tpm_tis_spi tpm_tis_core spi_imx imx_sdma
[ 437.196016] CPU: 1 PID: 455 Comm: oa-tc6-spi-thre Tainted: G O 6.6.11-
gce336e2c2bc3-dirty #1
[ 437.205853] Hardware name: Toradex Verdin iMX8M Mini on FUMU (DT)
[ 437.212820] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 437.219790] pc : skb_panic+0x58/0x5c
[ 437.223376] lr : skb_panic+0x58/0x5c
[ 437.226959] sp : ffff80008362bd90
[ 437.230278] x29: ffff80008362bda0 x28: 0000000000000000 x27: ffff000001066878
[ 437.237426] x26: 000000000000001e x25: 00000000000007f8 x24: ffff0000010cea80
[ 437.244571] x23: 00000000f0f0f0f1 x22: 000000000000001f x21: 0000000000000000
[ 437.251720] x20: ffff0000010ceaa8 x19: 000000003f20003f x18: ffffffffffffffff
[ 437.258867] x17: ffff7ffffded9000 x16: ffff800080008000 x15: 073a0764076e0765
[ 437.266015] x14: 0720073007380736 x13: ffff8000823d1f58 x12: 0000000000000534
[ 437.273162] x11: 00000000000001bc x10: ffff800082429f58 x9 : ffff8000823d1f58
[ 437.280310] x8 : 00000000ffffefff x7 : ffff800082429f58 x6 : 0000000000000000
[ 437.287455] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000
[ 437.294606] x2 : 0000000000000000 x1 : ffff000001223b00 x0 : 0000000000000087
[ 437.301753] Call trace:
[ 437.304203] skb_panic+0x58/0x5c
[ 437.307436] skb_find_text+0x0/0xf0
[ 437.310933] oa_tc6_spi_thread_handler+0x438/0x880 [oa_tc6]
[ 437.316523] kthread+0x118/0x11c
[ 437.319758] ret_from_fork+0x10/0x20
[ 437.323343] Code: f90007e9 b940b908 f90003e8 97ca3c34 (d4210000)
[ 437.329446] ---[ end trace 0000000000000000 ]---
Sometimes there are 2 end_valid after eachother without a start_valid between.
Then this happens:
[ 469.737297] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000074
[ 469.746137] Mem abort info:
[ 469.748950] ESR = 0x0000000096000004
[ 469.752709] EC = 0x25: DABT (current EL), IL = 32 bits
[ 469.758036] SET = 0, FnV = 0
[ 469.761098] EA = 0, S1PTW = 0
[ 469.764252] FSC = 0x04: level 0 translation fault
[ 469.769144] Data abort info:
[ 469.772033] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 469.777529] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 469.782594] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 469.787921] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000043c32000
[ 469.794377] [0000000000000074] pgd=0000000000000000, p4d=0000000000000000
[ 469.801184] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 469.807459] Modules linked in: ppp_async crc_ccitt ppp_generic slhc lan865x oa_tc6 bec_infoo(O)
tpm_tis_spi tpm_tis_core spi_imx imx_sdma
[ 469.823064] CPU: 2 PID: 456 Comm: oa-tc6-spi-thre Tainted: G O 6.6.11-
g350ed394a6ca-dirty #1
[ 469.832903] Hardware name: Toradex Verdin iMX8M Mini on FUMU (DT)
[ 469.839871] pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 469.846841] pc : skb_put+0xc/0x6c
[ 469.850169] lr : oa_tc6_spi_thread_handler+0x438/0x880 [oa_tc6]
[ 469.856106] sp : ffff80008376bdb0
[ 469.859424] x29: ffff80008376bdb0 x28: 0000000000000000 x27: ffff00000194c080
[ 469.866573] x26: 0000000000000000 x25: 0000000000000000 x24: ffff000001095c80
[ 469.873720] x23: 00000000f0f0f0f1 x22: 000000000000001f x21: 0000000000000000
[ 469.880870] x20: ffff000001095ca8 x19: 000000003f20003f x18: 0000000000000000
[ 469.888023] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 469.895174] x14: 0000031acf8b86d8 x13: 0000000000000000 x12: 0000000000000000
[ 469.902321] x11: 0000000000000002 x10: 0000000000000a60 x9 : ffff80008376b970
[ 469.909467] x8 : ffff00007fb6e580 x7 : 000000000194b080 x6 : 0000000000000000
[ 469.916616] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 000000000000fc80
[ 469.923765] x2 : 0000000000000001 x1 : 0000000000000040 x0 : 0000000000000000
[ 469.930915] Call trace:
[ 469.933365] skb_put+0xc/0x6c
[ 469.936342] oa_tc6_spi_thread_handler+0x438/0x880 [oa_tc6]
[ 469.941929] kthread+0x118/0x11c
[ 469.945166] ret_from_fork+0x10/0x20
[ 469.948752] Code: d65f03c0 d503233f a9bf7bfd 910003fd (b9407406)
[ 469.954854] ---[ end trace 0000000000000000 ]---
If interested I can try to get a recording with the logic analyzer and send it to you.
By the way in the other answer you attached a screenshot of the logic analyzer and you
have a very nice HLA for oa_tc6. Are they open-source or are there any plans to publish them?
> >
> > - Sometimes the chip stops working. It always asserts the irq but there is no data (rca=0)
> > and also exst is not active. I found out that there is an errata (DS80001075) point s3
> > that explains this. I set the ZARFE bit in CONFIG0. This also fixes the point above.
> > The driver now works since about 2.5 weeks with various load with just one loss of frame
> > error where I had to reboot the system after about 4 days.
> It is good to hear that the driver works fine with the above changes. As
> mentioned in the errata, this continuous interrupt issue is a known
> issue with LAN8651 Rev.B0. Switching to LAN8651 Rev.B1 will solve this
> issue and no need of any workaround. Setting ZARFE bit in the CONFIG0
> will solve the continuous interrupt issue but don't know how the above
> "receive buffer overflow" issue also solved. I think it is a good idea
> to test with LAN8651 Rev.B1 without setting ZARFE bit once. It would be
> interesting to see the result. I am always using LAN8651 Rev.B1 for my
> testing.
Unfortunately I just have LAN8651 Rev. B0 Chips. Are you sure that the Rev B1 has the
issue fixed? The errata here says that B1 is affected too:
https://ww1.microchip.com/downloads/aemDocuments/documents/AIS/ProductDocuments/Errata/LAN8650-1-Errata-80001075.pdf
>
> I should be able to reproduce the "receive buffer overflow" issue and
> consequently kernel crash in my setup with LAN8651 Rev.B1 so that I can
> investigate the issue further. As I am not able to reproduce in my RPI
> 4, I need your support for the tests and applications you used in your
> setup.
> >
> > Is there a reason why you removed the netdev watchdog which was active in v2?
> When the timeout occurs, there is no further action except increasing
> tx_errors. Not seeing this except USB-to-Ethernet which can be removed
> unexpectedly. But this is SPI interface which will not be removed
> unexpectedly as it is a platform device. That's why we removed this.
>
> Best regards,
> Parthiban V
> >
> > Thanks,
> > Benjamin Bigler
> >
>
Thanks,
Benjamin Bigler
^ permalink raw reply
* Re: [PATCH v3 0/6] Add Synopsys DesignWare HDMI RX Controller
From: Heiko Stübner @ 2024-04-03 22:48 UTC (permalink / raw)
To: Shreeya Patel, Krzysztof Kozlowski
Cc: mchehab, hverkuil, hverkuil-cisco, robh, krzysztof.kozlowski+dt,
conor+dt, mturquette, sboyd, p.zabel, shawn.wen, kernel,
linux-kernel, linux-media, devicetree, linux-arm-kernel,
linux-rockchip, linux-clk, linux-arm
In-Reply-To: <86150c89-11d5-4d52-987e-974b1a03018f@linaro.org>
Am Mittwoch, 3. April 2024, 13:24:05 CEST schrieb Krzysztof Kozlowski:
> On 03/04/2024 13:20, Shreeya Patel wrote:
> > On Wednesday, April 03, 2024 15:51 IST, Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> wrote:
> >
> >> On 03/04/2024 11:24, Shreeya Patel wrote:
> >>> On Thursday, March 28, 2024 04:20 IST, Shreeya Patel <shreeya.patel@collabora.com> wrote:
> >>>
> >>>> This series implements support for the Synopsys DesignWare
> >>>> HDMI RX Controller, being compliant with standard HDMI 1.4b
> >>>> and HDMI 2.0.
> >>>>
> >>>
> >>> Hi Mauro and Hans,
> >>>
> >>> I haven't received any reviews so far. Hence, this is just a gentle reminder to review this patch series.
> >>
> >> Why did you put clk changes here? These go via different subsystem. That
> >> might be one of obstacles for your patchset.
> >>
> >
> > I added clock changes in this patch series because HDMIRX driver depends on it.
> > I thought it is wrong to send the driver patches which don't even compile?
>
> Hm, why HDMIRX driver depends on clock? How? This sounds really wrong.
> Please get it reviewed internally first.
For the change in question, the clock controller on the soc also handles
the reset controls (hence its name CRU, clock-and-reset-unit) .
There are at least 660 reset lines in the unit and it seems the hdmi-rx one
was overlooked on the initial submission, hence patches 1+2 add the
reset-line.
Of course, here only the "arm64: dts:" patch depends on the clock
change, is it references the new reset-id.
Am Mittwoch, 3. April 2024, 12:22:57 CEST schrieb Krzysztof Kozlowski:
> Please do not engage multiple subsystems in one patchset, if not
> necessary. Especially do not mix DTS into media or USB subsystems. And
> do not put DTS in the middle!
picking up your reply from patch 4/6, there seem to be different "schools
of thought" for this. Some maintainers might want to really only see
patches that are explicitly for their subsystem - I guess networking
might be a prime example for that, who will essentially apply whole series'
if nobody protests in time (including dts patches)
On the other hand I also remember seeing requests for "the full picture"
and individual maintainers then just picking and applying the patches
meant for their subsystem.
The series as it stands right now is nice in that it allows (random)
developers to just pick it up, apply it to a tree and test the actual driver
without needing to hunt for multiple dependant series.
Of course you're right, the "arm64: dts:" patch should be the last in the
series and not be in the middle of it.
Regards
Heiko
^ permalink raw reply
* [PATCH v3 00/29] riscv control-flow integrity for usermode
From: Deepak Gupta @ 2024-04-03 23:34 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
Sending out v3 for cpu assisted riscv user mode control flow integrity.
v2 [9] was sent a week ago for this riscv usermode control flow integrity
enabling. RFC patchset was (v1) early this year (January) [7].
changes in v3
--------------
envcfg:
logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been
picked on per task basis, even though CPU didn't implement it. Fixed in
this series.
dt-bindings:
As suggested, split into separate commit. fixed the messaging that spec is
in public review
arch_is_shadow_stack change:
arch_is_shadow_stack changed to vma_is_shadow_stack
hwprobe:
zicfiss / zicfilp if present will get enumerated in hwprobe
selftests:
As suggested, added object and binary filenames to .gitignore
Selftest binary anyways need to be compiled with cfi enabled compiler which
will make sure that landing pad and shadow stack are enabled. Thus removed
separate enable/disable tests. Cleaned up tests a bit.
changes in v2
---------------
As part of testing effort, compiled a rootfs with shadow stack and landing
pad enabled (libraries and binaries) and booted to shell. As part of long
running tests, I have been able to run some spec 2006 benchmarks [8] (here
link is provided only for list of benchmarks that were tested for long
running tests, excel sheet provided here actually is for some static stats
like code size growth on spec binaries). Thus converting from RFC to
regular patchset.
Securing control-flow integrity for usermode requires following
- Securing forward control flow : All callsites must reach
reach a target that they actually intend to reach.
- Securing backward control flow : All function returns must
return to location where they were called from.
This patch series use riscv cpu extension `zicfilp` [2] to secure forward
control flow and `zicfiss` [2] to secure backward control flow. `zicfilp`
enforces that all indirect calls or jmps must land on a landing pad instr
and label embedded in landing pad instr must match a value programmed in
`x7` register (at callsite via compiler). `zicfiss` introduces shadow stack
which can only be writeable via shadow stack instructions (sspush and
ssamoswap) and thus can't be tampered with via inadvertent stores. More
details about extension can be read from [2] and there are details in
documentation as well (in this patch series).
Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
Enabling of control flow integrity for user programs is left to user runtime
(specifically expected from dynamic loader). There has been a lot of earlier
discussion on the enabling topic around x86 shadow stack enabling [3, 4, 5] and
overall consensus had been to let dynamic loader (or usermode) to decide for
enabling the feature.
This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv. arm64 is expected
to implement shadow stack part of these arch agnostic `prctls` [6]
Changes since last time
***********************
Spec changes
------------
- Forward cfi spec has become much simpler. `lpad` instruction is pseudo for
`auipc rd, <20bit_imm>`. `lpad` checks x7 against 20bit embedded in instr.
Thus label width is 20bit.
- Shadow stack management instructions are reduced to
sspush - to push x1/x5 on shadow stack
sspopchk - pops from shadow stack and comapres with x1/x5.
ssamoswap - atomically swap value on shadow stack.
rdssp - reads current shadow stack pointer
- Shadow stack accesses on readonly memory always raise AMO/store page fault.
`sspopchk` is load but if underlying page is readonly, it'll raise a store
page fault. It simplifies hardware and kernel for COW handling for shadow
stack pages.
- riscv defines a new exception type `software check exception` and control flow
violations raise software check exception.
- enabling controls for shadow stack and landing are in xenvcfg CSR and controls
lower privilege mode enabling. As an example senvcfg controls enabling for U and
menvcfg controls enabling for S mode.
core mm shadow stack enabling
-----------------------------
Shadow stack for x86 usermode are now in mainline and thus this patch
series builds on top of that for arch-agnostic mm related changes. Big
thanks and shout out to Rick Edgecombe for that.
selftests
---------
Created some minimal selftests to test the patch series.
[1] - https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
[2] - https://github.com/riscv/riscv-cfi
[3] - https://lore.kernel.org/lkml/ZWHcBq0bJ+15eeKs@finisterre.sirena.org.uk/T/#mb121cd8b33d564e64234595a0ec52211479cf474
[4] - https://lore.kernel.org/all/20220130211838.8382-1-rick.p.edgecombe@intel.com/
[5] - https://lore.kernel.org/lkml/CAHk-=wgP5mk3poVeejw16Asbid0ghDt4okHnWaWKLBkRhQntRA@mail.gmail.com/
[6] - https://lore.kernel.org/linux-mm/20231122-arm64-gcs-v7-2-201c483bd775@kernel.org/
[7] - https://lore.kernel.org/lkml/20240125062739.1339782-1-debug@rivosinc.com/
[8] - https://docs.google.com/spreadsheets/d/1_cHGH4ctNVvFRiS7hW9dEGKtXLAJ3aX4Z_iTSa3Tw2U/edit#gid=0
[9] - https://lore.kernel.org/lkml/20240329044459.3990638-1-debug@rivosinc.com/
^ permalink raw reply
* [PATCH v3 01/29] riscv: envcfg save and restore on task switching
From: Deepak Gupta @ 2024-04-03 23:34 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
envcfg CSR defines enabling bits for cache management instructions and
soon will control enabling for control flow integrity and pointer
masking features.
Control flow integrity enabling for forward cfi and backward cfi are
controlled via envcfg and thus need to be enabled on per thread basis.
This patch creates a place holder for envcfg CSR in `thread_info` and
adds logic to save and restore on task switching.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
arch/riscv/include/asm/switch_to.h | 10 ++++++++++
arch/riscv/include/asm/thread_info.h | 1 +
2 files changed, 11 insertions(+)
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 7efdb0584d47..2d9a00a30394 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -69,6 +69,15 @@ static __always_inline bool has_fpu(void) { return false; }
#define __switch_to_fpu(__prev, __next) do { } while (0)
#endif
+static inline void __switch_to_envcfg(struct task_struct *next)
+{
+ register unsigned long envcfg = next->thread_info.envcfg;
+
+ asm volatile (ALTERNATIVE("nop", "csrw " __stringify(CSR_ENVCFG) ", %0", 0,
+ RISCV_ISA_EXT_XLINUXENVCFG, 1)
+ :: "r" (envcfg) : "memory");
+}
+
extern struct task_struct *__switch_to(struct task_struct *,
struct task_struct *);
@@ -80,6 +89,7 @@ do { \
__switch_to_fpu(__prev, __next); \
if (has_vector()) \
__switch_to_vector(__prev, __next); \
+ __switch_to_envcfg(__next); \
((last) = __switch_to(__prev, __next)); \
} while (0)
diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
index 5d473343634b..a503bdc2f6dd 100644
--- a/arch/riscv/include/asm/thread_info.h
+++ b/arch/riscv/include/asm/thread_info.h
@@ -56,6 +56,7 @@ struct thread_info {
long user_sp; /* User stack pointer */
int cpu;
unsigned long syscall_work; /* SYSCALL_WORK_ flags */
+ unsigned long envcfg;
#ifdef CONFIG_SHADOW_CALL_STACK
void *scs_base;
void *scs_sp;
--
2.43.2
^ permalink raw reply related
* [PATCH v3 02/29] riscv: define default value for envcfg for task
From: Deepak Gupta @ 2024-04-03 23:34 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
Defines a base default value for envcfg per task. By default all tasks
should have cache zeroing capability. Any future base capabilities that
apply to all tasks can be turned on same way.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
arch/riscv/include/asm/csr.h | 2 ++
arch/riscv/kernel/process.c | 6 ++++++
2 files changed, 8 insertions(+)
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index 2468c55933cd..bbd2207adb39 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -202,6 +202,8 @@
#define ENVCFG_CBIE_FLUSH _AC(0x1, UL)
#define ENVCFG_CBIE_INV _AC(0x3, UL)
#define ENVCFG_FIOM _AC(0x1, UL)
+/* by default all threads should be able to zero cache */
+#define ENVCFG_BASE ENVCFG_CBZE
/* Smstateen bits */
#define SMSTATEEN0_AIA_IMSIC_SHIFT 58
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 92922dbd5b5c..d3109557f951 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -152,6 +152,12 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
else
regs->status |= SR_UXL_64;
#endif
+ /*
+ * read current envcfg settings, AND it with base settings applicable
+ * for all the tasks. Base settings should've been set up during CPU
+ * bring up.
+ */
+ current->thread_info.envcfg = csr_read(CSR_ENVCFG) & ENVCFG_BASE;
}
void flush_thread(void)
--
2.43.2
^ permalink raw reply related
* [PATCH v3 03/29] riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv
From: Deepak Gupta @ 2024-04-03 23:34 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
riscv will need an implementation for exit_thread to clean up shadow stack
when thread exits. If current thread had shadow stack enabled, shadow
stack is allocated by default for any new thread.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
arch/riscv/Kconfig | 1 +
arch/riscv/kernel/process.c | 5 +++++
2 files changed, 6 insertions(+)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e3142ce531a0..7e0b2bcc388f 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -149,6 +149,7 @@ config RISCV
select HAVE_SAMPLE_FTRACE_DIRECT_MULTI
select HAVE_STACKPROTECTOR
select HAVE_SYSCALL_TRACEPOINTS
+ select HAVE_EXIT_THREAD
select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index d3109557f951..ce577cdc2af3 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -200,6 +200,11 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
return 0;
}
+void exit_thread(struct task_struct *tsk)
+{
+
+}
+
int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
{
unsigned long clone_flags = args->flags;
--
2.43.2
^ permalink raw reply related
* [PATCH v3 04/29] riscv: zicfilp / zicfiss in dt-bindings (extensions.yaml)
From: Deepak Gupta @ 2024-04-03 23:34 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
Make an entry for cfi extensions in extensions.yaml.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
.../devicetree/bindings/riscv/extensions.yaml | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/Documentation/devicetree/bindings/riscv/extensions.yaml b/Documentation/devicetree/bindings/riscv/extensions.yaml
index 63d81dc895e5..45b87ad6cc1c 100644
--- a/Documentation/devicetree/bindings/riscv/extensions.yaml
+++ b/Documentation/devicetree/bindings/riscv/extensions.yaml
@@ -317,6 +317,16 @@ properties:
The standard Zicboz extension for cache-block zeroing as ratified
in commit 3dd606f ("Create cmobase-v1.0.pdf") of riscv-CMOs.
+ - const: zicfilp
+ description:
+ The standard Zicfilp extension for enforcing forward edge control-flow
+ integrity in commit 3a20dc9 of riscv-cfi and is in public review.
+
+ - const: zicfiss
+ description:
+ The standard Zicfiss extension for enforcing backward edge control-flow
+ integrity in commit 3a20dc9 of riscv-cfi and is in publc review.
+
- const: zicntr
description:
The standard Zicntr extension for base counters and timers, as
--
2.43.2
^ permalink raw reply related
* [PATCH v3 05/29] riscv: zicfiss / zicfilp enumeration
From: Deepak Gupta @ 2024-04-03 23:34 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
This patch adds support for detecting zicfiss and zicfilp. zicfiss and
zicfilp stands for unprivleged integer spec extension for shadow stack
and branch tracking on indirect branches, respectively.
This patch looks for zicfiss and zicfilp in device tree and accordinlgy
lights up bit in cpu feature bitmap. Furthermore this patch adds detection
utility functions to return whether shadow stack or landing pads are
supported by cpu.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
arch/riscv/include/asm/cpufeature.h | 13 +++++++++++++
arch/riscv/include/asm/hwcap.h | 2 ++
arch/riscv/include/asm/processor.h | 1 +
arch/riscv/kernel/cpufeature.c | 2 ++
4 files changed, 18 insertions(+)
diff --git a/arch/riscv/include/asm/cpufeature.h b/arch/riscv/include/asm/cpufeature.h
index 0bd11862b760..f0fb8d8ae273 100644
--- a/arch/riscv/include/asm/cpufeature.h
+++ b/arch/riscv/include/asm/cpufeature.h
@@ -8,6 +8,7 @@
#include <linux/bitmap.h>
#include <linux/jump_label.h>
+#include <linux/smp.h>
#include <asm/hwcap.h>
#include <asm/alternative-macros.h>
#include <asm/errno.h>
@@ -137,4 +138,16 @@ static __always_inline bool riscv_cpu_has_extension_unlikely(int cpu, const unsi
DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key);
+static inline bool cpu_supports_shadow_stack(void)
+{
+ return (IS_ENABLED(CONFIG_RISCV_USER_CFI) &&
+ riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICFISS));
+}
+
+static inline bool cpu_supports_indirect_br_lp_instr(void)
+{
+ return (IS_ENABLED(CONFIG_RISCV_USER_CFI) &&
+ riscv_cpu_has_extension_unlikely(smp_processor_id(), RISCV_ISA_EXT_ZICFILP));
+}
+
#endif
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 1f2d2599c655..74b6c727f545 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -80,6 +80,8 @@
#define RISCV_ISA_EXT_ZFA 71
#define RISCV_ISA_EXT_ZTSO 72
#define RISCV_ISA_EXT_ZACAS 73
+#define RISCV_ISA_EXT_ZICFILP 74
+#define RISCV_ISA_EXT_ZICFISS 75
#define RISCV_ISA_EXT_XLINUXENVCFG 127
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index a8509cc31ab2..6c5b3d928b12 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -13,6 +13,7 @@
#include <vdso/processor.h>
#include <asm/ptrace.h>
+#include <asm/hwcap.h>
#ifdef CONFIG_64BIT
#define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1))
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 79a5a35fab96..d052cad5b82f 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -263,6 +263,8 @@ const struct riscv_isa_ext_data riscv_isa_ext[] = {
__RISCV_ISA_EXT_DATA(h, RISCV_ISA_EXT_h),
__RISCV_ISA_EXT_SUPERSET(zicbom, RISCV_ISA_EXT_ZICBOM, riscv_xlinuxenvcfg_exts),
__RISCV_ISA_EXT_SUPERSET(zicboz, RISCV_ISA_EXT_ZICBOZ, riscv_xlinuxenvcfg_exts),
+ __RISCV_ISA_EXT_SUPERSET(zicfilp, RISCV_ISA_EXT_ZICFILP, riscv_xlinuxenvcfg_exts),
+ __RISCV_ISA_EXT_SUPERSET(zicfiss, RISCV_ISA_EXT_ZICFISS, riscv_xlinuxenvcfg_exts),
__RISCV_ISA_EXT_DATA(zicntr, RISCV_ISA_EXT_ZICNTR),
__RISCV_ISA_EXT_DATA(zicond, RISCV_ISA_EXT_ZICOND),
__RISCV_ISA_EXT_DATA(zicsr, RISCV_ISA_EXT_ZICSR),
--
2.43.2
^ permalink raw reply related
* [PATCH v3 06/29] riscv: zicfiss / zicfilp extension csr and bit definitions
From: Deepak Gupta @ 2024-04-03 23:34 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
zicfiss and zicfilp extension gets enabled via b3 and b2 in *envcfg CSR.
menvcfg controls enabling for S/HS mode. henvcfg control enabling for VS
while senvcfg controls enabling for U/VU mode.
zicfilp extension extends *status CSR to hold `expected landing pad` bit.
A trap or interrupt can occur between an indirect jmp/call and target
instr. `expected landing pad` bit from CPU is recorded into xstatus CSR so
that when supervisor performs xret, `expected landing pad` state of CPU can
be restored.
zicfiss adds one new CSR
- CSR_SSP: CSR_SSP contains current shadow stack pointer.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
arch/riscv/include/asm/csr.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/arch/riscv/include/asm/csr.h b/arch/riscv/include/asm/csr.h
index bbd2207adb39..3bb126d1c5ff 100644
--- a/arch/riscv/include/asm/csr.h
+++ b/arch/riscv/include/asm/csr.h
@@ -18,6 +18,15 @@
#define SR_MPP _AC(0x00001800, UL) /* Previously Machine */
#define SR_SUM _AC(0x00040000, UL) /* Supervisor User Memory Access */
+/* zicfilp landing pad status bit */
+#define SR_SPELP _AC(0x00800000, UL)
+#define SR_MPELP _AC(0x020000000000, UL)
+#ifdef CONFIG_RISCV_M_MODE
+#define SR_ELP SR_MPELP
+#else
+#define SR_ELP SR_SPELP
+#endif
+
#define SR_FS _AC(0x00006000, UL) /* Floating-point Status */
#define SR_FS_OFF _AC(0x00000000, UL)
#define SR_FS_INITIAL _AC(0x00002000, UL)
@@ -196,6 +205,8 @@
#define ENVCFG_PBMTE (_AC(1, ULL) << 62)
#define ENVCFG_CBZE (_AC(1, UL) << 7)
#define ENVCFG_CBCFE (_AC(1, UL) << 6)
+#define ENVCFG_LPE (_AC(1, UL) << 2)
+#define ENVCFG_SSE (_AC(1, UL) << 3)
#define ENVCFG_CBIE_SHIFT 4
#define ENVCFG_CBIE (_AC(0x3, UL) << ENVCFG_CBIE_SHIFT)
#define ENVCFG_CBIE_ILL _AC(0x0, UL)
@@ -216,6 +227,11 @@
#define SMSTATEEN0_HSENVCFG (_ULL(1) << SMSTATEEN0_HSENVCFG_SHIFT)
#define SMSTATEEN0_SSTATEEN0_SHIFT 63
#define SMSTATEEN0_SSTATEEN0 (_ULL(1) << SMSTATEEN0_SSTATEEN0_SHIFT)
+/*
+ * zicfiss user mode csr
+ * CSR_SSP holds current shadow stack pointer.
+ */
+#define CSR_SSP 0x011
/* symbolic CSR names: */
#define CSR_CYCLE 0xc00
--
2.43.2
^ permalink raw reply related
* [PATCH v3 07/29] riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit
From: Deepak Gupta @ 2024-04-03 23:34 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
Carves out space in arch specific thread struct for cfi status and shadow
stack in usermode on riscv.
This patch does following
- defines a new structure cfi_status with status bit for cfi feature
- defines shadow stack pointer, base and size in cfi_status structure
- defines offsets to new member fields in thread in asm-offsets.c
- Saves and restore shadow stack pointer on trap entry (U --> S) and exit
(S --> U)
Shadow stack save/restore is gated on feature availiblity and implemented
using alternative. CSR can be context switched in `switch_to` as well but
soon as kernel shadow stack support gets rolled in, shadow stack pointer
will need to be switched at trap entry/exit point (much like `sp`). It can
be argued that kernel using shadow stack deployment scenario may not be as
prevalant as user mode using this feature. But even if there is some
minimal deployment of kernel shadow stack, that means that it needs to be
supported. And thus save/restore of shadow stack pointer in entry.S instead
of in `switch_to.h`.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
arch/riscv/include/asm/processor.h | 1 +
arch/riscv/include/asm/thread_info.h | 3 +++
arch/riscv/include/asm/usercfi.h | 24 ++++++++++++++++++++++++
arch/riscv/kernel/asm-offsets.c | 4 ++++
arch/riscv/kernel/entry.S | 26 ++++++++++++++++++++++++++
5 files changed, 58 insertions(+)
create mode 100644 arch/riscv/include/asm/usercfi.h
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 6c5b3d928b12..f8decf357804 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -14,6 +14,7 @@
#include <asm/ptrace.h>
#include <asm/hwcap.h>
+#include <asm/usercfi.h>
#ifdef CONFIG_64BIT
#define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1))
diff --git a/arch/riscv/include/asm/thread_info.h b/arch/riscv/include/asm/thread_info.h
index a503bdc2f6dd..f1dee307806e 100644
--- a/arch/riscv/include/asm/thread_info.h
+++ b/arch/riscv/include/asm/thread_info.h
@@ -57,6 +57,9 @@ struct thread_info {
int cpu;
unsigned long syscall_work; /* SYSCALL_WORK_ flags */
unsigned long envcfg;
+#ifdef CONFIG_RISCV_USER_CFI
+ struct cfi_status user_cfi_state;
+#endif
#ifdef CONFIG_SHADOW_CALL_STACK
void *scs_base;
void *scs_sp;
diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h
new file mode 100644
index 000000000000..4fa201b4fc4e
--- /dev/null
+++ b/arch/riscv/include/asm/usercfi.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (C) 2024 Rivos, Inc.
+ * Deepak Gupta <debug@rivosinc.com>
+ */
+#ifndef _ASM_RISCV_USERCFI_H
+#define _ASM_RISCV_USERCFI_H
+
+#ifndef __ASSEMBLY__
+#include <linux/types.h>
+
+#ifdef CONFIG_RISCV_USER_CFI
+struct cfi_status {
+ unsigned long ubcfi_en : 1; /* Enable for backward cfi. */
+ unsigned long rsvd : ((sizeof(unsigned long)*8) - 1);
+ unsigned long user_shdw_stk; /* Current user shadow stack pointer */
+ unsigned long shdw_stk_base; /* Base address of shadow stack */
+ unsigned long shdw_stk_size; /* size of shadow stack */
+};
+
+#endif /* CONFIG_RISCV_USER_CFI */
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ASM_RISCV_USERCFI_H */
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index a03129f40c46..5c5ea015c776 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -44,6 +44,10 @@ void asm_offsets(void)
#endif
OFFSET(TASK_TI_CPU_NUM, task_struct, thread_info.cpu);
+#ifdef CONFIG_RISCV_USER_CFI
+ OFFSET(TASK_TI_CFI_STATUS, task_struct, thread_info.user_cfi_state);
+ OFFSET(TASK_TI_USER_SSP, task_struct, thread_info.user_cfi_state.user_shdw_stk);
+#endif
OFFSET(TASK_THREAD_F0, task_struct, thread.fstate.f[0]);
OFFSET(TASK_THREAD_F1, task_struct, thread.fstate.f[1]);
OFFSET(TASK_THREAD_F2, task_struct, thread.fstate.f[2]);
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 9d1a305d5508..7245a0ea25c1 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -60,6 +60,20 @@ SYM_CODE_START(handle_exception)
REG_L s0, TASK_TI_USER_SP(tp)
csrrc s1, CSR_STATUS, t0
+ /*
+ * If previous mode was U, capture shadow stack pointer and save it away
+ * Zero CSR_SSP at the same time for sanitization.
+ */
+ ALTERNATIVE("nop; nop; nop; nop",
+ __stringify( \
+ andi s2, s1, SR_SPP; \
+ bnez s2, skip_ssp_save; \
+ csrrw s2, CSR_SSP, x0; \
+ REG_S s2, TASK_TI_USER_SSP(tp); \
+ skip_ssp_save:),
+ 0,
+ RISCV_ISA_EXT_ZICFISS,
+ CONFIG_RISCV_USER_CFI)
csrr s2, CSR_EPC
csrr s3, CSR_TVAL
csrr s4, CSR_CAUSE
@@ -141,6 +155,18 @@ SYM_CODE_START_NOALIGN(ret_from_exception)
* structures again.
*/
csrw CSR_SCRATCH, tp
+
+ /*
+ * Going back to U mode, restore shadow stack pointer
+ */
+ ALTERNATIVE("nop; nop",
+ __stringify( \
+ REG_L s3, TASK_TI_USER_SSP(tp); \
+ csrw CSR_SSP, s3),
+ 0,
+ RISCV_ISA_EXT_ZICFISS,
+ CONFIG_RISCV_USER_CFI)
+
1:
#ifdef CONFIG_RISCV_ISA_V_PREEMPTIVE
move a0, sp
--
2.43.2
^ permalink raw reply related
* [PATCH v3 08/29] mm: Define VM_SHADOW_STACK for RISC-V
From: Deepak Gupta @ 2024-04-03 23:34 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
VM_SHADOW_STACK is defined by x86 as vm flag to mark a shadow stack vma.
x86 uses VM_HIGH_ARCH_5 bit but that limits shadow stack vma to 64bit only.
arm64 follows same path (see links)
To keep things simple, RISC-V follows the same.
This patch adds `ss` for shadow stack in process maps.
Links:
https://lore.kernel.org/lkml/20231009-arm64-gcs-v6-12-78e55deaa4dd@kernel.org/#r
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
fs/proc/task_mmu.c | 3 +++
include/linux/mm.h | 11 ++++++++++-
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 3f78ebbb795f..d9d63eb74f0d 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -702,6 +702,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
#ifdef CONFIG_X86_USER_SHADOW_STACK
[ilog2(VM_SHADOW_STACK)] = "ss",
+#endif
+#ifdef CONFIG_RISCV_USER_CFI
+ [ilog2(VM_SHADOW_STACK)] = "ss",
#endif
};
size_t i;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f5a97dec5169..64109f6c70f5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -352,7 +352,16 @@ extern unsigned int kobjsize(const void *objp);
* for more details on the guard size.
*/
# define VM_SHADOW_STACK VM_HIGH_ARCH_5
-#else
+#endif
+
+#ifdef CONFIG_RISCV_USER_CFI
+/*
+ * RISC-V is going along with using VM_HIGH_ARCH_5 bit position for shadow stack
+ */
+#define VM_SHADOW_STACK VM_HIGH_ARCH_5
+#endif
+
+#ifndef VM_SHADOW_STACK
# define VM_SHADOW_STACK VM_NONE
#endif
--
2.43.2
^ permalink raw reply related
* [PATCH v3 09/29] mm: abstract shadow stack vma behind `vma_is_shadow_stack`
From: Deepak Gupta @ 2024-04-03 23:34 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard, Mike Rapoport
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
VM_SHADOW_STACK (alias to VM_HIGH_ARCH_5) to encode shadow stack VMA.
This patch changes checks of VM_SHADOW_STACK flag in generic code to call
to a function `vma_is_shadow_stack` which will return true if its a
shadow stack vma and default stub (when support doesnt exist) returns false.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
Suggested-by: Mike Rapoport <rppt@kernel.org>
---
include/linux/mm.h | 13 ++++++++++++-
mm/gup.c | 5 +++--
mm/internal.h | 2 +-
3 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 64109f6c70f5..9952937be659 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -363,8 +363,19 @@ extern unsigned int kobjsize(const void *objp);
#ifndef VM_SHADOW_STACK
# define VM_SHADOW_STACK VM_NONE
+
+static inline bool vma_is_shadow_stack(vm_flags_t vm_flags)
+{
+ return false;
+}
+#else
+static inline bool vma_is_shadow_stack(vm_flags_t vm_flags)
+{
+ return (vm_flags & VM_SHADOW_STACK);
+}
#endif
+
#if defined(CONFIG_X86)
# define VM_PAT VM_ARCH_1 /* PAT reserves whole VMA at once (x86) */
#elif defined(CONFIG_PPC)
@@ -3473,7 +3484,7 @@ static inline unsigned long stack_guard_start_gap(struct vm_area_struct *vma)
return stack_guard_gap;
/* See reasoning around the VM_SHADOW_STACK definition */
- if (vma->vm_flags & VM_SHADOW_STACK)
+ if (vma->vm_flags && vma_is_shadow_stack(vma->vm_flags))
return PAGE_SIZE;
return 0;
diff --git a/mm/gup.c b/mm/gup.c
index df83182ec72d..a7a02eb0a6b3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1053,7 +1053,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
!writable_file_mapping_allowed(vma, gup_flags))
return -EFAULT;
- if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) {
+ if (!(vm_flags & VM_WRITE) || vma_is_shadow_stack(vm_flags)) {
if (!(gup_flags & FOLL_FORCE))
return -EFAULT;
/* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */
@@ -1071,7 +1071,8 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
if (!is_cow_mapping(vm_flags))
return -EFAULT;
}
- } else if (!(vm_flags & VM_READ)) {
+ } else if (!(vm_flags & VM_READ) && !vma_is_shadow_stack(vm_flags)) {
+ /* reads allowed if its shadow stack vma */
if (!(gup_flags & FOLL_FORCE))
return -EFAULT;
/*
diff --git a/mm/internal.h b/mm/internal.h
index f309a010d50f..5035b5a58df0 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -572,7 +572,7 @@ static inline bool is_exec_mapping(vm_flags_t flags)
*/
static inline bool is_stack_mapping(vm_flags_t flags)
{
- return ((flags & VM_STACK) == VM_STACK) || (flags & VM_SHADOW_STACK);
+ return ((flags & VM_STACK) == VM_STACK) || vma_is_shadow_stack(flags);
}
/*
--
2.43.2
^ permalink raw reply related
* [PATCH v3 10/29] riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE
From: Deepak Gupta @ 2024-04-03 23:34 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
`arch_calc_vm_prot_bits` is implemented on risc-v to return VM_READ |
VM_WRITE if PROT_WRITE is specified. Similarly `riscv_sys_mmap` is
updated to convert all incoming PROT_WRITE to (PROT_WRITE | PROT_READ).
This is to make sure that any existing apps using PROT_WRITE still work.
Earlier `protection_map[VM_WRITE]` used to pick read-write PTE encodings.
Now `protection_map[VM_WRITE]` will always pick PAGE_SHADOWSTACK PTE
encodings for shadow stack. Above changes ensure that existing apps
continue to work because underneath kernel will be picking
`protection_map[VM_WRITE|VM_READ]` PTE encodings.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
arch/riscv/include/asm/mman.h | 24 ++++++++++++++++++++++++
arch/riscv/include/asm/pgtable.h | 1 +
arch/riscv/kernel/sys_riscv.c | 11 +++++++++++
arch/riscv/mm/init.c | 2 +-
mm/mmap.c | 1 +
5 files changed, 38 insertions(+), 1 deletion(-)
create mode 100644 arch/riscv/include/asm/mman.h
diff --git a/arch/riscv/include/asm/mman.h b/arch/riscv/include/asm/mman.h
new file mode 100644
index 000000000000..ef9fedf32546
--- /dev/null
+++ b/arch/riscv/include/asm/mman.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_MMAN_H__
+#define __ASM_MMAN_H__
+
+#include <linux/compiler.h>
+#include <linux/types.h>
+#include <uapi/asm/mman.h>
+
+static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot,
+ unsigned long pkey __always_unused)
+{
+ unsigned long ret = 0;
+
+ /*
+ * If PROT_WRITE was specified, force it to VM_READ | VM_WRITE.
+ * Only VM_WRITE means shadow stack.
+ */
+ if (prot & PROT_WRITE)
+ ret = (VM_READ | VM_WRITE);
+ return ret;
+}
+#define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey)
+
+#endif /* ! __ASM_MMAN_H__ */
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 6066822e7396..4d5983bc6766 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -184,6 +184,7 @@ extern struct pt_alloc_ops pt_ops __initdata;
#define PAGE_READ_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | _PAGE_EXEC)
#define PAGE_WRITE_EXEC __pgprot(_PAGE_BASE | _PAGE_READ | \
_PAGE_EXEC | _PAGE_WRITE)
+#define PAGE_SHADOWSTACK __pgprot(_PAGE_BASE | _PAGE_WRITE)
#define PAGE_COPY PAGE_READ
#define PAGE_COPY_EXEC PAGE_READ_EXEC
diff --git a/arch/riscv/kernel/sys_riscv.c b/arch/riscv/kernel/sys_riscv.c
index f1c1416a9f1e..846c36b1b3d5 100644
--- a/arch/riscv/kernel/sys_riscv.c
+++ b/arch/riscv/kernel/sys_riscv.c
@@ -8,6 +8,8 @@
#include <linux/syscalls.h>
#include <asm/cacheflush.h>
#include <asm-generic/mman-common.h>
+#include <vdso/vsyscall.h>
+#include <asm/mman.h>
static long riscv_sys_mmap(unsigned long addr, unsigned long len,
unsigned long prot, unsigned long flags,
@@ -17,6 +19,15 @@ static long riscv_sys_mmap(unsigned long addr, unsigned long len,
if (unlikely(offset & (~PAGE_MASK >> page_shift_offset)))
return -EINVAL;
+ /*
+ * If only PROT_WRITE is specified then extend that to PROT_READ
+ * protection_map[VM_WRITE] is now going to select shadow stack encodings.
+ * So specifying PROT_WRITE actually should select protection_map [VM_WRITE | VM_READ]
+ * If user wants to create shadow stack then they should use `map_shadow_stack` syscall.
+ */
+ if (unlikely((prot & PROT_WRITE) && !(prot & PROT_READ)))
+ prot |= PROT_READ;
+
return ksys_mmap_pgoff(addr, len, prot, flags, fd,
offset >> (PAGE_SHIFT - page_shift_offset));
}
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index fa34cf55037b..98e5ece4052a 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -299,7 +299,7 @@ pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
static const pgprot_t protection_map[16] = {
[VM_NONE] = PAGE_NONE,
[VM_READ] = PAGE_READ,
- [VM_WRITE] = PAGE_COPY,
+ [VM_WRITE] = PAGE_SHADOWSTACK,
[VM_WRITE | VM_READ] = PAGE_COPY,
[VM_EXEC] = PAGE_EXEC,
[VM_EXEC | VM_READ] = PAGE_READ_EXEC,
diff --git a/mm/mmap.c b/mm/mmap.c
index d89770eaab6b..57a974f49b00 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -47,6 +47,7 @@
#include <linux/oom.h>
#include <linux/sched/mm.h>
#include <linux/ksm.h>
+#include <linux/processor.h>
#include <linux/uaccess.h>
#include <asm/cacheflush.h>
--
2.43.2
^ permalink raw reply related
* [PATCH v3 11/29] riscv mm: manufacture shadow stack pte
From: Deepak Gupta @ 2024-04-03 23:34 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
This patch implements creating shadow stack pte (on riscv). Creating
shadow stack PTE on riscv means that clearing RWX and then setting W=1.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
arch/riscv/include/asm/pgtable.h | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 4d5983bc6766..6362407f1e83 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -408,6 +408,12 @@ static inline pte_t pte_mkwrite_novma(pte_t pte)
return __pte(pte_val(pte) | _PAGE_WRITE);
}
+static inline pte_t pte_mkwrite_shstk(pte_t pte)
+{
+ /* shadow stack on risc-v is XWR = 010. Clear everything and only set _PAGE_WRITE */
+ return __pte((pte_val(pte) & ~(_PAGE_LEAF)) | _PAGE_WRITE);
+}
+
/* static inline pte_t pte_mkexec(pte_t pte) */
static inline pte_t pte_mkdirty(pte_t pte)
@@ -693,6 +699,12 @@ static inline pmd_t pmd_mkwrite_novma(pmd_t pmd)
return pte_pmd(pte_mkwrite_novma(pmd_pte(pmd)));
}
+static inline pmd_t pmd_mkwrite_shstk(pmd_t pte)
+{
+ /* shadow stack on risc-v is XWR = 010. Clear everything and only set _PAGE_WRITE */
+ return __pmd((pmd_val(pte) & ~(_PAGE_LEAF)) | _PAGE_WRITE);
+}
+
static inline pmd_t pmd_wrprotect(pmd_t pmd)
{
return pte_pmd(pte_wrprotect(pmd_pte(pmd)));
--
2.43.2
^ permalink raw reply related
* [PATCH v3 12/29] riscv mmu: teach pte_mkwrite to manufacture shadow stack PTEs
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
pte_mkwrite creates PTEs with WRITE encodings for underlying arch.
Underlying arch can have two types of writeable mappings. One that can be
written using regular store instructions. Another one that can only be
written using specialized store instructions (like shadow stack stores).
pte_mkwrite can select write PTE encoding based on VMA range (i.e.
VM_SHADOW_STACK)
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
arch/riscv/include/asm/pgtable.h | 7 +++++++
arch/riscv/mm/pgtable.c | 21 +++++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 6362407f1e83..9b837239d3e8 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -403,6 +403,10 @@ static inline pte_t pte_wrprotect(pte_t pte)
/* static inline pte_t pte_mkread(pte_t pte) */
+struct vm_area_struct;
+pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma);
+#define pte_mkwrite pte_mkwrite
+
static inline pte_t pte_mkwrite_novma(pte_t pte)
{
return __pte(pte_val(pte) | _PAGE_WRITE);
@@ -694,6 +698,9 @@ static inline pmd_t pmd_mkyoung(pmd_t pmd)
return pte_pmd(pte_mkyoung(pmd_pte(pmd)));
}
+pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma);
+#define pmd_mkwrite pmd_mkwrite
+
static inline pmd_t pmd_mkwrite_novma(pmd_t pmd)
{
return pte_pmd(pte_mkwrite_novma(pmd_pte(pmd)));
diff --git a/arch/riscv/mm/pgtable.c b/arch/riscv/mm/pgtable.c
index ef887efcb679..c84ae2e0424d 100644
--- a/arch/riscv/mm/pgtable.c
+++ b/arch/riscv/mm/pgtable.c
@@ -142,3 +142,24 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma,
return pmd;
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
+pte_t pte_mkwrite(pte_t pte, struct vm_area_struct *vma)
+{
+ if (vma_is_shadow_stack(vma->vm_flags))
+ return pte_mkwrite_shstk(pte);
+
+ pte = pte_mkwrite_novma(pte);
+
+ return pte;
+}
+
+pmd_t pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
+{
+ if (vma_is_shadow_stack(vma->vm_flags))
+ return pmd_mkwrite_shstk(pmd);
+
+ pmd = pmd_mkwrite_novma(pmd);
+
+ return pmd;
+}
+
--
2.43.2
^ permalink raw reply related
* [PATCH v3 13/29] riscv mmu: write protect and shadow stack
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
`fork` implements copy on write (COW) by making pages readonly in child
and parent both.
ptep_set_wrprotect and pte_wrprotect clears _PAGE_WRITE in PTE.
Assumption is that page is readable and on fault copy on write happens.
To implement COW on such pages, clearing up W bit makes them XWR = 000.
This will result in wrong PTE setting which says no perms but V=1 and PFN
field pointing to final page. Instead desired behavior is to turn it into
a readable page, take an access (load/store) fault on sspush/sspop
(shadow stack) and then perform COW on such pages. This way regular reads
would still be allowed and not lead to COW maintaining current behavior
of COW on non-shadow stack but writeable memory.
On the other hand it doesn't interfere with existing COW for read-write
memory. Assumption is always that _PAGE_READ must have been set and thus
setting _PAGE_READ is harmless.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
arch/riscv/include/asm/pgtable.h | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 9b837239d3e8..7a1c2a98d272 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -398,7 +398,7 @@ static inline int pte_special(pte_t pte)
static inline pte_t pte_wrprotect(pte_t pte)
{
- return __pte(pte_val(pte) & ~(_PAGE_WRITE));
+ return __pte((pte_val(pte) & ~(_PAGE_WRITE)) | (_PAGE_READ));
}
/* static inline pte_t pte_mkread(pte_t pte) */
@@ -581,7 +581,15 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
static inline void ptep_set_wrprotect(struct mm_struct *mm,
unsigned long address, pte_t *ptep)
{
- atomic_long_and(~(unsigned long)_PAGE_WRITE, (atomic_long_t *)ptep);
+ volatile pte_t read_pte = *ptep;
+ /*
+ * ptep_set_wrprotect can be called for shadow stack ranges too.
+ * shadow stack memory is XWR = 010 and thus clearing _PAGE_WRITE will lead to
+ * encoding 000b which is wrong encoding with V = 1. This should lead to page fault
+ * but we dont want this wrong configuration to be set in page tables.
+ */
+ atomic_long_set((atomic_long_t *)ptep,
+ ((pte_val(read_pte) & ~(unsigned long)_PAGE_WRITE) | _PAGE_READ));
}
#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
--
2.43.2
^ permalink raw reply related
* [PATCH v3 14/29] riscv/mm: Implement map_shadow_stack() syscall
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
As discussed extensively in the changelog for the addition of this
syscall on x86 ("x86/shstk: Introduce map_shadow_stack syscall") the
existing mmap() and madvise() syscalls do not map entirely well onto the
security requirements for shadow stack memory since they lead to windows
where memory is allocated but not yet protected or stacks which are not
properly and safely initialised. Instead a new syscall map_shadow_stack()
has been defined which allocates and initialises a shadow stack page.
This patch implements this syscall for riscv. riscv doesn't require token
to be setup by kernel because user mode can do that by itself. However to
provide compatibility and portability with other architectues, user mode
can specify token set flag.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
arch/riscv/kernel/Makefile | 2 +
arch/riscv/kernel/usercfi.c | 149 ++++++++++++++++++++++++++++++++
include/uapi/asm-generic/mman.h | 1 +
3 files changed, 152 insertions(+)
create mode 100644 arch/riscv/kernel/usercfi.c
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 604d6bf7e476..3bec82f4e94c 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -107,3 +107,5 @@ obj-$(CONFIG_COMPAT) += compat_vdso/
obj-$(CONFIG_64BIT) += pi/
obj-$(CONFIG_ACPI) += acpi.o
+
+obj-$(CONFIG_RISCV_USER_CFI) += usercfi.o
diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c
new file mode 100644
index 000000000000..c4ed0d4e33d6
--- /dev/null
+++ b/arch/riscv/kernel/usercfi.c
@@ -0,0 +1,149 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2024 Rivos, Inc.
+ * Deepak Gupta <debug@rivosinc.com>
+ */
+
+#include <linux/sched.h>
+#include <linux/bitops.h>
+#include <linux/types.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/uaccess.h>
+#include <linux/sizes.h>
+#include <linux/user.h>
+#include <linux/syscalls.h>
+#include <linux/prctl.h>
+#include <asm/csr.h>
+#include <asm/usercfi.h>
+
+#define SHSTK_ENTRY_SIZE sizeof(void *)
+
+/*
+ * Writes on shadow stack can either be `sspush` or `ssamoswap`. `sspush` can happen
+ * implicitly on current shadow stack pointed to by CSR_SSP. `ssamoswap` takes pointer to
+ * shadow stack. To keep it simple, we plan to use `ssamoswap` to perform writes on shadow
+ * stack.
+ */
+static noinline unsigned long amo_user_shstk(unsigned long *addr, unsigned long val)
+{
+ /*
+ * Since shadow stack is supported only in 64bit configuration,
+ * ssamoswap.d is used below. CONFIG_RISCV_USER_CFI is dependent
+ * on 64BIT and compile of this file is dependent on CONFIG_RISCV_USER_CFI
+ * In case ssamoswap faults, return -1.
+ * Never expect -1 on shadow stack. Expect return addresses and zero
+ */
+ unsigned long swap = -1;
+
+ __enable_user_access();
+ asm goto(
+ ".option push\n"
+ ".option arch, +zicfiss\n"
+ "1: ssamoswap.d %[swap], %[val], %[addr]\n"
+ _ASM_EXTABLE(1b, %l[fault])
+ RISCV_ACQUIRE_BARRIER
+ ".option pop\n"
+ : [swap] "=r" (swap), [addr] "+A" (*addr)
+ : [val] "r" (val)
+ : "memory"
+ : fault
+ );
+ __disable_user_access();
+ return swap;
+fault:
+ __disable_user_access();
+ return -1;
+}
+
+/*
+ * Create a restore token on the shadow stack. A token is always XLEN wide
+ * and aligned to XLEN.
+ */
+static int create_rstor_token(unsigned long ssp, unsigned long *token_addr)
+{
+ unsigned long addr;
+
+ /* Token must be aligned */
+ if (!IS_ALIGNED(ssp, SHSTK_ENTRY_SIZE))
+ return -EINVAL;
+
+ /* On RISC-V we're constructing token to be function of address itself */
+ addr = ssp - SHSTK_ENTRY_SIZE;
+
+ if (amo_user_shstk((unsigned long __user *)addr, (unsigned long) ssp) == -1)
+ return -EFAULT;
+
+ if (token_addr)
+ *token_addr = addr;
+
+ return 0;
+}
+
+static unsigned long allocate_shadow_stack(unsigned long addr, unsigned long size,
+ unsigned long token_offset,
+ bool set_tok)
+{
+ int flags = MAP_ANONYMOUS | MAP_PRIVATE;
+ struct mm_struct *mm = current->mm;
+ unsigned long populate, tok_loc = 0;
+
+ if (addr)
+ flags |= MAP_FIXED_NOREPLACE;
+
+ mmap_write_lock(mm);
+ addr = do_mmap(NULL, addr, size, PROT_READ, flags,
+ VM_SHADOW_STACK | VM_WRITE, 0, &populate, NULL);
+ mmap_write_unlock(mm);
+
+ if (!set_tok || IS_ERR_VALUE(addr))
+ goto out;
+
+ if (create_rstor_token(addr + token_offset, &tok_loc)) {
+ vm_munmap(addr, size);
+ return -EINVAL;
+ }
+
+ addr = tok_loc;
+
+out:
+ return addr;
+}
+
+SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsigned int, flags)
+{
+ bool set_tok = flags & SHADOW_STACK_SET_TOKEN;
+ unsigned long aligned_size = 0;
+
+ if (!cpu_supports_shadow_stack())
+ return -EOPNOTSUPP;
+
+ /* Anything other than set token should result in invalid param */
+ if (flags & ~SHADOW_STACK_SET_TOKEN)
+ return -EINVAL;
+
+ /*
+ * Unlike other architectures, on RISC-V, SSP pointer is held in CSR_SSP and is available
+ * CSR in all modes. CSR accesses are performed using 12bit index programmed in instruction
+ * itself. This provides static property on register programming and writes to CSR can't
+ * be unintentional from programmer's perspective. As long as programmer has guarded areas
+ * which perform writes to CSR_SSP properly, shadow stack pivoting is not possible. Since
+ * CSR_SSP is writeable by user mode, it itself can setup a shadow stack token subsequent
+ * to allocation. Although in order to provide portablity with other architecture (because
+ * `map_shadow_stack` is arch agnostic syscall), RISC-V will follow expectation of a token
+ * flag in flags and if provided in flags, setup a token at the base.
+ */
+
+ /* If there isn't space for a token */
+ if (set_tok && size < SHSTK_ENTRY_SIZE)
+ return -ENOSPC;
+
+ if (addr && (addr % PAGE_SIZE))
+ return -EINVAL;
+
+ aligned_size = PAGE_ALIGN(size);
+ if (aligned_size < size)
+ return -EOVERFLOW;
+
+ return allocate_shadow_stack(addr, aligned_size, size, set_tok);
+}
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index 57e8195d0b53..0c0ac6214de6 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -19,4 +19,5 @@
#define MCL_FUTURE 2 /* lock all future mappings */
#define MCL_ONFAULT 4 /* lock all pages that are faulted in */
+#define SHADOW_STACK_SET_TOKEN (1ULL << 0) /* Set up a restore token in the shadow stack */
#endif /* __ASM_GENERIC_MMAN_H */
--
2.43.2
^ permalink raw reply related
* [PATCH v3 15/29] riscv/shstk: If needed allocate a new shadow stack on clone
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
Userspace specifies VM_CLONE to share address space and spawn new thread.
`clone` allow userspace to specify a new stack for new thread. However
there is no way to specify new shadow stack base address without changing
API. This patch allocates a new shadow stack whenever VM_CLONE is given.
In case of VM_FORK, parent is suspended until child finishes and thus can
child use parent shadow stack. In case of !VM_CLONE, COW kicks in because
entire address space is copied from parent to child.
`clone3` is extensible and can provide mechanisms using which shadow stack
as an input parameter can be provided. This is not settled yet and being
extensively discussed on mailing list. Once that's settled, this commit
will adapt to that.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
arch/riscv/include/asm/usercfi.h | 39 ++++++++++
arch/riscv/kernel/process.c | 12 ++-
arch/riscv/kernel/usercfi.c | 121 +++++++++++++++++++++++++++++++
3 files changed, 171 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h
index 4fa201b4fc4e..b47574a7a8c9 100644
--- a/arch/riscv/include/asm/usercfi.h
+++ b/arch/riscv/include/asm/usercfi.h
@@ -8,6 +8,9 @@
#ifndef __ASSEMBLY__
#include <linux/types.h>
+struct task_struct;
+struct kernel_clone_args;
+
#ifdef CONFIG_RISCV_USER_CFI
struct cfi_status {
unsigned long ubcfi_en : 1; /* Enable for backward cfi. */
@@ -17,6 +20,42 @@ struct cfi_status {
unsigned long shdw_stk_size; /* size of shadow stack */
};
+unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
+ const struct kernel_clone_args *args);
+void shstk_release(struct task_struct *tsk);
+void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size);
+void set_active_shstk(struct task_struct *task, unsigned long shstk_addr);
+bool is_shstk_enabled(struct task_struct *task);
+
+#else
+
+static inline unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
+ const struct kernel_clone_args *args)
+{
+ return 0;
+}
+
+static inline void shstk_release(struct task_struct *tsk)
+{
+
+}
+
+static inline void set_shstk_base(struct task_struct *task, unsigned long shstk_addr,
+ unsigned long size)
+{
+
+}
+
+static inline void set_active_shstk(struct task_struct *task, unsigned long shstk_addr)
+{
+
+}
+
+static inline bool is_shstk_enabled(struct task_struct *task)
+{
+ return false;
+}
+
#endif /* CONFIG_RISCV_USER_CFI */
#endif /* __ASSEMBLY__ */
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index ce577cdc2af3..ef48a25b0eff 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -26,6 +26,7 @@
#include <asm/cpuidle.h>
#include <asm/vector.h>
#include <asm/cpufeature.h>
+#include <asm/usercfi.h>
register unsigned long gp_in_global __asm__("gp");
@@ -202,7 +203,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src)
void exit_thread(struct task_struct *tsk)
{
-
+ if (IS_ENABLED(CONFIG_RISCV_USER_CFI))
+ shstk_release(tsk);
}
int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
@@ -210,6 +212,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
unsigned long clone_flags = args->flags;
unsigned long usp = args->stack;
unsigned long tls = args->tls;
+ unsigned long ssp = 0;
struct pt_regs *childregs = task_pt_regs(p);
memset(&p->thread.s, 0, sizeof(p->thread.s));
@@ -225,11 +228,18 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
p->thread.s[0] = (unsigned long)args->fn;
p->thread.s[1] = (unsigned long)args->fn_arg;
} else {
+ /* allocate new shadow stack if needed. In case of CLONE_VM we have to */
+ ssp = shstk_alloc_thread_stack(p, args);
+ if (IS_ERR_VALUE(ssp))
+ return PTR_ERR((void *)ssp);
+
*childregs = *(current_pt_regs());
/* Turn off status.VS */
riscv_v_vstate_off(childregs);
if (usp) /* User fork */
childregs->sp = usp;
+ if (ssp) /* if needed, set new ssp */
+ set_active_shstk(p, ssp);
if (clone_flags & CLONE_SETTLS)
childregs->tp = tls;
childregs->a0 = 0; /* Return value of fork() */
diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c
index c4ed0d4e33d6..11ef7ab925c9 100644
--- a/arch/riscv/kernel/usercfi.c
+++ b/arch/riscv/kernel/usercfi.c
@@ -19,6 +19,41 @@
#define SHSTK_ENTRY_SIZE sizeof(void *)
+bool is_shstk_enabled(struct task_struct *task)
+{
+ return task->thread_info.user_cfi_state.ubcfi_en ? true : false;
+}
+
+void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size)
+{
+ task->thread_info.user_cfi_state.shdw_stk_base = shstk_addr;
+ task->thread_info.user_cfi_state.shdw_stk_size = size;
+}
+
+unsigned long get_shstk_base(struct task_struct *task, unsigned long *size)
+{
+ if (size)
+ *size = task->thread_info.user_cfi_state.shdw_stk_size;
+ return task->thread_info.user_cfi_state.shdw_stk_base;
+}
+
+void set_active_shstk(struct task_struct *task, unsigned long shstk_addr)
+{
+ task->thread_info.user_cfi_state.user_shdw_stk = shstk_addr;
+}
+
+/*
+ * If size is 0, then to be compatible with regular stack we want it to be as big as
+ * regular stack. Else PAGE_ALIGN it and return back
+ */
+static unsigned long calc_shstk_size(unsigned long size)
+{
+ if (size)
+ return PAGE_ALIGN(size);
+
+ return PAGE_ALIGN(min_t(unsigned long long, rlimit(RLIMIT_STACK), SZ_4G));
+}
+
/*
* Writes on shadow stack can either be `sspush` or `ssamoswap`. `sspush` can happen
* implicitly on current shadow stack pointed to by CSR_SSP. `ssamoswap` takes pointer to
@@ -147,3 +182,89 @@ SYSCALL_DEFINE3(map_shadow_stack, unsigned long, addr, unsigned long, size, unsi
return allocate_shadow_stack(addr, aligned_size, size, set_tok);
}
+
+/*
+ * This gets called during clone/clone3/fork. And is needed to allocate a shadow stack for
+ * cases where CLONE_VM is specified and thus a different stack is specified by user. We
+ * thus need a separate shadow stack too. How does separate shadow stack is specified by
+ * user is still being debated. Once that's settled, remove this part of the comment.
+ * This function simply returns 0 if shadow stack are not supported or if separate shadow
+ * stack allocation is not needed (like in case of !CLONE_VM)
+ */
+unsigned long shstk_alloc_thread_stack(struct task_struct *tsk,
+ const struct kernel_clone_args *args)
+{
+ unsigned long addr, size;
+
+ /* If shadow stack is not supported, return 0 */
+ if (!cpu_supports_shadow_stack())
+ return 0;
+
+ /*
+ * If shadow stack is not enabled on the new thread, skip any
+ * switch to a new shadow stack.
+ */
+ if (is_shstk_enabled(tsk))
+ return 0;
+
+ /*
+ * For CLONE_VFORK the child will share the parents shadow stack.
+ * Set base = 0 and size = 0, this is special means to track this state
+ * so the freeing logic run for child knows to leave it alone.
+ */
+ if (args->flags & CLONE_VFORK) {
+ set_shstk_base(tsk, 0, 0);
+ return 0;
+ }
+
+ /*
+ * For !CLONE_VM the child will use a copy of the parents shadow
+ * stack.
+ */
+ if (!(args->flags & CLONE_VM))
+ return 0;
+
+ /*
+ * reaching here means, CLONE_VM was specified and thus a separate shadow
+ * stack is needed for new cloned thread. Note: below allocation is happening
+ * using current mm.
+ */
+ size = calc_shstk_size(args->stack_size);
+ addr = allocate_shadow_stack(0, size, 0, false);
+ if (IS_ERR_VALUE(addr))
+ return addr;
+
+ set_shstk_base(tsk, addr, size);
+
+ return addr + size;
+}
+
+void shstk_release(struct task_struct *tsk)
+{
+ unsigned long base = 0, size = 0;
+ /* If shadow stack is not supported or not enabled, nothing to release */
+ if (!cpu_supports_shadow_stack() ||
+ !is_shstk_enabled(tsk))
+ return;
+
+ /*
+ * When fork() with CLONE_VM fails, the child (tsk) already has a
+ * shadow stack allocated, and exit_thread() calls this function to
+ * free it. In this case the parent (current) and the child share
+ * the same mm struct. Move forward only when they're same.
+ */
+ if (!tsk->mm || tsk->mm != current->mm)
+ return;
+
+ /*
+ * We know shadow stack is enabled but if base is NULL, then
+ * this task is not managing its own shadow stack (CLONE_VFORK). So
+ * skip freeing it.
+ */
+ base = get_shstk_base(tsk, &size);
+ if (!base)
+ return;
+
+ vm_munmap(base, size);
+ set_shstk_base(tsk, 0, 0);
+}
--
2.43.2
^ permalink raw reply related
* [PATCH v3 16/29] prctl: arch-agnostic prctl for shadow stack
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
From: Mark Brown <broonie@kernel.org>
Three architectures (x86, aarch64, riscv) have announced support for
shadow stacks with fairly similar functionality. While x86 is using
arch_prctl() to control the functionality neither arm64 nor riscv uses
that interface so this patch adds arch-agnostic prctl() support to
get and set status of shadow stacks and lock the current configuration to
prevent further changes, with support for turning on and off individual
subfeatures so applications can limit their exposure to features that
they do not need. The features are:
- PR_SHADOW_STACK_ENABLE: Tracking and enforcement of shadow stacks,
including allocation of a shadow stack if one is not already
allocated.
- PR_SHADOW_STACK_WRITE: Writes to specific addresses in the shadow
stack.
- PR_SHADOW_STACK_PUSH: Push additional values onto the shadow stack.
- PR_SHADOW_STACK_DISABLE: Allow to disable shadow stack.
Note once locked, disable must fail.
These features are expected to be inherited by new threads and cleared
on exec(), unknown features should be rejected for enable but accepted
for locking (in order to allow for future proofing).
This is based on a patch originally written by Deepak Gupta but later
modified by Mark Brown for arm's GCS patch series.
Signed-off-by: Mark Brown <broonie@kernel.org>
Co-developed-by: Deepak Gupta <debug@rivosinc.com>
---
include/linux/mm.h | 3 +++
include/uapi/linux/prctl.h | 22 ++++++++++++++++++++++
kernel/sys.c | 30 ++++++++++++++++++++++++++++++
3 files changed, 55 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9952937be659..1d08e1fd2f6a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4201,5 +4201,8 @@ static inline bool pfn_is_unaccepted_memory(unsigned long pfn)
return range_contains_unaccepted_memory(paddr, paddr + PAGE_SIZE);
}
+int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status);
+int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status);
+int arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status);
#endif /* _LINUX_MM_H */
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 370ed14b1ae0..3c66ed8f46d8 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -306,4 +306,26 @@ struct prctl_mm_map {
# define PR_RISCV_V_VSTATE_CTRL_NEXT_MASK 0xc
# define PR_RISCV_V_VSTATE_CTRL_MASK 0x1f
+/*
+ * Get the current shadow stack configuration for the current thread,
+ * this will be the value configured via PR_SET_SHADOW_STACK_STATUS.
+ */
+#define PR_GET_SHADOW_STACK_STATUS 71
+
+/*
+ * Set the current shadow stack configuration. Enabling the shadow
+ * stack will cause a shadow stack to be allocated for the thread.
+ */
+#define PR_SET_SHADOW_STACK_STATUS 72
+# define PR_SHADOW_STACK_ENABLE (1UL << 0)
+# define PR_SHADOW_STACK_WRITE (1UL << 1)
+# define PR_SHADOW_STACK_PUSH (1UL << 2)
+
+/*
+ * Prevent further changes to the specified shadow stack
+ * configuration. All bits may be locked via this call, including
+ * undefined bits.
+ */
+#define PR_LOCK_SHADOW_STACK_STATUS 73
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index f8e543f1e38a..242e9f147791 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2315,6 +2315,21 @@ int __weak arch_prctl_spec_ctrl_set(struct task_struct *t, unsigned long which,
return -EINVAL;
}
+int __weak arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status)
+{
+ return -EINVAL;
+}
+
+int __weak arch_set_shadow_stack_status(struct task_struct *t, unsigned long status)
+{
+ return -EINVAL;
+}
+
+int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long status)
+{
+ return -EINVAL;
+}
+
#define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
#ifdef CONFIG_ANON_VMA_NAME
@@ -2757,6 +2772,21 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
case PR_RISCV_V_GET_CONTROL:
error = RISCV_V_GET_CONTROL();
break;
+ case PR_GET_SHADOW_STACK_STATUS:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = arch_get_shadow_stack_status(me, (unsigned long __user *) arg2);
+ break;
+ case PR_SET_SHADOW_STACK_STATUS:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = arch_set_shadow_stack_status(me, arg2);
+ break;
+ case PR_LOCK_SHADOW_STACK_STATUS:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = arch_lock_shadow_stack_status(me, arg2);
+ break;
default:
error = -EINVAL;
break;
--
2.43.2
^ permalink raw reply related
* [PATCH v3 17/29] prctl: arch-agnostic prctl for indirect branch tracking
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
Three architectures (x86, aarch64, riscv) have support for indirect branch
tracking feature in a very similar fashion. On a very high level, indirect
branch tracking is a CPU feature where CPU tracks branches which uses
memory operand to perform control transfer in program. As part of this
tracking on indirect branches, CPU goes in a state where it expects a
landing pad instr on target and if not found then CPU raises some fault
(architecture dependent)
x86 landing pad instr - `ENDBRANCH`
aarch64 landing pad instr - `BTI`
riscv landing instr - `lpad`
Given that three major arches have support for indirect branch tracking,
This patch makes `prctl` for indirect branch tracking arch agnostic.
To allow userspace to enable this feature for itself, following prtcls are
defined:
- PR_GET_INDIR_BR_LP_STATUS: Gets current configured status for indirect
branch tracking.
- PR_SET_INDIR_BR_LP_STATUS: Sets a configuration for indirect branch
tracking.
Following status options are allowed
- PR_INDIR_BR_LP_ENABLE: Enables indirect branch tracking on user
thread.
- PR_INDIR_BR_LP_DISABLE; Disables indirect branch tracking on user
thread.
- PR_LOCK_INDIR_BR_LP_STATUS: Locks configured status for indirect branch
tracking for user thread.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
include/uapi/linux/prctl.h | 27 +++++++++++++++++++++++++++
kernel/sys.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 57 insertions(+)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 3c66ed8f46d8..b7a8212a068e 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -328,4 +328,31 @@ struct prctl_mm_map {
*/
#define PR_LOCK_SHADOW_STACK_STATUS 73
+/*
+ * Get the current indirect branch tracking configuration for the current
+ * thread, this will be the value configured via PR_SET_INDIR_BR_LP_STATUS.
+ */
+#define PR_GET_INDIR_BR_LP_STATUS 74
+
+/*
+ * Set the indirect branch tracking configuration. PR_INDIR_BR_LP_ENABLE will
+ * enable cpu feature for user thread, to track all indirect branches and ensure
+ * they land on arch defined landing pad instruction.
+ * x86 - If enabled, an indirect branch must land on `ENDBRANCH` instruction.
+ * arch64 - If enabled, an indirect branch must land on `BTI` instruction.
+ * riscv - If enabled, an indirect branch must land on `lpad` instruction.
+ * PR_INDIR_BR_LP_DISABLE will disable feature for user thread and indirect
+ * branches will no more be tracked by cpu to land on arch defined landing pad
+ * instruction.
+ */
+#define PR_SET_INDIR_BR_LP_STATUS 75
+# define PR_INDIR_BR_LP_ENABLE (1UL << 0)
+
+/*
+ * Prevent further changes to the specified indirect branch tracking
+ * configuration. All bits may be locked via this call, including
+ * undefined bits.
+ */
+#define PR_LOCK_INDIR_BR_LP_STATUS 76
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 242e9f147791..c770060c3f06 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2330,6 +2330,21 @@ int __weak arch_lock_shadow_stack_status(struct task_struct *t, unsigned long st
return -EINVAL;
}
+int __weak arch_get_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
+{
+ return -EINVAL;
+}
+
+int __weak arch_set_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
+{
+ return -EINVAL;
+}
+
+int __weak arch_lock_indir_br_lp_status(struct task_struct *t, unsigned long __user *status)
+{
+ return -EINVAL;
+}
+
#define PR_IO_FLUSHER (PF_MEMALLOC_NOIO | PF_LOCAL_THROTTLE)
#ifdef CONFIG_ANON_VMA_NAME
@@ -2787,6 +2802,21 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
return -EINVAL;
error = arch_lock_shadow_stack_status(me, arg2);
break;
+ case PR_GET_INDIR_BR_LP_STATUS:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = arch_get_indir_br_lp_status(me, (unsigned long __user *) arg2);
+ break;
+ case PR_SET_INDIR_BR_LP_STATUS:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = arch_set_indir_br_lp_status(me, (unsigned long __user *) arg2);
+ break;
+ case PR_LOCK_INDIR_BR_LP_STATUS:
+ if (arg3 || arg4 || arg5)
+ return -EINVAL;
+ error = arch_lock_indir_br_lp_status(me, (unsigned long __user *) arg2);
+ break;
default:
error = -EINVAL;
break;
--
2.43.2
^ permalink raw reply related
* [PATCH v3 18/29] riscv: Implements arch agnostic shadow stack prctls
From: Deepak Gupta @ 2024-04-03 23:35 UTC (permalink / raw)
To: paul.walmsley, rick.p.edgecombe, broonie, Szabolcs.Nagy,
kito.cheng, keescook, ajones, conor.dooley, cleger, atishp, alex,
bjorn, alexghiti, samuel.holland, conor
Cc: linux-doc, linux-riscv, linux-kernel, devicetree, linux-mm,
linux-arch, linux-kselftest, corbet, palmer, aou, robh+dt,
krzysztof.kozlowski+dt, oleg, akpm, arnd, ebiederm, Liam.Howlett,
vbabka, lstoakes, shuah, brauner, debug, andy.chiu, jerry.shih,
hankuan.chen, greentime.hu, evan, xiao.w.wang, charlie, apatel,
mchitale, dbarboza, sameo, shikemeng, willy, vincent.chen, guoren,
samitolvanen, songshuaishuai, gerg, heiko, bhe, jeeheng.sia, cyy,
maskray, ancientmodern4, mathis.salmen, cuiyunhui, bgray, mpe,
baruch, alx, david, catalin.marinas, revest, josh, shr, deller,
omosnace, ojeda, jhubbard
In-Reply-To: <20240403234054.2020347-1-debug@rivosinc.com>
Implement architecture agnostic prctls() interface for setting and getting
shadow stack status.
prctls implemented are PR_GET_SHADOW_STACK_STATUS,
PR_SET_SHADOW_STACK_STATUS and PR_LOCK_SHADOW_STACK_STATUS.
As part of PR_SET_SHADOW_STACK_STATUS/PR_GET_SHADOW_STACK_STATUS, only
PR_SHADOW_STACK_ENABLE is implemented because RISCV allows each mode to
write to their own shadow stack using `sspush` or `ssamoswap`.
PR_LOCK_SHADOW_STACK_STATUS locks current configuration of shadow stack
enabling.
Signed-off-by: Deepak Gupta <debug@rivosinc.com>
---
arch/riscv/include/asm/usercfi.h | 18 +++++-
arch/riscv/kernel/process.c | 8 +++
arch/riscv/kernel/usercfi.c | 107 +++++++++++++++++++++++++++++++
3 files changed, 132 insertions(+), 1 deletion(-)
diff --git a/arch/riscv/include/asm/usercfi.h b/arch/riscv/include/asm/usercfi.h
index b47574a7a8c9..a168ae0fa5d8 100644
--- a/arch/riscv/include/asm/usercfi.h
+++ b/arch/riscv/include/asm/usercfi.h
@@ -7,6 +7,7 @@
#ifndef __ASSEMBLY__
#include <linux/types.h>
+#include <linux/prctl.h>
struct task_struct;
struct kernel_clone_args;
@@ -14,7 +15,8 @@ struct kernel_clone_args;
#ifdef CONFIG_RISCV_USER_CFI
struct cfi_status {
unsigned long ubcfi_en : 1; /* Enable for backward cfi. */
- unsigned long rsvd : ((sizeof(unsigned long)*8) - 1);
+ unsigned long ubcfi_locked : 1;
+ unsigned long rsvd : ((sizeof(unsigned long)*8) - 2);
unsigned long user_shdw_stk; /* Current user shadow stack pointer */
unsigned long shdw_stk_base; /* Base address of shadow stack */
unsigned long shdw_stk_size; /* size of shadow stack */
@@ -26,6 +28,10 @@ void shstk_release(struct task_struct *tsk);
void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size);
void set_active_shstk(struct task_struct *task, unsigned long shstk_addr);
bool is_shstk_enabled(struct task_struct *task);
+bool is_shstk_locked(struct task_struct *task);
+void set_shstk_status(struct task_struct *task, bool enable);
+
+#define PR_SHADOW_STACK_SUPPORTED_STATUS_MASK (PR_SHADOW_STACK_ENABLE)
#else
@@ -56,6 +62,16 @@ static inline bool is_shstk_enabled(struct task_struct *task)
return false;
}
+static inline bool is_shstk_locked(struct task_struct *task)
+{
+ return false;
+}
+
+static inline void set_shstk_status(struct task_struct *task, bool enable)
+{
+
+}
+
#endif /* CONFIG_RISCV_USER_CFI */
#endif /* __ASSEMBLY__ */
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index ef48a25b0eff..3fb8b23f629b 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -145,6 +145,14 @@ void start_thread(struct pt_regs *regs, unsigned long pc,
regs->epc = pc;
regs->sp = sp;
+ /*
+ * clear shadow stack state on exec.
+ * libc will set it later via prctl.
+ */
+ set_shstk_status(current, false);
+ set_shstk_base(current, 0, 0);
+ set_active_shstk(current, 0);
+
#ifdef CONFIG_64BIT
regs->status &= ~SR_UXL;
diff --git a/arch/riscv/kernel/usercfi.c b/arch/riscv/kernel/usercfi.c
index 11ef7ab925c9..cdedf1f78b3e 100644
--- a/arch/riscv/kernel/usercfi.c
+++ b/arch/riscv/kernel/usercfi.c
@@ -24,6 +24,16 @@ bool is_shstk_enabled(struct task_struct *task)
return task->thread_info.user_cfi_state.ubcfi_en ? true : false;
}
+bool is_shstk_allocated(struct task_struct *task)
+{
+ return task->thread_info.user_cfi_state.shdw_stk_base ? true : false;
+}
+
+bool is_shstk_locked(struct task_struct *task)
+{
+ return task->thread_info.user_cfi_state.ubcfi_locked ? true : false;
+}
+
void set_shstk_base(struct task_struct *task, unsigned long shstk_addr, unsigned long size)
{
task->thread_info.user_cfi_state.shdw_stk_base = shstk_addr;
@@ -42,6 +52,23 @@ void set_active_shstk(struct task_struct *task, unsigned long shstk_addr)
task->thread_info.user_cfi_state.user_shdw_stk = shstk_addr;
}
+void set_shstk_status(struct task_struct *task, bool enable)
+{
+ task->thread_info.user_cfi_state.ubcfi_en = enable ? 1 : 0;
+
+ if (enable)
+ task->thread_info.envcfg |= ENVCFG_SSE;
+ else
+ task->thread_info.envcfg &= ~ENVCFG_SSE;
+
+ csr_write(CSR_ENVCFG, task->thread_info.envcfg);
+}
+
+void set_shstk_lock(struct task_struct *task)
+{
+ task->thread_info.user_cfi_state.ubcfi_locked = 1;
+}
+
/*
* If size is 0, then to be compatible with regular stack we want it to be as big as
* regular stack. Else PAGE_ALIGN it and return back
@@ -268,3 +295,83 @@ void shstk_release(struct task_struct *tsk)
vm_munmap(base, size);
set_shstk_base(tsk, 0, 0);
}
+
+int arch_get_shadow_stack_status(struct task_struct *t, unsigned long __user *status)
+{
+ unsigned long bcfi_status = 0;
+
+ if (!cpu_supports_shadow_stack())
+ return -EINVAL;
+
+ /* this means shadow stack is enabled on the task */
+ bcfi_status |= (is_shstk_enabled(t) ? PR_SHADOW_STACK_ENABLE : 0);
+
+ return copy_to_user(status, &bcfi_status, sizeof(bcfi_status)) ? -EFAULT : 0;
+}
+
+int arch_set_shadow_stack_status(struct task_struct *t, unsigned long status)
+{
+ unsigned long size = 0, addr = 0;
+ bool enable_shstk = false;
+
+ if (!cpu_supports_shadow_stack())
+ return -EINVAL;
+
+ /* Reject unknown flags */
+ if (status & ~PR_SHADOW_STACK_SUPPORTED_STATUS_MASK)
+ return -EINVAL;
+
+ /* bcfi status is locked and further can't be modified by user */
+ if (is_shstk_locked(t))
+ return -EINVAL;
+
+ enable_shstk = status & PR_SHADOW_STACK_ENABLE;
+ /* Request is to enable shadow stack and shadow stack is not enabled already */
+ if (enable_shstk && !is_shstk_enabled(t)) {
+ /* shadow stack was allocated and enable request again
+ * no need to support such usecase and return EINVAL.
+ */
+ if (is_shstk_allocated(t))
+ return -EINVAL;
+
+ size = calc_shstk_size(0);
+ addr = allocate_shadow_stack(0, size, 0, false);
+ if (IS_ERR_VALUE(addr))
+ return -ENOMEM;
+ set_shstk_base(t, addr, size);
+ set_active_shstk(t, addr + size);
+ }
+
+ /*
+ * If a request to disable shadow stack happens, let's go ahead and release it
+ * Although, if CLONE_VFORKed child did this, then in that case we will end up
+ * not releasing the shadow stack (because it might be needed in parent). Although
+ * we will disable it for VFORKed child. And if VFORKed child tries to enable again
+ * then in that case, it'll get entirely new shadow stack because following condition
+ * are true
+ * - shadow stack was not enabled for vforked child
+ * - shadow stack base was anyways pointing to 0
+ * This shouldn't be a big issue because we want parent to have availability of shadow
+ * stack whenever VFORKed child releases resources via exit or exec but at the same
+ * time we want VFORKed child to break away and establish new shadow stack if it desires
+ *
+ */
+ if (!enable_shstk)
+ shstk_release(t);
+
+ set_shstk_status(t, enable_shstk);
+ return 0;
+}
+
+int arch_lock_shadow_stack_status(struct task_struct *task,
+ unsigned long arg)
+{
+ /* If shtstk not supported or not enabled on task, nothing to lock here */
+ if (!cpu_supports_shadow_stack() ||
+ !is_shstk_enabled(task))
+ return -EINVAL;
+
+ set_shstk_lock(task);
+
+ return 0;
+}
--
2.43.2
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox