* Re: [PATCH v4 12/63] Documentation: ACPI: move i2c-muxes.txt to firmware-guide/acpi and convert to reST
From: Mauro Carvalho Chehab @ 2019-04-23 21:09 UTC (permalink / raw)
To: Changbin Du
Cc: fenghua.yu, x86, linux-doc, linux-pci, linux-gpio,
Jonathan Corbet, rjw, linux-kernel, linux-acpi, mingo,
Bjorn Helgaas, tglx, linuxppc-dev
In-Reply-To: <20190423162932.21428-13-changbin.du@gmail.com>
Em Wed, 24 Apr 2019 00:28:41 +0800
Changbin Du <changbin.du@gmail.com> escreveu:
> This converts the plain text documentation to reStructuredText format and
> add it to Sphinx TOC tree. No essential content change.
>
> Signed-off-by: Changbin Du <changbin.du@gmail.com>
For the conversion itself:
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
> ---
> Documentation/acpi/i2c-muxes.txt | 58 ------------------
> .../firmware-guide/acpi/i2c-muxes.rst | 61 +++++++++++++++++++
> Documentation/firmware-guide/acpi/index.rst | 3 +-
> 3 files changed, 63 insertions(+), 59 deletions(-)
> delete mode 100644 Documentation/acpi/i2c-muxes.txt
> create mode 100644 Documentation/firmware-guide/acpi/i2c-muxes.rst
>
> diff --git a/Documentation/acpi/i2c-muxes.txt b/Documentation/acpi/i2c-muxes.txt
> deleted file mode 100644
> index 9fcc4f0b885e..000000000000
> --- a/Documentation/acpi/i2c-muxes.txt
> +++ /dev/null
> @@ -1,58 +0,0 @@
> -ACPI I2C Muxes
> ---------------
> -
> -Describing an I2C device hierarchy that includes I2C muxes requires an ACPI
> -Device () scope per mux channel.
> -
> -Consider this topology:
> -
> -+------+ +------+
> -| SMB1 |-->| MUX0 |--CH00--> i2c client A (0x50)
> -| | | 0x70 |--CH01--> i2c client B (0x50)
> -+------+ +------+
> -
> -which corresponds to the following ASL:
> -
> -Device (SMB1)
> -{
> - Name (_HID, ...)
> - Device (MUX0)
> - {
> - Name (_HID, ...)
> - Name (_CRS, ResourceTemplate () {
> - I2cSerialBus (0x70, ControllerInitiated, I2C_SPEED,
> - AddressingMode7Bit, "^SMB1", 0x00,
> - ResourceConsumer,,)
> - }
> -
> - Device (CH00)
> - {
> - Name (_ADR, 0)
> -
> - Device (CLIA)
> - {
> - Name (_HID, ...)
> - Name (_CRS, ResourceTemplate () {
> - I2cSerialBus (0x50, ControllerInitiated, I2C_SPEED,
> - AddressingMode7Bit, "^CH00", 0x00,
> - ResourceConsumer,,)
> - }
> - }
> - }
> -
> - Device (CH01)
> - {
> - Name (_ADR, 1)
> -
> - Device (CLIB)
> - {
> - Name (_HID, ...)
> - Name (_CRS, ResourceTemplate () {
> - I2cSerialBus (0x50, ControllerInitiated, I2C_SPEED,
> - AddressingMode7Bit, "^CH01", 0x00,
> - ResourceConsumer,,)
> - }
> - }
> - }
> - }
> -}
> diff --git a/Documentation/firmware-guide/acpi/i2c-muxes.rst b/Documentation/firmware-guide/acpi/i2c-muxes.rst
> new file mode 100644
> index 000000000000..3a8997ccd7c4
> --- /dev/null
> +++ b/Documentation/firmware-guide/acpi/i2c-muxes.rst
> @@ -0,0 +1,61 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +==============
> +ACPI I2C Muxes
> +==============
> +
> +Describing an I2C device hierarchy that includes I2C muxes requires an ACPI
> +Device () scope per mux channel.
> +
> +Consider this topology::
> +
> + +------+ +------+
> + | SMB1 |-->| MUX0 |--CH00--> i2c client A (0x50)
> + | | | 0x70 |--CH01--> i2c client B (0x50)
> + +------+ +------+
> +
> +which corresponds to the following ASL::
> +
> + Device (SMB1)
> + {
> + Name (_HID, ...)
> + Device (MUX0)
> + {
> + Name (_HID, ...)
> + Name (_CRS, ResourceTemplate () {
> + I2cSerialBus (0x70, ControllerInitiated, I2C_SPEED,
> + AddressingMode7Bit, "^SMB1", 0x00,
> + ResourceConsumer,,)
> + }
> +
> + Device (CH00)
> + {
> + Name (_ADR, 0)
> +
> + Device (CLIA)
> + {
> + Name (_HID, ...)
> + Name (_CRS, ResourceTemplate () {
> + I2cSerialBus (0x50, ControllerInitiated, I2C_SPEED,
> + AddressingMode7Bit, "^CH00", 0x00,
> + ResourceConsumer,,)
> + }
> + }
> + }
> +
> + Device (CH01)
> + {
> + Name (_ADR, 1)
> +
> + Device (CLIB)
> + {
> + Name (_HID, ...)
> + Name (_CRS, ResourceTemplate () {
> + I2cSerialBus (0x50, ControllerInitiated, I2C_SPEED,
> + AddressingMode7Bit, "^CH01", 0x00,
> + ResourceConsumer,,)
> + }
> + }
> + }
> + }
> + }
> diff --git a/Documentation/firmware-guide/acpi/index.rst b/Documentation/firmware-guide/acpi/index.rst
> index d1d069b26bbc..1c89888f6ee8 100644
> --- a/Documentation/firmware-guide/acpi/index.rst
> +++ b/Documentation/firmware-guide/acpi/index.rst
> @@ -12,4 +12,5 @@ ACPI Support
> osi
> method-customizing
> DSD-properties-rules
> - gpio-properties
> \ No newline at end of file
> + gpio-properties
> + i2c-muxes
Thanks,
Mauro
^ permalink raw reply
* [PATCH] powerpc: vdso: drop unnecessary cc-ldoption
From: Nick Desaulniers @ 2019-04-23 21:11 UTC (permalink / raw)
To: mpe, benh, paulus
Cc: Masahiro Yamada, Nick Desaulniers, linux-kernel, Nicholas Piggin,
clang-built-linux, Andrew Donnellan, linuxppc-dev, Dmitry Vyukov
Towards the goal of removing cc-ldoption, it seems that --hash-style=
was added to binutils 2.17.50.0.2 in 2006. The minimal required version
of binutils for the kernel according to
Documentation/process/changes.rst is 2.20.
Link: https://gcc.gnu.org/ml/gcc/2007-01/msg01141.html
Cc: clang-built-linux@googlegroups.com
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
arch/powerpc/kernel/vdso32/Makefile | 5 ++---
arch/powerpc/kernel/vdso64/Makefile | 5 ++---
2 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/kernel/vdso32/Makefile b/arch/powerpc/kernel/vdso32/Makefile
index ce199f6e4256..06f54d947057 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -26,9 +26,8 @@ GCOV_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
-ccflags-y := -shared -fno-common -fno-builtin
-ccflags-y += -nostdlib -Wl,-soname=linux-vdso32.so.1 \
- $(call cc-ldoption, -Wl$(comma)--hash-style=both)
+ccflags-y := -shared -fno-common -fno-builtin -nostdlib \
+ -Wl,-soname=linux-vdso32.so.1 -Wl,--hash-style=both
asflags-y := -D__VDSO32__ -s
obj-y += vdso32_wrapper.o
diff --git a/arch/powerpc/kernel/vdso64/Makefile b/arch/powerpc/kernel/vdso64/Makefile
index 28e7d112aa2f..32ebb3522ea1 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -12,9 +12,8 @@ GCOV_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
-ccflags-y := -shared -fno-common -fno-builtin
-ccflags-y += -nostdlib -Wl,-soname=linux-vdso64.so.1 \
- $(call cc-ldoption, -Wl$(comma)--hash-style=both)
+ccflags-y := -shared -fno-common -fno-builtin -nostdlib \
+ -Wl,-soname=linux-vdso64.so.1 -Wl,--hash-style=both
asflags-y := -D__VDSO64__ -s
obj-y += vdso64_wrapper.o
--
2.21.0.593.g511ec345e18-goog
^ permalink raw reply related
* Re: [PATCH v4 13/63] Documentation: ACPI: move acpi-lid.txt to firmware-guide/acpi and convert to reST
From: Mauro Carvalho Chehab @ 2019-04-23 21:12 UTC (permalink / raw)
To: Changbin Du
Cc: fenghua.yu, x86, linux-doc, linux-pci, linux-gpio,
Jonathan Corbet, rjw, linux-kernel, linux-acpi, mingo,
Bjorn Helgaas, tglx, linuxppc-dev
In-Reply-To: <20190423162932.21428-14-changbin.du@gmail.com>
Em Wed, 24 Apr 2019 00:28:42 +0800
Changbin Du <changbin.du@gmail.com> escreveu:
> This converts the plain text documentation to reStructuredText format and
> add it to Sphinx TOC tree. No essential content change.
>
> Signed-off-by: Changbin Du <changbin.du@gmail.com>
> ---
> .../acpi/acpi-lid.rst} | 48 ++++++++++++-------
> Documentation/firmware-guide/acpi/index.rst | 1 +
> 2 files changed, 33 insertions(+), 16 deletions(-)
> rename Documentation/{acpi/acpi-lid.txt => firmware-guide/acpi/acpi-lid.rst} (77%)
>
> diff --git a/Documentation/acpi/acpi-lid.txt b/Documentation/firmware-guide/acpi/acpi-lid.rst
> similarity index 77%
> rename from Documentation/acpi/acpi-lid.txt
> rename to Documentation/firmware-guide/acpi/acpi-lid.rst
> index effe7af3a5af..1d19e15a6945 100644
> --- a/Documentation/acpi/acpi-lid.txt
> +++ b/Documentation/firmware-guide/acpi/acpi-lid.rst
> @@ -1,25 +1,29 @@
> -Special Usage Model of the ACPI Control Method Lid Device
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: <isonum.txt>
>
> -Copyright (C) 2016, Intel Corporation
> -Author: Lv Zheng <lv.zheng@intel.com>
> +=========================================================
> +Special Usage Model of the ACPI Control Method Lid Device
> +=========================================================
>
> +:Copyright: |copy| 2016, Intel Corporation
>
> -Abstract:
> +:Author: Lv Zheng <lv.zheng@intel.com>
>
> -Platforms containing lids convey lid state (open/close) to OSPMs using a
> -control method lid device. To implement this, the AML tables issue
> -Notify(lid_device, 0x80) to notify the OSPMs whenever the lid state has
> -changed. The _LID control method for the lid device must be implemented to
> -report the "current" state of the lid as either "opened" or "closed".
> +:Abstract: Platforms containing lids convey lid state (open/close) to OSPMs
> + using a control method lid device. To implement this, the AML tables issue
> + Notify(lid_device, 0x80) to notify the OSPMs whenever the lid state has
> + changed. The _LID control method for the lid device must be implemented to
> + report the "current" state of the lid as either "opened" or "closed".
Same comment for the abstract.
>
> -For most platforms, both the _LID method and the lid notifications are
> -reliable. However, there are exceptions. In order to work with these
> -exceptional buggy platforms, special restrictions and expections should be
> -taken into account. This document describes the restrictions and the
> -expections of the Linux ACPI lid device driver.
> + For most platforms, both the _LID method and the lid notifications are
> + reliable. However, there are exceptions. In order to work with these
> + exceptional buggy platforms, special restrictions and expections should be
> + taken into account. This document describes the restrictions and the
> + expections of the Linux ACPI lid device driver.
>
>
> 1. Restrictions of the returning value of the _LID control method
> +=================================================================
>
> The _LID control method is described to return the "current" lid state.
> However the word of "current" has ambiguity, some buggy AML tables return
> @@ -31,6 +35,7 @@ with cached value, the initial returning value is likely not reliable.
> There are platforms always retun "closed" as initial lid state.
>
> 2. Restrictions of the lid state change notifications
> +=====================================================
>
> There are buggy AML tables never notifying when the lid device state is
> changed to "opened". Thus the "opened" notification is not guaranteed. But
> @@ -40,17 +45,21 @@ trigger some system power saving operations on Windows. Since it is fully
> tested, it is reliable from all AML tables.
>
> 3. Expections for the userspace users of the ACPI lid device driver
> +===================================================================
>
> The ACPI button driver exports the lid state to the userspace via the
> -following file:
> +following file::
> +
> /proc/acpi/button/lid/LID0/state
> +
> This file actually calls the _LID control method described above. And given
> the previous explanation, it is not reliable enough on some platforms. So
> it is advised for the userspace program to not to solely rely on this file
> to determine the actual lid state.
>
> The ACPI button driver emits the following input event to the userspace:
> - SW_LID
> + * SW_LID
> +
> The ACPI lid device driver is implemented to try to deliver the platform
> triggered events to the userspace. However, given the fact that the buggy
> firmware cannot make sure "opened"/"closed" events are paired, the ACPI
> @@ -59,20 +68,25 @@ button driver uses the following 3 modes in order not to trigger issues.
> If the userspace hasn't been prepared to ignore the unreliable "opened"
> events and the unreliable initial state notification, Linux users can use
> the following kernel parameters to handle the possible issues:
> +
> A. button.lid_init_state=method:
Just for the sake of a better visual at the html output, I would place those
button.* as:
A. ``button.lid_init_state=method``:
Anyway, with or without such change:
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
> When this option is specified, the ACPI button driver reports the
> initial lid state using the returning value of the _LID control method
> and whether the "opened"/"closed" events are paired fully relies on the
> firmware implementation.
> +
> This option can be used to fix some platforms where the returning value
> of the _LID control method is reliable but the initial lid state
> notification is missing.
> +
> This option is the default behavior during the period the userspace
> isn't ready to handle the buggy AML tables.
> +
> B. button.lid_init_state=open:
> When this option is specified, the ACPI button driver always reports the
> initial lid state as "opened" and whether the "opened"/"closed" events
> are paired fully relies on the firmware implementation.
> +
> This may fix some platforms where the returning value of the _LID
> control method is not reliable and the initial lid state notification is
> missing.
> @@ -80,6 +94,7 @@ B. button.lid_init_state=open:
> If the userspace has been prepared to ignore the unreliable "opened" events
> and the unreliable initial state notification, Linux users should always
> use the following kernel parameter:
> +
> C. button.lid_init_state=ignore:
> When this option is specified, the ACPI button driver never reports the
> initial lid state and there is a compensation mechanism implemented to
> @@ -89,6 +104,7 @@ C. button.lid_init_state=ignore:
> notifications can be delivered to the userspace when the lid is actually
> opens given that some AML tables do not send "opened" notifications
> reliably.
> +
> In this mode, if everything is correctly implemented by the platform
> firmware, the old userspace programs should still work. Otherwise, the
> new userspace programs are required to work with the ACPI button driver.
> diff --git a/Documentation/firmware-guide/acpi/index.rst b/Documentation/firmware-guide/acpi/index.rst
> index 1c89888f6ee8..bedcb0b242a2 100644
> --- a/Documentation/firmware-guide/acpi/index.rst
> +++ b/Documentation/firmware-guide/acpi/index.rst
> @@ -14,3 +14,4 @@ ACPI Support
> DSD-properties-rules
> gpio-properties
> i2c-muxes
> + acpi-lid
> \ No newline at end of file
Thanks,
Mauro
^ permalink raw reply
* Re: [PATCH v4 14/63] Documentation: ACPI: move dsd/graph.txt to firmware-guide/acpi and convert to reST
From: Mauro Carvalho Chehab @ 2019-04-23 21:14 UTC (permalink / raw)
To: Changbin Du
Cc: fenghua.yu, x86, linux-doc, linux-pci, linux-gpio,
Jonathan Corbet, rjw, linux-kernel, linux-acpi, mingo,
Bjorn Helgaas, tglx, linuxppc-dev
In-Reply-To: <20190423162932.21428-15-changbin.du@gmail.com>
Em Wed, 24 Apr 2019 00:28:43 +0800
Changbin Du <changbin.du@gmail.com> escreveu:
> This converts the plain text documentation to reStructuredText format and
> add it to Sphinx TOC tree. No essential content change.
>
> Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
> ---
> .../acpi/dsd/graph.rst} | 157 +++++++++---------
> Documentation/firmware-guide/acpi/index.rst | 1 +
> 2 files changed, 81 insertions(+), 77 deletions(-)
> rename Documentation/{acpi/dsd/graph.txt => firmware-guide/acpi/dsd/graph.rst} (56%)
>
> diff --git a/Documentation/acpi/dsd/graph.txt b/Documentation/firmware-guide/acpi/dsd/graph.rst
> similarity index 56%
> rename from Documentation/acpi/dsd/graph.txt
> rename to Documentation/firmware-guide/acpi/dsd/graph.rst
> index b9ce910781dc..e0baed35b037 100644
> --- a/Documentation/acpi/dsd/graph.txt
> +++ b/Documentation/firmware-guide/acpi/dsd/graph.rst
> @@ -1,8 +1,11 @@
> -Graphs
> +.. SPDX-License-Identifier: GPL-2.0
>
> +======
> +Graphs
> +======
>
> _DSD
> -----
> +====
>
> _DSD (Device Specific Data) [7] is a predefined ACPI device
> configuration object that can be used to convey information on
> @@ -30,7 +33,7 @@ hierarchical data extension array on each depth.
>
>
> Ports and endpoints
> --------------------
> +===================
>
> The port and endpoint concepts are very similar to those in Devicetree
> [3]. A port represents an interface in a device, and an endpoint
> @@ -38,9 +41,9 @@ represents a connection to that interface.
>
> All port nodes are located under the device's "_DSD" node in the hierarchical
> data extension tree. The data extension related to each port node must begin
> -with "port" and must be followed by the "@" character and the number of the port
> -as its key. The target object it refers to should be called "PRTX", where "X" is
> -the number of the port. An example of such a package would be:
> +with "port" and must be followed by the "@" character and the number of the
> +port as its key. The target object it refers to should be called "PRTX", where
> +"X" is the number of the port. An example of such a package would be::
>
> Package() { "port@4", PRT4 }
>
> @@ -49,7 +52,7 @@ data extension key of the endpoint nodes must begin with
> "endpoint" and must be followed by the "@" character and the number of the
> endpoint. The object it refers to should be called "EPXY", where "X" is the
> number of the port and "Y" is the number of the endpoint. An example of such a
> -package would be:
> +package would be::
>
> Package() { "endpoint@0", EP40 }
>
> @@ -62,85 +65,85 @@ of that port shall be zero. Similarly, if a port may only have a single
> endpoint, the number of that endpoint shall be zero.
>
> The endpoint reference uses property extension with "remote-endpoint" property
> -name followed by a reference in the same package. Such references consist of the
> +name followed by a reference in the same package. Such references consist of
> the remote device reference, the first package entry of the port data extension
> reference under the device and finally the first package entry of the endpoint
> -data extension reference under the port. Individual references thus appear as:
> +data extension reference under the port. Individual references thus appear as::
>
> Package() { device, "port@X", "endpoint@Y" }
>
> -In the above example, "X" is the number of the port and "Y" is the number of the
> -endpoint.
> +In the above example, "X" is the number of the port and "Y" is the number of
> +the endpoint.
>
> The references to endpoints must be always done both ways, to the
> remote endpoint and back from the referred remote endpoint node.
>
> -A simple example of this is show below:
> +A simple example of this is show below::
>
> Scope (\_SB.PCI0.I2C2)
> {
> - Device (CAM0)
> - {
> - Name (_DSD, Package () {
> - ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> - Package () {
> - Package () { "compatible", Package () { "nokia,smia" } },
> - },
> - ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
> - Package () {
> - Package () { "port@0", PRT0 },
> - }
> - })
> - Name (PRT0, Package() {
> - ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> - Package () {
> - Package () { "reg", 0 },
> - },
> - ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
> - Package () {
> - Package () { "endpoint@0", EP00 },
> - }
> - })
> - Name (EP00, Package() {
> - ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> - Package () {
> - Package () { "reg", 0 },
> - Package () { "remote-endpoint", Package() { \_SB.PCI0.ISP, "port@4", "endpoint@0" } },
> - }
> - })
> - }
> + Device (CAM0)
> + {
> + Name (_DSD, Package () {
> + ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> + Package () {
> + Package () { "compatible", Package () { "nokia,smia" } },
> + },
> + ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
> + Package () {
> + Package () { "port@0", PRT0 },
> + }
> + })
> + Name (PRT0, Package() {
> + ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> + Package () {
> + Package () { "reg", 0 },
> + },
> + ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
> + Package () {
> + Package () { "endpoint@0", EP00 },
> + }
> + })
> + Name (EP00, Package() {
> + ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> + Package () {
> + Package () { "reg", 0 },
> + Package () { "remote-endpoint", Package() { \_SB.PCI0.ISP, "port@4", "endpoint@0" } },
> + }
> + })
> + }
> }
>
> Scope (\_SB.PCI0)
> {
> - Device (ISP)
> - {
> - Name (_DSD, Package () {
> - ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
> - Package () {
> - Package () { "port@4", PRT4 },
> - }
> - })
> -
> - Name (PRT4, Package() {
> - ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> - Package () {
> - Package () { "reg", 4 }, /* CSI-2 port number */
> - },
> - ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
> - Package () {
> - Package () { "endpoint@0", EP40 },
> - }
> - })
> -
> - Name (EP40, Package() {
> - ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> - Package () {
> - Package () { "reg", 0 },
> - Package () { "remote-endpoint", Package () { \_SB.PCI0.I2C2.CAM0, "port@0", "endpoint@0" } },
> - }
> - })
> - }
> + Device (ISP)
> + {
> + Name (_DSD, Package () {
> + ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
> + Package () {
> + Package () { "port@4", PRT4 },
> + }
> + })
> +
> + Name (PRT4, Package() {
> + ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> + Package () {
> + Package () { "reg", 4 }, /* CSI-2 port number */
> + },
> + ToUUID("dbb8e3e6-5886-4ba6-8795-1319f52a966b"),
> + Package () {
> + Package () { "endpoint@0", EP40 },
> + }
> + })
> +
> + Name (EP40, Package() {
> + ToUUID("daffd814-6eba-4d8c-8a91-bc9bbf4aa301"),
> + Package () {
> + Package () { "reg", 0 },
> + Package () { "remote-endpoint", Package () { \_SB.PCI0.I2C2.CAM0, "port@0", "endpoint@0" } },
> + }
> + })
> + }
> }
>
> Here, the port 0 of the "CAM0" device is connected to the port 4 of
> @@ -148,27 +151,27 @@ the "ISP" device and vice versa.
>
>
> References
> -----------
> +==========
>
> [1] _DSD (Device Specific Data) Implementation Guide.
> - <URL:http://www.uefi.org/sites/default/files/resources/_DSD-implementation-guide-toplevel-1_1.htm>,
> + http://www.uefi.org/sites/default/files/resources/_DSD-implementation-guide-toplevel-1_1.htm,
> referenced 2016-10-03.
>
> -[2] Devicetree. <URL:http://www.devicetree.org>, referenced 2016-10-03.
> +[2] Devicetree. http://www.devicetree.org, referenced 2016-10-03.
>
> [3] Documentation/devicetree/bindings/graph.txt
>
> [4] Device Properties UUID For _DSD.
> - <URL:http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf>,
> + http://www.uefi.org/sites/default/files/resources/_DSD-device-properties-UUID.pdf,
> referenced 2016-10-04.
>
> [5] Hierarchical Data Extension UUID For _DSD.
> - <URL:http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf>,
> + http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf,
> referenced 2016-10-04.
>
> [6] Advanced Configuration and Power Interface Specification.
> - <URL:http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf>,
> + http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf,
> referenced 2016-10-04.
>
> [7] _DSD Device Properties Usage Rules.
> - Documentation/acpi/DSD-properties-rules.txt
> + :doc:`../DSD-properties-rules`
> diff --git a/Documentation/firmware-guide/acpi/index.rst b/Documentation/firmware-guide/acpi/index.rst
> index bedcb0b242a2..f81cfbcb6878 100644
> --- a/Documentation/firmware-guide/acpi/index.rst
> +++ b/Documentation/firmware-guide/acpi/index.rst
> @@ -8,6 +8,7 @@ ACPI Support
> :maxdepth: 1
>
> namespace
> + dsd/graph
> enumeration
> osi
> method-customizing
Thanks,
Mauro
^ permalink raw reply
* Re: [PATCH v4 15/63] Documentation: ACPI: move dsd/data-node-references.txt to firmware-guide/acpi and convert to reST
From: Mauro Carvalho Chehab @ 2019-04-23 21:17 UTC (permalink / raw)
To: Changbin Du
Cc: fenghua.yu, x86, linux-doc, linux-pci, linux-gpio,
Jonathan Corbet, rjw, linux-kernel, linux-acpi, mingo,
Bjorn Helgaas, tglx, linuxppc-dev
In-Reply-To: <20190423162932.21428-16-changbin.du@gmail.com>
Em Wed, 24 Apr 2019 00:28:44 +0800
Changbin Du <changbin.du@gmail.com> escreveu:
> This converts the plain text documentation to reStructuredText format and
> add it to Sphinx TOC tree. No essential content change.
>
> Signed-off-by: Changbin Du <changbin.du@gmail.com>
> ---
> .../acpi/dsd/data-node-references.rst} | 28 +++++++++++--------
> Documentation/firmware-guide/acpi/index.rst | 1 +
> 2 files changed, 17 insertions(+), 12 deletions(-)
> rename Documentation/{acpi/dsd/data-node-references.txt => firmware-guide/acpi/dsd/data-node-references.rst} (79%)
>
> diff --git a/Documentation/acpi/dsd/data-node-references.txt b/Documentation/firmware-guide/acpi/dsd/data-node-references.rst
> similarity index 79%
> rename from Documentation/acpi/dsd/data-node-references.txt
> rename to Documentation/firmware-guide/acpi/dsd/data-node-references.rst
> index c3871565c8cf..79c5368eaecf 100644
> --- a/Documentation/acpi/dsd/data-node-references.txt
> +++ b/Documentation/firmware-guide/acpi/dsd/data-node-references.rst
> @@ -1,9 +1,12 @@
> -Copyright (C) 2018 Intel Corporation
> -Author: Sakari Ailus <sakari.ailus@linux.intel.com>
> -
> +.. SPDX-License-Identifier: GPL-2.0
> +.. include:: <isonum.txt>
>
> +===================================
> Referencing hierarchical data nodes
> ------------------------------------
> +===================================
> +
> +:Copyright: |copy| 2018 Intel Corporation
> +:Author: Sakari Ailus <sakari.ailus@linux.intel.com>
>
> ACPI in general allows referring to device objects in the tree only.
> Hierarchical data extension nodes may not be referred to directly, hence this
> @@ -28,13 +31,14 @@ extension key.
>
>
> Example
> --------
> +=======
>
> - In the ASL snippet below, the "reference" _DSD property [2] contains a
> - device object reference to DEV0 and under that device object, a
> - hierarchical data extension key "node@1" referring to the NOD1 object
> - and lastly, a hierarchical data extension key "anothernode" referring to
> - the ANOD object which is also the final target node of the reference.
> +In the ASL snippet below, the "reference" _DSD property [2] contains a
> +device object reference to DEV0 and under that device object, a
> +hierarchical data extension key "node@1" referring to the NOD1 object
> +and lastly, a hierarchical data extension key "anothernode" referring to
> +the ANOD object which is also the final target node of the reference.
> +::
>
> Device (DEV0)
> {
> @@ -75,10 +79,10 @@ Example
> })
> }
>
> -Please also see a graph example in graph.txt .
> +Please also see a graph example in :doc:`graph`.
>
> References
> -----------
> +==========
>
> [1] Hierarchical Data Extension UUID For _DSD.
> <URL:http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-data-extension-UUID-v1.1.pdf>,
Hmm... on the previous patch, you replaced <URL:some_url> by some_url,
with makes sense. Please do the same here and on other patches on
this series with a similar way to describe URLs.
> diff --git a/Documentation/firmware-guide/acpi/index.rst b/Documentation/firmware-guide/acpi/index.rst
> index f81cfbcb6878..6d4e0df4f063 100644
> --- a/Documentation/firmware-guide/acpi/index.rst
> +++ b/Documentation/firmware-guide/acpi/index.rst
> @@ -9,6 +9,7 @@ ACPI Support
>
> namespace
> dsd/graph
> + dsd/data-node-references
> enumeration
> osi
> method-customizing
Thanks,
Mauro
^ permalink raw reply
* Re: [PATCH v4 16/63] Documentation: ACPI: move debug.txt to firmware-guide/acpi and convert to reST
From: Mauro Carvalho Chehab @ 2019-04-23 21:21 UTC (permalink / raw)
To: Changbin Du
Cc: fenghua.yu, x86, linux-doc, linux-pci, linux-gpio,
Jonathan Corbet, rjw, linux-kernel, linux-acpi, mingo,
Bjorn Helgaas, tglx, linuxppc-dev
In-Reply-To: <20190423162932.21428-17-changbin.du@gmail.com>
Em Wed, 24 Apr 2019 00:28:45 +0800
Changbin Du <changbin.du@gmail.com> escreveu:
> This converts the plain text documentation to reStructuredText format and
> add it to Sphinx TOC tree. No essential content change.
>
> Signed-off-by: Changbin Du <changbin.du@gmail.com>
> ---
> .../acpi/debug.rst} | 31 ++++++++++---------
> Documentation/firmware-guide/acpi/index.rst | 3 +-
> 2 files changed, 19 insertions(+), 15 deletions(-)
> rename Documentation/{acpi/debug.txt => firmware-guide/acpi/debug.rst} (91%)
>
> diff --git a/Documentation/acpi/debug.txt b/Documentation/firmware-guide/acpi/debug.rst
> similarity index 91%
> rename from Documentation/acpi/debug.txt
> rename to Documentation/firmware-guide/acpi/debug.rst
> index 65bf47c46b6d..1a152dd1d765 100644
> --- a/Documentation/acpi/debug.txt
> +++ b/Documentation/firmware-guide/acpi/debug.rst
> @@ -1,18 +1,21 @@
> - ACPI Debug Output
> +.. SPDX-License-Identifier: GPL-2.0
>
> +=================
> +ACPI Debug Output
> +=================
>
> The ACPI CA, the Linux ACPI core, and some ACPI drivers can generate debug
> output. This document describes how to use this facility.
>
> Compile-time configuration
> ---------------------------
> +==========================
>
> ACPI debug output is globally enabled by CONFIG_ACPI_DEBUG. If this config
> option is turned off, the debug messages are not even built into the
> kernel.
>
> Boot- and run-time configuration
> ---------------------------------
> +================================
>
> When CONFIG_ACPI_DEBUG=y, you can select the component and level of messages
> you're interested in. At boot-time, use the acpi.debug_layer and
> @@ -21,7 +24,7 @@ debug_layer and debug_level files in /sys/module/acpi/parameters/ to control
> the debug messages.
>
> debug_layer (component)
> ------------------------
> +=======================
>
> The "debug_layer" is a mask that selects components of interest, e.g., a
> specific driver or part of the ACPI interpreter. To build the debug_layer
> @@ -33,7 +36,7 @@ to /sys/module/acpi/parameters/debug_layer.
>
> The possible components are defined in include/acpi/acoutput.h and
> include/acpi/acpi_drivers.h. Reading /sys/module/acpi/parameters/debug_layer
> -shows the supported mask values, currently these:
> +shows the supported mask values, currently these::
>
> ACPI_UTILITIES 0x00000001
> ACPI_HARDWARE 0x00000002
> @@ -65,7 +68,7 @@ shows the supported mask values, currently these:
> ACPI_PROCESSOR_COMPONENT 0x20000000
This is one way of doing. The other one, with would likely produce a
better visual, would be to use tables, e. g.:
============== ==========
ACPI_UTILITIES 0x00000001
ACPI_HARDWARE 0x00000002
============== ==========
Of course, if you use tables here, you need to be consistent along
similar cases inside the document.
while both works, I prefer using tables on such cases.
Either way:
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
>
> debug_level
> ------------
> +===========
>
> The "debug_level" is a mask that selects different types of messages, e.g.,
> those related to initialization, method execution, informational messages, etc.
> @@ -81,7 +84,7 @@ to /sys/module/acpi/parameters/debug_level.
>
> The possible levels are defined in include/acpi/acoutput.h. Reading
> /sys/module/acpi/parameters/debug_level shows the supported mask values,
> -currently these:
> +currently these::
>
> ACPI_LV_INIT 0x00000001
> ACPI_LV_DEBUG_OBJECT 0x00000002
> @@ -113,9 +116,9 @@ currently these:
> ACPI_LV_EVENTS 0x80000000
>
> Examples
> ---------
> +========
>
> -For example, drivers/acpi/bus.c contains this:
> +For example, drivers/acpi/bus.c contains this::
>
> #define _COMPONENT ACPI_BUS_COMPONENT
> ...
> @@ -127,22 +130,22 @@ statement uses ACPI_DB_INFO, which is macro based on the ACPI_LV_INFO
> definition.)
>
> Enable all AML "Debug" output (stores to the Debug object while interpreting
> -AML) during boot:
> +AML) during boot::
>
> acpi.debug_layer=0xffffffff acpi.debug_level=0x2
>
> -Enable PCI and PCI interrupt routing debug messages:
> +Enable PCI and PCI interrupt routing debug messages::
>
> acpi.debug_layer=0x400000 acpi.debug_level=0x4
>
> -Enable all ACPI hardware-related messages:
> +Enable all ACPI hardware-related messages::
>
> acpi.debug_layer=0x2 acpi.debug_level=0xffffffff
>
> -Enable all ACPI_DB_INFO messages after boot:
> +Enable all ACPI_DB_INFO messages after boot::
>
> # echo 0x4 > /sys/module/acpi/parameters/debug_level
>
> -Show all valid component values:
> +Show all valid component values::
>
> # cat /sys/module/acpi/parameters/debug_layer
> diff --git a/Documentation/firmware-guide/acpi/index.rst b/Documentation/firmware-guide/acpi/index.rst
> index 6d4e0df4f063..a45fea11f998 100644
> --- a/Documentation/firmware-guide/acpi/index.rst
> +++ b/Documentation/firmware-guide/acpi/index.rst
> @@ -14,6 +14,7 @@ ACPI Support
> osi
> method-customizing
> DSD-properties-rules
> + debug
> gpio-properties
> i2c-muxes
> - acpi-lid
> \ No newline at end of file
> + acpi-lid
Thanks,
Mauro
^ permalink raw reply
* [PATCH v4] powerpc/pseries: Remove limit in wait for dying CPU
From: Thiago Jung Bauermann @ 2019-04-23 22:39 UTC (permalink / raw)
To: linuxppc-dev
Cc: Gautham R Shenoy, linux-kernel, Nicholas Piggin,
Michael Bringmann, Tyrel Datwyler, Vaidyanathan Srinivasan,
Thiago Jung Bauermann
When testing DLPAR CPU add/remove on a system under stress,
pseries_cpu_die() doesn't wait long enough for a CPU to die:
[ 446.983944] cpu 148 (hwid 148) Ready to die...
[ 446.984062] cpu 149 (hwid 149) Ready to die...
[ 446.993518] cpu 150 (hwid 150) Ready to die...
[ 446.993543] Querying DEAD? cpu 150 (150) shows 2
[ 446.994098] cpu 151 (hwid 151) Ready to die...
[ 447.133726] cpu 136 (hwid 136) Ready to die...
[ 447.403532] cpu 137 (hwid 137) Ready to die...
[ 447.403772] cpu 138 (hwid 138) Ready to die...
[ 447.403839] cpu 139 (hwid 139) Ready to die...
[ 447.403887] cpu 140 (hwid 140) Ready to die...
[ 447.403937] cpu 141 (hwid 141) Ready to die...
[ 447.403979] cpu 142 (hwid 142) Ready to die...
[ 447.404038] cpu 143 (hwid 143) Ready to die...
[ 447.513546] cpu 128 (hwid 128) Ready to die...
[ 447.693533] cpu 129 (hwid 129) Ready to die...
[ 447.693999] cpu 130 (hwid 130) Ready to die...
[ 447.703530] cpu 131 (hwid 131) Ready to die...
[ 447.704087] Querying DEAD? cpu 132 (132) shows 2
[ 447.704102] cpu 132 (hwid 132) Ready to die...
[ 447.713534] cpu 133 (hwid 133) Ready to die...
[ 447.714064] Querying DEAD? cpu 134 (134) shows 2
This is a race between one CPU stopping and another one calling
pseries_cpu_die() to wait for it to stop. That function does a short busy
loop calling RTAS query-cpu-stopped-state on the stopping CPU to verify
that it is stopped, but I think there's a lot for the stopping CPU to do
which may take longer than this loop allows.
As can be seen in the dmesg right before or after the "Querying DEAD?"
messages, if pseries_cpu_die() waited a little longer it would have seen
the CPU in the stopped state.
What I think is going on is that CPU 134 was inactive at the time it was
unplugged. In that case, dlpar_offline_cpu() calls H_PROD on that CPU and
immediately calls pseries_cpu_die(). Meanwhile, the prodded CPU activates
and start the process of stopping itself. The busy loop is not long enough
to allow for the CPU to wake up and complete the stopping process.
This can be a problem because if the busy loop finishes too early, then the
kernel may offline another CPU before the previous one finished dying,
which would lead to two concurrent calls to rtas-stop-self, which is
prohibited by the PAPR.
Since the hotplug machinery already assumes that cpu_die() is going to
work, we can simply loop until the CPU stops.
Also change the loop to wait 100 µs between each call to
smp_query_cpu_stopped() to avoid querying RTAS too often.
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Analyzed-by: Gautham R Shenoy <ego@linux.vnet.ibm.com>
---
arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
I have seen this problem since v4.8. Should this patch go to stable as
well?
Changes since v3:
- Changed to loop until the CPU stops rather than for a fixed amount
of time.
Changes since v2:
- Increased busy loop to 200 iterations so that it can last up to 20 ms
(suggested by Gautham).
- Changed commit message to include Gautham's remarks.
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 97feb6e79f1a..d75cee60644c 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -214,13 +214,17 @@ static void pseries_cpu_die(unsigned int cpu)
msleep(1);
}
} else if (get_preferred_offline_state(cpu) == CPU_STATE_OFFLINE) {
-
- for (tries = 0; tries < 25; tries++) {
+ /*
+ * rtas_stop_self() panics if the CPU fails to stop and our
+ * callers already assume that we are going to succeed, so we
+ * can just loop until the CPU stops.
+ */
+ while (true) {
cpu_status = smp_query_cpu_stopped(pcpu);
if (cpu_status == QCSS_STOPPED ||
cpu_status == QCSS_HARDWARE_ERROR)
break;
- cpu_relax();
+ udelay(100);
}
}
^ permalink raw reply related
* Re: [PATCH v2 26/79] docs: powerpc: convert docs to ReST and rename to *.rst
From: Andrew Donnellan @ 2019-04-24 1:15 UTC (permalink / raw)
To: Mauro Carvalho Chehab, Linux Doc Mailing List
Cc: Paul Mackerras, Qiang Zhao, linux-scsi, Jonathan Corbet,
linux-pci, Jiri Slaby, Mauro Carvalho Chehab, Manoj N. Kumar,
Bjorn Helgaas, linux-arm-kernel, Matthew R. Ochs, Uma Krishnan,
Sam Bobroff, Greg Kroah-Hartman, linux-kernel, Li Yang,
Frederic Barrat, Oliver O'Halloran, linuxppc-dev
In-Reply-To: <dbbfc37a221aa1a1320bdfbb34f593d8a76f643d.1555938376.git.mchehab+samsung@kernel.org>
On 22/4/19 11:27 pm, Mauro Carvalho Chehab wrote:
> Convert docs to ReST and add them to the arch-specific
> book.
>
> The conversion here was trivial, as almost every file there
> was already using an elegant format close to ReST standard.
>
> The changes were mostly to mark literal blocks and add a few
> missing section title identifiers.
>
> One note with regards to "--": on Sphinx, this can't be used
> to identify a list, as it will format it badly. This can be
> used, however, to identify a long hyphen - and "---" is an
> even longer one.
>
> At its new index.rst, let's add a :orphan: while this is not linked to
> the main index.rst file, in order to avoid build warnings.
>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> # cxl
Minor possible improvement below, otherwise compiled output looks very nice.
> diff --git a/Documentation/powerpc/cxl.txt b/Documentation/powerpc/cxl.rst
> similarity index 95%
> rename from Documentation/powerpc/cxl.txt
> rename to Documentation/powerpc/cxl.rst
> index c5e8d5098ed3..99e704afb09d 100644
> --- a/Documentation/powerpc/cxl.txt
> +++ b/Documentation/powerpc/cxl.rst
> @@ -1,3 +1,4 @@
> +====================================
> Coherent Accelerator Interface (CXL)
> ====================================
>
> @@ -21,6 +22,8 @@ Introduction
> Hardware overview
> =================
>
> + ::
> +
> POWER8/9 FPGA
> +----------+ +---------+
> | | | |
> @@ -59,14 +62,16 @@ Hardware overview
> the fault. The context to which this fault is serviced is based on
> who owns that acceleration function.
>
> - POWER8 <-----> PSL Version 8 is compliant to the CAIA Version 1.0.
> - POWER9 <-----> PSL Version 9 is compliant to the CAIA Version 2.0.
> + - POWER8 <------> PSL Version 8 is compliant to the CAIA Version 1.0.
> + - POWER9 <------> PSL Version 9 is compliant to the CAIA Version 2.0.
> +
This could probably be changed to "POWER8 and PSL Version 8 are
compliant" or something like that to avoid the ASCII art arrows turning
into dashes. Happy to pick this up myself if I ever get around to
revising the rest of the documentation.
--
Andrew Donnellan OzLabs, ADL Canberra
andrew.donnellan@au1.ibm.com IBM Australia Limited
^ permalink raw reply
* Re: [PATCH] powerpc/pseries: Use correct event modifier in rtas_parse_epow_errlog()
From: Russell Currey @ 2019-04-24 1:29 UTC (permalink / raw)
To: Yue Haibing, paulus, mpe, mahesh, npiggin, ganeshgr, anton
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20190423143533.26952-1-yuehaibing@huawei.com>
On Tue, 2019-04-23 at 22:35 +0800, Yue Haibing wrote:
> From: YueHaibing <yuehaibing@huawei.com>
>
> rtas_parse_epow_errlog() should pass 'modifier' to
> handle_system_shutdown, because event modifier only use
> bottom 4 bits.
>
> Fixes: 55fc0c561742 ("powerpc/pseries: Parse and handle EPOW
> interrupts")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Hello,
This fails to compile.
arch/powerpc/platforms/pseries/ras.c: In function
'rtas_parse_epow_errlog':
arch/powerpc/platforms/pseries/ras.c:288:26: error: 'event_modifier'
undeclared (first use in this function); did you mean 'modifier'?
handle_system_shutdown(event_modifier);
^~~~~~~~~~~~~~
modifier
So yes, I assume you meant "modifier" instead of "event_modifier" as
mentioned in your commit message. Did you compile this before sending?
(found by snowpatch)
- Russell
^ permalink raw reply
* Re: [PATCH] powerpc/pseries: Use correct event modifier in rtas_parse_epow_errlog()
From: YueHaibing @ 2019-04-24 1:48 UTC (permalink / raw)
To: Russell Currey, paulus, mpe, mahesh, npiggin, ganeshgr, anton
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1b208e33c4f51630e2a336e95f25509fe7286e34.camel@russell.cc>
On 2019/4/24 9:29, Russell Currey wrote:
> On Tue, 2019-04-23 at 22:35 +0800, Yue Haibing wrote:
>> From: YueHaibing <yuehaibing@huawei.com>
>>
>> rtas_parse_epow_errlog() should pass 'modifier' to
>> handle_system_shutdown, because event modifier only use
>> bottom 4 bits.
>>
>> Fixes: 55fc0c561742 ("powerpc/pseries: Parse and handle EPOW
>> interrupts")
>> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
>
> Hello,
>
> This fails to compile.
>
> arch/powerpc/platforms/pseries/ras.c: In function
> 'rtas_parse_epow_errlog':
> arch/powerpc/platforms/pseries/ras.c:288:26: error: 'event_modifier'
> undeclared (first use in this function); did you mean 'modifier'?
> handle_system_shutdown(event_modifier);
> ^~~~~~~~~~~~~~
> modifier
Yes, this should be 'modifier'
>
> So yes, I assume you meant "modifier" instead of "event_modifier" as
> mentioned in your commit message. Did you compile this before sending?
I am very sorry for this, this is not my compiled version,
I forgot to check again before sending.
>
> (found by snowpatch)
>
> - Russell
>
>
> .
>
^ permalink raw reply
* Re: [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic
From: Davidlohr Bueso @ 2019-04-24 2:15 UTC (permalink / raw)
To: Daniel Jordan
Cc: Alexey Kardashevskiy, linux-kernel, linux-mm, jgg, Paul Mackerras,
akpm, linuxppc-dev, Christoph Lameter
In-Reply-To: <20190403164002.hued52o4mga4yprw@ca-dmjordan1.us.oracle.com>
On Wed, 03 Apr 2019, Daniel Jordan wrote:
>On Wed, Apr 03, 2019 at 06:58:45AM +0200, Christophe Leroy wrote:
>> Le 02/04/2019 à 22:41, Daniel Jordan a écrit :
>> > With locked_vm now an atomic, there is no need to take mmap_sem as
>> > writer. Delete and refactor accordingly.
>>
>> Could you please detail the change ?
>
>Ok, I'll be more specific in the next version, using some of your language in
>fact. :)
>
>> It looks like this is not the only
>> change. I'm wondering what the consequences are.
>>
>> Before we did:
>> - lock
>> - calculate future value
>> - check the future value is acceptable
>> - update value if future value acceptable
>> - return error if future value non acceptable
>> - unlock
>>
>> Now we do:
>> - atomic update with future (possibly too high) value
>> - check the new value is acceptable
>> - atomic update back with older value if new value not acceptable and return
>> error
>>
>> So if a concurrent action wants to increase locked_vm with an acceptable
>> step while another one has temporarily set it too high, it will now fail.
>>
>> I think we should keep the previous approach and do a cmpxchg after
>> validating the new value.
Wouldn't the cmpxchg alternative also be exposed the locked_vm changing between
validating the new value and the cmpxchg() and we'd bogusly fail even when there
is still just because the value changed (I'm assuming we don't hold any locks,
otherwise all this is pointless).
current_locked = atomic_read(&mm->locked_vm);
new_locked = current_locked + npages;
if (new_locked < lock_limit)
if (cmpxchg(&mm->locked_vm, current_locked, new_locked) == current_locked)
/* ENOMEM */
>
>That's a good idea, and especially worth doing considering that an arbitrary
>number of threads that charge a low amount of locked_vm can fail just because
>one thread charges lots of it.
Yeah but the window for this is quite small, I doubt it would be a real issue.
What if before doing the atomic_add_return(), we first did the racy new_locked
check for ENOMEM, then do the speculative add and cleanup, if necessary. This
would further reduce the scope of the window where false ENOMEM can occur.
>pinned_vm appears to be broken the same way, so I can fix it too unless someone
>beats me to it.
This should not be a surprise for the rdma folks. Cc'ing Jason nonetheless.
Thanks,
Davidlohr
^ permalink raw reply
* [PATCH v2] powerpc/pseries: Use correct event modifier in rtas_parse_epow_errlog()
From: Yue Haibing @ 2019-04-24 2:17 UTC (permalink / raw)
To: benh, paulus, mpe, mahesh, npiggin, ganeshgr, anton, ruscur
Cc: YueHaibing, linuxppc-dev, linux-kernel
In-Reply-To: <20190423143533.26952-1-yuehaibing@huawei.com>
From: YueHaibing <yuehaibing@huawei.com>
rtas_parse_epow_errlog() should pass 'modifier' to
handle_system_shutdown, because event modifier only use
bottom 4 bits.
Fixes: 55fc0c561742 ("powerpc/pseries: Parse and handle EPOW interrupts")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
---
v2: fix compile issue by 'event_modifier'-->'modifier'
---
arch/powerpc/platforms/pseries/ras.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
index c97d153..744604d 100644
--- a/arch/powerpc/platforms/pseries/ras.c
+++ b/arch/powerpc/platforms/pseries/ras.c
@@ -285,7 +285,7 @@ static void rtas_parse_epow_errlog(struct rtas_error_log *log)
break;
case EPOW_SYSTEM_SHUTDOWN:
- handle_system_shutdown(epow_log->event_modifier);
+ handle_system_shutdown(modifier);
break;
case EPOW_SYSTEM_HALT:
--
2.7.0
^ permalink raw reply related
* Re: [PATCH 5/6] powerpc/mmu: drop mmap_sem now that locked_vm is atomic
From: Davidlohr Bueso @ 2019-04-24 2:31 UTC (permalink / raw)
To: Daniel Jordan, Christophe Leroy, akpm, Alexey Kardashevskiy,
linux-kernel, linux-mm, Paul Mackerras, Christoph Lameter,
linuxppc-dev, jgg
In-Reply-To: <20190424021544.ygqa4hvwbyb6nuxp@linux-r8p5>
On Tue, 23 Apr 2019, Bueso wrote:
>On Wed, 03 Apr 2019, Daniel Jordan wrote:
>
>>On Wed, Apr 03, 2019 at 06:58:45AM +0200, Christophe Leroy wrote:
>>>Le 02/04/2019 à 22:41, Daniel Jordan a écrit :
>>>> With locked_vm now an atomic, there is no need to take mmap_sem as
>>>> writer. Delete and refactor accordingly.
>>>
>>>Could you please detail the change ?
>>
>>Ok, I'll be more specific in the next version, using some of your language in
>>fact. :)
>>
>>>It looks like this is not the only
>>>change. I'm wondering what the consequences are.
>>>
>>>Before we did:
>>>- lock
>>>- calculate future value
>>>- check the future value is acceptable
>>>- update value if future value acceptable
>>>- return error if future value non acceptable
>>>- unlock
>>>
>>>Now we do:
>>>- atomic update with future (possibly too high) value
>>>- check the new value is acceptable
>>>- atomic update back with older value if new value not acceptable and return
>>>error
>>>
>>>So if a concurrent action wants to increase locked_vm with an acceptable
>>>step while another one has temporarily set it too high, it will now fail.
>>>
>>>I think we should keep the previous approach and do a cmpxchg after
>>>validating the new value.
>
>Wouldn't the cmpxchg alternative also be exposed the locked_vm changing between
>validating the new value and the cmpxchg() and we'd bogusly fail even when there
>is still just because the value changed (I'm assuming we don't hold any locks,
>otherwise all this is pointless).
>
> current_locked = atomic_read(&mm->locked_vm);
> new_locked = current_locked + npages;
> if (new_locked < lock_limit)
> if (cmpxchg(&mm->locked_vm, current_locked, new_locked) == current_locked)
Err, this being != of course.
^ permalink raw reply
* Re: [RFC PATCH 1/1] KVM: PPC: Report single stepping capability
From: Paul Mackerras @ 2019-04-24 4:14 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: kvm, rkrcmar, aik, kvm-ppc, pbonzini, linuxppc-dev, david
In-Reply-To: <20190320183951.29537-2-farosas@linux.ibm.com>
On Wed, Mar 20, 2019 at 03:39:50PM -0300, Fabiano Rosas wrote:
> When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request
> the next instruction to be single stepped via the
> KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure.
>
> We currently don't have support for guest single stepping implemented
> in Book3S HV.
>
> This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order
> to inform userspace about the state of single stepping support.
>
> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
> ---
> arch/powerpc/kvm/powerpc.c | 5 +++++
> include/uapi/linux/kvm.h | 1 +
> 2 files changed, 6 insertions(+)
I assume since this is [RFC] that you're not expecting it to go
upstream as-is. If it were to go upstream it would need to include an
update to Documentation/virtual/kvm/api.txt.
Paul.
^ permalink raw reply
* Re: [PATCH v6 06/10] powerpc/64: Setup KUP on secondary CPUs
From: Christophe Leroy @ 2019-04-24 6:03 UTC (permalink / raw)
To: Michael Ellerman, linuxppc-dev
In-Reply-To: <20190418065125.2687-6-mpe@ellerman.id.au>
Le 18/04/2019 à 08:51, Michael Ellerman a écrit :
> From: Russell Currey <ruscur@russell.cc>
>
> Some platforms (i.e. Radix MMU) need per-CPU initialisation for KUP.
>
> Any platforms that only want to do KUP initialisation once
> globally can just check to see if they're running on the boot CPU, or
> check if whatever setup they need has already been performed.
>
> Note that this is only for 64-bit.
>
> Signed-off-by: Russell Currey <ruscur@russell.cc>
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>
> v6: setup_kup() can't be __init anymore.
As reported by the kbuild robot, then setup_kuap() and setup_kuep()
can't be __init anymore in that case on the 8xx allthough the 8xx is not
SMP. That is rather pitty since they are only used at init.
Why can't we keep setup_kup() __init ? Are secondary CPUs started after
init section has been freed ?
Christophe
> ---
> arch/powerpc/kernel/setup_64.c | 3 +++
> arch/powerpc/mm/init-common.c | 2 +-
> 2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 6179c4200339..684e34493bf5 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -390,6 +390,9 @@ void early_setup_secondary(void)
> /* Initialize the hash table or TLB handling */
> early_init_mmu_secondary();
>
> + /* Perform any KUP setup that is per-cpu */
> + setup_kup();
> +
> /*
> * At this point, we can let interrupts switch to virtual mode
> * (the MMU has been setup), so adjust the MSR in the PACA to
> diff --git a/arch/powerpc/mm/init-common.c b/arch/powerpc/mm/init-common.c
> index ecaedfff9992..6ea5607fc564 100644
> --- a/arch/powerpc/mm/init-common.c
> +++ b/arch/powerpc/mm/init-common.c
> @@ -45,7 +45,7 @@ static int __init parse_nosmap(char *p)
> }
> early_param("nosmap", parse_nosmap);
>
> -void __init setup_kup(void)
> +void setup_kup(void)
> {
> setup_kuep(disable_kuep);
> setup_kuap(disable_kuap);
>
^ permalink raw reply
* Re: [powerpc:next-test 40/58] WARNING: vmlinux.o(.text+0xb038): Section mismatch in reference from the function setup_kup() to the function .init.text:setup_kuep()
From: Russell Currey @ 2019-04-24 6:14 UTC (permalink / raw)
To: Christophe Leroy, Michael Ellerman
Cc: linuxppc-dev, kbuild test robot, kbuild-all
In-Reply-To: <1d55b63d-3f7d-b8f5-03c8-ac59a3a98acf@c-s.fr>
On Tue, 2019-04-23 at 14:13 +0200, Christophe Leroy wrote:
> Russel, Michael,
>
> Looks like the reported problem comes from b28c97505eb1
> ("powerpc/64:
> Setup KUP on secondary CPUs")
> (
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next&id=b28c97505eb1a5265e367c398c3406be6ce5e313
> )
>
> Is it really necessary to remove the __init tag on setup_kup() ?
I don't think so, this was probably a mistake on my end, or something I
was testing and left in. I'll put it back in and test to make sure it
doesn't break anything.
- Russell
>
> Christophe
>
> Le 21/04/2019 à 18:23, kbuild test robot a écrit :
> > tree:
> > https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
> > next-test
> > head: 26ad26718dfaa7cf49d106d212ebf2370076c253
> > commit: 06fbe81b5909847aa13f9c86c2b6f9bbc5c2795b [40/58]
> > powerpc/8xx: Add Kernel Userspace Execution Prevention
> > config: powerpc-mpc885_ads_defconfig (attached as .config)
> > compiler: powerpc-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
> > reproduce:
> > wget
> > https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross
> > -O ~/bin/make.cross
> > chmod +x ~/bin/make.cross
> > git checkout 06fbe81b5909847aa13f9c86c2b6f9bbc5c2795b
> > # save the attached .config to linux build tree
> > GCC_VERSION=7.2.0 make.cross ARCH=powerpc
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kbuild test robot <lkp@intel.com>
> >
> >
> > All warnings (new ones prefixed by >>):
> >
> > > > WARNING: vmlinux.o(.text+0xb038): Section mismatch in reference
> > > > from the function setup_kup() to the function
> > > > .init.text:setup_kuep()
> > The function setup_kup() references
> > the function __init setup_kuep().
> > This is often because setup_kup lacks a __init
> > annotation or the annotation of setup_kuep is wrong.
> >
> > ---
> > 0-DAY kernel test infrastructure Open Source
> > Technology Center
> > https://lists.01.org/pipermail/kbuild-all Intel
> > Corporation
> >
^ permalink raw reply
* [PATCHv2] kernel/crash: make parse_crashkernel()'s return value more indicant
From: Pingfan Liu @ 2019-04-24 6:33 UTC (permalink / raw)
To: linux-kernel
Cc: Rich Felker, linux-ia64, Julien Thierry, Yangtao Li,
Palmer Dabbelt, Heiko Carstens, x86, Pingfan Liu, linux-mips,
Paul Mackerras, H. Peter Anvin, linux-s390, Florian Fainelli,
Yoshinori Sato, linux-sh, David Hildenbrand, Russell King,
Ingo Molnar, linux-arm-kernel, Catalin Marinas, James Hogan,
Dave Young, Fenghua Yu, Will Deacon, linuxppc-dev,
Borislav Petkov, Stefan Agner, Thomas Gleixner, Hari Bathini,
Jens Axboe, Tony Luck, Baoquan He, Ard Biesheuvel, Robin Murphy,
Greg Kroah-Hartman, Ralf Baechle, Thomas Bogendoerfer,
Paul Burton, Johannes Weiner, Martin Schwidefsky, Andrew Morton,
Logan Gunthorpe, Greg Hackmann
At present, both return and crash_size should be checked to guarantee the
success of parse_crashkernel().
Take a close look at the cases, which causes crash_size=0. Beside syntax
error, three cases cause parsing to get crash_size=0.
-1st. in parse_crashkernel_mem(), the demanded crash size is bigger than
system ram.
-2nd. in parse_crashkernel_mem(), the system ram size does not match any
item in the range list.
-3rd. "crashkernel=0MB", which is impractical.
All these cases can be treated as invalid argument.
By this way, only need a simple check on return value of
parse_crashkernel().
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Julien Thierry <julien.thierry@arm.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Greg Hackmann <ghackmann@android.com>
Cc: Stefan Agner <stefan@agner.ch>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Yangtao Li <tiny.windzz@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: x86@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-mips@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
---
v1 -> v2: On error, return -EINVAL for all failure cases
arch/arm/kernel/setup.c | 2 +-
arch/arm64/mm/init.c | 2 +-
arch/ia64/kernel/setup.c | 2 +-
arch/mips/kernel/setup.c | 2 +-
arch/powerpc/kernel/fadump.c | 2 +-
arch/powerpc/kernel/machine_kexec.c | 2 +-
arch/s390/kernel/setup.c | 2 +-
arch/sh/kernel/machine_kexec.c | 2 +-
arch/x86/kernel/setup.c | 4 ++--
kernel/crash_core.c | 10 +++++++++-
10 files changed, 19 insertions(+), 11 deletions(-)
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 5d78b6a..2feab13 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -997,7 +997,7 @@ static void __init reserve_crashkernel(void)
total_mem = get_total_mem();
ret = parse_crashkernel(boot_command_line, total_mem,
&crash_size, &crash_base);
- if (ret)
+ if (ret < 0)
return;
if (crash_base <= 0) {
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 6bc1350..240918c 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -79,7 +79,7 @@ static void __init reserve_crashkernel(void)
ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
&crash_size, &crash_base);
/* no crashkernel= or invalid value specified */
- if (ret || !crash_size)
+ if (ret < 0)
return;
crash_size = PAGE_ALIGN(crash_size);
diff --git a/arch/ia64/kernel/setup.c b/arch/ia64/kernel/setup.c
index 583a374..3bbb58b 100644
--- a/arch/ia64/kernel/setup.c
+++ b/arch/ia64/kernel/setup.c
@@ -277,7 +277,7 @@ static void __init setup_crashkernel(unsigned long total, int *n)
ret = parse_crashkernel(boot_command_line, total,
&size, &base);
- if (ret == 0 && size > 0) {
+ if (!ret) {
if (!base) {
sort_regions(rsvd_region, *n);
*n = merge_regions(rsvd_region, *n);
diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 8d1dc6c..168571b 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -715,7 +715,7 @@ static void __init mips_parse_crashkernel(void)
total_mem = get_total_mem();
ret = parse_crashkernel(boot_command_line, total_mem,
&crash_size, &crash_base);
- if (ret != 0 || crash_size <= 0)
+ if (ret < 0)
return;
if (!memory_region_available(crash_base, crash_size)) {
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 45a8d0b..3571504 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -376,7 +376,7 @@ static inline unsigned long fadump_calculate_reserve_size(void)
*/
ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
&size, &base);
- if (ret == 0 && size > 0) {
+ if (!ret) {
unsigned long max_size;
if (fw_dump.reserve_bootvar)
diff --git a/arch/powerpc/kernel/machine_kexec.c b/arch/powerpc/kernel/machine_kexec.c
index 63f5a93..1697ad2 100644
--- a/arch/powerpc/kernel/machine_kexec.c
+++ b/arch/powerpc/kernel/machine_kexec.c
@@ -122,7 +122,7 @@ void __init reserve_crashkernel(void)
/* use common parsing */
ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
&crash_size, &crash_base);
- if (ret == 0 && crash_size > 0) {
+ if (!ret) {
crashk_res.start = crash_base;
crashk_res.end = crash_base + crash_size - 1;
}
diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 2c642af..d4bd61b 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -671,7 +671,7 @@ static void __init reserve_crashkernel(void)
crash_base = ALIGN(crash_base, KEXEC_CRASH_MEM_ALIGN);
crash_size = ALIGN(crash_size, KEXEC_CRASH_MEM_ALIGN);
- if (rc || crash_size == 0)
+ if (rc < 0)
return;
if (memblock.memory.regions[0].size < crash_size) {
diff --git a/arch/sh/kernel/machine_kexec.c b/arch/sh/kernel/machine_kexec.c
index 63d63a3..3c03240 100644
--- a/arch/sh/kernel/machine_kexec.c
+++ b/arch/sh/kernel/machine_kexec.c
@@ -157,7 +157,7 @@ void __init reserve_crashkernel(void)
ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
&crash_size, &crash_base);
- if (ret == 0 && crash_size > 0) {
+ if (!ret) {
crashk_res.start = crash_base;
crashk_res.end = crash_base + crash_size - 1;
}
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3d872a5..592d5ad 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -526,11 +526,11 @@ static void __init reserve_crashkernel(void)
/* crashkernel=XM */
ret = parse_crashkernel(boot_command_line, total_mem, &crash_size, &crash_base);
- if (ret != 0 || crash_size <= 0) {
+ if (ret < 0) {
/* crashkernel=X,high */
ret = parse_crashkernel_high(boot_command_line, total_mem,
&crash_size, &crash_base);
- if (ret != 0 || crash_size <= 0)
+ if (ret < 0)
return;
high = true;
}
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 093c9f9..83ee4a9 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -108,8 +108,10 @@ static int __init parse_crashkernel_mem(char *cmdline,
return -EINVAL;
}
}
- } else
+ } else {
pr_info("crashkernel size resulted in zero bytes\n");
+ return -EINVAL;
+ }
return 0;
}
@@ -139,6 +141,8 @@ static int __init parse_crashkernel_simple(char *cmdline,
pr_warn("crashkernel: unrecognized char: %c\n", *cur);
return -EINVAL;
}
+ if (*crash_size == 0)
+ return -EINVAL;
return 0;
}
@@ -181,6 +185,8 @@ static int __init parse_crashkernel_suffix(char *cmdline,
pr_warn("crashkernel: unrecognized char: %c\n", *cur);
return -EINVAL;
}
+ if (*crash_size == 0)
+ return -EINVAL;
return 0;
}
@@ -266,6 +272,8 @@ static int __init __parse_crashkernel(char *cmdline,
/*
* That function is the entry point for command line parsing and should be
* called from the arch-specific code.
+ * On success 0. On error for either syntax error or crash_size=0, -EINVAL is
+ * returned.
*/
int __init parse_crashkernel(char *cmdline,
unsigned long long system_ram,
--
2.7.4
^ permalink raw reply related
* [PATCH 1/2] powerpc/mm/ptdump: Wrap seq_printf() to handle NULL pointers
From: Russell Currey @ 2019-04-24 6:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Julia.Lawall, rashmica.g, Russell Currey
Lovingly borrowed from the arch/arm64 ptdump code.
This doesn't seem to be an issue in practice, but is necessary for my
upcoming commit.
Converts a putc() into a puts().
Signed-off-by: Russell Currey <ruscur@russell.cc>
---
arch/powerpc/mm/ptdump/ptdump.c | 32 ++++++++++++++++++++++----------
1 file changed, 22 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index 37138428ab55..c50cb7faa334 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -104,6 +104,18 @@ static struct addr_marker address_markers[] = {
{ -1, NULL },
};
+#define pt_dump_seq_printf(m, fmt, args...) \
+({ \
+ if (m) \
+ seq_printf(m, fmt, ##args); \
+})
+
+#define pt_dump_seq_puts(m, fmt) \
+({ \
+ if (m) \
+ seq_printf(m, fmt); \
+})
+
static void dump_flag_info(struct pg_state *st, const struct flag_info
*flag, u64 pte, int num)
{
@@ -121,19 +133,19 @@ static void dump_flag_info(struct pg_state *st, const struct flag_info
val = pte & flag->val;
if (flag->shift)
val = val >> flag->shift;
- seq_printf(st->seq, " %s:%llx", flag->set, val);
+ pt_dump_seq_printf(st->seq, " %s:%llx", flag->set, val);
} else {
if ((pte & flag->mask) == flag->val)
s = flag->set;
else
s = flag->clear;
if (s)
- seq_printf(st->seq, " %s", s);
+ pt_dump_seq_printf(st->seq, " %s", s);
}
st->current_flags &= ~flag->mask;
}
if (st->current_flags != 0)
- seq_printf(st->seq, " unknown flags:%llx", st->current_flags);
+ pt_dump_seq_printf(st->seq, " unknown flags:%llx", st->current_flags);
}
static void dump_addr(struct pg_state *st, unsigned long addr)
@@ -148,12 +160,12 @@ static void dump_addr(struct pg_state *st, unsigned long addr)
#define REG "0x%08lx"
#endif
- seq_printf(st->seq, REG "-" REG " ", st->start_address, addr - 1);
+ pt_dump_seq_printf(st->seq, REG "-" REG " ", st->start_address, addr - 1);
if (st->start_pa == st->last_pa && st->start_address + PAGE_SIZE != addr) {
- seq_printf(st->seq, "[" REG "]", st->start_pa);
+ pt_dump_seq_printf(st->seq, "[" REG "]", st->start_pa);
delta = PAGE_SIZE >> 10;
} else {
- seq_printf(st->seq, " " REG " ", st->start_pa);
+ pt_dump_seq_printf(st->seq, " " REG " ", st->start_pa);
delta = (addr - st->start_address) >> 10;
}
/* Work out what appropriate unit to use */
@@ -161,7 +173,7 @@ static void dump_addr(struct pg_state *st, unsigned long addr)
delta >>= 10;
unit++;
}
- seq_printf(st->seq, "%9lu%c", delta, *unit);
+ pt_dump_seq_printf(st->seq, "%9lu%c", delta, *unit);
}
@@ -178,7 +190,7 @@ static void note_page(struct pg_state *st, unsigned long addr,
st->start_address = addr;
st->start_pa = pa;
st->last_pa = pa;
- seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
+ pt_dump_seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
/*
* Dump the section of virtual memory when:
* - the PTE flags from one entry to the next differs.
@@ -202,7 +214,7 @@ static void note_page(struct pg_state *st, unsigned long addr,
st->current_flags,
pg_level[st->level].num);
- seq_putc(st->seq, '\n');
+ pt_dump_seq_puts(st->seq, "\n");
}
/*
@@ -211,7 +223,7 @@ static void note_page(struct pg_state *st, unsigned long addr,
*/
while (addr >= st->marker[1].start_address) {
st->marker++;
- seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
+ pt_dump_seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
}
st->start_address = addr;
st->start_pa = pa;
--
2.21.0
^ permalink raw reply related
* [PATCH 2/2] powerpc/mm: Warn if W+X pages found on boot
From: Russell Currey @ 2019-04-24 6:39 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Julia.Lawall, rashmica.g, Russell Currey
In-Reply-To: <20190424063958.24559-1-ruscur@russell.cc>
Implement code to walk all pages and warn if any are found to be both
writable and executable. Depends on STRICT_KERNEL_RWX enabled, and is
behind the DEBUG_WX config option.
This only runs on boot and has no runtime performance implications.
Very heavily influenced (and in some cases copied verbatim) from the
ARM64 code written by Laura Abbott (thanks!), since our ptdump
infrastructure is similar.
Signed-off-by: Russell Currey <ruscur@russell.cc>
---
arch/powerpc/Kconfig.debug | 19 +++++++++++++++
arch/powerpc/include/asm/pgtable.h | 5 ++++
arch/powerpc/mm/pgtable_32.c | 5 ++++
arch/powerpc/mm/pgtable_64.c | 5 ++++
arch/powerpc/mm/ptdump/ptdump.c | 38 ++++++++++++++++++++++++++++++
5 files changed, 72 insertions(+)
diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
index 4e00cb0a5464..a4160ff02ed4 100644
--- a/arch/powerpc/Kconfig.debug
+++ b/arch/powerpc/Kconfig.debug
@@ -361,6 +361,25 @@ config PPC_PTDUMP
If you are unsure, say N.
+config DEBUG_WX
+ bool "Warn on W+X mappings at boot"
+ select PPC_PTDUMP
+ ---help---
+ Generate a warning if any W+X mappings are found at boot.
+
+ This is useful for discovering cases where the kernel is leaving
+ W+X mappings after applying NX, as such mappings are a security risk.
+
+ Note that even if the check fails, your kernel is possibly
+ still fine, as W+X mappings are not a security hole in
+ themselves, what they do is that they make the exploitation
+ of other unfixed kernel bugs easier.
+
+ There is no runtime or memory usage effect of this option
+ once the kernel has booted up - it's a one time check.
+
+ If in doubt, say "Y".
+
config PPC_FAST_ENDIAN_SWITCH
bool "Deprecated fast endian-switch syscall"
depends on DEBUG_KERNEL && PPC_BOOK3S_64
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 505550fb2935..be785f221e56 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -104,6 +104,11 @@ void pgtable_cache_init(void);
#if defined(CONFIG_STRICT_KERNEL_RWX) || defined(CONFIG_PPC32)
void mark_initmem_nx(void);
+
+#ifdef CONFIG_DEBUG_WX
+extern void ptdump_check_wx(void);
+#endif /* CONFIG_DEBUG_WX */
+
#else
static inline void mark_initmem_nx(void) { }
#endif
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 6e56a6240bfa..054a6174ff7f 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -384,6 +384,11 @@ void mark_rodata_ro(void)
PFN_DOWN((unsigned long)__start_rodata);
change_page_attr(page, numpages, PAGE_KERNEL_RO);
+
+#ifdef CONFIG_DEBUG_WX
+ // mark_initmem_nx() should have already run by now
+ ptdump_check_wx();
+#endif
}
#endif
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index fb1375c07e8c..48036b25a958 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -328,6 +328,11 @@ void mark_rodata_ro(void)
radix__mark_rodata_ro();
else
hash__mark_rodata_ro();
+
+#ifdef CONFIG_DEBUG_WX
+ // mark_initmem_nx() should have already run by now
+ ptdump_check_wx();
+#endif
}
void mark_initmem_nx(void)
diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
index c50cb7faa334..b4b09df839bb 100644
--- a/arch/powerpc/mm/ptdump/ptdump.c
+++ b/arch/powerpc/mm/ptdump/ptdump.c
@@ -68,6 +68,8 @@ struct pg_state {
unsigned long last_pa;
unsigned int level;
u64 current_flags;
+ bool check_wx;
+ unsigned long wx_pages;
};
struct addr_marker {
@@ -177,6 +179,19 @@ static void dump_addr(struct pg_state *st, unsigned long addr)
}
+static void note_prot_wx(struct pg_state *st, unsigned long addr)
+{
+ if (!st->check_wx)
+ return;
+ if (!((st->current_flags & _PAGE_EXEC) && (st->current_flags & _PAGE_WRITE)))
+ return;
+
+ WARN_ONCE(1, "powerpc/mm: Found insecure W+X mapping at address %p/%pS\n",
+ (void *)st->start_address, (void *)st->start_address);
+
+ st->wx_pages += (addr - st->start_address) / PAGE_SIZE;
+}
+
static void note_page(struct pg_state *st, unsigned long addr,
unsigned int level, u64 val)
{
@@ -206,6 +221,7 @@ static void note_page(struct pg_state *st, unsigned long addr,
/* Check the PTE flags */
if (st->current_flags) {
+ note_prot_wx(st, addr);
dump_addr(st, addr);
/* Dump all the flags */
@@ -378,6 +394,28 @@ static void build_pgtable_complete_mask(void)
pg_level[i].mask |= pg_level[i].flag[j].mask;
}
+void ptdump_check_wx(void)
+{
+ struct pg_state st = {
+ .seq = NULL,
+ .marker = address_markers,
+ .check_wx = true,
+ };
+
+ if (radix_enabled())
+ st.start_address = PAGE_OFFSET;
+ else
+ st.start_address = KERN_VIRT_START;
+
+ walk_pagetables(&st);
+
+ if (st.wx_pages)
+ pr_warn("Checked W+X mappings: FAILED, %lu W+X pages found\n",
+ st.wx_pages);
+ else
+ pr_info("Checked W+X mappings: passed, no W+X pages found\n");
+}
+
static int ptdump_init(void)
{
struct dentry *debugfs_file;
--
2.21.0
^ permalink raw reply related
* Re: [PATCH 10/10] powerpc: select DYNAMIC_DEBUG_RELATIVE_POINTERS for PPC64
From: Rasmus Villemoes @ 2019-04-24 6:46 UTC (permalink / raw)
To: Andrew Morton, Christophe Leroy; +Cc: Jason Baron, linuxppc-dev, linux-kernel
In-Reply-To: <20190423123611.1f7276cc4eb9f7a7005899a9@linux-foundation.org>
On 23/04/2019 21.36, Andrew Morton wrote:
> On Tue, 23 Apr 2019 17:37:33 +0200 Christophe Leroy <christophe.leroy@c-s.fr> wrote:
>
>>> --- a/arch/powerpc/Kconfig
>>> +++ b/arch/powerpc/Kconfig
>>> @@ -155,6 +155,7 @@ config PPC
>>> select BUILDTIME_EXTABLE_SORT
>>> select CLONE_BACKWARDS
>>> select DCACHE_WORD_ACCESS if PPC64 && CPU_LITTLE_ENDIAN
>>> + select DYNAMIC_DEBUG_RELATIVE_POINTERS if PPC64
>>
>> Why only PPC64 ? Won't it work with PPC32 ? Or would it be
>> counter-performant on 32 bits ?
>
> Ditto arm and i386?
>
It's pointless on 32-bit platforms - I'm replacing absolute const char*
pointers with a relative s32 offset from the _ddebug descriptor, so if
sizeof(void*)==4 there's no gain.
And yes, the current implementation also wouldn't work out-of-the-box
for 32-bit platforms, since the asm needs to know how to properly
initialize a whole struct _ddebug, which (often) contains a static_key,
which in turn contains a pointer member, which both affects its size as
well as its placement inside _ddebug. The C code in dynamic_debug.c
would likely Just Work, but there's no point in complicating the asm
part for no gain, so there are static_assert()s in place to ensure
BITS_PER_LONG==64 (as well as checking the various offsetof()s etc.).
[I don't think performance matters at all, it's one extra addition to
access these fields, and that is only done in the rare cases where
someone interacts with the dynamic_debug/control sysfs file, and when
one of the activated pr_debug()s is actually hit (so a few extra
instructions drown in the printk overhead).]
I do now see that PPC64 does not select GENERIC_BUG_RELATIVE_POINTERS,
so maybe this scheme simply doesn't work on PPC64, or nobody has done
the work to reduce the sizeof(struct bug_entry) on PPC64? As I said,
I've only compile-tested arm64 and ppc64.
Rasmus
^ permalink raw reply
* Re: [PATCH 1/2] powerpc/mm/ptdump: Wrap seq_printf() to handle NULL pointers
From: Christophe Leroy @ 2019-04-24 6:56 UTC (permalink / raw)
To: Russell Currey, linuxppc-dev; +Cc: Julia.Lawall, rashmica.g
In-Reply-To: <20190424063958.24559-1-ruscur@russell.cc>
Le 24/04/2019 à 08:39, Russell Currey a écrit :
> Lovingly borrowed from the arch/arm64 ptdump code.
>
> This doesn't seem to be an issue in practice, but is necessary for my
> upcoming commit.
>
> Converts a putc() into a puts().
>
> Signed-off-by: Russell Currey <ruscur@russell.cc>
> ---
> arch/powerpc/mm/ptdump/ptdump.c | 32 ++++++++++++++++++++++----------
> 1 file changed, 22 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
> index 37138428ab55..c50cb7faa334 100644
> --- a/arch/powerpc/mm/ptdump/ptdump.c
> +++ b/arch/powerpc/mm/ptdump/ptdump.c
> @@ -104,6 +104,18 @@ static struct addr_marker address_markers[] = {
> { -1, NULL },
> };
>
> +#define pt_dump_seq_printf(m, fmt, args...) \
> +({ \
> + if (m) \
> + seq_printf(m, fmt, ##args); \
> +})
> +
> +#define pt_dump_seq_puts(m, fmt) \
> +({ \
> + if (m) \
> + seq_printf(m, fmt); \
Why not use seq_puts() here ?
Christophe
> +})
> +
> static void dump_flag_info(struct pg_state *st, const struct flag_info
> *flag, u64 pte, int num)
> {
> @@ -121,19 +133,19 @@ static void dump_flag_info(struct pg_state *st, const struct flag_info
> val = pte & flag->val;
> if (flag->shift)
> val = val >> flag->shift;
> - seq_printf(st->seq, " %s:%llx", flag->set, val);
> + pt_dump_seq_printf(st->seq, " %s:%llx", flag->set, val);
> } else {
> if ((pte & flag->mask) == flag->val)
> s = flag->set;
> else
> s = flag->clear;
> if (s)
> - seq_printf(st->seq, " %s", s);
> + pt_dump_seq_printf(st->seq, " %s", s);
> }
> st->current_flags &= ~flag->mask;
> }
> if (st->current_flags != 0)
> - seq_printf(st->seq, " unknown flags:%llx", st->current_flags);
> + pt_dump_seq_printf(st->seq, " unknown flags:%llx", st->current_flags);
> }
>
> static void dump_addr(struct pg_state *st, unsigned long addr)
> @@ -148,12 +160,12 @@ static void dump_addr(struct pg_state *st, unsigned long addr)
> #define REG "0x%08lx"
> #endif
>
> - seq_printf(st->seq, REG "-" REG " ", st->start_address, addr - 1);
> + pt_dump_seq_printf(st->seq, REG "-" REG " ", st->start_address, addr - 1);
> if (st->start_pa == st->last_pa && st->start_address + PAGE_SIZE != addr) {
> - seq_printf(st->seq, "[" REG "]", st->start_pa);
> + pt_dump_seq_printf(st->seq, "[" REG "]", st->start_pa);
> delta = PAGE_SIZE >> 10;
> } else {
> - seq_printf(st->seq, " " REG " ", st->start_pa);
> + pt_dump_seq_printf(st->seq, " " REG " ", st->start_pa);
> delta = (addr - st->start_address) >> 10;
> }
> /* Work out what appropriate unit to use */
> @@ -161,7 +173,7 @@ static void dump_addr(struct pg_state *st, unsigned long addr)
> delta >>= 10;
> unit++;
> }
> - seq_printf(st->seq, "%9lu%c", delta, *unit);
> + pt_dump_seq_printf(st->seq, "%9lu%c", delta, *unit);
>
> }
>
> @@ -178,7 +190,7 @@ static void note_page(struct pg_state *st, unsigned long addr,
> st->start_address = addr;
> st->start_pa = pa;
> st->last_pa = pa;
> - seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
> + pt_dump_seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
> /*
> * Dump the section of virtual memory when:
> * - the PTE flags from one entry to the next differs.
> @@ -202,7 +214,7 @@ static void note_page(struct pg_state *st, unsigned long addr,
> st->current_flags,
> pg_level[st->level].num);
>
> - seq_putc(st->seq, '\n');
> + pt_dump_seq_puts(st->seq, "\n");
> }
>
> /*
> @@ -211,7 +223,7 @@ static void note_page(struct pg_state *st, unsigned long addr,
> */
> while (addr >= st->marker[1].start_address) {
> st->marker++;
> - seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
> + pt_dump_seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
> }
> st->start_address = addr;
> st->start_pa = pa;
>
^ permalink raw reply
* Re: [PATCH 1/2] powerpc/mm/ptdump: Wrap seq_printf() to handle NULL pointers
From: Russell Currey @ 2019-04-24 7:00 UTC (permalink / raw)
To: Christophe Leroy, linuxppc-dev; +Cc: Julia.Lawall, rashmica.g
In-Reply-To: <ab20f079-cf9f-f1c6-f3fa-54a842311fd3@c-s.fr>
whoops, will fix
--
Russell Currey
ruscur@russell.cc
On Wed, Apr 24, 2019, at 4:56 PM, Christophe Leroy wrote:
>
>
> Le 24/04/2019 à 08:39, Russell Currey a écrit :
> > Lovingly borrowed from the arch/arm64 ptdump code.
> >
> > This doesn't seem to be an issue in practice, but is necessary for my
> > upcoming commit.
> >
> > Converts a putc() into a puts().
> >
> > Signed-off-by: Russell Currey <ruscur@russell.cc>
> > ---
> > arch/powerpc/mm/ptdump/ptdump.c | 32 ++++++++++++++++++++++----------
> > 1 file changed, 22 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
> > index 37138428ab55..c50cb7faa334 100644
> > --- a/arch/powerpc/mm/ptdump/ptdump.c
> > +++ b/arch/powerpc/mm/ptdump/ptdump.c
> > @@ -104,6 +104,18 @@ static struct addr_marker address_markers[] = {
> > { -1, NULL },
> > };
> >
> > +#define pt_dump_seq_printf(m, fmt, args...) \
> > +({ \
> > + if (m) \
> > + seq_printf(m, fmt, ##args); \
> > +})
> > +
> > +#define pt_dump_seq_puts(m, fmt) \
> > +({ \
> > + if (m) \
> > + seq_printf(m, fmt); \
>
> Why not use seq_puts() here ?
>
> Christophe
>
> > +})
> > +
> > static void dump_flag_info(struct pg_state *st, const struct flag_info
> > *flag, u64 pte, int num)
> > {
> > @@ -121,19 +133,19 @@ static void dump_flag_info(struct pg_state *st, const struct flag_info
> > val = pte & flag->val;
> > if (flag->shift)
> > val = val >> flag->shift;
> > - seq_printf(st->seq, " %s:%llx", flag->set, val);
> > + pt_dump_seq_printf(st->seq, " %s:%llx", flag->set, val);
> > } else {
> > if ((pte & flag->mask) == flag->val)
> > s = flag->set;
> > else
> > s = flag->clear;
> > if (s)
> > - seq_printf(st->seq, " %s", s);
> > + pt_dump_seq_printf(st->seq, " %s", s);
> > }
> > st->current_flags &= ~flag->mask;
> > }
> > if (st->current_flags != 0)
> > - seq_printf(st->seq, " unknown flags:%llx", st->current_flags);
> > + pt_dump_seq_printf(st->seq, " unknown flags:%llx", st->current_flags);
> > }
> >
> > static void dump_addr(struct pg_state *st, unsigned long addr)
> > @@ -148,12 +160,12 @@ static void dump_addr(struct pg_state *st, unsigned long addr)
> > #define REG "0x%08lx"
> > #endif
> >
> > - seq_printf(st->seq, REG "-" REG " ", st->start_address, addr - 1);
> > + pt_dump_seq_printf(st->seq, REG "-" REG " ", st->start_address, addr - 1);
> > if (st->start_pa == st->last_pa && st->start_address + PAGE_SIZE != addr) {
> > - seq_printf(st->seq, "[" REG "]", st->start_pa);
> > + pt_dump_seq_printf(st->seq, "[" REG "]", st->start_pa);
> > delta = PAGE_SIZE >> 10;
> > } else {
> > - seq_printf(st->seq, " " REG " ", st->start_pa);
> > + pt_dump_seq_printf(st->seq, " " REG " ", st->start_pa);
> > delta = (addr - st->start_address) >> 10;
> > }
> > /* Work out what appropriate unit to use */
> > @@ -161,7 +173,7 @@ static void dump_addr(struct pg_state *st, unsigned long addr)
> > delta >>= 10;
> > unit++;
> > }
> > - seq_printf(st->seq, "%9lu%c", delta, *unit);
> > + pt_dump_seq_printf(st->seq, "%9lu%c", delta, *unit);
> >
> > }
> >
> > @@ -178,7 +190,7 @@ static void note_page(struct pg_state *st, unsigned long addr,
> > st->start_address = addr;
> > st->start_pa = pa;
> > st->last_pa = pa;
> > - seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
> > + pt_dump_seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
> > /*
> > * Dump the section of virtual memory when:
> > * - the PTE flags from one entry to the next differs.
> > @@ -202,7 +214,7 @@ static void note_page(struct pg_state *st, unsigned long addr,
> > st->current_flags,
> > pg_level[st->level].num);
> >
> > - seq_putc(st->seq, '\n');
> > + pt_dump_seq_puts(st->seq, "\n");
> > }
> >
> > /*
> > @@ -211,7 +223,7 @@ static void note_page(struct pg_state *st, unsigned long addr,
> > */
> > while (addr >= st->marker[1].start_address) {
> > st->marker++;
> > - seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
> > + pt_dump_seq_printf(st->seq, "---[ %s ]---\n", st->marker->name);
> > }
> > st->start_address = addr;
> > st->start_pa = pa;
> >
>
^ permalink raw reply
* Re: [PATCH 2/2] powerpc/mm: Warn if W+X pages found on boot
From: Christophe Leroy @ 2019-04-24 7:14 UTC (permalink / raw)
To: Russell Currey, linuxppc-dev; +Cc: Julia.Lawall, rashmica.g
In-Reply-To: <20190424063958.24559-2-ruscur@russell.cc>
Le 24/04/2019 à 08:39, Russell Currey a écrit :
> Implement code to walk all pages and warn if any are found to be both
> writable and executable. Depends on STRICT_KERNEL_RWX enabled, and is
> behind the DEBUG_WX config option.
>
> This only runs on boot and has no runtime performance implications.
>
> Very heavily influenced (and in some cases copied verbatim) from the
> ARM64 code written by Laura Abbott (thanks!), since our ptdump
> infrastructure is similar.
>
> Signed-off-by: Russell Currey <ruscur@russell.cc>
> ---
> arch/powerpc/Kconfig.debug | 19 +++++++++++++++
> arch/powerpc/include/asm/pgtable.h | 5 ++++
> arch/powerpc/mm/pgtable_32.c | 5 ++++
> arch/powerpc/mm/pgtable_64.c | 5 ++++
> arch/powerpc/mm/ptdump/ptdump.c | 38 ++++++++++++++++++++++++++++++
> 5 files changed, 72 insertions(+)
>
> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
> index 4e00cb0a5464..a4160ff02ed4 100644
> --- a/arch/powerpc/Kconfig.debug
> +++ b/arch/powerpc/Kconfig.debug
> @@ -361,6 +361,25 @@ config PPC_PTDUMP
>
> If you are unsure, say N.
>
> +config DEBUG_WX
I would call it PPC_DEBUG_WX to avoid confusion.
> + bool "Warn on W+X mappings at boot"
> + select PPC_PTDUMP
> + ---help---
> + Generate a warning if any W+X mappings are found at boot.
> +
> + This is useful for discovering cases where the kernel is leaving
> + W+X mappings after applying NX, as such mappings are a security risk.
> +
> + Note that even if the check fails, your kernel is possibly
> + still fine, as W+X mappings are not a security hole in
> + themselves, what they do is that they make the exploitation
> + of other unfixed kernel bugs easier.
> +
> + There is no runtime or memory usage effect of this option
> + once the kernel has booted up - it's a one time check.
> +
> + If in doubt, say "Y".
> +
> config PPC_FAST_ENDIAN_SWITCH
> bool "Deprecated fast endian-switch syscall"
> depends on DEBUG_KERNEL && PPC_BOOK3S_64
> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> index 505550fb2935..be785f221e56 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -104,6 +104,11 @@ void pgtable_cache_init(void);
>
> #if defined(CONFIG_STRICT_KERNEL_RWX) || defined(CONFIG_PPC32)
> void mark_initmem_nx(void);
> +
> +#ifdef CONFIG_DEBUG_WX
I don't think this #ifdef is necessary at all when there is no matching
#else. You could leave the declaration of the function all the time.
> +extern void ptdump_check_wx(void);
'extern' keyword is superflous and checkpatch --strict will likely complain.
> +#endif /* CONFIG_DEBUG_WX */
> +
> #else
> static inline void mark_initmem_nx(void) { }
> #endif
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index 6e56a6240bfa..054a6174ff7f 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -384,6 +384,11 @@ void mark_rodata_ro(void)
> PFN_DOWN((unsigned long)__start_rodata);
>
> change_page_attr(page, numpages, PAGE_KERNEL_RO);
> +
> +#ifdef CONFIG_DEBUG_WX
> + // mark_initmem_nx() should have already run by now
> + ptdump_check_wx();
Please avoid #ifdefs in .c files as much as possible.
It would be better to define ptdump_check_wx() as static inline {} in
pgtable.h when CONFIG_DEBUG_WX is not selected.
> +#endif
> }
> #endif
>
> diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
> index fb1375c07e8c..48036b25a958 100644
> --- a/arch/powerpc/mm/pgtable_64.c
> +++ b/arch/powerpc/mm/pgtable_64.c
> @@ -328,6 +328,11 @@ void mark_rodata_ro(void)
> radix__mark_rodata_ro();
> else
> hash__mark_rodata_ro();
> +
> +#ifdef CONFIG_DEBUG_WX
> + // mark_initmem_nx() should have already run by now
> + ptdump_check_wx();
> +#endif
Idem
> }
>
> void mark_initmem_nx(void)
> diff --git a/arch/powerpc/mm/ptdump/ptdump.c b/arch/powerpc/mm/ptdump/ptdump.c
> index c50cb7faa334..b4b09df839bb 100644
> --- a/arch/powerpc/mm/ptdump/ptdump.c
> +++ b/arch/powerpc/mm/ptdump/ptdump.c
> @@ -68,6 +68,8 @@ struct pg_state {
> unsigned long last_pa;
> unsigned int level;
> u64 current_flags;
> + bool check_wx;
> + unsigned long wx_pages;
> };
>
> struct addr_marker {
> @@ -177,6 +179,19 @@ static void dump_addr(struct pg_state *st, unsigned long addr)
>
> }
>
> +static void note_prot_wx(struct pg_state *st, unsigned long addr)
> +{
> + if (!st->check_wx)
> + return;
> + if (!((st->current_flags & _PAGE_EXEC) && (st->current_flags & _PAGE_WRITE)))
The above won't work in all cases. _PAGE_WRITE is only defined in
book3s64. Other arches have _PAGE_RW or _PAGE_RO.
Please use the helpers defined in pgtable.h to check and not flags directly.
> + return;
> +
> + WARN_ONCE(1, "powerpc/mm: Found insecure W+X mapping at address %p/%pS\n",
> + (void *)st->start_address, (void *)st->start_address);
> +
> + st->wx_pages += (addr - st->start_address) / PAGE_SIZE;
> +}
> +
> static void note_page(struct pg_state *st, unsigned long addr,
> unsigned int level, u64 val)
> {
> @@ -206,6 +221,7 @@ static void note_page(struct pg_state *st, unsigned long addr,
>
> /* Check the PTE flags */
> if (st->current_flags) {
> + note_prot_wx(st, addr);
> dump_addr(st, addr);
>
> /* Dump all the flags */
> @@ -378,6 +394,28 @@ static void build_pgtable_complete_mask(void)
> pg_level[i].mask |= pg_level[i].flag[j].mask;
> }
>
> +void ptdump_check_wx(void)
> +{
> + struct pg_state st = {
> + .seq = NULL,
> + .marker = address_markers,
> + .check_wx = true,
> + };
> +
> + if (radix_enabled())
> + st.start_address = PAGE_OFFSET;
> + else
> + st.start_address = KERN_VIRT_START;
KERN_VIRT_START doesn't exist on PPC32.
Christophe
> +
> + walk_pagetables(&st);
> +
> + if (st.wx_pages)
> + pr_warn("Checked W+X mappings: FAILED, %lu W+X pages found\n",
> + st.wx_pages);
> + else
> + pr_info("Checked W+X mappings: passed, no W+X pages found\n");
> +}
> +
> static int ptdump_init(void)
> {
> struct dentry *debugfs_file;
>
^ permalink raw reply
* Re: [PATCH v12 00/31] Speculative page faults
From: Laurent Dufour @ 2019-04-24 7:33 UTC (permalink / raw)
To: Peter Zijlstra, Michel Lespinasse
Cc: Jan Kara, sergey.senozhatsky.work, Will Deacon, Michal Hocko,
linux-mm, Paul Mackerras, Punit Agrawal, H. Peter Anvin,
Mike Rapoport, Alexei Starovoitov, Andrea Arcangeli, Andi Kleen,
Minchan Kim, aneesh.kumar, x86, Matthew Wilcox, Daniel Jordan,
Ingo Molnar, David Rientjes, Paul E. McKenney, Haiyan Song,
Nick Piggin, sj38.park, Jerome Glisse, dave, kemi.wang,
Kirill A. Shutemov, Thomas Gleixner, zhong jiang,
Ganesh Mahendran, Yang Shi, linuxppc-dev, LKML,
Sergey Senozhatsky, vinayak menon, Andrew Morton, Tim Chen, haren
In-Reply-To: <20190423093851.GJ11158@hirez.programming.kicks-ass.net>
Le 23/04/2019 à 11:38, Peter Zijlstra a écrit :
> On Mon, Apr 22, 2019 at 02:29:16PM -0700, Michel Lespinasse wrote:
>> The proposed spf mechanism only handles anon vmas. Is there a
>> fundamental reason why it couldn't handle mapped files too ?
>> My understanding is that the mechanism of verifying the vma after
>> taking back the ptl at the end of the fault would work there too ?
>> The file has to stay referenced during the fault, but holding the vma's
>> refcount could be made to cover that ? the vm_file refcount would have
>> to be released in __free_vma() instead of remove_vma; I'm not quite sure
>> if that has more implications than I realize ?
>
> IIRC (and I really don't remember all that much) the trickiest bit was
> vs unmount. Since files can stay open past the 'expected' duration,
> umount could be delayed.
>
> But yes, I think I had a version that did all that just 'fine'. Like
> mentioned, I didn't keep the refcount because it sucked just as hard as
> the mmap_sem contention, but the SRCU callback did the fput() just fine
> (esp. now that we have delayed_fput).
I had to use a refcount for the VMA because I'm using RCU in place of
SRCU and only protecting the RB tree using RCU.
Regarding the file pointer, I decided to release it synchronously to
avoid the latency of RCU during the file closing. As you mentioned this
could delayed the umount but not only, as Linus Torvald demonstrated by
the past [1]. Anyway, since the file support is not yet here there is no
need for that currently.
Regarding the file mapping support, the concern is to ensure that
vm_ops->fault() will not try to release the mmap_sem. This is true for
most of the file system operation using the generic one, but there is
currently no clever way to identify that except by checking the
vm_ops->fault pointer. Adding a flag to the vm_operations_struct
structure is another option.
that's doable as far as the underlying fault() function is not dealing
with the mmap_sem, and I made a try by the past but was thinking that
first the anonymous case should be accepted before moving forward this way.
[1]
https://lore.kernel.org/linux-mm/alpine.LFD.2.00.1001041904250.3630@localhost.localdomain/
^ permalink raw reply
* Re: [PATCH v12 21/31] mm: Introduce find_vma_rcu()
From: Laurent Dufour @ 2019-04-24 7:57 UTC (permalink / raw)
To: Peter Zijlstra
Cc: jack, sergey.senozhatsky.work, Will Deacon, mhocko, linux-mm,
paulus, Punit Agrawal, hpa, Michel Lespinasse, Alexei Starovoitov,
Andrea Arcangeli, ak, Minchan Kim, aneesh.kumar, x86,
Matthew Wilcox, Daniel Jordan, Ingo Molnar, David Rientjes,
paulmck, Haiyan Song, npiggin, sj38.park, Jerome Glisse, dave,
kemi.wang, kirill, Thomas Gleixner, zhong jiang, Ganesh Mahendran,
Yang Shi, Mike Rapoport, linuxppc-dev, linux-kernel,
Sergey Senozhatsky, vinayak menon, akpm, Tim Chen, haren
In-Reply-To: <20190423092710.GI11158@hirez.programming.kicks-ass.net>
Le 23/04/2019 à 11:27, Peter Zijlstra a écrit :
> On Tue, Apr 16, 2019 at 03:45:12PM +0200, Laurent Dufour wrote:
>> This allows to search for a VMA structure without holding the mmap_sem.
>>
>> The search is repeated while the mm seqlock is changing and until we found
>> a valid VMA.
>>
>> While under the RCU protection, a reference is taken on the VMA, so the
>> caller must call put_vma() once it not more need the VMA structure.
>>
>> At the time a VMA is inserted in the MM RB tree, in vma_rb_insert(), a
>> reference is taken to the VMA by calling get_vma().
>>
>> When removing a VMA from the MM RB tree, the VMA is not release immediately
>> but at the end of the RCU grace period through vm_rcu_put(). This ensures
>> that the VMA remains allocated until the end the RCU grace period.
>>
>> Since the vm_file pointer, if valid, is released in put_vma(), there is no
>> guarantee that the file pointer will be valid on the returned VMA.
>
> What I'm missing here, and in the previous patch introducing the
> refcount (also see refcount_t), is _why_ we need the refcount thing at
> all.
The need for the VMA's refcount is to ensure that the VMA will remain
until the end of the SPF handler. This is a consequence of the use of
RCU instead of SRCU to protect the RB tree.
I was not aware of the refcount_t type, it would be better here to avoid
wrapping.
> My original plan was to use SRCU, which at the time was not complete
> enough so I abused/hacked preemptible RCU, but that is no longer the
> case, SRCU has all the required bits and pieces.
When I did test using SRCU it was involving a lot a scheduling to run
the SRCU callback mechanism. In some workload the impact on the
perfomance was significant [1].
I can't see this overhead using RCU.
>
> Also; the initial motivation was prefaulting large VMAs and the
> contention on mmap was killing things; but similarly, the contention on
> the refcount (I did try that) killed things just the same.
Doing prefaulting should be doable, I'll try to think further about that.
Regarding the refcount, I should I missed something, this is an atomic
counter, so there should not be contention on it but cache exclusivity,
not ideal I agree but I can't see what else to use here.
> So I'm really sad to see the refcount return; and without any apparent
> justification.
I'm not opposed to use another mechanism here, but SRCU didn't show good
performance with some workload, and I can't see how to use RCU without a
reference counter here. So please, advise.
Thanks,
Laurent.
[1]
https://lore.kernel.org/linux-mm/7ca80231-fe02-a3a7-84bc-ce81690ea051@intel.com/
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox