* Re: [RFC PATCH v3 0/6] Restricted DMA
From: Florian Fainelli @ 2021-01-07 17:59 UTC (permalink / raw)
To: Claire Chang
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
Joerg Roedel, rafael.j.wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
will, Konrad Rzeszutek Wilk, dan.j.williams, linuxppc-dev,
Rob Herring, boris.ostrovsky, Andy Shevchenko, jgross,
Nicolas Boichat, Greg KH, rdunlap, lkml, Tomasz Figa, iommu,
Jim Quinlan, xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <CALiNf28sU1VtGB7LeTXExkMwQiCeg8N5arqyEjw0CPZP72R4dg@mail.gmail.com>
On 1/7/21 9:42 AM, Claire Chang wrote:
>> Can you explain how ATF gets involved and to what extent it does help,
>> besides enforcing a secure region from the ARM CPU's perpsective? Does
>> the PCIe root complex not have an IOMMU but can somehow be denied access
>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
>> still some sort of basic protection that the HW enforces, right?
>
> We need the ATF support for memory MPU (memory protection unit).
> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
> region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
> that specific regions.
OK so you do have a protection unit of some sort to enforce which region
in DRAM the PCIE bridge is allowed to access, that makes sense,
otherwise the restricted DMA region would only be a hint but nothing you
can really enforce. This is almost entirely analogous to our systems then.
There may be some value in standardizing on an ARM SMCCC call then since
you already support two different SoC vendors.
>
>>
>> On Broadcom STB SoCs we have had something similar for a while however
>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
>> basic protection mechanism whereby we can configure a region in DRAM to
>> be PCIe read/write and CPU read/write which then gets used as the PCIe
>> inbound region for the PCIe EP. By default the PCIe bridge is not
>> allowed access to DRAM so we must call into a security agent to allow
>> the PCIe bridge to access the designated DRAM region.
>>
>> We have done this using a private CMA area region assigned via Device
>> Tree, assigned with a and requiring the PCIe EP driver to use
>> dma_alloc_from_contiguous() in order to allocate from this device
>> private CMA area. The only drawback with that approach is that it
>> requires knowing how much memory you need up front for buffers and DMA
>> descriptors that the PCIe EP will need to process. The problem is that
>> it requires driver modifications and that does not scale over the number
>> of PCIe EP drivers, some we absolutely do not control, but there is no
>> need to bounce buffer. Your approach scales better across PCIe EP
>> drivers however it does require bounce buffering which could be a
>> performance hit.
>
> Only the streaming DMA (map/unmap) needs bounce buffering.
True, and typically only on transmit since you don't really control
where the sk_buff are allocated from, right? On RX since you need to
hand buffer addresses to the WLAN chip prior to DMA, you can allocate
them from a pool that already falls within the restricted DMA region, right?
> I also added alloc/free support in this series
> (https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc() will
> try to allocate memory from the predefined memory region.
>
> As for the performance hit, it should be similar to the default swiotlb.
> Here are my experiment results. Both SoCs lack IOMMU for PCIe.
>
> PCIe wifi vht80 throughput -
>
> MTK SoC tcp_tx tcp_rx udp_tx udp_rx
> w/o Restricted DMA 244.1 134.66 312.56 350.79
> w/ Restricted DMA 246.95 136.59 363.21 351.99
>
> Rockchip SoC tcp_tx tcp_rx udp_tx udp_rx
> w/o Restricted DMA 237.87 133.86 288.28 361.88
> w/ Restricted DMA 256.01 130.95 292.28 353.19
How come you get better throughput with restricted DMA? Is it because
doing DMA to/from a contiguous region allows for better grouping of
transactions from the DRAM controller's perspective somehow?
>
> The CPU usage doesn't increase too much either.
> Although I didn't measure the CPU usage very precisely, it's ~3% with a single
> big core (Cortex-A72) and ~5% with a single small core (Cortex-A53).
>
> Thanks!
>
>>
>> Thanks!
>> --
>> Florian
--
Florian
^ permalink raw reply
* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Florian Fainelli @ 2021-01-07 18:09 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, Claire Chang
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
Joerg Roedel, rafael.j.wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
will, dan.j.williams, linuxppc-dev, Rob Herring, boris.ostrovsky,
Andy Shevchenko, jgross, Nicolas Boichat, Greg KH, rdunlap, lkml,
Tomasz Figa, iommu, xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <20210107175740.GA16519@char.us.oracle.com>
On 1/7/21 9:57 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
>> Hi Greg and Konrad,
>>
>> This change is intended to be non-arch specific. Any arch that lacks DMA access
>> control and has devices not behind an IOMMU can make use of it. Could you share
>> why you think this should be arch specific?
>
> The idea behind non-arch specific code is it to be generic. The devicetree
> is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
> be in arch specific code.
In premise the same code could be used with an ACPI enabled system with
an appropriate service to identify the restricted DMA regions and unlock
them.
More than 1 architecture requiring this function (ARM and ARM64 are the
two I can think of needing this immediately) sort of calls for making
the code architecture agnostic since past 2, you need something that scales.
There is already code today under kernel/dma/contiguous.c that is only
activated on a CONFIG_OF=y && CONFIG_OF_RESERVED_MEM=y system, this is
no different.
--
Florian
^ permalink raw reply
* Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool
From: Florian Fainelli @ 2021-01-07 18:14 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, Claire Chang
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
Joerg Roedel, rafael.j.wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
will, dan.j.williams, linuxppc-dev, Rob Herring, boris.ostrovsky,
Andy Shevchenko, jgross, Nicolas Boichat, Greg KH, rdunlap, lkml,
Tomasz Figa, iommu, xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <20210107180032.GB16519@char.us.oracle.com>
On 1/7/21 10:00 AM, Konrad Rzeszutek Wilk wrote:
>>>
>>>
>>> - Nothing stops the physical device from bypassing the SWIOTLB buffer.
>>> That is if an errant device screwed up the length or DMA address, the
>>> SWIOTLB would gladly do what the device told it do?
>>
>> So the system needs to provide a way to lock down the memory access, e.g. MPU.
>
> OK! Would it be prudent to have this in the description above perhaps?
Yes this is something that must be documented as a requirement for the
restricted DMA pool users, otherwise attempting to do restricted DMA
pool is no different than say, using a device private CMA region.
Without the enforcement, this is just a best effort.
--
Florian
^ permalink raw reply
* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Konrad Rzeszutek Wilk @ 2021-01-07 21:19 UTC (permalink / raw)
To: Florian Fainelli
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
xypron.glpk, Joerg Roedel, rafael.j.wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
will, dan.j.williams, linuxppc-dev, Rob Herring, Claire Chang,
boris.ostrovsky, Andy Shevchenko, jgross, Nicolas Boichat,
Greg KH, rdunlap, lkml, Tomasz Figa, list@263.net:IOMMU DRIVERS,
Robin Murphy, bauerman
In-Reply-To: <aa5af7d1-779e-f0f6-e6ba-8040e603523f@gmail.com>
On Thu, Jan 07, 2021 at 10:09:14AM -0800, Florian Fainelli wrote:
> On 1/7/21 9:57 AM, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
> >> Hi Greg and Konrad,
> >>
> >> This change is intended to be non-arch specific. Any arch that lacks DMA access
> >> control and has devices not behind an IOMMU can make use of it. Could you share
> >> why you think this should be arch specific?
> >
> > The idea behind non-arch specific code is it to be generic. The devicetree
> > is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
> > be in arch specific code.
>
> In premise the same code could be used with an ACPI enabled system with
> an appropriate service to identify the restricted DMA regions and unlock
> them.
Which this patchset is not.
>
> More than 1 architecture requiring this function (ARM and ARM64 are the
> two I can think of needing this immediately) sort of calls for making
> the code architecture agnostic since past 2, you need something that scales.
I believe the use-case is for ARM64 at this moment.
>
> There is already code today under kernel/dma/contiguous.c that is only
> activated on a CONFIG_OF=y && CONFIG_OF_RESERVED_MEM=y system, this is
> no different.
> --
> Florian
^ permalink raw reply
* Re: [RFC PATCH v3 0/6] Restricted DMA
From: Claire Chang @ 2021-01-07 17:38 UTC (permalink / raw)
To: Florian Fainelli
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
rafael.j.wysocki, Christoph Hellwig, Bartosz Golaszewski,
xen-devel, Thierry Reding, linux-devicetree, will, konrad.wilk,
dan.j.williams, linuxppc-dev, Rob Herring, boris.ostrovsky,
Andy Shevchenko, jgross, Nicolas Boichat, Greg KH, rdunlap, lkml,
Tomasz Figa,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
Jim Quinlan, xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <d7043239-12cf-3636-4726-2e3b90917dc6@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 7552 bytes --]
On Thu, Jan 7, 2021 at 2:48 AM Florian Fainelli <f.fainelli@gmail.com>
wrote:
>
> Hi,
>
> First of all let me say that I am glad that someone is working on a
> upstream solution for this issue, would appreciate if you could CC and
> Jim Quinlan on subsequent submissions.
Sure!
>
>
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > This series implements mitigations for lack of DMA access control on
> > systems without an IOMMU, which could result in the DMA accessing the
> > system memory at unexpected times and/or unexpected addresses, possibly
> > leading to data leakage or corruption.
> >
> > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus
is
> > not behind an IOMMU. As PCI-e, by design, gives the device full access
to
> > system memory, a vulnerability in the Wi-Fi firmware could easily
escalate
> > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> > full chain of exploits; [2], [3]).
> >
> > To mitigate the security concerns, we introduce restricted DMA.
Restricted
> > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of
a
> > specially allocated region and does memory allocation from the same
region.
> > The feature on its own provides a basic level of protection against the
DMA
> > overwriting buffer contents at unexpected times. However, to protect
> > against general data leakage and system memory corruption, the system
needs
> > to provide a way to restrict the DMA to a predefined memory region
(this is
> > usually done at firmware level, e.g. in ATF on some ARM platforms).
>
> Can you explain how ATF gets involved and to what extent it does help,
> besides enforcing a secure region from the ARM CPU's perpsective? Does
> the PCIe root complex not have an IOMMU but can somehow be denied access
> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> still some sort of basic protection that the HW enforces, right?
We need the ATF support for memory MPU (memory protection unit).
Restricted DMA (with reserved-memory in dts) makes sure the predefined
memory
region is for PCIe DMA only, but we still need MPU to locks down PCIe
access to
that specific regions.
>
> On Broadcom STB SoCs we have had something similar for a while however
> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> basic protection mechanism whereby we can configure a region in DRAM to
> be PCIe read/write and CPU read/write which then gets used as the PCIe
> inbound region for the PCIe EP. By default the PCIe bridge is not
> allowed access to DRAM so we must call into a security agent to allow
> the PCIe bridge to access the designated DRAM region.
>
> We have done this using a private CMA area region assigned via Device
> Tree, assigned with a and requiring the PCIe EP driver to use
> dma_alloc_from_contiguous() in order to allocate from this device
> private CMA area. The only drawback with that approach is that it
> requires knowing how much memory you need up front for buffers and DMA
> descriptors that the PCIe EP will need to process. The problem is that
> it requires driver modifications and that does not scale over the number
> of PCIe EP drivers, some we absolutely do not control, but there is no
> need to bounce buffer. Your approach scales better across PCIe EP
> drivers however it does require bounce buffering which could be a
> performance hit.
Only the streaming DMA (map/unmap) needs bounce buffering.
I also added alloc/free support in this series
(https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc()
will
try to allocate memory from the predefined memory region.
As for the performance hit, it should be similar to the default swiotlb.
Here are my experiment results. Both SoCs lack IOMMU for PCIe.
PCIe wifi vht80 throughput -
MTK SoC tcp_tx tcp_rx udp_tx udp_rx
w/o Restricted DMA 244.1 134.66 312.56 350.79
w/ Restricted DMA 246.95 136.59 363.21 351.99
Rockchip SoC tcp_tx tcp_rx udp_tx udp_rx
w/o Restricted DMA 237.87 133.86 288.28 361.88
w/ Restricted DMA 256.01 130.95 292.28 353.19
The CPU usage doesn't increase too much either.
Although I didn't measure the CPU usage very precisely, it's ~3% with a
single
big core (Cortex-A72) and ~5% with a single small core (Cortex-A53).
Thanks!
>
> Thanks!
On Thu, Jan 7, 2021 at 2:48 AM Florian Fainelli <f.fainelli@gmail.com>
wrote:
> Hi,
>
> First of all let me say that I am glad that someone is working on a
> upstream solution for this issue, would appreciate if you could CC and
> Jim Quinlan on subsequent submissions.
>
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > This series implements mitigations for lack of DMA access control on
> > systems without an IOMMU, which could result in the DMA accessing the
> > system memory at unexpected times and/or unexpected addresses, possibly
> > leading to data leakage or corruption.
> >
> > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> > not behind an IOMMU. As PCI-e, by design, gives the device full access to
> > system memory, a vulnerability in the Wi-Fi firmware could easily
> escalate
> > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> > full chain of exploits; [2], [3]).
> >
> > To mitigate the security concerns, we introduce restricted DMA.
> Restricted
> > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> > specially allocated region and does memory allocation from the same
> region.
> > The feature on its own provides a basic level of protection against the
> DMA
> > overwriting buffer contents at unexpected times. However, to protect
> > against general data leakage and system memory corruption, the system
> needs
> > to provide a way to restrict the DMA to a predefined memory region (this
> is
> > usually done at firmware level, e.g. in ATF on some ARM platforms).
>
> Can you explain how ATF gets involved and to what extent it does help,
> besides enforcing a secure region from the ARM CPU's perpsective? Does
> the PCIe root complex not have an IOMMU but can somehow be denied access
> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> still some sort of basic protection that the HW enforces, right?
>
> On Broadcom STB SoCs we have had something similar for a while however
> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> basic protection mechanism whereby we can configure a region in DRAM to
> be PCIe read/write and CPU read/write which then gets used as the PCIe
> inbound region for the PCIe EP. By default the PCIe bridge is not
> allowed access to DRAM so we must call into a security agent to allow
> the PCIe bridge to access the designated DRAM region.
>
> We have done this using a private CMA area region assigned via Device
> Tree, assigned with a and requiring the PCIe EP driver to use
> dma_alloc_from_contiguous() in order to allocate from this device
> private CMA area. The only drawback with that approach is that it
> requires knowing how much memory you need up front for buffers and DMA
> descriptors that the PCIe EP will need to process. The problem is that
> it requires driver modifications and that does not scale over the number
> of PCIe EP drivers, some we absolutely do not control, but there is no
> need to bounce buffer. Your approach scales better across PCIe EP
> drivers however it does require bounce buffering which could be a
> performance hit.
>
> Thanks!
> --
> Florian
>
[-- Attachment #2: Type: text/html, Size: 8961 bytes --]
^ permalink raw reply
* Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool
From: Konrad Rzeszutek Wilk @ 2021-01-07 18:00 UTC (permalink / raw)
To: Claire Chang
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
rafael.j.wysocki, Christoph Hellwig, Bartosz Golaszewski,
xen-devel, Thierry Reding, linux-devicetree, will, dan.j.williams,
linuxppc-dev, Rob Herring, boris.ostrovsky, Andy Shevchenko,
jgross, Nicolas Boichat, Greg KH, rdunlap, lkml, Tomasz Figa,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <CALiNf2_dV13jbHqLt-r1eK+dtOcAKBGcWQCVMQn+eL6MuOrETQ@mail.gmail.com>
On Fri, Jan 08, 2021 at 01:39:43AM +0800, Claire Chang wrote:
> On Thu, Jan 7, 2021 at 2:58 AM Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> >
> > On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> > > Introduce the new compatible string, restricted-dma-pool, for restricted
> > > DMA. One can specify the address and length of the restricted DMA memory
> > > region by restricted-dma-pool in the device tree.
> > >
> > > Signed-off-by: Claire Chang <tientzu@chromium.org>
> > > ---
> > > .../reserved-memory/reserved-memory.txt | 24 +++++++++++++++++++
> > > 1 file changed, 24 insertions(+)
> > >
> > > diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > index e8d3096d922c..44975e2a1fd2 100644
> > > --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> > > used as a shared pool of DMA buffers for a set of devices. It can
> > > be used by an operating system to instantiate the necessary pool
> > > management subsystem if necessary.
> > > + - restricted-dma-pool: This indicates a region of memory meant to be
> > > + used as a pool of restricted DMA buffers for a set of devices. The
> > > + memory region would be the only region accessible to those devices.
> > > + When using this, the no-map and reusable properties must not be set,
> > > + so the operating system can create a virtual mapping that will be used
> > > + for synchronization. The main purpose for restricted DMA is to
> > > + mitigate the lack of DMA access control on systems without an IOMMU,
> > > + which could result in the DMA accessing the system memory at
> > > + unexpected times and/or unexpected addresses, possibly leading to data
> > > + leakage or corruption. The feature on its own provides a basic level
> > > + of protection against the DMA overwriting buffer contents at
> > > + unexpected times. However, to protect against general data leakage and
> > > + system memory corruption, the system needs to provide way to restrict
> > > + the DMA to a predefined memory region.
> >
> > Heya!
> >
> > I think I am missing something obvious here so please bear with my
> > questions:
> >
> > - This code adds the means of having the SWIOTLB pool tied to a specific
> > memory correct?
>
> It doesn't affect the existing SWIOTLB. It just utilizes the existing SWIOTLB
> code to create another DMA pool tied to a specific memory region for a given set
> of devices. It bounces the streaming DMA (map/unmap) in and out of that region
> and does the memory allocation (dma_direct_alloc) from the same region.
Right, so why can't it follow the same mechanism that Xen SWIOTLB does - which
had exactly the same problem (needed special handling on the pool) - and do
a similar code?
>
> >
> >
> > - Nothing stops the physical device from bypassing the SWIOTLB buffer.
> > That is if an errant device screwed up the length or DMA address, the
> > SWIOTLB would gladly do what the device told it do?
>
> So the system needs to provide a way to lock down the memory access, e.g. MPU.
OK! Would it be prudent to have this in the description above perhaps?
>
> >
> > - This has to be combined with SWIOTLB-force-ish to always use the
> > bounce buffer, otherwise you could still do DMA without using
> > SWIOTLB (by not hitting the criteria for needing to use SWIOTLB)?
>
> Since restricted DMA is for the devices that are not behind an IOMMU, I change
> the criteria
> `if (unlikely(swiotlb_force == SWIOTLB_FORCE))`
> to
> `if (unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dma_io_tlb_mem)`
> in dma_direct_map_page().
>
> Also, even if SWIOTLB=force, the restricted DMA pool is preferred if available
> (get_io_tlb_mem in https://lore.kernel.org/patchwork/patch/1360995/).
>
> Thanks!
^ permalink raw reply
* Re: [PATCH] ibmvfc: fix missing cast of ibmvfc_event pointer to u64 handle
From: Martin K. Petersen @ 2021-01-08 4:19 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: kernel test robot, linux-scsi, linux-kernel, Martin K . Petersen,
brking, linuxppc-dev
In-Reply-To: <20210106203721.1054693-1-tyreld@linux.ibm.com>
On Wed, 6 Jan 2021 14:37:21 -0600, Tyrel Datwyler wrote:
> Commit 2aa0102c6688 ("scsi: ibmvfc: Use correlation token to tag
> commands") sets the vfcFrame correlation token to the pointer handle of
> the associated ibmvfc_event. However, that commit failed to cast the
> pointer to an appropriate type which in this case is a u64. As such
> sparse warnings are generated for both correlation token assignments.
>
> ibmvfc.c:2375:36: sparse: incorrect type in argument 1 (different base types)
> ibmvfc.c:2375:36: sparse: expected unsigned long long [usertype] val
> ibmvfc.c:2375:36: sparse: got struct ibmvfc_event *[assigned] evt
>
> [...]
Applied to 5.11/scsi-fixes, thanks!
[1/1] ibmvfc: fix missing cast of ibmvfc_event pointer to u64 handle
https://git.kernel.org/mkp/scsi/c/901d01c8e50c
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply
* Re: [PATCH v2 0/5] ibmvfc: MQ preparatory locking work
From: Martin K. Petersen @ 2021-01-08 3:38 UTC (permalink / raw)
To: Tyrel Datwyler
Cc: martin.petersen, linux-scsi, linux-kernel, james.bottomley,
brking, linuxppc-dev
In-Reply-To: <20210106201835.1053593-1-tyreld@linux.ibm.com>
Tyrel,
> The ibmvfc driver in its current form relies heavily on the
> host_lock. This patchset introduces a genric queue with its own queue
> lock and sent/free event list locks. This generic queue allows the
> driver to decouple the primary queue and future subordinate queues
> from the host lock reducing lock contention while also relaxing
> locking for submissions and completions to simply the list lock of the
> queue in question.
Applied to 5.12/scsi-staging, thanks!
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply
* RE: [PATCH 05/11] iov_iter: merge the compat case into rw_copy_check_uvector
From: David Laight @ 2021-01-08 11:49 UTC (permalink / raw)
To: 'Christoph Hellwig', Alexander Viro
Cc: Jens Axboe, linux-s390@vger.kernel.org,
linux-arch@vger.kernel.org, linux-parisc@vger.kernel.org,
Arnd Bergmann, linux-aio@kvack.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
David Howells, linux-fsdevel@vger.kernel.org,
linux-security-module@vger.kernel.org, keyrings@vger.kernel.org,
linux-block@vger.kernel.org, linux-mm@kvack.org,
linux-scsi@vger.kernel.org, sparclinux@vger.kernel.org,
Andrew Morton, linuxppc-dev@lists.ozlabs.org,
io-uring@vger.kernel.org, linux-arm-kernel@lists.infradead.org
In-Reply-To: <20200921143434.707844-6-hch@lst.de>
From: Christoph Hellwig <hch@lst.de>
> Sent: 21 September 2020 15:34
>
> Stop duplicating the iovec verify code, and instead add add a
> __import_iovec helper that does the whole verify and import, but takes
> a bool compat to decided on the native or compat layout. This also
> ends up massively simplifying the calling conventions.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> lib/iov_iter.c | 195 ++++++++++++++++++-------------------------------
> 1 file changed, 70 insertions(+), 125 deletions(-)
>
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index a64867501a7483..8bfa47b63d39aa 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -10,6 +10,7 @@
> #include <net/checksum.h>
> #include <linux/scatterlist.h>
> #include <linux/instrumented.h>
> +#include <linux/compat.h>
>
> #define PIPE_PARANOIA /* for now */
>
> @@ -1650,43 +1651,76 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
> }
> EXPORT_SYMBOL(dup_iter);
>
> -static ssize_t rw_copy_check_uvector(int type,
> - const struct iovec __user *uvector, unsigned long nr_segs,
> - unsigned long fast_segs, struct iovec *fast_pointer,
> - struct iovec **ret_pointer)
> +static int compat_copy_iovecs_from_user(struct iovec *iov,
> + const struct iovec __user *uvector, unsigned long nr_segs)
> +{
> + const struct compat_iovec __user *uiov =
> + (const struct compat_iovec __user *)uvector;
> + unsigned long i;
> + int ret = -EFAULT;
> +
> + if (!user_access_begin(uvector, nr_segs * sizeof(*uvector)))
> + return -EFAULT;
I little bit late, but the above isn't quite right.
It should be sizeof(*iouv) - the length is double what it should be.
Not that access_ok() can fail for compat addresses
and the extra length won't matter for architectures that
need the address/length to open an address hole into userspace.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply
* [powerpc:merge] BUILD SUCCESS 39a77db53cca8200b8bc99fc0993e127b59f08fb
From: kernel test robot @ 2021-01-08 17:01 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge
branch HEAD: 39a77db53cca8200b8bc99fc0993e127b59f08fb Automatic merge of 'fixes' into merge (2021-01-07 17:43)
elapsed time: 899m
configs tested: 131
configs skipped: 2
The following configs have been built successfully.
More configs may be tested in the coming days.
gcc tested configs:
arm defconfig
arm64 allyesconfig
arm64 defconfig
arm allyesconfig
arm allmodconfig
nios2 allyesconfig
powerpc mvme5100_defconfig
mips ath25_defconfig
sh shx3_defconfig
m68k multi_defconfig
sh lboxre2_defconfig
openrisc defconfig
powerpc arches_defconfig
mips qi_lb60_defconfig
powerpc sbc8548_defconfig
openrisc simple_smp_defconfig
sh se7780_defconfig
arc haps_hs_smp_defconfig
um x86_64_defconfig
sparc64 alldefconfig
arm oxnas_v6_defconfig
sh se7721_defconfig
powerpc mgcoge_defconfig
mips malta_kvm_defconfig
m68k mvme16x_defconfig
powerpc wii_defconfig
riscv rv32_defconfig
m68k alldefconfig
openrisc or1klitex_defconfig
parisc generic-32bit_defconfig
powerpc warp_defconfig
powerpc motionpro_defconfig
sh microdev_defconfig
arm vexpress_defconfig
mips ath79_defconfig
sh rsk7201_defconfig
arm imx_v4_v5_defconfig
sh ap325rxa_defconfig
m68k amcore_defconfig
ia64 alldefconfig
sh sh7785lcr_defconfig
arm h3600_defconfig
m68k atari_defconfig
sparc sparc64_defconfig
c6x evmc6472_defconfig
powerpc sam440ep_defconfig
ia64 tiger_defconfig
powerpc ppc64_defconfig
powerpc tqm8555_defconfig
arm shannon_defconfig
powerpc ksi8560_defconfig
mips malta_kvm_guest_defconfig
powerpc pcm030_defconfig
mips gpr_defconfig
arm spear13xx_defconfig
nios2 3c120_defconfig
mips bigsur_defconfig
powerpc storcenter_defconfig
powerpc mpc7448_hpc2_defconfig
m68k m5272c3_defconfig
sh ecovec24_defconfig
arm eseries_pxa_defconfig
sh r7780mp_defconfig
powerpc acadia_defconfig
arc axs103_defconfig
ia64 allmodconfig
ia64 defconfig
ia64 allyesconfig
m68k allmodconfig
m68k defconfig
m68k allyesconfig
nios2 defconfig
arc allyesconfig
nds32 allnoconfig
c6x allyesconfig
nds32 defconfig
csky defconfig
alpha defconfig
alpha allyesconfig
xtensa allyesconfig
h8300 allyesconfig
arc defconfig
sh allmodconfig
parisc defconfig
s390 allyesconfig
parisc allyesconfig
s390 defconfig
i386 allyesconfig
sparc allyesconfig
sparc defconfig
i386 tinyconfig
i386 defconfig
mips allyesconfig
mips allmodconfig
powerpc allyesconfig
powerpc allmodconfig
powerpc allnoconfig
x86_64 randconfig-a004-20210108
x86_64 randconfig-a006-20210108
x86_64 randconfig-a001-20210108
x86_64 randconfig-a002-20210108
x86_64 randconfig-a003-20210108
x86_64 randconfig-a005-20210108
i386 randconfig-a005-20210108
i386 randconfig-a002-20210108
i386 randconfig-a001-20210108
i386 randconfig-a003-20210108
i386 randconfig-a006-20210108
i386 randconfig-a004-20210108
i386 randconfig-a016-20210108
i386 randconfig-a011-20210108
i386 randconfig-a014-20210108
i386 randconfig-a015-20210108
i386 randconfig-a013-20210108
i386 randconfig-a012-20210108
riscv nommu_k210_defconfig
riscv allyesconfig
riscv nommu_virt_defconfig
riscv allnoconfig
riscv defconfig
riscv allmodconfig
x86_64 rhel
x86_64 allyesconfig
x86_64 rhel-7.6-kselftests
x86_64 defconfig
x86_64 rhel-8.3
x86_64 rhel-8.3-kbuiltin
x86_64 kexec
clang tested configs:
x86_64 randconfig-a013-20210108
x86_64 randconfig-a011-20210108
x86_64 randconfig-a012-20210108
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* [powerpc:fixes-test] BUILD SUCCESS 3ce47d95b7346dcafd9bed3556a8d072cb2b8571
From: kernel test robot @ 2021-01-08 17:01 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git fixes-test
branch HEAD: 3ce47d95b7346dcafd9bed3556a8d072cb2b8571 powerpc: Handle .text.{hot,unlikely}.* in linker script
elapsed time: 900m
configs tested: 55
configs skipped: 109
The following configs have been built successfully.
More configs may be tested in the coming days.
gcc tested configs:
arc haps_hs_smp_defconfig
um x86_64_defconfig
sparc64 alldefconfig
arm oxnas_v6_defconfig
sh se7721_defconfig
arm h3600_defconfig
m68k atari_defconfig
sparc sparc64_defconfig
c6x evmc6472_defconfig
mips ath25_defconfig
nios2 3c120_defconfig
mips bigsur_defconfig
powerpc storcenter_defconfig
powerpc mpc7448_hpc2_defconfig
m68k m5272c3_defconfig
sh ecovec24_defconfig
arm eseries_pxa_defconfig
sh r7780mp_defconfig
powerpc acadia_defconfig
arc axs103_defconfig
ia64 allmodconfig
ia64 defconfig
ia64 allyesconfig
nds32 defconfig
nios2 allyesconfig
csky defconfig
alpha defconfig
alpha allyesconfig
xtensa allyesconfig
h8300 allyesconfig
arc defconfig
sh allmodconfig
mips allyesconfig
mips allmodconfig
powerpc allyesconfig
powerpc allmodconfig
powerpc allnoconfig
x86_64 randconfig-a004-20210108
x86_64 randconfig-a006-20210108
x86_64 randconfig-a001-20210108
x86_64 randconfig-a002-20210108
x86_64 randconfig-a003-20210108
x86_64 randconfig-a005-20210108
i386 randconfig-a005-20210108
i386 randconfig-a002-20210108
i386 randconfig-a001-20210108
i386 randconfig-a003-20210108
i386 randconfig-a006-20210108
i386 randconfig-a004-20210108
i386 randconfig-a016-20210108
i386 randconfig-a011-20210108
i386 randconfig-a014-20210108
i386 randconfig-a015-20210108
i386 randconfig-a013-20210108
i386 randconfig-a012-20210108
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* [powerpc:next-test] BUILD SUCCESS 3e069ffc01566d93fa14c80f3c705c1610b9c402
From: kernel test robot @ 2021-01-08 20:58 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test
branch HEAD: 3e069ffc01566d93fa14c80f3c705c1610b9c402 powerpc/pseries/eeh: Make pseries_send_allow_unfreeze() static
elapsed time: 1134m
configs tested: 157
configs skipped: 2
The following configs have been built successfully.
More configs may be tested in the coming days.
gcc tested configs:
arm defconfig
arm64 allyesconfig
arm allyesconfig
arm allmodconfig
arm64 defconfig
powerpc xes_mpc85xx_defconfig
powerpc mpc832x_rdb_defconfig
arm imx_v6_v7_defconfig
x86_64 defconfig
nios2 allyesconfig
powerpc mvme5100_defconfig
mips ath25_defconfig
sh shx3_defconfig
m68k multi_defconfig
sh lboxre2_defconfig
openrisc defconfig
powerpc arches_defconfig
mips qi_lb60_defconfig
powerpc sbc8548_defconfig
openrisc simple_smp_defconfig
sh se7780_defconfig
arc haps_hs_smp_defconfig
um x86_64_defconfig
sparc64 alldefconfig
arm oxnas_v6_defconfig
sh se7721_defconfig
powerpc mgcoge_defconfig
mips malta_kvm_defconfig
m68k mvme16x_defconfig
powerpc wii_defconfig
riscv rv32_defconfig
m68k alldefconfig
openrisc or1klitex_defconfig
parisc generic-32bit_defconfig
powerpc warp_defconfig
arm imx_v4_v5_defconfig
sh ap325rxa_defconfig
m68k amcore_defconfig
ia64 alldefconfig
sh sh7785lcr_defconfig
arm h3600_defconfig
m68k atari_defconfig
sparc sparc64_defconfig
c6x evmc6472_defconfig
powerpc sam440ep_defconfig
mips tb0219_defconfig
mips bmips_stb_defconfig
powerpc ppc6xx_defconfig
mips decstation_64_defconfig
nds32 alldefconfig
sh microdev_defconfig
powerpc mpc7448_hpc2_defconfig
mips workpad_defconfig
mips omega2p_defconfig
sh titan_defconfig
powerpc ppc64e_defconfig
sh dreamcast_defconfig
powerpc allnoconfig
arm pxa3xx_defconfig
riscv nommu_virt_defconfig
sh se7619_defconfig
mips ip22_defconfig
ia64 zx1_defconfig
powerpc ppa8548_defconfig
openrisc or1ksim_defconfig
powerpc mpc885_ads_defconfig
mips malta_kvm_guest_defconfig
i386 allyesconfig
powerpc ppc64_defconfig
powerpc tqm8560_defconfig
mips cobalt_defconfig
nios2 3c120_defconfig
mips bigsur_defconfig
powerpc storcenter_defconfig
m68k m5272c3_defconfig
sh ecovec24_defconfig
powerpc pseries_defconfig
arm lpc32xx_defconfig
powerpc kilauea_defconfig
m68k allyesconfig
arm eseries_pxa_defconfig
sh r7780mp_defconfig
powerpc acadia_defconfig
arc axs103_defconfig
arc axs101_defconfig
mips gcw0_defconfig
mips pic32mzda_defconfig
mips pistachio_defconfig
arm vexpress_defconfig
mips gpr_defconfig
sh hp6xx_defconfig
arm integrator_defconfig
sh rts7751r2dplus_defconfig
ia64 allmodconfig
ia64 defconfig
ia64 allyesconfig
m68k allmodconfig
m68k defconfig
nios2 defconfig
arc allyesconfig
nds32 allnoconfig
c6x allyesconfig
nds32 defconfig
csky defconfig
alpha defconfig
alpha allyesconfig
xtensa allyesconfig
h8300 allyesconfig
arc defconfig
sh allmodconfig
parisc defconfig
s390 allyesconfig
parisc allyesconfig
s390 defconfig
sparc allyesconfig
sparc defconfig
i386 tinyconfig
i386 defconfig
mips allyesconfig
mips allmodconfig
powerpc allyesconfig
powerpc allmodconfig
x86_64 randconfig-a004-20210108
x86_64 randconfig-a006-20210108
x86_64 randconfig-a001-20210108
x86_64 randconfig-a002-20210108
x86_64 randconfig-a003-20210108
x86_64 randconfig-a005-20210108
i386 randconfig-a005-20210108
i386 randconfig-a002-20210108
i386 randconfig-a001-20210108
i386 randconfig-a003-20210108
i386 randconfig-a006-20210108
i386 randconfig-a004-20210108
i386 randconfig-a016-20210108
i386 randconfig-a011-20210108
i386 randconfig-a014-20210108
i386 randconfig-a015-20210108
i386 randconfig-a013-20210108
i386 randconfig-a012-20210108
riscv nommu_k210_defconfig
riscv allyesconfig
riscv allnoconfig
riscv defconfig
riscv allmodconfig
x86_64 rhel
x86_64 allyesconfig
x86_64 rhel-7.6-kselftests
x86_64 rhel-8.3
x86_64 rhel-8.3-kbuiltin
x86_64 kexec
clang tested configs:
x86_64 randconfig-a013-20210108
x86_64 randconfig-a011-20210108
x86_64 randconfig-a012-20210108
x86_64 randconfig-a016-20210108
x86_64 randconfig-a014-20210108
x86_64 randconfig-a015-20210108
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* Re: [patch V3 13/37] mips/mm/highmem: Switch to generic kmap atomic
From: Paul Cercueil @ 2021-01-08 20:20 UTC (permalink / raw)
To: tglx
Cc: juri.lelli, linux-aio, airlied, nouveau, bigeasy, dri-devel,
linux-mips, bsegall, jcmvbkbc, ray.huang, paulus, kraxel,
sparclinux, deanbo422, hch, vincent.guittot, paulmck, x86, linux,
linux-csky, mingo, peterz, linux-graphics-maintainer, bskeggs,
airlied, linux-snps-arc, linux-mm, mgorman, linux-xtensa, arnd,
intel-gfx, sroland, josef, rostedt, torvalds, green.hu,
rodrigo.vivi, dsterba, virtualization, dietmar.eggemann,
linux-arm-kernel, chris, monstr, tsbogend, nickhu, clm,
linuxppc-dev, linux-kernel, christian.koenig, bcrl, spice-devel,
vgupta, linux-fsdevel, akpm, bristot, davem, linux-btrfs, viro
Hi Thomas,
5.11 does not boot anymore on Ingenic SoCs, I bisected it to this
commit.
Any idea what could be happening?
Cheers,
-Paul
^ permalink raw reply
* Re: [PATCH] powerpc/mm: Limit allocation of SWIOTLB on server machines
From: Thiago Jung Bauermann @ 2021-01-09 0:27 UTC (permalink / raw)
To: Ram Pai; +Cc: Satheesh Rajendran, linuxppc-dev, linux-kernel
In-Reply-To: <20201224031409.GB4102@ram-ibm-com.ibm.com>
Ram Pai <linuxram@us.ibm.com> writes:
> On Wed, Dec 23, 2020 at 09:06:01PM -0300, Thiago Jung Bauermann wrote:
>>
>> Hi Ram,
>>
>> Thanks for reviewing this patch.
>>
>> Ram Pai <linuxram@us.ibm.com> writes:
>>
>> > On Fri, Dec 18, 2020 at 03:21:03AM -0300, Thiago Jung Bauermann wrote:
>> >> On server-class POWER machines, we don't need the SWIOTLB unless we're a
>> >> secure VM. Nevertheless, if CONFIG_SWIOTLB is enabled we unconditionally
>> >> allocate it.
>> >>
>> >> In most cases this is harmless, but on a few machine configurations (e.g.,
>> >> POWER9 powernv systems with 4 GB area reserved for crashdump kernel) it can
>> >> happen that memblock can't find a 64 MB chunk of memory for the SWIOTLB and
>> >> fails with a scary-looking WARN_ONCE:
>> >>
>> >> ------------[ cut here ]------------
>> >> memblock: bottom-up allocation failed, memory hotremove may be affected
>> >> WARNING: CPU: 0 PID: 0 at mm/memblock.c:332 memblock_find_in_range_node+0x328/0x340
>> >> Modules linked in:
>> >> CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0-rc2-orig+ #6
>> >> NIP: c000000000442f38 LR: c000000000442f34 CTR: c0000000001e0080
>> >> REGS: c000000001def900 TRAP: 0700 Not tainted (5.10.0-rc2-orig+)
>> >> MSR: 9000000002021033 <SF,HV,VEC,ME,IR,DR,RI,LE> CR: 28022222 XER: 20040000
>> >> CFAR: c00000000014b7b4 IRQMASK: 1
>> >> GPR00: c000000000442f34 c000000001defba0 c000000001deff00 0000000000000047
>> >> GPR04: 00000000ffff7fff c000000001def828 c000000001def820 0000000000000000
>> >> GPR08: 0000001ffc3e0000 c000000001b75478 c000000001b75478 0000000000000001
>> >> GPR12: 0000000000002000 c000000002030000 0000000000000000 0000000000000000
>> >> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000002030000
>> >> GPR20: 0000000000000000 0000000000010000 0000000000010000 c000000001defc10
>> >> GPR24: c000000001defc08 c000000001c91868 c000000001defc18 c000000001c91890
>> >> GPR28: 0000000000000000 ffffffffffffffff 0000000004000000 00000000ffffffff
>> >> NIP [c000000000442f38] memblock_find_in_range_node+0x328/0x340
>> >> LR [c000000000442f34] memblock_find_in_range_node+0x324/0x340
>> >> Call Trace:
>> >> [c000000001defba0] [c000000000442f34] memblock_find_in_range_node+0x324/0x340 (unreliable)
>> >> [c000000001defc90] [c0000000015ac088] memblock_alloc_range_nid+0xec/0x1b0
>> >> [c000000001defd40] [c0000000015ac1f8] memblock_alloc_internal+0xac/0x110
>> >> [c000000001defda0] [c0000000015ac4d0] memblock_alloc_try_nid+0x94/0xcc
>> >> [c000000001defe30] [c00000000159c3c8] swiotlb_init+0x78/0x104
>> >> [c000000001defea0] [c00000000158378c] mem_init+0x4c/0x98
>> >> [c000000001defec0] [c00000000157457c] start_kernel+0x714/0xac8
>> >> [c000000001deff90] [c00000000000d244] start_here_common+0x1c/0x58
>> >> Instruction dump:
>> >> 2c230000 4182ffd4 ea610088 ea810090 4bfffe84 39200001 3d42fff4 3c62ff60
>> >> 3863c560 992a8bfc 4bd0881d 60000000 <0fe00000> ea610088 4bfffd94 60000000
>> >> random: get_random_bytes called from __warn+0x128/0x184 with crng_init=0
>> >> ---[ end trace 0000000000000000 ]---
>> >> software IO TLB: Cannot allocate buffer
>> >>
>> >> Unless this is a secure VM the message can actually be ignored, because the
>> >> SWIOTLB isn't needed. Therefore, let's avoid the SWIOTLB in those cases.
>> >
>> > The above warn_on is conveying a genuine warning. Should it be silenced?
>>
>> Not sure I understand your point. This patch doesn't silence the
>> warning, it avoids the problem it is warning about.
>
> Sorry, I should have explained it better. My point is...
>
> If CONFIG_SWIOTLB is enabled, it means that the kernel is
> promising the bounce buffering capability. I know, currently we
> do not have any kernel subsystems that use bounce buffers on
> non-secure-pseries-kernel or powernv-kernel. But that does not
> mean, there wont be any. In case there is such a third-party
> module needing bounce buffering, it wont be able to operate,
> because of the proposed change in your patch.
>
> Is that a good thing or a bad thing, I do not know. I will let
> the experts opine.
Ping? Does anyone else has an opinion on this? The other option I can
think of is changing the crashkernel code to not reserve so much memory
below 4 GB. Other people are considering this option, but it's not
planned for the near future.
Also, there's a patch currently in linux-next which removes the scary
warning because of unrelated reasons:
https://lore.kernel.org/lkml/20201217201214.3414100-2-guro@fb.com
So assuming that the patch above goes in and keeping the assumption that
the swiotlb won't be needed in the powernv machines where I've seen the
warning happen, we can just leave things as they are now.
--
Thiago Jung Bauermann
IBM Linux Technology Center
^ permalink raw reply
* Re: [patch V3 13/37] mips/mm/highmem: Switch to generic kmap atomic
From: Thomas Bogendoerfer @ 2021-01-08 23:58 UTC (permalink / raw)
To: Paul Cercueil
Cc: juri.lelli, linux-aio, airlied, nouveau, bigeasy, dri-devel,
linux-mips, bsegall, jcmvbkbc, ray.huang, paulus, kraxel,
sparclinux, deanbo422, hch, vincent.guittot, paulmck, x86, linux,
linux-csky, mingo, peterz, linux-graphics-maintainer, bskeggs,
airlied, linux-snps-arc, linux-mm, mgorman, linux-xtensa, arnd,
intel-gfx, sroland, josef, rostedt, torvalds, green.hu,
rodrigo.vivi, dsterba, tglx, virtualization, dietmar.eggemann,
linux-arm-kernel, chris, monstr, nickhu, clm, linuxppc-dev,
linux-kernel, christian.koenig, bcrl, spice-devel, vgupta,
linux-fsdevel, akpm, bristot, davem, linux-btrfs, viro
In-Reply-To: <JUTMMQ.NNFWKIUV7UUJ1@crapouillou.net>
On Fri, Jan 08, 2021 at 08:20:43PM +0000, Paul Cercueil wrote:
> Hi Thomas,
>
> 5.11 does not boot anymore on Ingenic SoCs, I bisected it to this commit.
>
> Any idea what could be happening?
not yet, kernel crash log of a Malta QEMU is below.
Thomas.
Kernel bug detected[#1]:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.11.0-rc1-00017-gccb21774863a #2
$ 0 : 00000000 00000001 00000000 00000010
$ 4 : 00000001 000005cf 9e00059f 00000000
$ 8 : 00118173 809e6db8 9e00059f 00000000
$12 : 82023c00 00000001 810da04c 0212422f
$16 : 810da000 00027800 000005cf 80b4bf9c
$20 : 809e968c 82602400 810da000 0000000b
$24 : 021558f9 00000000
$28 : 820e0000 820e3928 80b10000 802710d0
Hi : 0000346c
Lo : 000002dd
epc : 80271114 __kmap_local_pfn_prot+0x78/0x1c0
ra : 802710d0 __kmap_local_pfn_prot+0x34/0x1c0
Status: 1000a403 KERNEL EXL IE
Cause : 00800034 (ExcCode 0d)
PrId : 0001a800 (MIPS P5600)
Modules linked in:
Process swapper/0 (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=00000000)
Stack : 7fffffff 820c2408 820e3990 ffffff04 ffff0a00 80518224 000081a4 810da000
00000001 000005cf fff64000 8011c77c 820e3b26 ffffff04 ffff0a00 80518440
80b30000 80b4bf64 9e0005cf 000005cf fff64000 80271188 00000000 820e3a60
80b10000 80194478 0000005e 80954406 809e0000 810da000 00000001 000005cf
fff68000 8011c77c 8088fd44 809f6074 000000f4 00000000 00000000 80b4bf68
...
Call Trace:
[<80271114>] __kmap_local_pfn_prot+0x78/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<8011c77c>] __update_cache+0x16c/0x174
[<80271188>] __kmap_local_pfn_prot+0xec/0x1c0
[<802c49a0>] copy_string_kernel+0x168/0x264
[<802c5d18>] kernel_execve+0xd0/0x164
[<801006cc>] try_to_run_init_process+0x18/0x5c
[<80859e0c>] kernel_init+0xd0/0x120
[<801037f8>] ret_from_kernel_thread+0x14/0x1c
Code: 8c630564 28640010 38840001 <00040336> 8f82000c 2463ffff 00021100 00431021 2403ffbf
--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]
^ permalink raw reply
* Re: [patch V3 13/37] mips/mm/highmem: Switch to generic kmap atomic
From: Thomas Bogendoerfer @ 2021-01-09 0:33 UTC (permalink / raw)
To: Paul Cercueil
Cc: juri.lelli, linux-aio, airlied, nouveau, bigeasy, dri-devel,
linux-mips, bsegall, jcmvbkbc, ray.huang, paulus, kraxel,
sparclinux, deanbo422, hch, vincent.guittot, paulmck, x86, linux,
linux-csky, mingo, peterz, linux-graphics-maintainer, bskeggs,
airlied, linux-snps-arc, linux-mm, mgorman, linux-xtensa, arnd,
intel-gfx, sroland, josef, rostedt, torvalds, green.hu,
rodrigo.vivi, dsterba, tglx, virtualization, dietmar.eggemann,
linux-arm-kernel, chris, monstr, nickhu, clm, linuxppc-dev,
linux-kernel, christian.koenig, bcrl, spice-devel, vgupta,
linux-fsdevel, akpm, bristot, davem, linux-btrfs, viro
In-Reply-To: <20210108235805.GA17543@alpha.franken.de>
On Sat, Jan 09, 2021 at 12:58:05AM +0100, Thomas Bogendoerfer wrote:
> On Fri, Jan 08, 2021 at 08:20:43PM +0000, Paul Cercueil wrote:
> > Hi Thomas,
> >
> > 5.11 does not boot anymore on Ingenic SoCs, I bisected it to this commit.
> >
> > Any idea what could be happening?
>
> not yet, kernel crash log of a Malta QEMU is below.
update:
This dirty hack lets the Malta QEMU boot again:
diff --git a/mm/highmem.c b/mm/highmem.c
index c3a9ea7875ef..190cdda1149d 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -515,7 +515,7 @@ void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t prot)
vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
BUG_ON(!pte_none(*(kmap_pte - idx)));
pteval = pfn_pte(pfn, prot);
- set_pte_at(&init_mm, vaddr, kmap_pte - idx, pteval);
+ set_pte(kmap_pte - idx, pteval);
arch_kmap_local_post_map(vaddr, pteval);
current->kmap_ctrl.pteval[kmap_local_idx()] = pteval;
preempt_enable();
set_pte_at() tries to update cache and could do an kmap_atomic() there.
Not sure, if this is allowed at this point.
Thomas.
--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]
^ permalink raw reply related
* [PATCH v3 7/8] powerpc/signal64: Rewrite handle_rt_signal64() to minimise uaccess switches
From: Christopher M. Riedl @ 2021-01-09 3:25 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Daniel Axtens
In-Reply-To: <20210109032557.13831-1-cmr@codefail.de>
From: Daniel Axtens <dja@axtens.net>
Add uaccess blocks and use the 'unsafe' versions of functions doing user
access where possible to reduce the number of times uaccess has to be
opened/closed.
There is no 'unsafe' version of copy_siginfo_to_user, so move it
slightly to allow for a "longer" uaccess block.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Co-developed-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal_64.c | 54 +++++++++++++++++++++------------
1 file changed, 34 insertions(+), 20 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 0c7f633b832b..7e62d62ef482 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -848,44 +848,51 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
unsigned long msr __maybe_unused = regs->msr;
frame = get_sigframe(ksig, tsk, sizeof(*frame), 0);
- if (!access_ok(frame, sizeof(*frame)))
- goto badframe;
- err |= __put_user(&frame->info, &frame->pinfo);
- err |= __put_user(&frame->uc, &frame->puc);
- err |= copy_siginfo_to_user(&frame->info, &ksig->info);
- if (err)
+ /* This only applies when calling unsafe_setup_sigcontext() and must be
+ * called before opening the uaccess window.
+ */
+ if (!MSR_TM_ACTIVE(msr))
+ prepare_setup_sigcontext(tsk, 1);
+
+ if (!user_write_access_begin(frame, sizeof(*frame)))
goto badframe;
+ unsafe_put_user(&frame->info, &frame->pinfo, badframe_block);
+ unsafe_put_user(&frame->uc, &frame->puc, badframe_block);
+
/* Create the ucontext. */
- err |= __put_user(0, &frame->uc.uc_flags);
- err |= __save_altstack(&frame->uc.uc_stack, regs->gpr[1]);
+ unsafe_put_user(0, &frame->uc.uc_flags, badframe_block);
+ unsafe_save_altstack(&frame->uc.uc_stack, regs->gpr[1], badframe_block);
if (MSR_TM_ACTIVE(msr)) {
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/* The ucontext_t passed to userland points to the second
* ucontext_t (for transactional state) with its uc_link ptr.
*/
- err |= __put_user(&frame->uc_transact, &frame->uc.uc_link);
+ unsafe_put_user(&frame->uc_transact, &frame->uc.uc_link, badframe_block);
+
+ user_write_access_end();
+
err |= setup_tm_sigcontexts(&frame->uc.uc_mcontext,
&frame->uc_transact.uc_mcontext,
tsk, ksig->sig, NULL,
(unsigned long)ksig->ka.sa.sa_handler,
msr);
+
+ if (!user_write_access_begin(frame, sizeof(struct rt_sigframe)))
+ goto badframe;
+
#endif
} else {
- err |= __put_user(0, &frame->uc.uc_link);
- prepare_setup_sigcontext(tsk, 1);
- if (!user_write_access_begin(frame, sizeof(struct rt_sigframe)))
- return -EFAULT;
- err |= __unsafe_setup_sigcontext(&frame->uc.uc_mcontext, tsk,
- ksig->sig, NULL,
- (unsigned long)ksig->ka.sa.sa_handler, 1);
- user_write_access_end();
+ unsafe_put_user(0, &frame->uc.uc_link, badframe_block);
+ unsafe_setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig,
+ NULL, (unsigned long)ksig->ka.sa.sa_handler,
+ 1, badframe_block);
}
- err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
- if (err)
- goto badframe;
+
+ unsafe_copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set), badframe_block);
+ user_write_access_end();
/* Make sure signal handler doesn't get spurious FP exceptions */
tsk->thread.fp_state.fpscr = 0;
@@ -900,6 +907,11 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
regs->nip = (unsigned long) &frame->tramp[0];
}
+
+ /* Save the siginfo outside of the unsafe block. */
+ if (copy_siginfo_to_user(&frame->info, &ksig->info))
+ goto badframe;
+
/* Allocate a dummy caller frame for the signal handler. */
newsp = ((unsigned long)frame) - __SIGNAL_FRAMESIZE;
err |= put_user(regs->gpr[1], (unsigned long __user *)newsp);
@@ -939,6 +951,8 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
return 0;
+badframe_block:
+ user_write_access_end();
badframe:
signal_fault(current, regs, "handle_rt_signal64", frame);
--
2.26.1
^ permalink raw reply related
* [PATCH v3 5/8] powerpc/signal64: Replace setup_sigcontext() w/ unsafe_setup_sigcontext()
From: Christopher M. Riedl @ 2021-01-09 3:25 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20210109032557.13831-1-cmr@codefail.de>
Previously setup_sigcontext() performed a costly KUAP switch on every
uaccess operation. These repeated uaccess switches cause a significant
drop in signal handling performance.
Rewrite setup_sigcontext() to assume that a userspace write access window
is open. Replace all uaccess functions with their 'unsafe' versions
which avoid the repeated uaccess switches.
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal_64.c | 70 ++++++++++++++++++++-------------
1 file changed, 43 insertions(+), 27 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index dd3787f67a78..fd3c1ab9a1ea 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -101,9 +101,13 @@ static void prepare_setup_sigcontext(struct task_struct *tsk, int ctx_has_vsx_re
* Set up the sigcontext for the signal frame.
*/
-static long setup_sigcontext(struct sigcontext __user *sc,
- struct task_struct *tsk, int signr, sigset_t *set,
- unsigned long handler, int ctx_has_vsx_region)
+#define unsafe_setup_sigcontext(sc, tsk, signr, set, handler, \
+ ctx_has_vsx_region, e) \
+ unsafe_op_wrap(__unsafe_setup_sigcontext(sc, tsk, signr, set, \
+ handler, ctx_has_vsx_region), e)
+static long notrace __unsafe_setup_sigcontext(struct sigcontext __user *sc,
+ struct task_struct *tsk, int signr, sigset_t *set,
+ unsigned long handler, int ctx_has_vsx_region)
{
/* When CONFIG_ALTIVEC is set, we _always_ setup v_regs even if the
* process never used altivec yet (MSR_VEC is zero in pt_regs of
@@ -118,20 +122,19 @@ static long setup_sigcontext(struct sigcontext __user *sc,
#endif
struct pt_regs *regs = tsk->thread.regs;
unsigned long msr = regs->msr;
- long err = 0;
/* Force usr to alway see softe as 1 (interrupts enabled) */
unsigned long softe = 0x1;
BUG_ON(tsk != current);
#ifdef CONFIG_ALTIVEC
- err |= __put_user(v_regs, &sc->v_regs);
+ unsafe_put_user(v_regs, &sc->v_regs, efault_out);
/* save altivec registers */
if (tsk->thread.used_vr) {
/* Copy 33 vec registers (vr0..31 and vscr) to the stack */
- err |= __copy_to_user(v_regs, &tsk->thread.vr_state,
- 33 * sizeof(vector128));
+ unsafe_copy_to_user(v_regs, &tsk->thread.vr_state,
+ 33 * sizeof(vector128), efault_out);
/* set MSR_VEC in the MSR value in the frame to indicate that sc->v_reg)
* contains valid data.
*/
@@ -140,12 +143,12 @@ static long setup_sigcontext(struct sigcontext __user *sc,
/* We always copy to/from vrsave, it's 0 if we don't have or don't
* use altivec.
*/
- err |= __put_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33]);
+ unsafe_put_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33], efault_out);
#else /* CONFIG_ALTIVEC */
- err |= __put_user(0, &sc->v_regs);
+ unsafe_put_user(0, &sc->v_regs, efault_out);
#endif /* CONFIG_ALTIVEC */
/* copy fpr regs and fpscr */
- err |= copy_fpr_to_user(&sc->fp_regs, tsk);
+ unsafe_copy_fpr_to_user(&sc->fp_regs, tsk, efault_out);
/*
* Clear the MSR VSX bit to indicate there is no valid state attached
@@ -160,24 +163,27 @@ static long setup_sigcontext(struct sigcontext __user *sc,
*/
if (tsk->thread.used_vsr && ctx_has_vsx_region) {
v_regs += ELF_NVRREG;
- err |= copy_vsx_to_user(v_regs, tsk);
+ unsafe_copy_vsx_to_user(v_regs, tsk, efault_out);
/* set MSR_VSX in the MSR value in the frame to
* indicate that sc->vs_reg) contains valid data.
*/
msr |= MSR_VSX;
}
#endif /* CONFIG_VSX */
- err |= __put_user(&sc->gp_regs, &sc->regs);
+ unsafe_put_user(&sc->gp_regs, &sc->regs, efault_out);
WARN_ON(!FULL_REGS(regs));
- err |= __copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE);
- err |= __put_user(msr, &sc->gp_regs[PT_MSR]);
- err |= __put_user(softe, &sc->gp_regs[PT_SOFTE]);
- err |= __put_user(signr, &sc->signal);
- err |= __put_user(handler, &sc->handler);
+ unsafe_copy_to_user(&sc->gp_regs, regs, GP_REGS_SIZE, efault_out);
+ unsafe_put_user(msr, &sc->gp_regs[PT_MSR], efault_out);
+ unsafe_put_user(softe, &sc->gp_regs[PT_SOFTE], efault_out);
+ unsafe_put_user(signr, &sc->signal, efault_out);
+ unsafe_put_user(handler, &sc->handler, efault_out);
if (set != NULL)
- err |= __put_user(set->sig[0], &sc->oldmask);
+ unsafe_put_user(set->sig[0], &sc->oldmask, efault_out);
- return err;
+ return 0;
+
+efault_out:
+ return -EFAULT;
}
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
@@ -664,12 +670,15 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
if (old_ctx != NULL) {
prepare_setup_sigcontext(current, ctx_has_vsx_region);
- if (!access_ok(old_ctx, ctx_size)
- || setup_sigcontext(&old_ctx->uc_mcontext, current, 0, NULL, 0,
- ctx_has_vsx_region)
- || __copy_to_user(&old_ctx->uc_sigmask,
- ¤t->blocked, sizeof(sigset_t)))
+ if (!user_write_access_begin(old_ctx, ctx_size))
return -EFAULT;
+
+ unsafe_setup_sigcontext(&old_ctx->uc_mcontext, current, 0, NULL,
+ 0, ctx_has_vsx_region, efault_out);
+ unsafe_copy_to_user(&old_ctx->uc_sigmask, ¤t->blocked,
+ sizeof(sigset_t), efault_out);
+
+ user_write_access_end();
}
if (new_ctx == NULL)
return 0;
@@ -698,6 +707,10 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
/* This returns like rt_sigreturn */
set_thread_flag(TIF_RESTOREALL);
return 0;
+
+efault_out:
+ user_write_access_end();
+ return -EFAULT;
}
@@ -849,9 +862,12 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
} else {
err |= __put_user(0, &frame->uc.uc_link);
prepare_setup_sigcontext(tsk, 1);
- err |= setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig,
- NULL, (unsigned long)ksig->ka.sa.sa_handler,
- 1);
+ if (!user_write_access_begin(frame, sizeof(struct rt_sigframe)))
+ return -EFAULT;
+ err |= __unsafe_setup_sigcontext(&frame->uc.uc_mcontext, tsk,
+ ksig->sig, NULL,
+ (unsigned long)ksig->ka.sa.sa_handler, 1);
+ user_write_access_end();
}
err |= __copy_to_user(&frame->uc.uc_sigmask, set, sizeof(*set));
if (err)
--
2.26.1
^ permalink raw reply related
* [PATCH v3 1/8] powerpc/uaccess: Add unsafe_copy_from_user
From: Christopher M. Riedl @ 2021-01-09 3:25 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20210109032557.13831-1-cmr@codefail.de>
Implement raw_copy_from_user_allowed() which assumes that userspace read
access is open. Use this new function to implement raw_copy_from_user().
Finally, wrap the new function to follow the usual "unsafe_" convention
of taking a label argument.
The new raw_copy_from_user_allowed() calls non-inline __copy_tofrom_user()
internally. This is still safe to call inside user access blocks formed
with user_*_access_begin()/user_*_access_end() since asm functions are not
instrumented for tracing.
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/include/asm/uaccess.h | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 501c9a79038c..698f3a6d6ae5 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -403,38 +403,45 @@ raw_copy_in_user(void __user *to, const void __user *from, unsigned long n)
}
#endif /* __powerpc64__ */
-static inline unsigned long raw_copy_from_user(void *to,
- const void __user *from, unsigned long n)
+static inline unsigned long
+raw_copy_from_user_allowed(void *to, const void __user *from, unsigned long n)
{
- unsigned long ret;
if (__builtin_constant_p(n) && (n <= 8)) {
- ret = 1;
+ unsigned long ret = 1;
switch (n) {
case 1:
barrier_nospec();
- __get_user_size(*(u8 *)to, from, 1, ret);
+ __get_user_size_allowed(*(u8 *)to, from, 1, ret);
break;
case 2:
barrier_nospec();
- __get_user_size(*(u16 *)to, from, 2, ret);
+ __get_user_size_allowed(*(u16 *)to, from, 2, ret);
break;
case 4:
barrier_nospec();
- __get_user_size(*(u32 *)to, from, 4, ret);
+ __get_user_size_allowed(*(u32 *)to, from, 4, ret);
break;
case 8:
barrier_nospec();
- __get_user_size(*(u64 *)to, from, 8, ret);
+ __get_user_size_allowed(*(u64 *)to, from, 8, ret);
break;
}
if (ret == 0)
return 0;
}
+ return __copy_tofrom_user((__force void __user *)to, from, n);
+}
+
+static inline unsigned long
+raw_copy_from_user(void *to, const void __user *from, unsigned long n)
+{
+ unsigned long ret;
+
barrier_nospec();
allow_read_from_user(from, n);
- ret = __copy_tofrom_user((__force void __user *)to, from, n);
+ ret = raw_copy_from_user_allowed(to, from, n);
prevent_read_from_user(from, n);
return ret;
}
@@ -542,6 +549,9 @@ user_write_access_begin(const void __user *ptr, size_t len)
#define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e)
#define unsafe_put_user(x, p, e) __put_user_goto(x, p, e)
+#define unsafe_copy_from_user(d, s, l, e) \
+ unsafe_op_wrap(raw_copy_from_user_allowed(d, s, l), e)
+
#define unsafe_copy_to_user(d, s, l, e) \
do { \
u8 __user *_dst = (u8 __user *)(d); \
--
2.26.1
^ permalink raw reply related
* [PATCH v3 6/8] powerpc/signal64: Replace restore_sigcontext() w/ unsafe_restore_sigcontext()
From: Christopher M. Riedl @ 2021-01-09 3:25 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20210109032557.13831-1-cmr@codefail.de>
Previously restore_sigcontext() performed a costly KUAP switch on every
uaccess operation. These repeated uaccess switches cause a significant
drop in signal handling performance.
Rewrite restore_sigcontext() to assume that a userspace read access
window is open. Replace all uaccess functions with their 'unsafe'
versions which avoid the repeated uaccess switches.
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal_64.c | 68 ++++++++++++++++++++-------------
1 file changed, 41 insertions(+), 27 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index fd3c1ab9a1ea..0c7f633b832b 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -326,14 +326,14 @@ static long setup_tm_sigcontexts(struct sigcontext __user *sc,
/*
* Restore the sigcontext from the signal frame.
*/
-
-static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int sig,
- struct sigcontext __user *sc)
+#define unsafe_restore_sigcontext(tsk, set, sig, sc, e) \
+ unsafe_op_wrap(__unsafe_restore_sigcontext(tsk, set, sig, sc), e)
+static long notrace __unsafe_restore_sigcontext(struct task_struct *tsk, sigset_t *set,
+ int sig, struct sigcontext __user *sc)
{
#ifdef CONFIG_ALTIVEC
elf_vrreg_t __user *v_regs;
#endif
- unsigned long err = 0;
unsigned long save_r13 = 0;
unsigned long msr;
struct pt_regs *regs = tsk->thread.regs;
@@ -348,27 +348,28 @@ static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int sig,
save_r13 = regs->gpr[13];
/* copy the GPRs */
- err |= __copy_from_user(regs->gpr, sc->gp_regs, sizeof(regs->gpr));
- err |= __get_user(regs->nip, &sc->gp_regs[PT_NIP]);
+ unsafe_copy_from_user(regs->gpr, sc->gp_regs, sizeof(regs->gpr),
+ efault_out);
+ unsafe_get_user(regs->nip, &sc->gp_regs[PT_NIP], efault_out);
/* get MSR separately, transfer the LE bit if doing signal return */
- err |= __get_user(msr, &sc->gp_regs[PT_MSR]);
+ unsafe_get_user(msr, &sc->gp_regs[PT_MSR], efault_out);
if (sig)
regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
- err |= __get_user(regs->orig_gpr3, &sc->gp_regs[PT_ORIG_R3]);
- err |= __get_user(regs->ctr, &sc->gp_regs[PT_CTR]);
- err |= __get_user(regs->link, &sc->gp_regs[PT_LNK]);
- err |= __get_user(regs->xer, &sc->gp_regs[PT_XER]);
- err |= __get_user(regs->ccr, &sc->gp_regs[PT_CCR]);
+ unsafe_get_user(regs->orig_gpr3, &sc->gp_regs[PT_ORIG_R3], efault_out);
+ unsafe_get_user(regs->ctr, &sc->gp_regs[PT_CTR], efault_out);
+ unsafe_get_user(regs->link, &sc->gp_regs[PT_LNK], efault_out);
+ unsafe_get_user(regs->xer, &sc->gp_regs[PT_XER], efault_out);
+ unsafe_get_user(regs->ccr, &sc->gp_regs[PT_CCR], efault_out);
/* Don't allow userspace to set SOFTE */
set_trap_norestart(regs);
- err |= __get_user(regs->dar, &sc->gp_regs[PT_DAR]);
- err |= __get_user(regs->dsisr, &sc->gp_regs[PT_DSISR]);
- err |= __get_user(regs->result, &sc->gp_regs[PT_RESULT]);
+ unsafe_get_user(regs->dar, &sc->gp_regs[PT_DAR], efault_out);
+ unsafe_get_user(regs->dsisr, &sc->gp_regs[PT_DSISR], efault_out);
+ unsafe_get_user(regs->result, &sc->gp_regs[PT_RESULT], efault_out);
if (!sig)
regs->gpr[13] = save_r13;
if (set != NULL)
- err |= __get_user(set->sig[0], &sc->oldmask);
+ unsafe_get_user(set->sig[0], &sc->oldmask, efault_out);
/*
* Force reload of FP/VEC.
@@ -378,29 +379,28 @@ static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int sig,
regs->msr &= ~(MSR_FP | MSR_FE0 | MSR_FE1 | MSR_VEC | MSR_VSX);
#ifdef CONFIG_ALTIVEC
- err |= __get_user(v_regs, &sc->v_regs);
- if (err)
- return err;
+ unsafe_get_user(v_regs, &sc->v_regs, efault_out);
if (v_regs && !access_ok(v_regs, 34 * sizeof(vector128)))
return -EFAULT;
/* Copy 33 vec registers (vr0..31 and vscr) from the stack */
if (v_regs != NULL && (msr & MSR_VEC) != 0) {
- err |= __copy_from_user(&tsk->thread.vr_state, v_regs,
- 33 * sizeof(vector128));
+ unsafe_copy_from_user(&tsk->thread.vr_state, v_regs,
+ 33 * sizeof(vector128), efault_out);
tsk->thread.used_vr = true;
} else if (tsk->thread.used_vr) {
memset(&tsk->thread.vr_state, 0, 33 * sizeof(vector128));
}
/* Always get VRSAVE back */
if (v_regs != NULL)
- err |= __get_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33]);
+ unsafe_get_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33],
+ efault_out);
else
tsk->thread.vrsave = 0;
if (cpu_has_feature(CPU_FTR_ALTIVEC))
mtspr(SPRN_VRSAVE, tsk->thread.vrsave);
#endif /* CONFIG_ALTIVEC */
/* restore floating point */
- err |= copy_fpr_from_user(tsk, &sc->fp_regs);
+ unsafe_copy_fpr_from_user(tsk, &sc->fp_regs, efault_out);
#ifdef CONFIG_VSX
/*
* Get additional VSX data. Update v_regs to point after the
@@ -409,14 +409,17 @@ static long restore_sigcontext(struct task_struct *tsk, sigset_t *set, int sig,
*/
v_regs += ELF_NVRREG;
if ((msr & MSR_VSX) != 0) {
- err |= copy_vsx_from_user(tsk, v_regs);
+ unsafe_copy_vsx_from_user(tsk, v_regs, efault_out);
tsk->thread.used_vsr = true;
} else {
for (i = 0; i < 32 ; i++)
tsk->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = 0;
}
#endif
- return err;
+ return 0;
+
+efault_out:
+ return -EFAULT;
}
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
@@ -701,8 +704,14 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
if (__copy_from_user(&set, &new_ctx->uc_sigmask, sizeof(set)))
do_exit(SIGSEGV);
set_current_blocked(&set);
- if (restore_sigcontext(current, NULL, 0, &new_ctx->uc_mcontext))
+
+ if (!user_read_access_begin(new_ctx, ctx_size))
+ return -EFAULT;
+ if (__unsafe_restore_sigcontext(current, NULL, 0, &new_ctx->uc_mcontext)) {
+ user_read_access_end();
do_exit(SIGSEGV);
+ }
+ user_read_access_end();
/* This returns like rt_sigreturn */
set_thread_flag(TIF_RESTOREALL);
@@ -806,8 +815,13 @@ SYSCALL_DEFINE0(rt_sigreturn)
* causing a TM bad thing.
*/
current->thread.regs->msr &= ~MSR_TS_MASK;
- if (restore_sigcontext(current, NULL, 1, &uc->uc_mcontext))
+ if (!user_read_access_begin(uc, sizeof(*uc)))
+ return -EFAULT;
+ if (__unsafe_restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) {
+ user_read_access_end();
goto badframe;
+ }
+ user_read_access_end();
}
if (restore_altstack(&uc->uc_stack))
--
2.26.1
^ permalink raw reply related
* [PATCH v3 2/8] powerpc/signal: Add unsafe_copy_{vsx,fpr}_from_user()
From: Christopher M. Riedl @ 2021-01-09 3:25 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20210109032557.13831-1-cmr@codefail.de>
Reuse the "safe" implementation from signal.c except for calling
unsafe_copy_from_user() to copy into a local buffer.
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal.h | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/arch/powerpc/kernel/signal.h b/arch/powerpc/kernel/signal.h
index 2559a681536e..c18402d625f1 100644
--- a/arch/powerpc/kernel/signal.h
+++ b/arch/powerpc/kernel/signal.h
@@ -53,6 +53,33 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
&buf[i], label);\
} while (0)
+#define unsafe_copy_fpr_from_user(task, from, label) do { \
+ struct task_struct *__t = task; \
+ u64 __user *__f = (u64 __user *)from; \
+ u64 buf[ELF_NFPREG]; \
+ int i; \
+ \
+ unsafe_copy_from_user(buf, __f, ELF_NFPREG * sizeof(double), \
+ label); \
+ for (i = 0; i < ELF_NFPREG - 1; i++) \
+ __t->thread.TS_FPR(i) = buf[i]; \
+ __t->thread.fp_state.fpscr = buf[i]; \
+} while (0)
+
+#define unsafe_copy_vsx_from_user(task, from, label) do { \
+ struct task_struct *__t = task; \
+ u64 __user *__f = (u64 __user *)from; \
+ u64 buf[ELF_NVSRHALFREG]; \
+ int i; \
+ \
+ unsafe_copy_from_user(buf, __f, \
+ ELF_NVSRHALFREG * sizeof(double), \
+ label); \
+ for (i = 0; i < ELF_NVSRHALFREG ; i++) \
+ __t->thread.fp_state.fpr[i][TS_VSRLOWOFFSET] = buf[i]; \
+} while (0)
+
+
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
#define unsafe_copy_ckfpr_to_user(to, task, label) do { \
struct task_struct *__t = task; \
@@ -80,6 +107,10 @@ unsigned long copy_ckfpr_from_user(struct task_struct *task, void __user *from);
unsafe_copy_to_user(to, (task)->thread.fp_state.fpr, \
ELF_NFPREG * sizeof(double), label)
+#define unsafe_copy_fpr_from_user(task, from, label) \
+ unsafe_copy_from_user((task)->thread.fp_state.fpr, from, \
+ ELF_NFPREG * sizeof(double), label)
+
static inline unsigned long
copy_fpr_to_user(void __user *to, struct task_struct *task)
{
@@ -115,6 +146,8 @@ copy_ckfpr_from_user(struct task_struct *task, void __user *from)
#else
#define unsafe_copy_fpr_to_user(to, task, label) do { } while (0)
+#define unsafe_copy_fpr_from_user(task, from, label) do { } while (0)
+
static inline unsigned long
copy_fpr_to_user(void __user *to, struct task_struct *task)
{
--
2.26.1
^ permalink raw reply related
* [PATCH v3 4/8] powerpc/signal64: Remove TM ifdefery in middle of if/else block
From: Christopher M. Riedl @ 2021-01-09 3:25 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20210109032557.13831-1-cmr@codefail.de>
Rework the messy ifdef breaking up the if-else for TM similar to
commit f1cf4f93de2f ("powerpc/signal32: Remove ifdefery in middle of if/else").
Unlike that commit for ppc32, the ifdef can't be removed entirely since
uc_transact in sigframe depends on CONFIG_PPC_TRANSACTIONAL_MEM.
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal_64.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index b211a8ea4f6e..dd3787f67a78 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -710,9 +710,7 @@ SYSCALL_DEFINE0(rt_sigreturn)
struct pt_regs *regs = current_pt_regs();
struct ucontext __user *uc = (struct ucontext __user *)regs->gpr[1];
sigset_t set;
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
unsigned long msr;
-#endif
/* Always make any pending restarted system calls return -EINTR */
current->restart_block.fn = do_no_restart_syscall;
@@ -762,10 +760,12 @@ SYSCALL_DEFINE0(rt_sigreturn)
* restore_tm_sigcontexts.
*/
regs->msr &= ~MSR_TS_MASK;
+#endif
if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR]))
goto badframe;
if (MSR_TM_ACTIVE(msr)) {
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/* We recheckpoint on return. */
struct ucontext __user *uc_transact;
@@ -778,9 +778,8 @@ SYSCALL_DEFINE0(rt_sigreturn)
if (restore_tm_sigcontexts(current, &uc->uc_mcontext,
&uc_transact->uc_mcontext))
goto badframe;
- } else
#endif
- {
+ } else {
/*
* Fall through, for non-TM restore
*
@@ -818,10 +817,8 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
unsigned long newsp = 0;
long err = 0;
struct pt_regs *regs = tsk->thread.regs;
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/* Save the thread's msr before get_tm_stackpointer() changes it */
- unsigned long msr = regs->msr;
-#endif
+ unsigned long msr __maybe_unused = regs->msr;
frame = get_sigframe(ksig, tsk, sizeof(*frame), 0);
if (!access_ok(frame, sizeof(*frame)))
@@ -836,8 +833,9 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
/* Create the ucontext. */
err |= __put_user(0, &frame->uc.uc_flags);
err |= __save_altstack(&frame->uc.uc_stack, regs->gpr[1]);
-#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
+
if (MSR_TM_ACTIVE(msr)) {
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/* The ucontext_t passed to userland points to the second
* ucontext_t (for transactional state) with its uc_link ptr.
*/
@@ -847,9 +845,8 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
tsk, ksig->sig, NULL,
(unsigned long)ksig->ka.sa.sa_handler,
msr);
- } else
#endif
- {
+ } else {
err |= __put_user(0, &frame->uc.uc_link);
prepare_setup_sigcontext(tsk, 1);
err |= setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig,
--
2.26.1
^ permalink raw reply related
* [PATCH v3 8/8] powerpc/signal64: Rewrite rt_sigreturn() to minimise uaccess switches
From: Christopher M. Riedl @ 2021-01-09 3:25 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Daniel Axtens
In-Reply-To: <20210109032557.13831-1-cmr@codefail.de>
From: Daniel Axtens <dja@axtens.net>
Add uaccess blocks and use the 'unsafe' versions of functions doing user
access where possible to reduce the number of times uaccess has to be
opened/closed.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Co-developed-by: Christopher M. Riedl <cmr@codefail.de>
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal_64.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index 7e62d62ef482..1b1e39663999 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -784,8 +784,11 @@ SYSCALL_DEFINE0(rt_sigreturn)
regs->msr &= ~MSR_TS_MASK;
#endif
- if (__get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR]))
+ if (!user_read_access_begin(uc, sizeof(*uc)))
goto badframe;
+
+ unsafe_get_user(msr, &uc->uc_mcontext.gp_regs[PT_MSR], badframe_block);
+
if (MSR_TM_ACTIVE(msr)) {
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
/* We recheckpoint on return. */
@@ -793,10 +796,12 @@ SYSCALL_DEFINE0(rt_sigreturn)
/* Trying to start TM on non TM system */
if (!cpu_has_feature(CPU_FTR_TM))
- goto badframe;
+ goto badframe_block;
+
+ unsafe_get_user(uc_transact, &uc->uc_link, badframe_block);
+
+ user_read_access_end();
- if (__get_user(uc_transact, &uc->uc_link))
- goto badframe;
if (restore_tm_sigcontexts(current, &uc->uc_mcontext,
&uc_transact->uc_mcontext))
goto badframe;
@@ -815,12 +820,9 @@ SYSCALL_DEFINE0(rt_sigreturn)
* causing a TM bad thing.
*/
current->thread.regs->msr &= ~MSR_TS_MASK;
- if (!user_read_access_begin(uc, sizeof(*uc)))
- return -EFAULT;
- if (__unsafe_restore_sigcontext(current, NULL, 1, &uc->uc_mcontext)) {
- user_read_access_end();
- goto badframe;
- }
+ unsafe_restore_sigcontext(current, NULL, 1, &uc->uc_mcontext,
+ badframe_block);
+
user_read_access_end();
}
@@ -830,6 +832,8 @@ SYSCALL_DEFINE0(rt_sigreturn)
set_thread_flag(TIF_RESTOREALL);
return 0;
+badframe_block:
+ user_read_access_end();
badframe:
signal_fault(current, regs, "rt_sigreturn", uc);
--
2.26.1
^ permalink raw reply related
* [PATCH v3 0/8] Improve signal performance on PPC64 with KUAP
From: Christopher M. Riedl @ 2021-01-09 3:25 UTC (permalink / raw)
To: linuxppc-dev
As reported by Anton, there is a large penalty to signal handling
performance on radix systems using KUAP. The signal handling code
performs many user access operations, each of which needs to switch the
KUAP permissions bit to open and then close user access. This involves a
costly 'mtspr' operation [0].
There is existing work done on x86 and by Christopher Leroy for PPC32 to
instead open up user access in "blocks" using user_*_access_{begin,end}.
We can do the same in PPC64 to bring performance back up on KUAP-enabled
radix and now also hash MMU systems [1].
Hash MMU KUAP support along with uaccess flush has landed in linuxppc/next
since the last revision. This series also provides a large benefit on hash
with KUAP. However, in the hash implementation of KUAP the user AMR is
always restored during system_call_exception() which cannot be avoided.
Fewer user access switches naturally also result in less uaccess flushing.
The first two patches add some needed 'unsafe' versions of copy-from
functions. While these do not make use of asm-goto they still allow for
avoiding the repeated uaccess switches.
The third patch moves functions called by setup_sigcontext() into a new
prepare_setup_sigcontext() to simplify converting setup_sigcontext()
into an 'unsafe' version which assumes an open uaccess window later.
The fourth patch cleans-up some of the Transactional Memory ifdef stuff
to simplify using uaccess blocks later.
The next two patches rewrite some of the signal64 helper functions to
be 'unsafe'. Finally, the last two patches update the main signal
handling functions to make use of the new 'unsafe' helpers and eliminate
some additional uaccess switching.
I used the will-it-scale signal1 benchmark to measure and compare
performance [2]. The below results are from running a minimal
kernel+initramfs QEMU/KVM guest on a POWER9 Blackbird:
signal1_threads -t1 -s10
| | hash | radix |
| --------------------------- | ------ | ------ |
| linuxppc/next | 118693 | 133296 |
| linuxppc/next w/o KUAP+KUEP | 228911 | 228654 |
| unsafe-signal64 | 200480 | 234067 |
There were questions previously about the unsafe_copy_from_user() and
unsafe_copy_{vsx,fpr}_from_user() implementations in the first two
patches of this series [3][4]. The two results below show the performance
degradations when using the proposed alternate implementations:
| unsafe-signal64-regs | 178688 | 201128 |
| unsafe-signal64-copy | 147443 | 165759 |
Full trees with the two alternate implementations are available [5][6].
[0]: https://github.com/linuxppc/issues/issues/277
[1]: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=196278
[2]: https://github.com/antonblanchard/will-it-scale/blob/master/tests/signal1.c
[3]: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-October/219355.html
[4]: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-October/219351.html
[5]: https://git.sr.ht/~cmr/linux/tree/unsafe-signal64-v3-regs
[6]: https://git.sr.ht/~cmr/linux/tree/unsafe-signal64-v3-copy
v3: * Rebase on latest linuxppc/next
* Reword confusing commit messages
* Add missing comma in macro in signal.h which broke compiles without
CONFIG_ALTIVEC
* Validate hash KUAP signal performance improvements
v2: * Rebase on latest linuxppc/next + Christophe Leroy's PPC32
signal series
* Simplify/remove TM ifdefery similar to PPC32 series and clean
up the uaccess begin/end calls
* Isolate non-inline functions so they are not called when
uaccess window is open
Christopher M. Riedl (6):
powerpc/uaccess: Add unsafe_copy_from_user
powerpc/signal: Add unsafe_copy_{vsx,fpr}_from_user()
powerpc/signal64: Move non-inline functions out of setup_sigcontext()
powerpc/signal64: Remove TM ifdefery in middle of if/else block
powerpc/signal64: Replace setup_sigcontext() w/
unsafe_setup_sigcontext()
powerpc/signal64: Replace restore_sigcontext() w/
unsafe_restore_sigcontext()
Daniel Axtens (2):
powerpc/signal64: Rewrite handle_rt_signal64() to minimise uaccess
switches
powerpc/signal64: Rewrite rt_sigreturn() to minimise uaccess switches
arch/powerpc/include/asm/uaccess.h | 28 ++--
arch/powerpc/kernel/signal.h | 33 ++++
arch/powerpc/kernel/signal_64.c | 237 ++++++++++++++++++-----------
3 files changed, 198 insertions(+), 100 deletions(-)
--
2.26.1
^ permalink raw reply
* [PATCH v3 3/8] powerpc/signal64: Move non-inline functions out of setup_sigcontext()
From: Christopher M. Riedl @ 2021-01-09 3:25 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <20210109032557.13831-1-cmr@codefail.de>
There are non-inline functions which get called in setup_sigcontext() to
save register state to the thread struct. Move these functions into a
separate prepare_setup_sigcontext() function so that
setup_sigcontext() can be refactored later into an "unsafe" version
which assumes an open uaccess window. Non-inline functions should be
avoided when uaccess is open.
The majority of setup_sigcontext() can be refactored to execute in an
"unsafe" context (uaccess window is opened) except for some non-inline
functions. Move these out into a separate prepare_setup_sigcontext()
function which must be called first and before opening up a uaccess
window. A follow-up commit converts setup_sigcontext() to be "unsafe".
Signed-off-by: Christopher M. Riedl <cmr@codefail.de>
---
arch/powerpc/kernel/signal_64.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index f9e4a1ac440f..b211a8ea4f6e 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -79,6 +79,24 @@ static elf_vrreg_t __user *sigcontext_vmx_regs(struct sigcontext __user *sc)
}
#endif
+static void prepare_setup_sigcontext(struct task_struct *tsk, int ctx_has_vsx_region)
+{
+#ifdef CONFIG_ALTIVEC
+ /* save altivec registers */
+ if (tsk->thread.used_vr)
+ flush_altivec_to_thread(tsk);
+ if (cpu_has_feature(CPU_FTR_ALTIVEC))
+ tsk->thread.vrsave = mfspr(SPRN_VRSAVE);
+#endif /* CONFIG_ALTIVEC */
+
+ flush_fp_to_thread(tsk);
+
+#ifdef CONFIG_VSX
+ if (tsk->thread.used_vsr && ctx_has_vsx_region)
+ flush_vsx_to_thread(tsk);
+#endif /* CONFIG_VSX */
+}
+
/*
* Set up the sigcontext for the signal frame.
*/
@@ -97,7 +115,6 @@ static long setup_sigcontext(struct sigcontext __user *sc,
*/
#ifdef CONFIG_ALTIVEC
elf_vrreg_t __user *v_regs = sigcontext_vmx_regs(sc);
- unsigned long vrsave;
#endif
struct pt_regs *regs = tsk->thread.regs;
unsigned long msr = regs->msr;
@@ -112,7 +129,6 @@ static long setup_sigcontext(struct sigcontext __user *sc,
/* save altivec registers */
if (tsk->thread.used_vr) {
- flush_altivec_to_thread(tsk);
/* Copy 33 vec registers (vr0..31 and vscr) to the stack */
err |= __copy_to_user(v_regs, &tsk->thread.vr_state,
33 * sizeof(vector128));
@@ -124,17 +140,10 @@ static long setup_sigcontext(struct sigcontext __user *sc,
/* We always copy to/from vrsave, it's 0 if we don't have or don't
* use altivec.
*/
- vrsave = 0;
- if (cpu_has_feature(CPU_FTR_ALTIVEC)) {
- vrsave = mfspr(SPRN_VRSAVE);
- tsk->thread.vrsave = vrsave;
- }
-
- err |= __put_user(vrsave, (u32 __user *)&v_regs[33]);
+ err |= __put_user(tsk->thread.vrsave, (u32 __user *)&v_regs[33]);
#else /* CONFIG_ALTIVEC */
err |= __put_user(0, &sc->v_regs);
#endif /* CONFIG_ALTIVEC */
- flush_fp_to_thread(tsk);
/* copy fpr regs and fpscr */
err |= copy_fpr_to_user(&sc->fp_regs, tsk);
@@ -150,7 +159,6 @@ static long setup_sigcontext(struct sigcontext __user *sc,
* VMX data.
*/
if (tsk->thread.used_vsr && ctx_has_vsx_region) {
- flush_vsx_to_thread(tsk);
v_regs += ELF_NVRREG;
err |= copy_vsx_to_user(v_regs, tsk);
/* set MSR_VSX in the MSR value in the frame to
@@ -655,6 +663,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
ctx_has_vsx_region = 1;
if (old_ctx != NULL) {
+ prepare_setup_sigcontext(current, ctx_has_vsx_region);
if (!access_ok(old_ctx, ctx_size)
|| setup_sigcontext(&old_ctx->uc_mcontext, current, 0, NULL, 0,
ctx_has_vsx_region)
@@ -842,6 +851,7 @@ int handle_rt_signal64(struct ksignal *ksig, sigset_t *set,
#endif
{
err |= __put_user(0, &frame->uc.uc_link);
+ prepare_setup_sigcontext(tsk, 1);
err |= setup_sigcontext(&frame->uc.uc_mcontext, tsk, ksig->sig,
NULL, (unsigned long)ksig->ka.sa.sa_handler,
1);
--
2.26.1
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox