* Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool
From: Konrad Rzeszutek Wilk @ 2021-01-07 18:00 UTC (permalink / raw)
To: Claire Chang
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
rafael.j.wysocki, Christoph Hellwig, Bartosz Golaszewski,
xen-devel, Thierry Reding, linux-devicetree, will, dan.j.williams,
linuxppc-dev, Rob Herring, boris.ostrovsky, Andy Shevchenko,
jgross, Nicolas Boichat, Greg KH, rdunlap, lkml, Tomasz Figa,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <CALiNf2_dV13jbHqLt-r1eK+dtOcAKBGcWQCVMQn+eL6MuOrETQ@mail.gmail.com>
On Fri, Jan 08, 2021 at 01:39:43AM +0800, Claire Chang wrote:
> On Thu, Jan 7, 2021 at 2:58 AM Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> >
> > On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> > > Introduce the new compatible string, restricted-dma-pool, for restricted
> > > DMA. One can specify the address and length of the restricted DMA memory
> > > region by restricted-dma-pool in the device tree.
> > >
> > > Signed-off-by: Claire Chang <tientzu@chromium.org>
> > > ---
> > > .../reserved-memory/reserved-memory.txt | 24 +++++++++++++++++++
> > > 1 file changed, 24 insertions(+)
> > >
> > > diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > index e8d3096d922c..44975e2a1fd2 100644
> > > --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > > @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> > > used as a shared pool of DMA buffers for a set of devices. It can
> > > be used by an operating system to instantiate the necessary pool
> > > management subsystem if necessary.
> > > + - restricted-dma-pool: This indicates a region of memory meant to be
> > > + used as a pool of restricted DMA buffers for a set of devices. The
> > > + memory region would be the only region accessible to those devices.
> > > + When using this, the no-map and reusable properties must not be set,
> > > + so the operating system can create a virtual mapping that will be used
> > > + for synchronization. The main purpose for restricted DMA is to
> > > + mitigate the lack of DMA access control on systems without an IOMMU,
> > > + which could result in the DMA accessing the system memory at
> > > + unexpected times and/or unexpected addresses, possibly leading to data
> > > + leakage or corruption. The feature on its own provides a basic level
> > > + of protection against the DMA overwriting buffer contents at
> > > + unexpected times. However, to protect against general data leakage and
> > > + system memory corruption, the system needs to provide way to restrict
> > > + the DMA to a predefined memory region.
> >
> > Heya!
> >
> > I think I am missing something obvious here so please bear with my
> > questions:
> >
> > - This code adds the means of having the SWIOTLB pool tied to a specific
> > memory correct?
>
> It doesn't affect the existing SWIOTLB. It just utilizes the existing SWIOTLB
> code to create another DMA pool tied to a specific memory region for a given set
> of devices. It bounces the streaming DMA (map/unmap) in and out of that region
> and does the memory allocation (dma_direct_alloc) from the same region.
Right, so why can't it follow the same mechanism that Xen SWIOTLB does - which
had exactly the same problem (needed special handling on the pool) - and do
a similar code?
>
> >
> >
> > - Nothing stops the physical device from bypassing the SWIOTLB buffer.
> > That is if an errant device screwed up the length or DMA address, the
> > SWIOTLB would gladly do what the device told it do?
>
> So the system needs to provide a way to lock down the memory access, e.g. MPU.
OK! Would it be prudent to have this in the description above perhaps?
>
> >
> > - This has to be combined with SWIOTLB-force-ish to always use the
> > bounce buffer, otherwise you could still do DMA without using
> > SWIOTLB (by not hitting the criteria for needing to use SWIOTLB)?
>
> Since restricted DMA is for the devices that are not behind an IOMMU, I change
> the criteria
> `if (unlikely(swiotlb_force == SWIOTLB_FORCE))`
> to
> `if (unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dma_io_tlb_mem)`
> in dma_direct_map_page().
>
> Also, even if SWIOTLB=force, the restricted DMA pool is preferred if available
> (get_io_tlb_mem in https://lore.kernel.org/patchwork/patch/1360995/).
>
> Thanks!
^ permalink raw reply
* Re: [RFC PATCH v3 0/6] Restricted DMA
From: Claire Chang @ 2021-01-07 17:38 UTC (permalink / raw)
To: Florian Fainelli
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
rafael.j.wysocki, Christoph Hellwig, Bartosz Golaszewski,
xen-devel, Thierry Reding, linux-devicetree, will, konrad.wilk,
dan.j.williams, linuxppc-dev, Rob Herring, boris.ostrovsky,
Andy Shevchenko, jgross, Nicolas Boichat, Greg KH, rdunlap, lkml,
Tomasz Figa,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
Jim Quinlan, xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <d7043239-12cf-3636-4726-2e3b90917dc6@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 7552 bytes --]
On Thu, Jan 7, 2021 at 2:48 AM Florian Fainelli <f.fainelli@gmail.com>
wrote:
>
> Hi,
>
> First of all let me say that I am glad that someone is working on a
> upstream solution for this issue, would appreciate if you could CC and
> Jim Quinlan on subsequent submissions.
Sure!
>
>
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > This series implements mitigations for lack of DMA access control on
> > systems without an IOMMU, which could result in the DMA accessing the
> > system memory at unexpected times and/or unexpected addresses, possibly
> > leading to data leakage or corruption.
> >
> > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus
is
> > not behind an IOMMU. As PCI-e, by design, gives the device full access
to
> > system memory, a vulnerability in the Wi-Fi firmware could easily
escalate
> > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> > full chain of exploits; [2], [3]).
> >
> > To mitigate the security concerns, we introduce restricted DMA.
Restricted
> > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of
a
> > specially allocated region and does memory allocation from the same
region.
> > The feature on its own provides a basic level of protection against the
DMA
> > overwriting buffer contents at unexpected times. However, to protect
> > against general data leakage and system memory corruption, the system
needs
> > to provide a way to restrict the DMA to a predefined memory region
(this is
> > usually done at firmware level, e.g. in ATF on some ARM platforms).
>
> Can you explain how ATF gets involved and to what extent it does help,
> besides enforcing a secure region from the ARM CPU's perpsective? Does
> the PCIe root complex not have an IOMMU but can somehow be denied access
> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> still some sort of basic protection that the HW enforces, right?
We need the ATF support for memory MPU (memory protection unit).
Restricted DMA (with reserved-memory in dts) makes sure the predefined
memory
region is for PCIe DMA only, but we still need MPU to locks down PCIe
access to
that specific regions.
>
> On Broadcom STB SoCs we have had something similar for a while however
> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> basic protection mechanism whereby we can configure a region in DRAM to
> be PCIe read/write and CPU read/write which then gets used as the PCIe
> inbound region for the PCIe EP. By default the PCIe bridge is not
> allowed access to DRAM so we must call into a security agent to allow
> the PCIe bridge to access the designated DRAM region.
>
> We have done this using a private CMA area region assigned via Device
> Tree, assigned with a and requiring the PCIe EP driver to use
> dma_alloc_from_contiguous() in order to allocate from this device
> private CMA area. The only drawback with that approach is that it
> requires knowing how much memory you need up front for buffers and DMA
> descriptors that the PCIe EP will need to process. The problem is that
> it requires driver modifications and that does not scale over the number
> of PCIe EP drivers, some we absolutely do not control, but there is no
> need to bounce buffer. Your approach scales better across PCIe EP
> drivers however it does require bounce buffering which could be a
> performance hit.
Only the streaming DMA (map/unmap) needs bounce buffering.
I also added alloc/free support in this series
(https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc()
will
try to allocate memory from the predefined memory region.
As for the performance hit, it should be similar to the default swiotlb.
Here are my experiment results. Both SoCs lack IOMMU for PCIe.
PCIe wifi vht80 throughput -
MTK SoC tcp_tx tcp_rx udp_tx udp_rx
w/o Restricted DMA 244.1 134.66 312.56 350.79
w/ Restricted DMA 246.95 136.59 363.21 351.99
Rockchip SoC tcp_tx tcp_rx udp_tx udp_rx
w/o Restricted DMA 237.87 133.86 288.28 361.88
w/ Restricted DMA 256.01 130.95 292.28 353.19
The CPU usage doesn't increase too much either.
Although I didn't measure the CPU usage very precisely, it's ~3% with a
single
big core (Cortex-A72) and ~5% with a single small core (Cortex-A53).
Thanks!
>
> Thanks!
On Thu, Jan 7, 2021 at 2:48 AM Florian Fainelli <f.fainelli@gmail.com>
wrote:
> Hi,
>
> First of all let me say that I am glad that someone is working on a
> upstream solution for this issue, would appreciate if you could CC and
> Jim Quinlan on subsequent submissions.
>
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > This series implements mitigations for lack of DMA access control on
> > systems without an IOMMU, which could result in the DMA accessing the
> > system memory at unexpected times and/or unexpected addresses, possibly
> > leading to data leakage or corruption.
> >
> > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> > not behind an IOMMU. As PCI-e, by design, gives the device full access to
> > system memory, a vulnerability in the Wi-Fi firmware could easily
> escalate
> > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> > full chain of exploits; [2], [3]).
> >
> > To mitigate the security concerns, we introduce restricted DMA.
> Restricted
> > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> > specially allocated region and does memory allocation from the same
> region.
> > The feature on its own provides a basic level of protection against the
> DMA
> > overwriting buffer contents at unexpected times. However, to protect
> > against general data leakage and system memory corruption, the system
> needs
> > to provide a way to restrict the DMA to a predefined memory region (this
> is
> > usually done at firmware level, e.g. in ATF on some ARM platforms).
>
> Can you explain how ATF gets involved and to what extent it does help,
> besides enforcing a secure region from the ARM CPU's perpsective? Does
> the PCIe root complex not have an IOMMU but can somehow be denied access
> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> still some sort of basic protection that the HW enforces, right?
>
> On Broadcom STB SoCs we have had something similar for a while however
> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> basic protection mechanism whereby we can configure a region in DRAM to
> be PCIe read/write and CPU read/write which then gets used as the PCIe
> inbound region for the PCIe EP. By default the PCIe bridge is not
> allowed access to DRAM so we must call into a security agent to allow
> the PCIe bridge to access the designated DRAM region.
>
> We have done this using a private CMA area region assigned via Device
> Tree, assigned with a and requiring the PCIe EP driver to use
> dma_alloc_from_contiguous() in order to allocate from this device
> private CMA area. The only drawback with that approach is that it
> requires knowing how much memory you need up front for buffers and DMA
> descriptors that the PCIe EP will need to process. The problem is that
> it requires driver modifications and that does not scale over the number
> of PCIe EP drivers, some we absolutely do not control, but there is no
> need to bounce buffer. Your approach scales better across PCIe EP
> drivers however it does require bounce buffering which could be a
> performance hit.
>
> Thanks!
> --
> Florian
>
[-- Attachment #2: Type: text/html, Size: 8961 bytes --]
^ permalink raw reply
* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Konrad Rzeszutek Wilk @ 2021-01-07 21:19 UTC (permalink / raw)
To: Florian Fainelli
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
xypron.glpk, Joerg Roedel, rafael.j.wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
will, dan.j.williams, linuxppc-dev, Rob Herring, Claire Chang,
boris.ostrovsky, Andy Shevchenko, jgross, Nicolas Boichat,
Greg KH, rdunlap, lkml, Tomasz Figa, list@263.net:IOMMU DRIVERS,
Robin Murphy, bauerman
In-Reply-To: <aa5af7d1-779e-f0f6-e6ba-8040e603523f@gmail.com>
On Thu, Jan 07, 2021 at 10:09:14AM -0800, Florian Fainelli wrote:
> On 1/7/21 9:57 AM, Konrad Rzeszutek Wilk wrote:
> > On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
> >> Hi Greg and Konrad,
> >>
> >> This change is intended to be non-arch specific. Any arch that lacks DMA access
> >> control and has devices not behind an IOMMU can make use of it. Could you share
> >> why you think this should be arch specific?
> >
> > The idea behind non-arch specific code is it to be generic. The devicetree
> > is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
> > be in arch specific code.
>
> In premise the same code could be used with an ACPI enabled system with
> an appropriate service to identify the restricted DMA regions and unlock
> them.
Which this patchset is not.
>
> More than 1 architecture requiring this function (ARM and ARM64 are the
> two I can think of needing this immediately) sort of calls for making
> the code architecture agnostic since past 2, you need something that scales.
I believe the use-case is for ARM64 at this moment.
>
> There is already code today under kernel/dma/contiguous.c that is only
> activated on a CONFIG_OF=y && CONFIG_OF_RESERVED_MEM=y system, this is
> no different.
> --
> Florian
^ permalink raw reply
* Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool
From: Florian Fainelli @ 2021-01-07 18:14 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, Claire Chang
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
Joerg Roedel, rafael.j.wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
will, dan.j.williams, linuxppc-dev, Rob Herring, boris.ostrovsky,
Andy Shevchenko, jgross, Nicolas Boichat, Greg KH, rdunlap, lkml,
Tomasz Figa, iommu, xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <20210107180032.GB16519@char.us.oracle.com>
On 1/7/21 10:00 AM, Konrad Rzeszutek Wilk wrote:
>>>
>>>
>>> - Nothing stops the physical device from bypassing the SWIOTLB buffer.
>>> That is if an errant device screwed up the length or DMA address, the
>>> SWIOTLB would gladly do what the device told it do?
>>
>> So the system needs to provide a way to lock down the memory access, e.g. MPU.
>
> OK! Would it be prudent to have this in the description above perhaps?
Yes this is something that must be documented as a requirement for the
restricted DMA pool users, otherwise attempting to do restricted DMA
pool is no different than say, using a device private CMA region.
Without the enforcement, this is just a best effort.
--
Florian
^ permalink raw reply
* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Florian Fainelli @ 2021-01-07 18:09 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, Claire Chang
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
Joerg Roedel, rafael.j.wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
will, dan.j.williams, linuxppc-dev, Rob Herring, boris.ostrovsky,
Andy Shevchenko, jgross, Nicolas Boichat, Greg KH, rdunlap, lkml,
Tomasz Figa, iommu, xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <20210107175740.GA16519@char.us.oracle.com>
On 1/7/21 9:57 AM, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
>> Hi Greg and Konrad,
>>
>> This change is intended to be non-arch specific. Any arch that lacks DMA access
>> control and has devices not behind an IOMMU can make use of it. Could you share
>> why you think this should be arch specific?
>
> The idea behind non-arch specific code is it to be generic. The devicetree
> is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
> be in arch specific code.
In premise the same code could be used with an ACPI enabled system with
an appropriate service to identify the restricted DMA regions and unlock
them.
More than 1 architecture requiring this function (ARM and ARM64 are the
two I can think of needing this immediately) sort of calls for making
the code architecture agnostic since past 2, you need something that scales.
There is already code today under kernel/dma/contiguous.c that is only
activated on a CONFIG_OF=y && CONFIG_OF_RESERVED_MEM=y system, this is
no different.
--
Florian
^ permalink raw reply
* Re: [RFC PATCH v3 0/6] Restricted DMA
From: Florian Fainelli @ 2021-01-07 17:59 UTC (permalink / raw)
To: Claire Chang
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
Joerg Roedel, rafael.j.wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
will, Konrad Rzeszutek Wilk, dan.j.williams, linuxppc-dev,
Rob Herring, boris.ostrovsky, Andy Shevchenko, jgross,
Nicolas Boichat, Greg KH, rdunlap, lkml, Tomasz Figa, iommu,
Jim Quinlan, xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <CALiNf28sU1VtGB7LeTXExkMwQiCeg8N5arqyEjw0CPZP72R4dg@mail.gmail.com>
On 1/7/21 9:42 AM, Claire Chang wrote:
>> Can you explain how ATF gets involved and to what extent it does help,
>> besides enforcing a secure region from the ARM CPU's perpsective? Does
>> the PCIe root complex not have an IOMMU but can somehow be denied access
>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
>> still some sort of basic protection that the HW enforces, right?
>
> We need the ATF support for memory MPU (memory protection unit).
> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
> region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
> that specific regions.
OK so you do have a protection unit of some sort to enforce which region
in DRAM the PCIE bridge is allowed to access, that makes sense,
otherwise the restricted DMA region would only be a hint but nothing you
can really enforce. This is almost entirely analogous to our systems then.
There may be some value in standardizing on an ARM SMCCC call then since
you already support two different SoC vendors.
>
>>
>> On Broadcom STB SoCs we have had something similar for a while however
>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
>> basic protection mechanism whereby we can configure a region in DRAM to
>> be PCIe read/write and CPU read/write which then gets used as the PCIe
>> inbound region for the PCIe EP. By default the PCIe bridge is not
>> allowed access to DRAM so we must call into a security agent to allow
>> the PCIe bridge to access the designated DRAM region.
>>
>> We have done this using a private CMA area region assigned via Device
>> Tree, assigned with a and requiring the PCIe EP driver to use
>> dma_alloc_from_contiguous() in order to allocate from this device
>> private CMA area. The only drawback with that approach is that it
>> requires knowing how much memory you need up front for buffers and DMA
>> descriptors that the PCIe EP will need to process. The problem is that
>> it requires driver modifications and that does not scale over the number
>> of PCIe EP drivers, some we absolutely do not control, but there is no
>> need to bounce buffer. Your approach scales better across PCIe EP
>> drivers however it does require bounce buffering which could be a
>> performance hit.
>
> Only the streaming DMA (map/unmap) needs bounce buffering.
True, and typically only on transmit since you don't really control
where the sk_buff are allocated from, right? On RX since you need to
hand buffer addresses to the WLAN chip prior to DMA, you can allocate
them from a pool that already falls within the restricted DMA region, right?
> I also added alloc/free support in this series
> (https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc() will
> try to allocate memory from the predefined memory region.
>
> As for the performance hit, it should be similar to the default swiotlb.
> Here are my experiment results. Both SoCs lack IOMMU for PCIe.
>
> PCIe wifi vht80 throughput -
>
> MTK SoC tcp_tx tcp_rx udp_tx udp_rx
> w/o Restricted DMA 244.1 134.66 312.56 350.79
> w/ Restricted DMA 246.95 136.59 363.21 351.99
>
> Rockchip SoC tcp_tx tcp_rx udp_tx udp_rx
> w/o Restricted DMA 237.87 133.86 288.28 361.88
> w/ Restricted DMA 256.01 130.95 292.28 353.19
How come you get better throughput with restricted DMA? Is it because
doing DMA to/from a contiguous region allows for better grouping of
transactions from the DRAM controller's perspective somehow?
>
> The CPU usage doesn't increase too much either.
> Although I didn't measure the CPU usage very precisely, it's ~3% with a single
> big core (Cortex-A72) and ~5% with a single small core (Cortex-A53).
>
> Thanks!
>
>>
>> Thanks!
>> --
>> Florian
--
Florian
^ permalink raw reply
* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Konrad Rzeszutek Wilk @ 2021-01-07 17:57 UTC (permalink / raw)
To: Claire Chang
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
rafael.j.wysocki, Christoph Hellwig, Bartosz Golaszewski,
xen-devel, Thierry Reding, linux-devicetree, will, dan.j.williams,
linuxppc-dev, Rob Herring, boris.ostrovsky, Andy Shevchenko,
jgross, Nicolas Boichat, Greg KH, rdunlap, lkml, Tomasz Figa,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <CALiNf2-HDf6tFcvVgCttr-ta=88ZMH=OvB5XoryTPc6MNvwV+Q@mail.gmail.com>
On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
> Hi Greg and Konrad,
>
> This change is intended to be non-arch specific. Any arch that lacks DMA access
> control and has devices not behind an IOMMU can make use of it. Could you share
> why you think this should be arch specific?
The idea behind non-arch specific code is it to be generic. The devicetree
is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
be in arch specific code.
>
> Thanks!
^ permalink raw reply
* Re: [RFC PATCH v3 0/6] Restricted DMA
From: Claire Chang @ 2021-01-07 17:42 UTC (permalink / raw)
To: Florian Fainelli
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
rafael.j.wysocki, Christoph Hellwig, Bartosz Golaszewski,
xen-devel, Thierry Reding, linux-devicetree, will,
Konrad Rzeszutek Wilk, dan.j.williams, linuxppc-dev, Rob Herring,
boris.ostrovsky, Andy Shevchenko, jgross, Nicolas Boichat,
Greg KH, rdunlap, lkml, Tomasz Figa,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
Jim Quinlan, xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <d7043239-12cf-3636-4726-2e3b90917dc6@gmail.com>
On Thu, Jan 7, 2021 at 2:48 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> Hi,
>
> First of all let me say that I am glad that someone is working on a
> upstream solution for this issue, would appreciate if you could CC and
> Jim Quinlan on subsequent submissions.
Sure!
>
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > This series implements mitigations for lack of DMA access control on
> > systems without an IOMMU, which could result in the DMA accessing the
> > system memory at unexpected times and/or unexpected addresses, possibly
> > leading to data leakage or corruption.
> >
> > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> > not behind an IOMMU. As PCI-e, by design, gives the device full access to
> > system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> > full chain of exploits; [2], [3]).
> >
> > To mitigate the security concerns, we introduce restricted DMA. Restricted
> > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> > specially allocated region and does memory allocation from the same region.
> > The feature on its own provides a basic level of protection against the DMA
> > overwriting buffer contents at unexpected times. However, to protect
> > against general data leakage and system memory corruption, the system needs
> > to provide a way to restrict the DMA to a predefined memory region (this is
> > usually done at firmware level, e.g. in ATF on some ARM platforms).
>
> Can you explain how ATF gets involved and to what extent it does help,
> besides enforcing a secure region from the ARM CPU's perpsective? Does
> the PCIe root complex not have an IOMMU but can somehow be denied access
> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> still some sort of basic protection that the HW enforces, right?
We need the ATF support for memory MPU (memory protection unit).
Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
that specific regions.
>
> On Broadcom STB SoCs we have had something similar for a while however
> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> basic protection mechanism whereby we can configure a region in DRAM to
> be PCIe read/write and CPU read/write which then gets used as the PCIe
> inbound region for the PCIe EP. By default the PCIe bridge is not
> allowed access to DRAM so we must call into a security agent to allow
> the PCIe bridge to access the designated DRAM region.
>
> We have done this using a private CMA area region assigned via Device
> Tree, assigned with a and requiring the PCIe EP driver to use
> dma_alloc_from_contiguous() in order to allocate from this device
> private CMA area. The only drawback with that approach is that it
> requires knowing how much memory you need up front for buffers and DMA
> descriptors that the PCIe EP will need to process. The problem is that
> it requires driver modifications and that does not scale over the number
> of PCIe EP drivers, some we absolutely do not control, but there is no
> need to bounce buffer. Your approach scales better across PCIe EP
> drivers however it does require bounce buffering which could be a
> performance hit.
Only the streaming DMA (map/unmap) needs bounce buffering.
I also added alloc/free support in this series
(https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc() will
try to allocate memory from the predefined memory region.
As for the performance hit, it should be similar to the default swiotlb.
Here are my experiment results. Both SoCs lack IOMMU for PCIe.
PCIe wifi vht80 throughput -
MTK SoC tcp_tx tcp_rx udp_tx udp_rx
w/o Restricted DMA 244.1 134.66 312.56 350.79
w/ Restricted DMA 246.95 136.59 363.21 351.99
Rockchip SoC tcp_tx tcp_rx udp_tx udp_rx
w/o Restricted DMA 237.87 133.86 288.28 361.88
w/ Restricted DMA 256.01 130.95 292.28 353.19
The CPU usage doesn't increase too much either.
Although I didn't measure the CPU usage very precisely, it's ~3% with a single
big core (Cortex-A72) and ~5% with a single small core (Cortex-A53).
Thanks!
>
> Thanks!
> --
> Florian
^ permalink raw reply
* Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool
From: Claire Chang @ 2021-01-07 17:39 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
rafael.j.wysocki, Christoph Hellwig, Bartosz Golaszewski,
xen-devel, Thierry Reding, linux-devicetree, will, dan.j.williams,
linuxppc-dev, Rob Herring, boris.ostrovsky, Andy Shevchenko,
jgross, Nicolas Boichat, Greg KH, rdunlap, lkml, Tomasz Figa,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <20210106185757.GB109735@localhost.localdomain>
On Thu, Jan 7, 2021 at 2:58 AM Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>
> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> > Introduce the new compatible string, restricted-dma-pool, for restricted
> > DMA. One can specify the address and length of the restricted DMA memory
> > region by restricted-dma-pool in the device tree.
> >
> > Signed-off-by: Claire Chang <tientzu@chromium.org>
> > ---
> > .../reserved-memory/reserved-memory.txt | 24 +++++++++++++++++++
> > 1 file changed, 24 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > index e8d3096d922c..44975e2a1fd2 100644
> > --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> > @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> > used as a shared pool of DMA buffers for a set of devices. It can
> > be used by an operating system to instantiate the necessary pool
> > management subsystem if necessary.
> > + - restricted-dma-pool: This indicates a region of memory meant to be
> > + used as a pool of restricted DMA buffers for a set of devices. The
> > + memory region would be the only region accessible to those devices.
> > + When using this, the no-map and reusable properties must not be set,
> > + so the operating system can create a virtual mapping that will be used
> > + for synchronization. The main purpose for restricted DMA is to
> > + mitigate the lack of DMA access control on systems without an IOMMU,
> > + which could result in the DMA accessing the system memory at
> > + unexpected times and/or unexpected addresses, possibly leading to data
> > + leakage or corruption. The feature on its own provides a basic level
> > + of protection against the DMA overwriting buffer contents at
> > + unexpected times. However, to protect against general data leakage and
> > + system memory corruption, the system needs to provide way to restrict
> > + the DMA to a predefined memory region.
>
> Heya!
>
> I think I am missing something obvious here so please bear with my
> questions:
>
> - This code adds the means of having the SWIOTLB pool tied to a specific
> memory correct?
It doesn't affect the existing SWIOTLB. It just utilizes the existing SWIOTLB
code to create another DMA pool tied to a specific memory region for a given set
of devices. It bounces the streaming DMA (map/unmap) in and out of that region
and does the memory allocation (dma_direct_alloc) from the same region.
>
>
> - Nothing stops the physical device from bypassing the SWIOTLB buffer.
> That is if an errant device screwed up the length or DMA address, the
> SWIOTLB would gladly do what the device told it do?
So the system needs to provide a way to lock down the memory access, e.g. MPU.
>
> - This has to be combined with SWIOTLB-force-ish to always use the
> bounce buffer, otherwise you could still do DMA without using
> SWIOTLB (by not hitting the criteria for needing to use SWIOTLB)?
Since restricted DMA is for the devices that are not behind an IOMMU, I change
the criteria
`if (unlikely(swiotlb_force == SWIOTLB_FORCE))`
to
`if (unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dma_io_tlb_mem)`
in dma_direct_map_page().
Also, even if SWIOTLB=force, the restricted DMA pool is preferred if available
(get_io_tlb_mem in https://lore.kernel.org/patchwork/patch/1360995/).
Thanks!
^ permalink raw reply
* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Claire Chang @ 2021-01-07 17:39 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
rafael.j.wysocki, Christoph Hellwig, Bartosz Golaszewski,
xen-devel, Thierry Reding, linux-devicetree, will, dan.j.williams,
linuxppc-dev, Rob Herring, boris.ostrovsky, Andy Shevchenko,
jgross, Nicolas Boichat, Greg KH, rdunlap, lkml, Tomasz Figa,
list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <20210106185241.GA109735@localhost.localdomain>
Hi Greg and Konrad,
This change is intended to be non-arch specific. Any arch that lacks DMA access
control and has devices not behind an IOMMU can make use of it. Could you share
why you think this should be arch specific?
Thanks!
^ permalink raw reply
* [PATCH v2] powerpc/mce: Remove per cpu variables from MCE handlers
From: Ganesh Goudar @ 2021-01-07 11:00 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: Ganesh Goudar, mahesh, npiggin
Access to per-cpu variables requires translation to be enabled on
pseries machine running in hash mmu mode, Since part of MCE handler
runs in realmode and part of MCE handling code is shared between ppc
architectures pseries and powernv, it becomes difficult to manage
these variables differently on different architectures, So have
these variables in paca instead of having them as per-cpu variables
to avoid complications.
Maximum recursive depth of MCE is 4, Considering the maximum depth
allowed reduce the size of event to 10 from 100.
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
---
v2: Dynamically allocate memory for machine check event info
---
arch/powerpc/include/asm/mce.h | 21 +++++++-
arch/powerpc/include/asm/paca.h | 4 ++
arch/powerpc/kernel/mce.c | 86 ++++++++++++++++++------------
arch/powerpc/kernel/setup-common.c | 2 +-
4 files changed, 78 insertions(+), 35 deletions(-)
diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index e6c27ae843dc..8d6e3a7a9f37 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -204,7 +204,18 @@ struct mce_error_info {
bool ignore_event;
};
-#define MAX_MC_EVT 100
+#define MAX_MC_EVT 10
+
+struct mce_info {
+ int mce_nest_count;
+ struct machine_check_event mce_event[MAX_MC_EVT];
+ /* Queue for delayed MCE events. */
+ int mce_queue_count;
+ struct machine_check_event mce_event_queue[MAX_MC_EVT];
+ /* Queue for delayed MCE UE events. */
+ int mce_ue_count;
+ struct machine_check_event mce_ue_event_queue[MAX_MC_EVT];
+};
/* Release flags for get_mce_event() */
#define MCE_EVENT_RELEASE true
@@ -233,5 +244,13 @@ long __machine_check_early_realmode_p7(struct pt_regs *regs);
long __machine_check_early_realmode_p8(struct pt_regs *regs);
long __machine_check_early_realmode_p9(struct pt_regs *regs);
long __machine_check_early_realmode_p10(struct pt_regs *regs);
+#define get_mce_info() local_paca->mce_info
+#endif /* CONFIG_PPC_BOOK3S_64 */
+
+#ifdef CONFIG_PPC_BOOK3S_64
+void mce_init(void);
+#else
+static inline void mce_init(void) { };
#endif /* CONFIG_PPC_BOOK3S_64 */
+
#endif /* __ASM_PPC64_MCE_H__ */
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 9454d29ff4b4..38e0c55e845d 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -29,6 +29,7 @@
#include <asm/hmi.h>
#include <asm/cpuidle.h>
#include <asm/atomic.h>
+#include <asm/mce.h>
#include <asm-generic/mmiowb_types.h>
@@ -273,6 +274,9 @@ struct paca_struct {
#ifdef CONFIG_MMIOWB
struct mmiowb_state mmiowb_state;
#endif
+#ifdef CONFIG_PPC_BOOK3S_64
+ struct mce_info *mce_info;
+#endif /* CONFIG_PPC_BOOK3S_64 */
} ____cacheline_aligned;
extern void copy_mm_to_paca(struct mm_struct *mm);
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 9f3e133b57b7..14142ddbedf2 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -17,23 +17,12 @@
#include <linux/irq_work.h>
#include <linux/extable.h>
#include <linux/ftrace.h>
+#include <linux/memblock.h>
#include <asm/machdep.h>
#include <asm/mce.h>
#include <asm/nmi.h>
-static DEFINE_PER_CPU(int, mce_nest_count);
-static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event);
-
-/* Queue for delayed MCE events. */
-static DEFINE_PER_CPU(int, mce_queue_count);
-static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event_queue);
-
-/* Queue for delayed MCE UE events. */
-static DEFINE_PER_CPU(int, mce_ue_count);
-static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT],
- mce_ue_event_queue);
-
static void machine_check_process_queued_event(struct irq_work *work);
static void machine_check_ue_irq_work(struct irq_work *work);
static void machine_check_ue_event(struct machine_check_event *evt);
@@ -103,8 +92,8 @@ void save_mce_event(struct pt_regs *regs, long handled,
struct mce_error_info *mce_err,
uint64_t nip, uint64_t addr, uint64_t phys_addr)
{
- int index = __this_cpu_inc_return(mce_nest_count) - 1;
- struct machine_check_event *mce = this_cpu_ptr(&mce_event[index]);
+ int index = get_mce_info()->mce_nest_count++;
+ struct machine_check_event *mce = &get_mce_info()->mce_event[index];
/*
* Return if we don't have enough space to log mce event.
@@ -191,7 +180,7 @@ void save_mce_event(struct pt_regs *regs, long handled,
*/
int get_mce_event(struct machine_check_event *mce, bool release)
{
- int index = __this_cpu_read(mce_nest_count) - 1;
+ int index = get_mce_info()->mce_nest_count - 1;
struct machine_check_event *mc_evt;
int ret = 0;
@@ -201,7 +190,7 @@ int get_mce_event(struct machine_check_event *mce, bool release)
/* Check if we have MCE info to process. */
if (index < MAX_MC_EVT) {
- mc_evt = this_cpu_ptr(&mce_event[index]);
+ mc_evt = &get_mce_info()->mce_event[index];
/* Copy the event structure and release the original */
if (mce)
*mce = *mc_evt;
@@ -211,7 +200,7 @@ int get_mce_event(struct machine_check_event *mce, bool release)
}
/* Decrement the count to free the slot. */
if (release)
- __this_cpu_dec(mce_nest_count);
+ get_mce_info()->mce_nest_count--;
return ret;
}
@@ -233,13 +222,13 @@ static void machine_check_ue_event(struct machine_check_event *evt)
{
int index;
- index = __this_cpu_inc_return(mce_ue_count) - 1;
+ index = get_mce_info()->mce_ue_count++;
/* If queue is full, just return for now. */
if (index >= MAX_MC_EVT) {
- __this_cpu_dec(mce_ue_count);
+ get_mce_info()->mce_ue_count--;
return;
}
- memcpy(this_cpu_ptr(&mce_ue_event_queue[index]), evt, sizeof(*evt));
+ memcpy(&get_mce_info()->mce_ue_event_queue[index], evt, sizeof(*evt));
/* Queue work to process this event later. */
irq_work_queue(&mce_ue_event_irq_work);
@@ -256,13 +245,13 @@ void machine_check_queue_event(void)
if (!get_mce_event(&evt, MCE_EVENT_RELEASE))
return;
- index = __this_cpu_inc_return(mce_queue_count) - 1;
+ index = get_mce_info()->mce_queue_count++;
/* If queue is full, just return for now. */
if (index >= MAX_MC_EVT) {
- __this_cpu_dec(mce_queue_count);
+ get_mce_info()->mce_queue_count--;
return;
}
- memcpy(this_cpu_ptr(&mce_event_queue[index]), &evt, sizeof(evt));
+ memcpy(&get_mce_info()->mce_event_queue[index], &evt, sizeof(evt));
/* Queue irq work to process this event later. */
irq_work_queue(&mce_event_process_work);
@@ -289,9 +278,9 @@ static void machine_process_ue_event(struct work_struct *work)
int index;
struct machine_check_event *evt;
- while (__this_cpu_read(mce_ue_count) > 0) {
- index = __this_cpu_read(mce_ue_count) - 1;
- evt = this_cpu_ptr(&mce_ue_event_queue[index]);
+ while (get_mce_info()->mce_ue_count > 0) {
+ index = get_mce_info()->mce_ue_count - 1;
+ evt = &get_mce_info()->mce_ue_event_queue[index];
blocking_notifier_call_chain(&mce_notifier_list, 0, evt);
#ifdef CONFIG_MEMORY_FAILURE
/*
@@ -304,7 +293,7 @@ static void machine_process_ue_event(struct work_struct *work)
*/
if (evt->error_type == MCE_ERROR_TYPE_UE) {
if (evt->u.ue_error.ignore_event) {
- __this_cpu_dec(mce_ue_count);
+ get_mce_info()->mce_ue_count--;
continue;
}
@@ -320,7 +309,7 @@ static void machine_process_ue_event(struct work_struct *work)
"was generated\n");
}
#endif
- __this_cpu_dec(mce_ue_count);
+ get_mce_info()->mce_ue_count--;
}
}
/*
@@ -338,17 +327,17 @@ static void machine_check_process_queued_event(struct irq_work *work)
* For now just print it to console.
* TODO: log this error event to FSP or nvram.
*/
- while (__this_cpu_read(mce_queue_count) > 0) {
- index = __this_cpu_read(mce_queue_count) - 1;
- evt = this_cpu_ptr(&mce_event_queue[index]);
+ while (get_mce_info()->mce_queue_count > 0) {
+ index = get_mce_info()->mce_queue_count - 1;
+ evt = &get_mce_info()->mce_event_queue[index];
if (evt->error_type == MCE_ERROR_TYPE_UE &&
evt->u.ue_error.ignore_event) {
- __this_cpu_dec(mce_queue_count);
+ get_mce_info()->mce_queue_count--;
continue;
}
machine_check_print_event_info(evt, false, false);
- __this_cpu_dec(mce_queue_count);
+ get_mce_info()->mce_queue_count--;
}
}
@@ -741,3 +730,34 @@ long hmi_exception_realmode(struct pt_regs *regs)
return 1;
}
+
+void __init mce_init(void)
+{
+ int i, size;
+ struct mce_info *mce_info;
+
+ if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
+ size = sizeof(struct mce_info) * num_possible_cpus();
+ mce_info = memblock_alloc_try_nid(size, sizeof(struct mce_info),
+ MEMBLOCK_LOW_LIMIT,
+ ppc64_rma_size,
+ NUMA_NO_NODE);
+ if (!mce_info)
+ goto err;
+ for_each_possible_cpu(i)
+ paca_ptrs[i]->mce_info = mce_info + i;
+ return;
+ }
+ for_each_possible_cpu(i) {
+ mce_info =
+ memblock_alloc_node(sizeof(struct mce_info),
+ __alignof__(*paca_ptrs[i]->mce_info),
+ cpu_to_node(i));
+ if (!mce_info)
+ goto err;
+ paca_ptrs[i]->mce_info = mce_info;
+ }
+ return;
+err:
+ panic("Failed allocate memory MCE event data\n");
+}
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 71f38e9248be..17dc451f0e45 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -916,7 +916,6 @@ void __init setup_arch(char **cmdline_p)
/* On BookE, setup per-core TLB data structures. */
setup_tlb_core_data();
#endif
-
/* Print various info about the machine that has been gathered so far. */
print_system_info();
@@ -938,6 +937,7 @@ void __init setup_arch(char **cmdline_p)
exc_lvl_early_init();
emergency_stack_init();
+ mce_init();
smp_release_cpus();
initmem_init();
--
2.26.2
^ permalink raw reply related
* Re: [PATCH v3 1/2] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
From: Bharata B Rao @ 2021-01-07 4:08 UTC (permalink / raw)
To: Fabiano Rosas; +Cc: aneesh.kumar, npiggin, kvm-ppc, linuxppc-dev, david
In-Reply-To: <87k0sp95g0.fsf@linux.ibm.com>
On Wed, Jan 06, 2021 at 05:27:27PM -0300, Fabiano Rosas wrote:
> Bharata B Rao <bharata@linux.ibm.com> writes:
> > +
> > +long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
> > + unsigned long type, unsigned long pg_sizes,
> > + unsigned long start, unsigned long end)
> > +{
> > + struct kvm_nested_guest *gp;
> > + long ret;
> > + unsigned long psize, ap;
> > +
> > + /*
> > + * If L2 lpid isn't valid, we need to return H_PARAMETER.
> > + *
> > + * However, nested KVM issues a L2 lpid flush call when creating
> > + * partition table entries for L2. This happens even before the
> > + * corresponding shadow lpid is created in HV which happens in
> > + * H_ENTER_NESTED call. Since we can't differentiate this case from
> > + * the invalid case, we ignore such flush requests and return success.
> > + */
>
> So for a nested lpid the H_TLB_INVALIDATE in:
>
> kvmppc_core_init_vm_hv -> kvmppc_setup_partition_table ->
> kvmhv_set_ptbl_entry -> kvmhv_flush_lpid
>
> has always been a noop? It seems that we could just skip
> kvmhv_flush_lpid in L1 during init_vm then.
May be, but I suppose that flush is required and could be fixed
eventually.
Regards,
Bharata.
^ permalink raw reply
* [PATCH] powerpc/pseries/dlpar: handle ibm, configure-connector delay status
From: Nathan Lynch @ 2021-01-07 2:59 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld, brking
dlpar_configure_connector() has two problems in its handling of
ibm,configure-connector's return status:
1. When the status is -2 (busy, call again), we call
ibm,configure-connector again immediately without checking whether
to schedule, which can result in monopolizing the CPU.
2. Extended delay status (9900..9905) goes completely unhandled,
causing the configuration to unnecessarily terminate.
Fix both of these issues by using rtas_busy_delay().
Fixes: ab519a011caa ("powerpc/pseries: Kernel DLPAR Infrastructure")
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
arch/powerpc/platforms/pseries/dlpar.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
index 16e86ba8aa20..f6b7749d6ada 100644
--- a/arch/powerpc/platforms/pseries/dlpar.c
+++ b/arch/powerpc/platforms/pseries/dlpar.c
@@ -127,7 +127,6 @@ void dlpar_free_cc_nodes(struct device_node *dn)
#define NEXT_PROPERTY 3
#define PREV_PARENT 4
#define MORE_MEMORY 5
-#define CALL_AGAIN -2
#define ERR_CFG_USE -9003
struct device_node *dlpar_configure_connector(__be32 drc_index,
@@ -168,6 +167,9 @@ struct device_node *dlpar_configure_connector(__be32 drc_index,
spin_unlock(&rtas_data_buf_lock);
+ if (rtas_busy_delay(rc))
+ continue;
+
switch (rc) {
case COMPLETE:
break;
@@ -216,9 +218,6 @@ struct device_node *dlpar_configure_connector(__be32 drc_index,
last_dn = last_dn->parent;
break;
- case CALL_AGAIN:
- break;
-
case MORE_MEMORY:
case ERR_CFG_USE:
default:
--
2.29.2
^ permalink raw reply related
* [PATCH] ibmvfc: fix missing cast of ibmvfc_event pointer to u64 handle
From: Tyrel Datwyler @ 2021-01-06 20:37 UTC (permalink / raw)
To: james.bottomley
Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel, brking,
linuxppc-dev, kernel test robot
Commit 2aa0102c6688 ("scsi: ibmvfc: Use correlation token to tag
commands") sets the vfcFrame correlation token to the pointer handle of
the associated ibmvfc_event. However, that commit failed to cast the
pointer to an appropriate type which in this case is a u64. As such
sparse warnings are generated for both correlation token assignments.
ibmvfc.c:2375:36: sparse: incorrect type in argument 1 (different base types)
ibmvfc.c:2375:36: sparse: expected unsigned long long [usertype] val
ibmvfc.c:2375:36: sparse: got struct ibmvfc_event *[assigned] evt
Add the apporpriate u64 casts when assigning an ibmvfc_event as a
correlation token.
Fixes: Commit 2aa0102c6688 ("scsi: ibmvfc: Use correlation token to tag commands")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 42e4d35e0d35..7312f31df878 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -1744,7 +1744,7 @@ static int ibmvfc_queuecommand_lck(struct scsi_cmnd *cmnd,
iu->pri_task_attr = IBMVFC_SIMPLE_TASK;
}
- vfc_cmd->correlation = cpu_to_be64(evt);
+ vfc_cmd->correlation = cpu_to_be64((u64)evt);
if (likely(!(rc = ibmvfc_map_sg_data(cmnd, evt, vfc_cmd, vhost->dev))))
return ibmvfc_send_event(evt, vhost, 0);
@@ -2418,7 +2418,7 @@ static int ibmvfc_abort_task_set(struct scsi_device *sdev)
tmf->flags = cpu_to_be16((IBMVFC_NO_MEM_DESC | IBMVFC_TMF));
evt->sync_iu = &rsp_iu;
- tmf->correlation = cpu_to_be64(evt);
+ tmf->correlation = cpu_to_be64((u64)evt);
init_completion(&evt->comp);
rsp_rc = ibmvfc_send_event(evt, vhost, default_timeout);
--
2.27.0
^ permalink raw reply related
* [PATCH v2 5/5] ibmvfc: relax locking around ibmvfc_queuecommand
From: Tyrel Datwyler @ 2021-01-06 20:18 UTC (permalink / raw)
To: james.bottomley
Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel,
Brian King, brking, linuxppc-dev
In-Reply-To: <20210106201835.1053593-1-tyreld@linux.ibm.com>
The drivers queuecommand routine is still wrapped to hold the host lock
for the duration of the call. This will become problematic when moving
to multiple queues due to the lock contention preventing asynchronous
submissions to mulitple queues. There is no real legatimate reason to
hold the host lock, and previous patches have insured proper protection
of moving ibmvfc_event objects between free and sent lists.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index f680f96d5d06..ff86c43b4b33 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -1793,10 +1793,9 @@ static struct ibmvfc_cmd *ibmvfc_init_vfc_cmd(struct ibmvfc_event *evt, struct s
* Returns:
* 0 on success / other on failure
**/
-static int ibmvfc_queuecommand_lck(struct scsi_cmnd *cmnd,
- void (*done) (struct scsi_cmnd *))
+static int ibmvfc_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *cmnd)
{
- struct ibmvfc_host *vhost = shost_priv(cmnd->device->host);
+ struct ibmvfc_host *vhost = shost_priv(shost);
struct fc_rport *rport = starget_to_rport(scsi_target(cmnd->device));
struct ibmvfc_cmd *vfc_cmd;
struct ibmvfc_fcp_cmd_iu *iu;
@@ -1806,7 +1805,7 @@ static int ibmvfc_queuecommand_lck(struct scsi_cmnd *cmnd,
if (unlikely((rc = fc_remote_port_chkready(rport))) ||
unlikely((rc = ibmvfc_host_chkready(vhost)))) {
cmnd->result = rc;
- done(cmnd);
+ cmnd->scsi_done(cmnd);
return 0;
}
@@ -1814,7 +1813,6 @@ static int ibmvfc_queuecommand_lck(struct scsi_cmnd *cmnd,
evt = ibmvfc_get_event(&vhost->crq);
ibmvfc_init_event(evt, ibmvfc_scsi_done, IBMVFC_CMD_FORMAT);
evt->cmnd = cmnd;
- cmnd->scsi_done = done;
vfc_cmd = ibmvfc_init_vfc_cmd(evt, cmnd->device);
iu = ibmvfc_get_fcp_iu(vhost, vfc_cmd);
@@ -1841,12 +1839,10 @@ static int ibmvfc_queuecommand_lck(struct scsi_cmnd *cmnd,
"Failed to map DMA buffer for command. rc=%d\n", rc);
cmnd->result = DID_ERROR << 16;
- done(cmnd);
+ cmnd->scsi_done(cmnd);
return 0;
}
-static DEF_SCSI_QCMD(ibmvfc_queuecommand)
-
/**
* ibmvfc_sync_completion - Signal that a synchronous command has completed
* @evt: ibmvfc event struct
--
2.27.0
^ permalink raw reply related
* [PATCH v2 4/5] ibmvfc: complete commands outside the host/queue lock
From: Tyrel Datwyler @ 2021-01-06 20:18 UTC (permalink / raw)
To: james.bottomley
Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel,
Brian King, brking, linuxppc-dev
In-Reply-To: <20210106201835.1053593-1-tyreld@linux.ibm.com>
Drain the command queue and place all commands on a completion list.
Perform command completion on that list outside the host/queue locks.
Further, move purged command compeletions outside the host_lock as well.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 58 ++++++++++++++++++++++++++--------
drivers/scsi/ibmvscsi/ibmvfc.h | 3 +-
2 files changed, 47 insertions(+), 14 deletions(-)
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 69a6401ca504..f680f96d5d06 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -894,7 +894,7 @@ static void ibmvfc_scsi_eh_done(struct ibmvfc_event *evt)
* @purge_list: list head of failed commands
*
* This function runs completions on commands to fail as a result of a
- * host reset or platform migration. Caller must hold host_lock.
+ * host reset or platform migration.
**/
static void ibmvfc_complete_purge(struct list_head *purge_list)
{
@@ -1407,6 +1407,23 @@ static struct ibmvfc_event *ibmvfc_get_event(struct ibmvfc_queue *queue)
return evt;
}
+/**
+ * ibmvfc_locked_done - Calls evt completion with host_lock held
+ * @evt: ibmvfc evt to complete
+ *
+ * All non-scsi command completion callbacks have the expectation that the
+ * host_lock is held. This callback is used by ibmvfc_init_event to wrap a
+ * MAD evt with the host_lock.
+ **/
+static void ibmvfc_locked_done(struct ibmvfc_event *evt)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(evt->vhost->host->host_lock, flags);
+ evt->_done(evt);
+ spin_unlock_irqrestore(evt->vhost->host->host_lock, flags);
+}
+
/**
* ibmvfc_init_event - Initialize fields in an event struct that are always
* required.
@@ -1419,9 +1436,14 @@ static void ibmvfc_init_event(struct ibmvfc_event *evt,
{
evt->cmnd = NULL;
evt->sync_iu = NULL;
- evt->crq.format = format;
- evt->done = done;
evt->eh_comp = NULL;
+ evt->crq.format = format;
+ if (format == IBMVFC_CMD_FORMAT)
+ evt->done = done;
+ else {
+ evt->_done = done;
+ evt->done = ibmvfc_locked_done;
+ }
}
/**
@@ -1640,7 +1662,9 @@ static void ibmvfc_relogin(struct scsi_device *sdev)
struct ibmvfc_host *vhost = shost_priv(sdev->host);
struct fc_rport *rport = starget_to_rport(scsi_target(sdev));
struct ibmvfc_target *tgt;
+ unsigned long flags;
+ spin_lock_irqsave(vhost->host->host_lock, flags);
list_for_each_entry(tgt, &vhost->targets, queue) {
if (rport == tgt->rport) {
ibmvfc_del_tgt(tgt);
@@ -1649,6 +1673,7 @@ static void ibmvfc_relogin(struct scsi_device *sdev)
}
ibmvfc_reinit_host(vhost);
+ spin_unlock_irqrestore(vhost->host->host_lock, flags);
}
/**
@@ -2901,7 +2926,8 @@ static void ibmvfc_handle_async(struct ibmvfc_async_crq *crq,
* @vhost: ibmvfc host struct
*
**/
-static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost)
+static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost,
+ struct list_head *evt_doneq)
{
long rc;
struct ibmvfc_event *evt = (struct ibmvfc_event *)be64_to_cpu(crq->ioba);
@@ -2972,12 +2998,9 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost)
return;
}
- del_timer(&evt->timer);
spin_lock(&evt->queue->l_lock);
- list_del(&evt->queue_list);
+ list_move_tail(&evt->queue_list, evt_doneq);
spin_unlock(&evt->queue->l_lock);
- ibmvfc_trc_end(evt);
- evt->done(evt);
}
/**
@@ -3364,8 +3387,10 @@ static void ibmvfc_tasklet(void *data)
struct vio_dev *vdev = to_vio_dev(vhost->dev);
struct ibmvfc_crq *crq;
struct ibmvfc_async_crq *async;
+ struct ibmvfc_event *evt, *temp;
unsigned long flags;
int done = 0;
+ LIST_HEAD(evt_doneq);
spin_lock_irqsave(vhost->host->host_lock, flags);
spin_lock(vhost->crq.q_lock);
@@ -3379,7 +3404,7 @@ static void ibmvfc_tasklet(void *data)
/* Pull all the valid messages off the CRQ */
while ((crq = ibmvfc_next_crq(vhost)) != NULL) {
- ibmvfc_handle_crq(crq, vhost);
+ ibmvfc_handle_crq(crq, vhost, &evt_doneq);
crq->valid = 0;
wmb();
}
@@ -3392,7 +3417,7 @@ static void ibmvfc_tasklet(void *data)
wmb();
} else if ((crq = ibmvfc_next_crq(vhost)) != NULL) {
vio_disable_interrupts(vdev);
- ibmvfc_handle_crq(crq, vhost);
+ ibmvfc_handle_crq(crq, vhost, &evt_doneq);
crq->valid = 0;
wmb();
} else
@@ -3401,6 +3426,13 @@ static void ibmvfc_tasklet(void *data)
spin_unlock(vhost->crq.q_lock);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
+
+ list_for_each_entry_safe(evt, temp, &evt_doneq, queue_list) {
+ del_timer(&evt->timer);
+ list_del(&evt->queue_list);
+ ibmvfc_trc_end(evt);
+ evt->done(evt);
+ }
}
/**
@@ -4790,8 +4822,8 @@ static void ibmvfc_do_work(struct ibmvfc_host *vhost)
case IBMVFC_HOST_ACTION_RESET:
vhost->action = IBMVFC_HOST_ACTION_TGT_DEL;
list_splice_init(&vhost->purge, &purge);
- ibmvfc_complete_purge(&purge);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
+ ibmvfc_complete_purge(&purge);
rc = ibmvfc_reset_crq(vhost);
spin_lock_irqsave(vhost->host->host_lock, flags);
if (rc == H_CLOSED)
@@ -4805,8 +4837,8 @@ static void ibmvfc_do_work(struct ibmvfc_host *vhost)
case IBMVFC_HOST_ACTION_REENABLE:
vhost->action = IBMVFC_HOST_ACTION_TGT_DEL;
list_splice_init(&vhost->purge, &purge);
- ibmvfc_complete_purge(&purge);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
+ ibmvfc_complete_purge(&purge);
rc = ibmvfc_reenable_crq_queue(vhost);
spin_lock_irqsave(vhost->host->host_lock, flags);
if (rc || (rc = ibmvfc_send_crq_init(vhost))) {
@@ -5369,8 +5401,8 @@ static int ibmvfc_remove(struct vio_dev *vdev)
spin_lock_irqsave(vhost->host->host_lock, flags);
ibmvfc_purge_requests(vhost, DID_ERROR);
list_splice_init(&vhost->purge, &purge);
- ibmvfc_complete_purge(&purge);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
+ ibmvfc_complete_purge(&purge);
ibmvfc_free_event_pool(vhost, &vhost->crq);
ibmvfc_free_mem(vhost);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index faf5b50d65b9..632e977449c5 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -733,7 +733,8 @@ struct ibmvfc_event {
struct scsi_cmnd *cmnd;
atomic_t free;
union ibmvfc_iu *xfer_iu;
- void (*done) (struct ibmvfc_event *);
+ void (*done)(struct ibmvfc_event *evt);
+ void (*_done)(struct ibmvfc_event *evt);
struct ibmvfc_crq crq;
union ibmvfc_iu iu;
union ibmvfc_iu *sync_iu;
--
2.27.0
^ permalink raw reply related
* [PATCH v2 0/5] ibmvfc: MQ preparatory locking work
From: Tyrel Datwyler @ 2021-01-06 20:18 UTC (permalink / raw)
To: james.bottomley
Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel, brking,
linuxppc-dev
The ibmvfc driver in its current form relies heavily on the host_lock. This
patchset introduces a genric queue with its own queue lock and sent/free event
list locks. This generic queue allows the driver to decouple the primary queue
and future subordinate queues from the host lock reducing lock contention while
also relaxing locking for submissions and completions to simply the list lock of
the queue in question.
changes in v2:
* Patch 4: Made ibmvfc_locked_done() static fixing a no-prototype warning
Tyrel Datwyler (5):
ibmvfc: define generic queue structure for CRQs
ibmvfc: make command event pool queue specific
ibmvfc: define per-queue state/list locks
ibmvfc: complete commands outside the host/queue lock
ibmvfc: relax locking around ibmvfc_queuecommand
drivers/scsi/ibmvscsi/ibmvfc.c | 379 ++++++++++++++++++++++-----------
drivers/scsi/ibmvscsi/ibmvfc.h | 54 +++--
2 files changed, 286 insertions(+), 147 deletions(-)
--
2.27.0
^ permalink raw reply
* [PATCH v2 3/5] ibmvfc: define per-queue state/list locks
From: Tyrel Datwyler @ 2021-01-06 20:18 UTC (permalink / raw)
To: james.bottomley
Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel,
Brian King, brking, linuxppc-dev
In-Reply-To: <20210106201835.1053593-1-tyreld@linux.ibm.com>
Define per-queue locks for protecting queue state and event pool
sent/free lists. The evt list lock is initially redundant but it allows
the driver to be modified in the follow-up patches to relax the queue
locking around submissions and completions.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 93 +++++++++++++++++++++++++++-------
drivers/scsi/ibmvscsi/ibmvfc.h | 7 ++-
2 files changed, 80 insertions(+), 20 deletions(-)
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 8de2a25b05ee..69a6401ca504 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -176,8 +176,9 @@ static void ibmvfc_trc_start(struct ibmvfc_event *evt)
struct ibmvfc_mad_common *mad = &evt->iu.mad_common;
struct ibmvfc_fcp_cmd_iu *iu = ibmvfc_get_fcp_iu(vhost, vfc_cmd);
struct ibmvfc_trace_entry *entry;
+ int index = atomic_inc_return(&vhost->trace_index) & IBMVFC_TRACE_INDEX_MASK;
- entry = &vhost->trace[vhost->trace_index++];
+ entry = &vhost->trace[index];
entry->evt = evt;
entry->time = jiffies;
entry->fmt = evt->crq.format;
@@ -211,8 +212,10 @@ static void ibmvfc_trc_end(struct ibmvfc_event *evt)
struct ibmvfc_mad_common *mad = &evt->xfer_iu->mad_common;
struct ibmvfc_fcp_cmd_iu *iu = ibmvfc_get_fcp_iu(vhost, vfc_cmd);
struct ibmvfc_fcp_rsp *rsp = ibmvfc_get_fcp_rsp(vhost, vfc_cmd);
- struct ibmvfc_trace_entry *entry = &vhost->trace[vhost->trace_index++];
+ struct ibmvfc_trace_entry *entry;
+ int index = atomic_inc_return(&vhost->trace_index) & IBMVFC_TRACE_INDEX_MASK;
+ entry = &vhost->trace[index];
entry->evt = evt;
entry->time = jiffies;
entry->fmt = evt->crq.format;
@@ -805,6 +808,7 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
} while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
spin_lock_irqsave(vhost->host->host_lock, flags);
+ spin_lock(vhost->crq.q_lock);
vhost->state = IBMVFC_NO_CRQ;
vhost->logged_in = 0;
@@ -821,6 +825,7 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
dev_warn(vhost->dev, "Partner adapter not ready\n");
else if (rc != 0)
dev_warn(vhost->dev, "Couldn't register crq (rc=%d)\n", rc);
+ spin_unlock(vhost->crq.q_lock);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
return rc;
@@ -853,10 +858,16 @@ static int ibmvfc_valid_event(struct ibmvfc_event_pool *pool,
static void ibmvfc_free_event(struct ibmvfc_event *evt)
{
struct ibmvfc_event_pool *pool = &evt->queue->evt_pool;
+ unsigned long flags;
BUG_ON(!ibmvfc_valid_event(pool, evt));
BUG_ON(atomic_inc_return(&evt->free) != 1);
+
+ spin_lock_irqsave(&evt->queue->l_lock, flags);
list_add_tail(&evt->queue_list, &evt->queue->free);
+ if (evt->eh_comp)
+ complete(evt->eh_comp);
+ spin_unlock_irqrestore(&evt->queue->l_lock, flags);
}
/**
@@ -875,12 +886,27 @@ static void ibmvfc_scsi_eh_done(struct ibmvfc_event *evt)
cmnd->scsi_done(cmnd);
}
- if (evt->eh_comp)
- complete(evt->eh_comp);
-
ibmvfc_free_event(evt);
}
+/**
+ * ibmvfc_complete_purge - Complete failed command list
+ * @purge_list: list head of failed commands
+ *
+ * This function runs completions on commands to fail as a result of a
+ * host reset or platform migration. Caller must hold host_lock.
+ **/
+static void ibmvfc_complete_purge(struct list_head *purge_list)
+{
+ struct ibmvfc_event *evt, *pos;
+
+ list_for_each_entry_safe(evt, pos, purge_list, queue_list) {
+ list_del(&evt->queue_list);
+ ibmvfc_trc_end(evt);
+ evt->done(evt);
+ }
+}
+
/**
* ibmvfc_fail_request - Fail request with specified error code
* @evt: ibmvfc event struct
@@ -897,10 +923,7 @@ static void ibmvfc_fail_request(struct ibmvfc_event *evt, int error_code)
} else
evt->xfer_iu->mad_common.status = cpu_to_be16(IBMVFC_MAD_DRIVER_FAILED);
- list_del(&evt->queue_list);
del_timer(&evt->timer);
- ibmvfc_trc_end(evt);
- evt->done(evt);
}
/**
@@ -914,10 +937,14 @@ static void ibmvfc_fail_request(struct ibmvfc_event *evt, int error_code)
static void ibmvfc_purge_requests(struct ibmvfc_host *vhost, int error_code)
{
struct ibmvfc_event *evt, *pos;
+ unsigned long flags;
ibmvfc_dbg(vhost, "Purging all requests\n");
+ spin_lock_irqsave(&vhost->crq.l_lock, flags);
list_for_each_entry_safe(evt, pos, &vhost->crq.sent, queue_list)
ibmvfc_fail_request(evt, error_code);
+ list_splice_init(&vhost->crq.sent, &vhost->purge);
+ spin_unlock_irqrestore(&vhost->crq.l_lock, flags);
}
/**
@@ -1314,6 +1341,7 @@ static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost,
INIT_LIST_HEAD(&queue->sent);
INIT_LIST_HEAD(&queue->free);
+ spin_lock_init(&queue->l_lock);
for (i = 0; i < pool->size; ++i) {
struct ibmvfc_event *evt = &pool->events[i];
@@ -1368,11 +1396,14 @@ static void ibmvfc_free_event_pool(struct ibmvfc_host *vhost,
static struct ibmvfc_event *ibmvfc_get_event(struct ibmvfc_queue *queue)
{
struct ibmvfc_event *evt;
+ unsigned long flags;
+ spin_lock_irqsave(&queue->l_lock, flags);
BUG_ON(list_empty(&queue->free));
evt = list_entry(queue->free.next, struct ibmvfc_event, queue_list);
atomic_set(&evt->free, 0);
list_del(&evt->queue_list);
+ spin_unlock_irqrestore(&queue->l_lock, flags);
return evt;
}
@@ -1506,6 +1537,7 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
struct ibmvfc_host *vhost, unsigned long timeout)
{
__be64 *crq_as_u64 = (__be64 *) &evt->crq;
+ unsigned long flags;
int rc;
/* Copy the IU into the transfer area */
@@ -1517,7 +1549,6 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
else
BUG();
- list_add_tail(&evt->queue_list, &evt->queue->sent);
timer_setup(&evt->timer, ibmvfc_timeout, 0);
if (timeout) {
@@ -1525,11 +1556,15 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
add_timer(&evt->timer);
}
+ spin_lock_irqsave(&evt->queue->l_lock, flags);
+ list_add_tail(&evt->queue_list, &evt->queue->sent);
+
mb();
if ((rc = ibmvfc_send_crq(vhost, be64_to_cpu(crq_as_u64[0]),
be64_to_cpu(crq_as_u64[1])))) {
list_del(&evt->queue_list);
+ spin_unlock_irqrestore(&evt->queue->l_lock, flags);
del_timer(&evt->timer);
/* If send_crq returns H_CLOSED, return SCSI_MLQUEUE_HOST_BUSY.
@@ -1554,8 +1589,10 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
evt->xfer_iu->mad_common.status = cpu_to_be16(IBMVFC_MAD_CRQ_ERROR);
evt->done(evt);
- } else
+ } else {
+ spin_unlock_irqrestore(&evt->queue->l_lock, flags);
ibmvfc_trc_start(evt);
+ }
return 0;
}
@@ -1663,9 +1700,6 @@ static void ibmvfc_scsi_done(struct ibmvfc_event *evt)
cmnd->scsi_done(cmnd);
}
- if (evt->eh_comp)
- complete(evt->eh_comp);
-
ibmvfc_free_event(evt);
}
@@ -2219,28 +2253,28 @@ static int ibmvfc_wait_for_ops(struct ibmvfc_host *vhost, void *device,
ENTER;
do {
wait = 0;
- spin_lock_irqsave(vhost->host->host_lock, flags);
+ spin_lock_irqsave(&vhost->crq.l_lock, flags);
list_for_each_entry(evt, &vhost->crq.sent, queue_list) {
if (match(evt, device)) {
evt->eh_comp = ∁
wait++;
}
}
- spin_unlock_irqrestore(vhost->host->host_lock, flags);
+ spin_unlock_irqrestore(&vhost->crq.l_lock, flags);
if (wait) {
timeout = wait_for_completion_timeout(&comp, timeout);
if (!timeout) {
wait = 0;
- spin_lock_irqsave(vhost->host->host_lock, flags);
+ spin_lock_irqsave(&vhost->crq.l_lock, flags);
list_for_each_entry(evt, &vhost->crq.sent, queue_list) {
if (match(evt, device)) {
evt->eh_comp = NULL;
wait++;
}
}
- spin_unlock_irqrestore(vhost->host->host_lock, flags);
+ spin_unlock_irqrestore(&vhost->crq.l_lock, flags);
if (wait)
dev_err(vhost->dev, "Timed out waiting for aborted commands\n");
LEAVE;
@@ -2277,14 +2311,16 @@ static int ibmvfc_cancel_all(struct scsi_device *sdev, int type)
u16 status;
ENTER;
- spin_lock_irqsave(vhost->host->host_lock, flags);
found_evt = NULL;
+ spin_lock_irqsave(vhost->host->host_lock, flags);
+ spin_lock(&vhost->crq.l_lock);
list_for_each_entry(evt, &vhost->crq.sent, queue_list) {
if (evt->cmnd && evt->cmnd->device == sdev) {
found_evt = evt;
break;
}
}
+ spin_unlock(&vhost->crq.l_lock);
if (!found_evt) {
if (vhost->log_level > IBMVFC_DEFAULT_LOG_LEVEL)
@@ -2414,14 +2450,16 @@ static int ibmvfc_abort_task_set(struct scsi_device *sdev)
unsigned long flags, timeout = IBMVFC_ABORT_TIMEOUT;
int rsp_code = 0;
- spin_lock_irqsave(vhost->host->host_lock, flags);
found_evt = NULL;
+ spin_lock_irqsave(vhost->host->host_lock, flags);
+ spin_lock(&vhost->crq.l_lock);
list_for_each_entry(evt, &vhost->crq.sent, queue_list) {
if (evt->cmnd && evt->cmnd->device == sdev) {
found_evt = evt;
break;
}
}
+ spin_unlock(&vhost->crq.l_lock);
if (!found_evt) {
if (vhost->log_level > IBMVFC_DEFAULT_LOG_LEVEL)
@@ -2935,7 +2973,9 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost)
}
del_timer(&evt->timer);
+ spin_lock(&evt->queue->l_lock);
list_del(&evt->queue_list);
+ spin_unlock(&evt->queue->l_lock);
ibmvfc_trc_end(evt);
evt->done(evt);
}
@@ -3328,6 +3368,7 @@ static void ibmvfc_tasklet(void *data)
int done = 0;
spin_lock_irqsave(vhost->host->host_lock, flags);
+ spin_lock(vhost->crq.q_lock);
while (!done) {
/* Pull all the valid messages off the async CRQ */
while ((async = ibmvfc_next_async_crq(vhost)) != NULL) {
@@ -3358,6 +3399,7 @@ static void ibmvfc_tasklet(void *data)
done = 1;
}
+ spin_unlock(vhost->crq.q_lock);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
}
@@ -4734,6 +4776,7 @@ static void ibmvfc_do_work(struct ibmvfc_host *vhost)
struct ibmvfc_target *tgt;
unsigned long flags;
struct fc_rport *rport;
+ LIST_HEAD(purge);
int rc;
ibmvfc_log_ae(vhost, vhost->events_to_log);
@@ -4746,6 +4789,8 @@ static void ibmvfc_do_work(struct ibmvfc_host *vhost)
break;
case IBMVFC_HOST_ACTION_RESET:
vhost->action = IBMVFC_HOST_ACTION_TGT_DEL;
+ list_splice_init(&vhost->purge, &purge);
+ ibmvfc_complete_purge(&purge);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
rc = ibmvfc_reset_crq(vhost);
spin_lock_irqsave(vhost->host->host_lock, flags);
@@ -4759,6 +4804,8 @@ static void ibmvfc_do_work(struct ibmvfc_host *vhost)
break;
case IBMVFC_HOST_ACTION_REENABLE:
vhost->action = IBMVFC_HOST_ACTION_TGT_DEL;
+ list_splice_init(&vhost->purge, &purge);
+ ibmvfc_complete_purge(&purge);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
rc = ibmvfc_reenable_crq_queue(vhost);
spin_lock_irqsave(vhost->host->host_lock, flags);
@@ -4936,6 +4983,9 @@ static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost,
size_t fmt_size;
ENTER;
+ spin_lock_init(&queue->_lock);
+ queue->q_lock = &queue->_lock;
+
switch (fmt) {
case IBMVFC_CRQ_FMT:
fmt_size = sizeof(*queue->msgs.crq);
@@ -5098,6 +5148,7 @@ static int ibmvfc_alloc_mem(struct ibmvfc_host *vhost)
vhost->trace = kcalloc(IBMVFC_NUM_TRACE_ENTRIES,
sizeof(struct ibmvfc_trace_entry), GFP_KERNEL);
+ atomic_set(&vhost->trace_index, -1);
if (!vhost->trace)
goto free_disc_buffer;
@@ -5214,6 +5265,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
vhost = shost_priv(shost);
INIT_LIST_HEAD(&vhost->targets);
+ INIT_LIST_HEAD(&vhost->purge);
sprintf(vhost->name, IBMVFC_NAME);
vhost->host = shost;
vhost->dev = dev;
@@ -5298,6 +5350,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
static int ibmvfc_remove(struct vio_dev *vdev)
{
struct ibmvfc_host *vhost = dev_get_drvdata(&vdev->dev);
+ LIST_HEAD(purge);
unsigned long flags;
ENTER;
@@ -5315,6 +5368,8 @@ static int ibmvfc_remove(struct vio_dev *vdev)
spin_lock_irqsave(vhost->host->host_lock, flags);
ibmvfc_purge_requests(vhost, DID_ERROR);
+ list_splice_init(&vhost->purge, &purge);
+ ibmvfc_complete_purge(&purge);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
ibmvfc_free_event_pool(vhost, &vhost->crq);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 61c73b6f7a77..faf5b50d65b9 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -768,10 +768,13 @@ struct ibmvfc_queue {
dma_addr_t msg_token;
enum ibmvfc_msg_fmt fmt;
int size, cur;
+ spinlock_t _lock;
+ spinlock_t *q_lock;
struct ibmvfc_event_pool evt_pool;
struct list_head sent;
struct list_head free;
+ spinlock_t l_lock;
};
enum ibmvfc_host_action {
@@ -808,11 +811,13 @@ struct ibmvfc_host {
enum ibmvfc_host_action action;
#define IBMVFC_NUM_TRACE_INDEX_BITS 8
#define IBMVFC_NUM_TRACE_ENTRIES (1 << IBMVFC_NUM_TRACE_INDEX_BITS)
+#define IBMVFC_TRACE_INDEX_MASK (IBMVFC_NUM_TRACE_ENTRIES - 1)
#define IBMVFC_TRACE_SIZE (sizeof(struct ibmvfc_trace_entry) * IBMVFC_NUM_TRACE_ENTRIES)
struct ibmvfc_trace_entry *trace;
- u32 trace_index:IBMVFC_NUM_TRACE_INDEX_BITS;
+ atomic_t trace_index;
int num_targets;
struct list_head targets;
+ struct list_head purge;
struct device *dev;
struct dma_pool *sg_pool;
mempool_t *tgt_pool;
--
2.27.0
^ permalink raw reply related
* [PATCH v2 2/5] ibmvfc: make command event pool queue specific
From: Tyrel Datwyler @ 2021-01-06 20:18 UTC (permalink / raw)
To: james.bottomley
Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel,
Brian King, brking, linuxppc-dev
In-Reply-To: <20210106201835.1053593-1-tyreld@linux.ibm.com>
There is currently a single command event pool per host. In anticipation
of providing multiple queues add a per-queue event pool definition and
reimplement the existing CRQ to use its queue defined event pool for
command submission and completion.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 95 ++++++++++++++++++----------------
drivers/scsi/ibmvscsi/ibmvfc.h | 10 ++--
2 files changed, 55 insertions(+), 50 deletions(-)
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index c8e7c4701ac4..8de2a25b05ee 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -852,12 +852,11 @@ static int ibmvfc_valid_event(struct ibmvfc_event_pool *pool,
**/
static void ibmvfc_free_event(struct ibmvfc_event *evt)
{
- struct ibmvfc_host *vhost = evt->vhost;
- struct ibmvfc_event_pool *pool = &vhost->pool;
+ struct ibmvfc_event_pool *pool = &evt->queue->evt_pool;
BUG_ON(!ibmvfc_valid_event(pool, evt));
BUG_ON(atomic_inc_return(&evt->free) != 1);
- list_add_tail(&evt->queue, &vhost->free);
+ list_add_tail(&evt->queue_list, &evt->queue->free);
}
/**
@@ -898,7 +897,7 @@ static void ibmvfc_fail_request(struct ibmvfc_event *evt, int error_code)
} else
evt->xfer_iu->mad_common.status = cpu_to_be16(IBMVFC_MAD_DRIVER_FAILED);
- list_del(&evt->queue);
+ list_del(&evt->queue_list);
del_timer(&evt->timer);
ibmvfc_trc_end(evt);
evt->done(evt);
@@ -917,7 +916,7 @@ static void ibmvfc_purge_requests(struct ibmvfc_host *vhost, int error_code)
struct ibmvfc_event *evt, *pos;
ibmvfc_dbg(vhost, "Purging all requests\n");
- list_for_each_entry_safe(evt, pos, &vhost->sent, queue)
+ list_for_each_entry_safe(evt, pos, &vhost->crq.sent, queue_list)
ibmvfc_fail_request(evt, error_code);
}
@@ -1292,10 +1291,11 @@ static void ibmvfc_set_login_info(struct ibmvfc_host *vhost)
*
* Returns zero on success.
**/
-static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost)
+static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost,
+ struct ibmvfc_queue *queue)
{
int i;
- struct ibmvfc_event_pool *pool = &vhost->pool;
+ struct ibmvfc_event_pool *pool = &queue->evt_pool;
ENTER;
pool->size = max_requests + IBMVFC_NUM_INTERNAL_REQ;
@@ -1312,6 +1312,9 @@ static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost)
return -ENOMEM;
}
+ INIT_LIST_HEAD(&queue->sent);
+ INIT_LIST_HEAD(&queue->free);
+
for (i = 0; i < pool->size; ++i) {
struct ibmvfc_event *evt = &pool->events[i];
atomic_set(&evt->free, 1);
@@ -1319,8 +1322,9 @@ static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost)
evt->crq.ioba = cpu_to_be64(pool->iu_token + (sizeof(*evt->xfer_iu) * i));
evt->xfer_iu = pool->iu_storage + i;
evt->vhost = vhost;
+ evt->queue = queue;
evt->ext_list = NULL;
- list_add_tail(&evt->queue, &vhost->free);
+ list_add_tail(&evt->queue_list, &queue->free);
}
LEAVE;
@@ -1332,14 +1336,15 @@ static int ibmvfc_init_event_pool(struct ibmvfc_host *vhost)
* @vhost: ibmvfc host who owns the event pool
*
**/
-static void ibmvfc_free_event_pool(struct ibmvfc_host *vhost)
+static void ibmvfc_free_event_pool(struct ibmvfc_host *vhost,
+ struct ibmvfc_queue *queue)
{
int i;
- struct ibmvfc_event_pool *pool = &vhost->pool;
+ struct ibmvfc_event_pool *pool = &queue->evt_pool;
ENTER;
for (i = 0; i < pool->size; ++i) {
- list_del(&pool->events[i].queue);
+ list_del(&pool->events[i].queue_list);
BUG_ON(atomic_read(&pool->events[i].free) != 1);
if (pool->events[i].ext_list)
dma_pool_free(vhost->sg_pool,
@@ -1360,14 +1365,14 @@ static void ibmvfc_free_event_pool(struct ibmvfc_host *vhost)
*
* Returns a free event from the pool.
**/
-static struct ibmvfc_event *ibmvfc_get_event(struct ibmvfc_host *vhost)
+static struct ibmvfc_event *ibmvfc_get_event(struct ibmvfc_queue *queue)
{
struct ibmvfc_event *evt;
- BUG_ON(list_empty(&vhost->free));
- evt = list_entry(vhost->free.next, struct ibmvfc_event, queue);
+ BUG_ON(list_empty(&queue->free));
+ evt = list_entry(queue->free.next, struct ibmvfc_event, queue_list);
atomic_set(&evt->free, 0);
- list_del(&evt->queue);
+ list_del(&evt->queue_list);
return evt;
}
@@ -1512,7 +1517,7 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
else
BUG();
- list_add_tail(&evt->queue, &vhost->sent);
+ list_add_tail(&evt->queue_list, &evt->queue->sent);
timer_setup(&evt->timer, ibmvfc_timeout, 0);
if (timeout) {
@@ -1524,7 +1529,7 @@ static int ibmvfc_send_event(struct ibmvfc_event *evt,
if ((rc = ibmvfc_send_crq(vhost, be64_to_cpu(crq_as_u64[0]),
be64_to_cpu(crq_as_u64[1])))) {
- list_del(&evt->queue);
+ list_del(&evt->queue_list);
del_timer(&evt->timer);
/* If send_crq returns H_CLOSED, return SCSI_MLQUEUE_HOST_BUSY.
@@ -1747,7 +1752,7 @@ static int ibmvfc_queuecommand_lck(struct scsi_cmnd *cmnd,
}
cmnd->result = (DID_OK << 16);
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
ibmvfc_init_event(evt, ibmvfc_scsi_done, IBMVFC_CMD_FORMAT);
evt->cmnd = cmnd;
cmnd->scsi_done = done;
@@ -1836,7 +1841,7 @@ static int ibmvfc_bsg_timeout(struct bsg_job *job)
}
vhost->aborting_passthru = 1;
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
ibmvfc_init_event(evt, ibmvfc_bsg_timeout_done, IBMVFC_MAD_FORMAT);
tmf = &evt->iu.tmf;
@@ -1894,7 +1899,7 @@ static int ibmvfc_bsg_plogi(struct ibmvfc_host *vhost, unsigned int port_id)
if (unlikely((rc = ibmvfc_host_chkready(vhost))))
goto unlock_out;
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
ibmvfc_init_event(evt, ibmvfc_sync_completion, IBMVFC_MAD_FORMAT);
plogi = &evt->iu.plogi;
memset(plogi, 0, sizeof(*plogi));
@@ -2012,7 +2017,7 @@ static int ibmvfc_bsg_request(struct bsg_job *job)
goto out;
}
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
ibmvfc_init_event(evt, ibmvfc_sync_completion, IBMVFC_MAD_FORMAT);
mad = &evt->iu.passthru;
@@ -2096,7 +2101,7 @@ static int ibmvfc_reset_device(struct scsi_device *sdev, int type, char *desc)
spin_lock_irqsave(vhost->host->host_lock, flags);
if (vhost->state == IBMVFC_ACTIVE) {
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
ibmvfc_init_event(evt, ibmvfc_sync_completion, IBMVFC_CMD_FORMAT);
tmf = ibmvfc_init_vfc_cmd(evt, sdev);
iu = ibmvfc_get_fcp_iu(vhost, tmf);
@@ -2215,7 +2220,7 @@ static int ibmvfc_wait_for_ops(struct ibmvfc_host *vhost, void *device,
do {
wait = 0;
spin_lock_irqsave(vhost->host->host_lock, flags);
- list_for_each_entry(evt, &vhost->sent, queue) {
+ list_for_each_entry(evt, &vhost->crq.sent, queue_list) {
if (match(evt, device)) {
evt->eh_comp = ∁
wait++;
@@ -2229,7 +2234,7 @@ static int ibmvfc_wait_for_ops(struct ibmvfc_host *vhost, void *device,
if (!timeout) {
wait = 0;
spin_lock_irqsave(vhost->host->host_lock, flags);
- list_for_each_entry(evt, &vhost->sent, queue) {
+ list_for_each_entry(evt, &vhost->crq.sent, queue_list) {
if (match(evt, device)) {
evt->eh_comp = NULL;
wait++;
@@ -2274,7 +2279,7 @@ static int ibmvfc_cancel_all(struct scsi_device *sdev, int type)
ENTER;
spin_lock_irqsave(vhost->host->host_lock, flags);
found_evt = NULL;
- list_for_each_entry(evt, &vhost->sent, queue) {
+ list_for_each_entry(evt, &vhost->crq.sent, queue_list) {
if (evt->cmnd && evt->cmnd->device == sdev) {
found_evt = evt;
break;
@@ -2289,7 +2294,7 @@ static int ibmvfc_cancel_all(struct scsi_device *sdev, int type)
}
if (vhost->logged_in) {
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
ibmvfc_init_event(evt, ibmvfc_sync_completion, IBMVFC_MAD_FORMAT);
tmf = &evt->iu.tmf;
@@ -2411,7 +2416,7 @@ static int ibmvfc_abort_task_set(struct scsi_device *sdev)
spin_lock_irqsave(vhost->host->host_lock, flags);
found_evt = NULL;
- list_for_each_entry(evt, &vhost->sent, queue) {
+ list_for_each_entry(evt, &vhost->crq.sent, queue_list) {
if (evt->cmnd && evt->cmnd->device == sdev) {
found_evt = evt;
break;
@@ -2426,7 +2431,7 @@ static int ibmvfc_abort_task_set(struct scsi_device *sdev)
}
if (vhost->state == IBMVFC_ACTIVE) {
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
ibmvfc_init_event(evt, ibmvfc_sync_completion, IBMVFC_CMD_FORMAT);
tmf = ibmvfc_init_vfc_cmd(evt, sdev);
iu = ibmvfc_get_fcp_iu(vhost, tmf);
@@ -2917,7 +2922,7 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost)
* things we send. Make sure this response is to something we
* actually sent
*/
- if (unlikely(!ibmvfc_valid_event(&vhost->pool, evt))) {
+ if (unlikely(!ibmvfc_valid_event(&vhost->crq.evt_pool, evt))) {
dev_err(vhost->dev, "Returned correlation_token 0x%08llx is invalid!\n",
crq->ioba);
return;
@@ -2930,7 +2935,7 @@ static void ibmvfc_handle_crq(struct ibmvfc_crq *crq, struct ibmvfc_host *vhost)
}
del_timer(&evt->timer);
- list_del(&evt->queue);
+ list_del(&evt->queue_list);
ibmvfc_trc_end(evt);
evt->done(evt);
}
@@ -3508,7 +3513,7 @@ static void ibmvfc_tgt_send_prli(struct ibmvfc_target *tgt)
return;
kref_get(&tgt->kref);
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
vhost->discovery_threads++;
ibmvfc_init_event(evt, ibmvfc_tgt_prli_done, IBMVFC_MAD_FORMAT);
evt->tgt = tgt;
@@ -3615,7 +3620,7 @@ static void ibmvfc_tgt_send_plogi(struct ibmvfc_target *tgt)
kref_get(&tgt->kref);
tgt->logo_rcvd = 0;
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
vhost->discovery_threads++;
ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_INIT_WAIT);
ibmvfc_init_event(evt, ibmvfc_tgt_plogi_done, IBMVFC_MAD_FORMAT);
@@ -3690,7 +3695,7 @@ static struct ibmvfc_event *__ibmvfc_tgt_get_implicit_logout_evt(struct ibmvfc_t
struct ibmvfc_event *evt;
kref_get(&tgt->kref);
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
ibmvfc_init_event(evt, done, IBMVFC_MAD_FORMAT);
evt->tgt = tgt;
mad = &evt->iu.implicit_logout;
@@ -3855,7 +3860,7 @@ static void ibmvfc_tgt_move_login(struct ibmvfc_target *tgt)
return;
kref_get(&tgt->kref);
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
vhost->discovery_threads++;
ibmvfc_set_tgt_action(tgt, IBMVFC_TGT_ACTION_INIT_WAIT);
ibmvfc_init_event(evt, ibmvfc_tgt_move_login_done, IBMVFC_MAD_FORMAT);
@@ -4021,7 +4026,7 @@ static void ibmvfc_adisc_timeout(struct timer_list *t)
vhost->abort_threads++;
kref_get(&tgt->kref);
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
ibmvfc_init_event(evt, ibmvfc_tgt_adisc_cancel_done, IBMVFC_MAD_FORMAT);
evt->tgt = tgt;
@@ -4071,7 +4076,7 @@ static void ibmvfc_tgt_adisc(struct ibmvfc_target *tgt)
return;
kref_get(&tgt->kref);
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
vhost->discovery_threads++;
ibmvfc_init_event(evt, ibmvfc_tgt_adisc_done, IBMVFC_MAD_FORMAT);
evt->tgt = tgt;
@@ -4174,7 +4179,7 @@ static void ibmvfc_tgt_query_target(struct ibmvfc_target *tgt)
return;
kref_get(&tgt->kref);
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
vhost->discovery_threads++;
evt->tgt = tgt;
ibmvfc_init_event(evt, ibmvfc_tgt_query_target_done, IBMVFC_MAD_FORMAT);
@@ -4341,7 +4346,7 @@ static void ibmvfc_discover_targets_done(struct ibmvfc_event *evt)
static void ibmvfc_discover_targets(struct ibmvfc_host *vhost)
{
struct ibmvfc_discover_targets *mad;
- struct ibmvfc_event *evt = ibmvfc_get_event(vhost);
+ struct ibmvfc_event *evt = ibmvfc_get_event(&vhost->crq);
ibmvfc_init_event(evt, ibmvfc_discover_targets_done, IBMVFC_MAD_FORMAT);
mad = &evt->iu.discover_targets;
@@ -4454,7 +4459,7 @@ static void ibmvfc_npiv_login_done(struct ibmvfc_event *evt)
static void ibmvfc_npiv_login(struct ibmvfc_host *vhost)
{
struct ibmvfc_npiv_login_mad *mad;
- struct ibmvfc_event *evt = ibmvfc_get_event(vhost);
+ struct ibmvfc_event *evt = ibmvfc_get_event(&vhost->crq);
ibmvfc_gather_partition_info(vhost);
ibmvfc_set_login_info(vhost);
@@ -4491,7 +4496,7 @@ static void ibmvfc_npiv_logout_done(struct ibmvfc_event *evt)
switch (mad_status) {
case IBMVFC_MAD_SUCCESS:
- if (list_empty(&vhost->sent) &&
+ if (list_empty(&vhost->crq.sent) &&
vhost->action == IBMVFC_HOST_ACTION_LOGO_WAIT) {
ibmvfc_init_host(vhost);
return;
@@ -4519,7 +4524,7 @@ static void ibmvfc_npiv_logout(struct ibmvfc_host *vhost)
struct ibmvfc_npiv_logout_mad *mad;
struct ibmvfc_event *evt;
- evt = ibmvfc_get_event(vhost);
+ evt = ibmvfc_get_event(&vhost->crq);
ibmvfc_init_event(evt, ibmvfc_npiv_logout_done, IBMVFC_MAD_FORMAT);
mad = &evt->iu.npiv_logout;
@@ -5208,8 +5213,6 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
shost->unique_id = shost->host_no;
vhost = shost_priv(shost);
- INIT_LIST_HEAD(&vhost->sent);
- INIT_LIST_HEAD(&vhost->free);
INIT_LIST_HEAD(&vhost->targets);
sprintf(vhost->name, IBMVFC_NAME);
vhost->host = shost;
@@ -5241,7 +5244,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
goto kill_kthread;
}
- if ((rc = ibmvfc_init_event_pool(vhost))) {
+ if ((rc = ibmvfc_init_event_pool(vhost, &vhost->crq))) {
dev_err(dev, "Couldn't initialize event pool. rc=%d\n", rc);
goto release_crq;
}
@@ -5271,7 +5274,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
remove_shost:
scsi_remove_host(shost);
release_event_pool:
- ibmvfc_free_event_pool(vhost);
+ ibmvfc_free_event_pool(vhost, &vhost->crq);
release_crq:
ibmvfc_release_crq_queue(vhost);
kill_kthread:
@@ -5313,7 +5316,7 @@ static int ibmvfc_remove(struct vio_dev *vdev)
spin_lock_irqsave(vhost->host->host_lock, flags);
ibmvfc_purge_requests(vhost, DID_ERROR);
spin_unlock_irqrestore(vhost->host->host_lock, flags);
- ibmvfc_free_event_pool(vhost);
+ ibmvfc_free_event_pool(vhost, &vhost->crq);
ibmvfc_free_mem(vhost);
spin_lock(&ibmvfc_driver_lock);
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 5bf1621223d6..61c73b6f7a77 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -726,8 +726,9 @@ struct ibmvfc_target {
/* a unit of work for the hosting partition */
struct ibmvfc_event {
- struct list_head queue;
+ struct list_head queue_list;
struct ibmvfc_host *vhost;
+ struct ibmvfc_queue *queue;
struct ibmvfc_target *tgt;
struct scsi_cmnd *cmnd;
atomic_t free;
@@ -767,6 +768,10 @@ struct ibmvfc_queue {
dma_addr_t msg_token;
enum ibmvfc_msg_fmt fmt;
int size, cur;
+
+ struct ibmvfc_event_pool evt_pool;
+ struct list_head sent;
+ struct list_head free;
};
enum ibmvfc_host_action {
@@ -808,10 +813,7 @@ struct ibmvfc_host {
u32 trace_index:IBMVFC_NUM_TRACE_INDEX_BITS;
int num_targets;
struct list_head targets;
- struct list_head sent;
- struct list_head free;
struct device *dev;
- struct ibmvfc_event_pool pool;
struct dma_pool *sg_pool;
mempool_t *tgt_pool;
struct ibmvfc_queue crq;
--
2.27.0
^ permalink raw reply related
* [PATCH v2 1/5] ibmvfc: define generic queue structure for CRQs
From: Tyrel Datwyler @ 2021-01-06 20:18 UTC (permalink / raw)
To: james.bottomley
Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel,
Brian King, brking, linuxppc-dev
In-Reply-To: <20210106201835.1053593-1-tyreld@linux.ibm.com>
The primary and async CRQs are nearly identical outside of the format
and length of each message entry in the dma mapped page that represents
the queue data. These queues can be represented with a generic queue
structure that uses a union to differentiate between message format of
the mapped page.
This structure will further be leveraged in a followup patcheset that
introduce Sub-CRQs.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
drivers/scsi/ibmvscsi/ibmvfc.c | 135 +++++++++++++++++++++------------
drivers/scsi/ibmvscsi/ibmvfc.h | 34 +++++----
2 files changed, 107 insertions(+), 62 deletions(-)
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 42e4d35e0d35..c8e7c4701ac4 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -660,7 +660,7 @@ static void ibmvfc_init_host(struct ibmvfc_host *vhost)
}
if (!ibmvfc_set_host_state(vhost, IBMVFC_INITIALIZING)) {
- memset(vhost->async_crq.msgs, 0, PAGE_SIZE);
+ memset(vhost->async_crq.msgs.async, 0, PAGE_SIZE);
vhost->async_crq.cur = 0;
list_for_each_entry(tgt, &vhost->targets, queue)
@@ -713,6 +713,23 @@ static int ibmvfc_send_crq_init_complete(struct ibmvfc_host *vhost)
return ibmvfc_send_crq(vhost, 0xC002000000000000LL, 0);
}
+/**
+ * ibmvfc_free_queue - Deallocate queue
+ * @vhost: ibmvfc host struct
+ * @queue: ibmvfc queue struct
+ *
+ * Unmaps dma and deallocates page for messages
+ **/
+static void ibmvfc_free_queue(struct ibmvfc_host *vhost,
+ struct ibmvfc_queue *queue)
+{
+ struct device *dev = vhost->dev;
+
+ dma_unmap_single(dev, queue->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
+ free_page((unsigned long)queue->msgs.handle);
+ queue->msgs.handle = NULL;
+}
+
/**
* ibmvfc_release_crq_queue - Deallocates data and unregisters CRQ
* @vhost: ibmvfc host struct
@@ -724,7 +741,7 @@ static void ibmvfc_release_crq_queue(struct ibmvfc_host *vhost)
{
long rc = 0;
struct vio_dev *vdev = to_vio_dev(vhost->dev);
- struct ibmvfc_crq_queue *crq = &vhost->crq;
+ struct ibmvfc_queue *crq = &vhost->crq;
ibmvfc_dbg(vhost, "Releasing CRQ\n");
free_irq(vdev->irq, vhost);
@@ -737,8 +754,8 @@ static void ibmvfc_release_crq_queue(struct ibmvfc_host *vhost)
vhost->state = IBMVFC_NO_CRQ;
vhost->logged_in = 0;
- dma_unmap_single(vhost->dev, crq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
- free_page((unsigned long)crq->msgs);
+
+ ibmvfc_free_queue(vhost, crq);
}
/**
@@ -778,7 +795,7 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
int rc = 0;
unsigned long flags;
struct vio_dev *vdev = to_vio_dev(vhost->dev);
- struct ibmvfc_crq_queue *crq = &vhost->crq;
+ struct ibmvfc_queue *crq = &vhost->crq;
/* Close the CRQ */
do {
@@ -792,7 +809,7 @@ static int ibmvfc_reset_crq(struct ibmvfc_host *vhost)
vhost->logged_in = 0;
/* Clean out the queue */
- memset(crq->msgs, 0, PAGE_SIZE);
+ memset(crq->msgs.crq, 0, PAGE_SIZE);
crq->cur = 0;
/* And re-open it again */
@@ -1238,6 +1255,7 @@ static void ibmvfc_gather_partition_info(struct ibmvfc_host *vhost)
static void ibmvfc_set_login_info(struct ibmvfc_host *vhost)
{
struct ibmvfc_npiv_login *login_info = &vhost->login_info;
+ struct ibmvfc_queue *async_crq = &vhost->async_crq;
struct device_node *of_node = vhost->dev->of_node;
const char *location;
@@ -1257,7 +1275,8 @@ static void ibmvfc_set_login_info(struct ibmvfc_host *vhost)
login_info->max_cmds = cpu_to_be32(max_requests + IBMVFC_NUM_INTERNAL_REQ);
login_info->capabilities = cpu_to_be64(IBMVFC_CAN_MIGRATE | IBMVFC_CAN_SEND_VF_WWPN);
login_info->async.va = cpu_to_be64(vhost->async_crq.msg_token);
- login_info->async.len = cpu_to_be32(vhost->async_crq.size * sizeof(*vhost->async_crq.msgs));
+ login_info->async.len = cpu_to_be32(async_crq->size *
+ sizeof(*async_crq->msgs.async));
strncpy(login_info->partition_name, vhost->partition_name, IBMVFC_MAX_NAME);
strncpy(login_info->device_name,
dev_name(&vhost->host->shost_gendev), IBMVFC_MAX_NAME);
@@ -3230,10 +3249,10 @@ static struct scsi_host_template driver_template = {
**/
static struct ibmvfc_async_crq *ibmvfc_next_async_crq(struct ibmvfc_host *vhost)
{
- struct ibmvfc_async_crq_queue *async_crq = &vhost->async_crq;
+ struct ibmvfc_queue *async_crq = &vhost->async_crq;
struct ibmvfc_async_crq *crq;
- crq = &async_crq->msgs[async_crq->cur];
+ crq = &async_crq->msgs.async[async_crq->cur];
if (crq->valid & 0x80) {
if (++async_crq->cur == async_crq->size)
async_crq->cur = 0;
@@ -3253,10 +3272,10 @@ static struct ibmvfc_async_crq *ibmvfc_next_async_crq(struct ibmvfc_host *vhost)
**/
static struct ibmvfc_crq *ibmvfc_next_crq(struct ibmvfc_host *vhost)
{
- struct ibmvfc_crq_queue *queue = &vhost->crq;
+ struct ibmvfc_queue *queue = &vhost->crq;
struct ibmvfc_crq *crq;
- crq = &queue->msgs[queue->cur];
+ crq = &queue->msgs.crq[queue->cur];
if (crq->valid & 0x80) {
if (++queue->cur == queue->size)
queue->cur = 0;
@@ -4895,6 +4914,54 @@ static int ibmvfc_work(void *data)
return 0;
}
+/**
+ * ibmvfc_alloc_queue - Allocate queue
+ * @vhost: ibmvfc host struct
+ * @queue: ibmvfc queue to allocate
+ * @fmt: queue format to allocate
+ *
+ * Returns:
+ * 0 on success / non-zero on failure
+ **/
+static int ibmvfc_alloc_queue(struct ibmvfc_host *vhost,
+ struct ibmvfc_queue *queue,
+ enum ibmvfc_msg_fmt fmt)
+{
+ struct device *dev = vhost->dev;
+ size_t fmt_size;
+
+ ENTER;
+ switch (fmt) {
+ case IBMVFC_CRQ_FMT:
+ fmt_size = sizeof(*queue->msgs.crq);
+ break;
+ case IBMVFC_ASYNC_FMT:
+ fmt_size = sizeof(*queue->msgs.async);
+ break;
+ default:
+ dev_warn(dev, "Unknown command/response queue message format: %d\n", fmt);
+ return -EINVAL;
+ }
+
+ queue->msgs.handle = (void *)get_zeroed_page(GFP_KERNEL);
+ if (!queue->msgs.handle)
+ return -ENOMEM;
+
+ queue->msg_token = dma_map_single(dev, queue->msgs.handle, PAGE_SIZE,
+ DMA_BIDIRECTIONAL);
+
+ if (dma_mapping_error(dev, queue->msg_token)) {
+ free_page((unsigned long)queue->msgs.handle);
+ queue->msgs.handle = NULL;
+ return -ENOMEM;
+ }
+
+ queue->cur = 0;
+ queue->fmt = fmt;
+ queue->size = PAGE_SIZE / fmt_size;
+ return 0;
+}
+
/**
* ibmvfc_init_crq - Initializes and registers CRQ with hypervisor
* @vhost: ibmvfc host struct
@@ -4910,21 +4977,12 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost)
int rc, retrc = -ENOMEM;
struct device *dev = vhost->dev;
struct vio_dev *vdev = to_vio_dev(dev);
- struct ibmvfc_crq_queue *crq = &vhost->crq;
+ struct ibmvfc_queue *crq = &vhost->crq;
ENTER;
- crq->msgs = (struct ibmvfc_crq *)get_zeroed_page(GFP_KERNEL);
-
- if (!crq->msgs)
+ if (ibmvfc_alloc_queue(vhost, crq, IBMVFC_CRQ_FMT))
return -ENOMEM;
- crq->size = PAGE_SIZE / sizeof(*crq->msgs);
- crq->msg_token = dma_map_single(dev, crq->msgs,
- PAGE_SIZE, DMA_BIDIRECTIONAL);
-
- if (dma_mapping_error(dev, crq->msg_token))
- goto map_failed;
-
retrc = rc = plpar_hcall_norets(H_REG_CRQ, vdev->unit_address,
crq->msg_token, PAGE_SIZE);
@@ -4953,7 +5011,6 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost)
goto req_irq_failed;
}
- crq->cur = 0;
LEAVE;
return retrc;
@@ -4963,9 +5020,7 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost)
rc = plpar_hcall_norets(H_FREE_CRQ, vdev->unit_address);
} while (rc == H_BUSY || H_IS_LONG_BUSY(rc));
reg_crq_failed:
- dma_unmap_single(dev, crq->msg_token, PAGE_SIZE, DMA_BIDIRECTIONAL);
-map_failed:
- free_page((unsigned long)crq->msgs);
+ ibmvfc_free_queue(vhost, crq);
return retrc;
}
@@ -4978,7 +5033,7 @@ static int ibmvfc_init_crq(struct ibmvfc_host *vhost)
**/
static void ibmvfc_free_mem(struct ibmvfc_host *vhost)
{
- struct ibmvfc_async_crq_queue *async_q = &vhost->async_crq;
+ struct ibmvfc_queue *async_q = &vhost->async_crq;
ENTER;
mempool_destroy(vhost->tgt_pool);
@@ -4988,9 +5043,7 @@ static void ibmvfc_free_mem(struct ibmvfc_host *vhost)
dma_free_coherent(vhost->dev, sizeof(*vhost->login_buf),
vhost->login_buf, vhost->login_buf_dma);
dma_pool_destroy(vhost->sg_pool);
- dma_unmap_single(vhost->dev, async_q->msg_token,
- async_q->size * sizeof(*async_q->msgs), DMA_BIDIRECTIONAL);
- free_page((unsigned long)async_q->msgs);
+ ibmvfc_free_queue(vhost, async_q);
LEAVE;
}
@@ -5003,26 +5056,15 @@ static void ibmvfc_free_mem(struct ibmvfc_host *vhost)
**/
static int ibmvfc_alloc_mem(struct ibmvfc_host *vhost)
{
- struct ibmvfc_async_crq_queue *async_q = &vhost->async_crq;
+ struct ibmvfc_queue *async_q = &vhost->async_crq;
struct device *dev = vhost->dev;
ENTER;
- async_q->msgs = (struct ibmvfc_async_crq *)get_zeroed_page(GFP_KERNEL);
- if (!async_q->msgs) {
- dev_err(dev, "Couldn't allocate async queue.\n");
+ if (ibmvfc_alloc_queue(vhost, async_q, IBMVFC_ASYNC_FMT)) {
+ dev_err(dev, "Couldn't allocate/map async queue.\n");
goto nomem;
}
- async_q->size = PAGE_SIZE / sizeof(struct ibmvfc_async_crq);
- async_q->msg_token = dma_map_single(dev, async_q->msgs,
- async_q->size * sizeof(*async_q->msgs),
- DMA_BIDIRECTIONAL);
-
- if (dma_mapping_error(dev, async_q->msg_token)) {
- dev_err(dev, "Failed to map async queue\n");
- goto free_async_crq;
- }
-
vhost->sg_pool = dma_pool_create(IBMVFC_NAME, dev,
SG_ALL * sizeof(struct srp_direct_buf),
sizeof(struct srp_direct_buf), 0);
@@ -5077,10 +5119,7 @@ static int ibmvfc_alloc_mem(struct ibmvfc_host *vhost)
free_sg_pool:
dma_pool_destroy(vhost->sg_pool);
unmap_async_crq:
- dma_unmap_single(dev, async_q->msg_token,
- async_q->size * sizeof(*async_q->msgs), DMA_BIDIRECTIONAL);
-free_async_crq:
- free_page((unsigned long)async_q->msgs);
+ ibmvfc_free_queue(vhost, async_q);
nomem:
LEAVE;
return -ENOMEM;
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.h b/drivers/scsi/ibmvscsi/ibmvfc.h
index 9d58cfd774d3..5bf1621223d6 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.h
+++ b/drivers/scsi/ibmvscsi/ibmvfc.h
@@ -645,12 +645,6 @@ struct ibmvfc_crq {
volatile __be64 ioba;
} __packed __aligned(8);
-struct ibmvfc_crq_queue {
- struct ibmvfc_crq *msgs;
- int size, cur;
- dma_addr_t msg_token;
-};
-
enum ibmvfc_ae_link_state {
IBMVFC_AE_LS_LINK_UP = 0x01,
IBMVFC_AE_LS_LINK_BOUNCED = 0x02,
@@ -678,12 +672,6 @@ struct ibmvfc_async_crq {
__be64 reserved;
} __packed __aligned(8);
-struct ibmvfc_async_crq_queue {
- struct ibmvfc_async_crq *msgs;
- int size, cur;
- dma_addr_t msg_token;
-};
-
union ibmvfc_iu {
struct ibmvfc_mad_common mad_common;
struct ibmvfc_npiv_login_mad npiv_login;
@@ -763,6 +751,24 @@ struct ibmvfc_event_pool {
dma_addr_t iu_token;
};
+enum ibmvfc_msg_fmt {
+ IBMVFC_CRQ_FMT = 0,
+ IBMVFC_ASYNC_FMT,
+};
+
+union ibmvfc_msgs {
+ void *handle;
+ struct ibmvfc_crq *crq;
+ struct ibmvfc_async_crq *async;
+};
+
+struct ibmvfc_queue {
+ union ibmvfc_msgs msgs;
+ dma_addr_t msg_token;
+ enum ibmvfc_msg_fmt fmt;
+ int size, cur;
+};
+
enum ibmvfc_host_action {
IBMVFC_HOST_ACTION_NONE = 0,
IBMVFC_HOST_ACTION_RESET,
@@ -808,8 +814,8 @@ struct ibmvfc_host {
struct ibmvfc_event_pool pool;
struct dma_pool *sg_pool;
mempool_t *tgt_pool;
- struct ibmvfc_crq_queue crq;
- struct ibmvfc_async_crq_queue async_crq;
+ struct ibmvfc_queue crq;
+ struct ibmvfc_queue async_crq;
struct ibmvfc_npiv_login login_info;
union ibmvfc_npiv_login_data *login_buf;
dma_addr_t login_buf_dma;
--
2.27.0
^ permalink raw reply related
* Re: [PATCH -next] pci/controller/dwc: convert comma to semicolon
From: Bjorn Helgaas @ 2021-01-06 19:07 UTC (permalink / raw)
To: Zheng Yongjun
Cc: robh, roy.zang, linux-pci, linux-kernel, minghuan.Lian,
linux-arm-kernel, linuxppc-dev, mingkai.hu
In-Reply-To: <20201216131944.14990-1-zhengyongjun3@huawei.com>
On Wed, Dec 16, 2020 at 09:19:44PM +0800, Zheng Yongjun wrote:
> Replace a comma between expression statements by a semicolon.
Looks like a good fix, but read this about the changelog title:
https://lore.kernel.org/r/20171026223701.GA25649@bhelgaas-glaptop.roam.corp.google.com
> Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com>
> ---
> drivers/pci/controller/dwc/pci-layerscape-ep.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> index 84206f265e54..917ba8d254fc 100644
> --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> @@ -178,7 +178,7 @@ static int __init ls_pcie_ep_probe(struct platform_device *pdev)
> pci->dev = dev;
> pci->ops = pcie->drvdata->dw_pcie_ops;
>
> - ls_epc->bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4),
> + ls_epc->bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4);
>
> pcie->pci = pci;
> pcie->ls_epc = ls_epc;
> --
> 2.22.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply
* Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool
From: Konrad Rzeszutek Wilk @ 2021-01-06 18:57 UTC (permalink / raw)
To: Claire Chang
Cc: heikki.krogerus, peterz, grant.likely, paulus, frowand.list,
mingo, m.szyprowski, sstabellini, saravanak, joro,
rafael.j.wysocki, hch, bgolaszewski, xen-devel, treding,
devicetree, will, dan.j.williams, linuxppc-dev, robh+dt,
boris.ostrovsky, andriy.shevchenko, jgross, drinkcat, gregkh,
rdunlap, linux-kernel, tfiga, iommu, xypron.glpk, robin.murphy,
bauerman
In-Reply-To: <20210106034124.30560-6-tientzu@chromium.org>
On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
> Introduce the new compatible string, restricted-dma-pool, for restricted
> DMA. One can specify the address and length of the restricted DMA memory
> region by restricted-dma-pool in the device tree.
>
> Signed-off-by: Claire Chang <tientzu@chromium.org>
> ---
> .../reserved-memory/reserved-memory.txt | 24 +++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> index e8d3096d922c..44975e2a1fd2 100644
> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> @@ -51,6 +51,20 @@ compatible (optional) - standard definition
> used as a shared pool of DMA buffers for a set of devices. It can
> be used by an operating system to instantiate the necessary pool
> management subsystem if necessary.
> + - restricted-dma-pool: This indicates a region of memory meant to be
> + used as a pool of restricted DMA buffers for a set of devices. The
> + memory region would be the only region accessible to those devices.
> + When using this, the no-map and reusable properties must not be set,
> + so the operating system can create a virtual mapping that will be used
> + for synchronization. The main purpose for restricted DMA is to
> + mitigate the lack of DMA access control on systems without an IOMMU,
> + which could result in the DMA accessing the system memory at
> + unexpected times and/or unexpected addresses, possibly leading to data
> + leakage or corruption. The feature on its own provides a basic level
> + of protection against the DMA overwriting buffer contents at
> + unexpected times. However, to protect against general data leakage and
> + system memory corruption, the system needs to provide way to restrict
> + the DMA to a predefined memory region.
Heya!
I think I am missing something obvious here so please bear with my
questions:
- This code adds the means of having the SWIOTLB pool tied to a specific
memory correct?
- Nothing stops the physical device from bypassing the SWIOTLB buffer.
That is if an errant device screwed up the length or DMA address, the
SWIOTLB would gladly do what the device told it do?
- This has to be combined with SWIOTLB-force-ish to always use the
bounce buffer, otherwise you could still do DMA without using
SWIOTLB (by not hitting the criteria for needing to use SWIOTLB)?
^ permalink raw reply
* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Konrad Rzeszutek Wilk @ 2021-01-06 18:52 UTC (permalink / raw)
To: Claire Chang
Cc: heikki.krogerus, peterz, grant.likely, paulus, frowand.list,
mingo, m.szyprowski, sstabellini, saravanak, joro,
rafael.j.wysocki, hch, bgolaszewski, xen-devel, treding,
devicetree, will, dan.j.williams, linuxppc-dev, robh+dt,
boris.ostrovsky, andriy.shevchenko, jgross, drinkcat, gregkh,
rdunlap, linux-kernel, tfiga, iommu, xypron.glpk, robin.murphy,
bauerman
In-Reply-To: <20210106034124.30560-3-tientzu@chromium.org>
Hello!
In this file:
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index e4368159f88a..7fb2ac087d23 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
..
> +static const struct reserved_mem_ops rmem_swiotlb_ops = {
> + .device_init = rmem_swiotlb_device_init,
> + .device_release = rmem_swiotlb_device_release,
> +};
> +
> +static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
> +{
> + unsigned long node = rmem->fdt_node;
> +
> + if (of_get_flat_dt_prop(node, "reusable", NULL) ||
> + of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
> + of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
> + of_get_flat_dt_prop(node, "no-map", NULL))
> + return -EINVAL;
> +
> + rmem->ops = &rmem_swiotlb_ops;
> + pr_info("Reserved memory: created device swiotlb memory pool at %pa, size %ld MiB\n",
> + &rmem->base, (unsigned long)rmem->size / SZ_1M);
> + return 0;
> +}
> +
> +RESERVEDMEM_OF_DECLARE(dma, "restricted-dma-pool", rmem_swiotlb_setup);
The code should be as much as possible arch-agnostic. That is why there
are multiple -swiotlb files scattered in arch directories that own the
architecture specific code.
Would it be possible to move the code there and perhaps have a ARM
specific front-end for this DMA restricted pool there? See for example
the xen-swiotlb code.
Cheers!
Konrad
^ permalink raw reply
* Re: [RFC PATCH v3 0/6] Restricted DMA
From: Florian Fainelli @ 2021-01-06 18:48 UTC (permalink / raw)
To: Claire Chang, robh+dt, mpe, benh, paulus, joro, will,
frowand.list, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
hch, m.szyprowski, robin.murphy
Cc: heikki.krogerus, peterz, grant.likely, mingo, drinkcat, saravanak,
xypron.glpk, rafael.j.wysocki, bgolaszewski, xen-devel, treding,
devicetree, dan.j.williams, andriy.shevchenko, gregkh, rdunlap,
linux-kernel, tfiga, iommu, Jim Quinlan, linuxppc-dev, bauerman
In-Reply-To: <20210106034124.30560-1-tientzu@chromium.org>
Hi,
First of all let me say that I am glad that someone is working on a
upstream solution for this issue, would appreciate if you could CC and
Jim Quinlan on subsequent submissions.
On 1/5/21 7:41 PM, Claire Chang wrote:
> This series implements mitigations for lack of DMA access control on
> systems without an IOMMU, which could result in the DMA accessing the
> system memory at unexpected times and/or unexpected addresses, possibly
> leading to data leakage or corruption.
>
> For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> not behind an IOMMU. As PCI-e, by design, gives the device full access to
> system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> full chain of exploits; [2], [3]).
>
> To mitigate the security concerns, we introduce restricted DMA. Restricted
> DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> specially allocated region and does memory allocation from the same region.
> The feature on its own provides a basic level of protection against the DMA
> overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system needs
> to provide a way to restrict the DMA to a predefined memory region (this is
> usually done at firmware level, e.g. in ATF on some ARM platforms).
Can you explain how ATF gets involved and to what extent it does help,
besides enforcing a secure region from the ARM CPU's perpsective? Does
the PCIe root complex not have an IOMMU but can somehow be denied access
to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
still some sort of basic protection that the HW enforces, right?
On Broadcom STB SoCs we have had something similar for a while however
and while we don't have an IOMMU for the PCIe bridge, we do have a a
basic protection mechanism whereby we can configure a region in DRAM to
be PCIe read/write and CPU read/write which then gets used as the PCIe
inbound region for the PCIe EP. By default the PCIe bridge is not
allowed access to DRAM so we must call into a security agent to allow
the PCIe bridge to access the designated DRAM region.
We have done this using a private CMA area region assigned via Device
Tree, assigned with a and requiring the PCIe EP driver to use
dma_alloc_from_contiguous() in order to allocate from this device
private CMA area. The only drawback with that approach is that it
requires knowing how much memory you need up front for buffers and DMA
descriptors that the PCIe EP will need to process. The problem is that
it requires driver modifications and that does not scale over the number
of PCIe EP drivers, some we absolutely do not control, but there is no
need to bounce buffer. Your approach scales better across PCIe EP
drivers however it does require bounce buffering which could be a
performance hit.
Thanks!
--
Florian
^ permalink raw reply
* Re: [PATCH v2 4/5] ibmvfc: complete commands outside the host/queue lock
From: Tyrel Datwyler @ 2021-01-06 17:18 UTC (permalink / raw)
To: Martin K. Petersen
Cc: linux-scsi, linux-kernel, james.bottomley, Brian King, brking,
linuxppc-dev
In-Reply-To: <yq1v9caekxl.fsf@ca-mkp.ca.oracle.com>
On 1/5/21 8:42 PM, Martin K. Petersen wrote:
>
> Tyrel,
>
>> Drain the command queue and place all commands on a completion list.
>> Perform command completion on that list outside the host/queue locks.
>> Further, move purged command compeletions outside the host_lock as well.
>
> Please resubmit entire series instead of amending individual patches.
>
> thanks!
>
No problem. I wasn't sure since it was simply adding a "static" keyword. I'll
send a v2 out today.
-Tyrel
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox