* Re: [PATCH] tools/perf: Fix powerpc gap between kernel end and module start
From: Athira Rajeev @ 2021-01-13 6:44 UTC (permalink / raw)
To: Jiri Olsa
Cc: Kajol Jain, Madhavan Srinivasan, linuxppc-dev, Jiri Olsa,
Arnaldo Carvalho de Melo
In-Reply-To: <20210112093811.GA1272772@krava>
> On 12-Jan-2021, at 3:08 PM, Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Mon, Dec 28, 2020 at 09:14:14PM -0500, Athira Rajeev wrote:
>
> SNIP
>
>> c000000002799370 b backtrace_flag
>> c000000002799378 B radix_tree_node_cachep
>> c000000002799380 B __bss_stop
>> c0000000027a0000 B _end
>> c008000003890000 t icmp_checkentry [ip_tables]
>> c008000003890038 t ipt_alloc_initial_table [ip_tables]
>> c008000003890468 T ipt_do_table [ip_tables]
>> c008000003890de8 T ipt_unregister_table_pre_exit [ip_tables]
>> ...
>>
>> Perf calls function symbols__fixup_end() which sets the end of symbol
>> to 0xc008000003890000, which is the next address and this is the start
>> address of first module (icmp_checkentry in above) which will make the
>> huge symbol size of 0x80000010f0000.
>>
>> After symbols__fixup_end:
>> symbols__fixup_end: sym->name: _end, sym->start: 0xc0000000027a0000,
>> sym->end: 0xc008000003890000
>>
>> On powerpc, kernel text segment is located at 0xc000000000000000
>> whereas the modules are located at very high memory addresses,
>> 0xc00800000xxxxxxx. Since the gap between end of kernel text segment
>> and beginning of first module's address is high, histogram allocation
>> using calloc fails.
>>
>> Fix this by detecting the kernel's last symbol and limiting
>> the range of last kernel symbol to pagesize.
>>
>> Signed-off-by: Athira Rajeev<atrajeev@linux.vnet.ibm.com>
>
> I can't test, but since the same approach works for arm and s390,
> this also looks ok
>
> Acked-by: Jiri Olsa <jolsa@redhat.com>
>
> thanks,
> jirka
Thanks Jiri for reviewing the patch,
Athira
>
>> ---
>> tools/perf/arch/powerpc/util/Build | 1 +
>> tools/perf/arch/powerpc/util/machine.c | 24 ++++++++++++++++++++++++
>> 2 files changed, 25 insertions(+)
>> create mode 100644 tools/perf/arch/powerpc/util/machine.c
>>
>> diff --git a/tools/perf/arch/powerpc/util/Build b/tools/perf/arch/powerpc/util/Build
>> index e86e210bf514..b7945e5a543b 100644
>> --- a/tools/perf/arch/powerpc/util/Build
>> +++ b/tools/perf/arch/powerpc/util/Build
>> @@ -1,4 +1,5 @@
>> perf-y += header.o
>> +perf-y += machine.o
>> perf-y += kvm-stat.o
>> perf-y += perf_regs.o
>> perf-y += mem-events.o
>> diff --git a/tools/perf/arch/powerpc/util/machine.c b/tools/perf/arch/powerpc/util/machine.c
>> new file mode 100644
>> index 000000000000..c30e5cc88c16
>> --- /dev/null
>> +++ b/tools/perf/arch/powerpc/util/machine.c
>> @@ -0,0 +1,24 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#include <stdio.h>
>> +#include <string.h>
>> +#include <internal/lib.h> // page_size
>> +#include "debug.h"
>> +#include "symbol.h"
>> +
>> +/* On powerpc kernel text segment start at memory addresses, 0xc000000000000000
>> + * whereas the modules are located at very high memory addresses,
>> + * for example 0xc00800000xxxxxxx. The gap between end of kernel text segment
>> + * and beginning of first module's text segment is very high.
>> + * Therefore do not fill this gap and do not assign it to the kernel dso map.
>> + */
>> +
>> +void arch__symbols__fixup_end(struct symbol *p, struct symbol *c)
>> +{
>> + if (strchr(p->name, '[') == NULL && strchr(c->name, '['))
>> + /* Limit the range of last kernel symbol */
>> + p->end += page_size;
>> + else
>> + p->end = c->start;
>> + pr_debug4("%s sym:%s end:%#lx\n", __func__, p->name, p->end);
>> +}
>> --
>> 1.8.3.1
^ permalink raw reply
* [PATCH for 5.10] powerpc/32s: Fix RTAS machine check with VMAP stack
From: Christophe Leroy @ 2021-01-13 6:40 UTC (permalink / raw)
To: stable; +Cc: linuxppc-dev, linux-kernel
This is backport for 5.10
(cherry picked from commit 98bf2d3f4970179c702ef64db658e0553bc6ef3a)
When we have VMAP stack, exception prolog 1 sets r1, not r11.
When it is not an RTAS machine check, don't trash r1 because it is
needed by prolog 1.
Fixes: da7bb43ab9da ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Squash in fixup for RTAS machine check from Christophe]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bc77d61d1c18940e456a2dee464f1e2eda65a3f0.1608621048.git.christophe.leroy@csgroup.eu
---
arch/powerpc/kernel/head_book3s_32.S | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index a0dda2a1f2df..d66da35f2e8d 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -262,10 +262,19 @@ __secondary_hold_acknowledge:
MachineCheck:
EXCEPTION_PROLOG_0
#ifdef CONFIG_PPC_CHRP
+#ifdef CONFIG_VMAP_STACK
+ mr r11, r1
+ mfspr r1, SPRN_SPRG_THREAD
+ lwz r1, RTAS_SP(r1)
+ cmpwi cr1, r1, 0
+ bne cr1, 7f
+ mr r1, r11
+#else
mfspr r11, SPRN_SPRG_THREAD
lwz r11, RTAS_SP(r11)
cmpwi cr1, r11, 0
bne cr1, 7f
+#endif
#endif /* CONFIG_PPC_CHRP */
EXCEPTION_PROLOG_1 for_rtas=1
7: EXCEPTION_PROLOG_2
--
2.25.0
^ permalink raw reply related
* Re: [PATCH v2 0/5] ibmvfc: MQ preparatory locking work
From: Martin K. Petersen @ 2021-01-13 5:48 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: brking, linuxppc-dev, linux-kernel, Martin K . Petersen,
linux-scsi
In-Reply-To: <20210106201835.1053593-1-tyreld@linux.ibm.com>
On Wed, 6 Jan 2021 14:18:30 -0600, Tyrel Datwyler wrote:
> The ibmvfc driver in its current form relies heavily on the host_lock. This
> patchset introduces a genric queue with its own queue lock and sent/free event
> list locks. This generic queue allows the driver to decouple the primary queue
> and future subordinate queues from the host lock reducing lock contention while
> also relaxing locking for submissions and completions to simply the list lock of
> the queue in question.
>
> [...]
Applied to 5.12/scsi-queue, thanks!
[1/5] ibmvfc: define generic queue structure for CRQs
https://git.kernel.org/mkp/scsi/c/f8968665af28
[2/5] ibmvfc: make command event pool queue specific
https://git.kernel.org/mkp/scsi/c/e4b26f3db864
[3/5] ibmvfc: define per-queue state/list locks
https://git.kernel.org/mkp/scsi/c/57e80e0bc108
[4/5] ibmvfc: complete commands outside the host/queue lock
https://git.kernel.org/mkp/scsi/c/1f4a4a19508d
[5/5] ibmvfc: relax locking around ibmvfc_queuecommand
https://git.kernel.org/mkp/scsi/c/654080d02edb
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply
* Re: [RFC PATCH v3 0/6] Restricted DMA
From: Florian Fainelli @ 2021-01-13 4:41 UTC (permalink / raw)
To: Tomasz Figa
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
xypron.glpk, Joerg Roedel, Rafael J . Wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
Will Deacon, Konrad Rzeszutek Wilk, Dan Williams, linuxppc-dev,
Rob Herring, Claire Chang, boris.ostrovsky, Andy Shevchenko,
jgross, Nicolas Boichat, Greg KH, Randy Dunlap, lkml,
list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, bauerman
In-Reply-To: <CAAFQd5AGm4U8hD4jHmw10S7MRS1-ZUSq7eGgoUifMMyfZgP2NA@mail.gmail.com>
On 1/12/2021 8:25 PM, Tomasz Figa wrote:
> On Wed, Jan 13, 2021 at 12:56 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>>
>>
>> On 1/12/2021 6:29 PM, Tomasz Figa wrote:
>>> Hi Florian,
>>>
>>> On Wed, Jan 13, 2021 at 3:01 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>>
>>>> On 1/11/21 11:48 PM, Claire Chang wrote:
>>>>> On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>>>>
>>>>>> On 1/7/21 9:42 AM, Claire Chang wrote:
>>>>>>
>>>>>>>> Can you explain how ATF gets involved and to what extent it does help,
>>>>>>>> besides enforcing a secure region from the ARM CPU's perpsective? Does
>>>>>>>> the PCIe root complex not have an IOMMU but can somehow be denied access
>>>>>>>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
>>>>>>>> still some sort of basic protection that the HW enforces, right?
>>>>>>>
>>>>>>> We need the ATF support for memory MPU (memory protection unit).
>>>>>>> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
>>>>>>> region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
>>>>>>> that specific regions.
>>>>>>
>>>>>> OK so you do have a protection unit of some sort to enforce which region
>>>>>> in DRAM the PCIE bridge is allowed to access, that makes sense,
>>>>>> otherwise the restricted DMA region would only be a hint but nothing you
>>>>>> can really enforce. This is almost entirely analogous to our systems then.
>>>>>
>>>>> Here is the example of setting the MPU:
>>>>> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
>>>>>
>>>>>>
>>>>>> There may be some value in standardizing on an ARM SMCCC call then since
>>>>>> you already support two different SoC vendors.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> On Broadcom STB SoCs we have had something similar for a while however
>>>>>>>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
>>>>>>>> basic protection mechanism whereby we can configure a region in DRAM to
>>>>>>>> be PCIe read/write and CPU read/write which then gets used as the PCIe
>>>>>>>> inbound region for the PCIe EP. By default the PCIe bridge is not
>>>>>>>> allowed access to DRAM so we must call into a security agent to allow
>>>>>>>> the PCIe bridge to access the designated DRAM region.
>>>>>>>>
>>>>>>>> We have done this using a private CMA area region assigned via Device
>>>>>>>> Tree, assigned with a and requiring the PCIe EP driver to use
>>>>>>>> dma_alloc_from_contiguous() in order to allocate from this device
>>>>>>>> private CMA area. The only drawback with that approach is that it
>>>>>>>> requires knowing how much memory you need up front for buffers and DMA
>>>>>>>> descriptors that the PCIe EP will need to process. The problem is that
>>>>>>>> it requires driver modifications and that does not scale over the number
>>>>>>>> of PCIe EP drivers, some we absolutely do not control, but there is no
>>>>>>>> need to bounce buffer. Your approach scales better across PCIe EP
>>>>>>>> drivers however it does require bounce buffering which could be a
>>>>>>>> performance hit.
>>>>>>>
>>>>>>> Only the streaming DMA (map/unmap) needs bounce buffering.
>>>>>>
>>>>>> True, and typically only on transmit since you don't really control
>>>>>> where the sk_buff are allocated from, right? On RX since you need to
>>>>>> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
>>>>>> them from a pool that already falls within the restricted DMA region, right?
>>>>>>
>>>>>
>>>>> Right, but applying bounce buffering to RX will make it more secure.
>>>>> The device won't be able to modify the content after unmap. Just like what
>>>>> iommu_unmap does.
>>>>
>>>> Sure, however the goals of using bounce buffering equally applies to RX
>>>> and TX in that this is the only layer sitting between a stack (block,
>>>> networking, USB, etc.) and the underlying device driver that scales well
>>>> in order to massage a dma_addr_t to be within a particular physical range.
>>>>
>>>> There is however room for improvement if the drivers are willing to
>>>> change their buffer allocation strategy. When you receive Wi-Fi frames
>>>> you need to allocate buffers for the Wi-Fi device to DMA into, and that
>>>> happens ahead of the DMA transfers by the Wi-Fi device. At buffer
>>>> allocation time you could very well allocate these frames from the
>>>> restricted DMA region without having to bounce buffer them since the
>>>> host CPU is in control over where and when to DMA into.
>>>>
>>>
>>> That is, however, still a trade-off between saving that one copy and
>>> protection from the DMA tampering with the packet contents when the
>>> kernel is reading them. Notice how the copy effectively makes a
>>> snapshot of the contents, guaranteeing that the kernel has a
>>> consistent view of the packet, which is not true if the DMA could
>>> modify the buffer contents in the middle of CPU accesses.
>>
>> I would say that the window just became so much narrower for the PCIe
>> end-point to overwrite contents with the copy because it would have to
>> happen within the dma_unmap_{page,single} time and before the copy is
>> finished to the bounce buffer.
>
> Not only. Imagine this:
>
> a) Without bouncing:
>
> - RX interrupt
> - Pass the packet to the network stack
> - Network stack validates the packet
> - DMA overwrites the packet
> - Network stack goes boom, because the packet changed after validation
>
> b) With bouncing:
>
> - RX interrupt
> - Copy the packet to a DMA-inaccessible buffer
> - Network stack validates the packet
> - Network stack is happy, because the packet is guaranteed to stay the
> same after validation
Yes that's a much safer set of operations, thanks for walking through a
practical example.
--
Florian
^ permalink raw reply
* Re: [RFC PATCH v3 0/6] Restricted DMA
From: Tomasz Figa @ 2021-01-13 4:25 UTC (permalink / raw)
To: Florian Fainelli
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
xypron.glpk, Joerg Roedel, Rafael J . Wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
Will Deacon, Konrad Rzeszutek Wilk, Dan Williams, linuxppc-dev,
Rob Herring, Claire Chang, boris.ostrovsky, Andy Shevchenko,
jgross, Nicolas Boichat, Greg KH, Randy Dunlap, lkml,
list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, bauerman
In-Reply-To: <c7f7941d-b8bd-f0f3-4e40-b899a77188bf@gmail.com>
On Wed, Jan 13, 2021 at 12:56 PM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 1/12/2021 6:29 PM, Tomasz Figa wrote:
> > Hi Florian,
> >
> > On Wed, Jan 13, 2021 at 3:01 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
> >>
> >> On 1/11/21 11:48 PM, Claire Chang wrote:
> >>> On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
> >>>>
> >>>> On 1/7/21 9:42 AM, Claire Chang wrote:
> >>>>
> >>>>>> Can you explain how ATF gets involved and to what extent it does help,
> >>>>>> besides enforcing a secure region from the ARM CPU's perpsective? Does
> >>>>>> the PCIe root complex not have an IOMMU but can somehow be denied access
> >>>>>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> >>>>>> still some sort of basic protection that the HW enforces, right?
> >>>>>
> >>>>> We need the ATF support for memory MPU (memory protection unit).
> >>>>> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
> >>>>> region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
> >>>>> that specific regions.
> >>>>
> >>>> OK so you do have a protection unit of some sort to enforce which region
> >>>> in DRAM the PCIE bridge is allowed to access, that makes sense,
> >>>> otherwise the restricted DMA region would only be a hint but nothing you
> >>>> can really enforce. This is almost entirely analogous to our systems then.
> >>>
> >>> Here is the example of setting the MPU:
> >>> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
> >>>
> >>>>
> >>>> There may be some value in standardizing on an ARM SMCCC call then since
> >>>> you already support two different SoC vendors.
> >>>>
> >>>>>
> >>>>>>
> >>>>>> On Broadcom STB SoCs we have had something similar for a while however
> >>>>>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> >>>>>> basic protection mechanism whereby we can configure a region in DRAM to
> >>>>>> be PCIe read/write and CPU read/write which then gets used as the PCIe
> >>>>>> inbound region for the PCIe EP. By default the PCIe bridge is not
> >>>>>> allowed access to DRAM so we must call into a security agent to allow
> >>>>>> the PCIe bridge to access the designated DRAM region.
> >>>>>>
> >>>>>> We have done this using a private CMA area region assigned via Device
> >>>>>> Tree, assigned with a and requiring the PCIe EP driver to use
> >>>>>> dma_alloc_from_contiguous() in order to allocate from this device
> >>>>>> private CMA area. The only drawback with that approach is that it
> >>>>>> requires knowing how much memory you need up front for buffers and DMA
> >>>>>> descriptors that the PCIe EP will need to process. The problem is that
> >>>>>> it requires driver modifications and that does not scale over the number
> >>>>>> of PCIe EP drivers, some we absolutely do not control, but there is no
> >>>>>> need to bounce buffer. Your approach scales better across PCIe EP
> >>>>>> drivers however it does require bounce buffering which could be a
> >>>>>> performance hit.
> >>>>>
> >>>>> Only the streaming DMA (map/unmap) needs bounce buffering.
> >>>>
> >>>> True, and typically only on transmit since you don't really control
> >>>> where the sk_buff are allocated from, right? On RX since you need to
> >>>> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
> >>>> them from a pool that already falls within the restricted DMA region, right?
> >>>>
> >>>
> >>> Right, but applying bounce buffering to RX will make it more secure.
> >>> The device won't be able to modify the content after unmap. Just like what
> >>> iommu_unmap does.
> >>
> >> Sure, however the goals of using bounce buffering equally applies to RX
> >> and TX in that this is the only layer sitting between a stack (block,
> >> networking, USB, etc.) and the underlying device driver that scales well
> >> in order to massage a dma_addr_t to be within a particular physical range.
> >>
> >> There is however room for improvement if the drivers are willing to
> >> change their buffer allocation strategy. When you receive Wi-Fi frames
> >> you need to allocate buffers for the Wi-Fi device to DMA into, and that
> >> happens ahead of the DMA transfers by the Wi-Fi device. At buffer
> >> allocation time you could very well allocate these frames from the
> >> restricted DMA region without having to bounce buffer them since the
> >> host CPU is in control over where and when to DMA into.
> >>
> >
> > That is, however, still a trade-off between saving that one copy and
> > protection from the DMA tampering with the packet contents when the
> > kernel is reading them. Notice how the copy effectively makes a
> > snapshot of the contents, guaranteeing that the kernel has a
> > consistent view of the packet, which is not true if the DMA could
> > modify the buffer contents in the middle of CPU accesses.
>
> I would say that the window just became so much narrower for the PCIe
> end-point to overwrite contents with the copy because it would have to
> happen within the dma_unmap_{page,single} time and before the copy is
> finished to the bounce buffer.
Not only. Imagine this:
a) Without bouncing:
- RX interrupt
- Pass the packet to the network stack
- Network stack validates the packet
- DMA overwrites the packet
- Network stack goes boom, because the packet changed after validation
b) With bouncing:
- RX interrupt
- Copy the packet to a DMA-inaccessible buffer
- Network stack validates the packet
- Network stack is happy, because the packet is guaranteed to stay the
same after validation
Best regards,
Tomasz
^ permalink raw reply
* Re: [RFC PATCH v3 0/6] Restricted DMA
From: Florian Fainelli @ 2021-01-13 3:56 UTC (permalink / raw)
To: Tomasz Figa
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
xypron.glpk, Joerg Roedel, Rafael J . Wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
Will Deacon, Konrad Rzeszutek Wilk, Dan Williams, linuxppc-dev,
Rob Herring, Claire Chang, boris.ostrovsky, Andy Shevchenko,
jgross, Nicolas Boichat, Greg KH, Randy Dunlap, lkml,
list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, bauerman
In-Reply-To: <CAAFQd5DX=AdaYSYQbxgnrYYojkM5q7EE_Qs-BYPOiNjcQWbN1A@mail.gmail.com>
On 1/12/2021 6:29 PM, Tomasz Figa wrote:
> Hi Florian,
>
> On Wed, Jan 13, 2021 at 3:01 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>> On 1/11/21 11:48 PM, Claire Chang wrote:
>>> On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>>
>>>> On 1/7/21 9:42 AM, Claire Chang wrote:
>>>>
>>>>>> Can you explain how ATF gets involved and to what extent it does help,
>>>>>> besides enforcing a secure region from the ARM CPU's perpsective? Does
>>>>>> the PCIe root complex not have an IOMMU but can somehow be denied access
>>>>>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
>>>>>> still some sort of basic protection that the HW enforces, right?
>>>>>
>>>>> We need the ATF support for memory MPU (memory protection unit).
>>>>> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
>>>>> region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
>>>>> that specific regions.
>>>>
>>>> OK so you do have a protection unit of some sort to enforce which region
>>>> in DRAM the PCIE bridge is allowed to access, that makes sense,
>>>> otherwise the restricted DMA region would only be a hint but nothing you
>>>> can really enforce. This is almost entirely analogous to our systems then.
>>>
>>> Here is the example of setting the MPU:
>>> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
>>>
>>>>
>>>> There may be some value in standardizing on an ARM SMCCC call then since
>>>> you already support two different SoC vendors.
>>>>
>>>>>
>>>>>>
>>>>>> On Broadcom STB SoCs we have had something similar for a while however
>>>>>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
>>>>>> basic protection mechanism whereby we can configure a region in DRAM to
>>>>>> be PCIe read/write and CPU read/write which then gets used as the PCIe
>>>>>> inbound region for the PCIe EP. By default the PCIe bridge is not
>>>>>> allowed access to DRAM so we must call into a security agent to allow
>>>>>> the PCIe bridge to access the designated DRAM region.
>>>>>>
>>>>>> We have done this using a private CMA area region assigned via Device
>>>>>> Tree, assigned with a and requiring the PCIe EP driver to use
>>>>>> dma_alloc_from_contiguous() in order to allocate from this device
>>>>>> private CMA area. The only drawback with that approach is that it
>>>>>> requires knowing how much memory you need up front for buffers and DMA
>>>>>> descriptors that the PCIe EP will need to process. The problem is that
>>>>>> it requires driver modifications and that does not scale over the number
>>>>>> of PCIe EP drivers, some we absolutely do not control, but there is no
>>>>>> need to bounce buffer. Your approach scales better across PCIe EP
>>>>>> drivers however it does require bounce buffering which could be a
>>>>>> performance hit.
>>>>>
>>>>> Only the streaming DMA (map/unmap) needs bounce buffering.
>>>>
>>>> True, and typically only on transmit since you don't really control
>>>> where the sk_buff are allocated from, right? On RX since you need to
>>>> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
>>>> them from a pool that already falls within the restricted DMA region, right?
>>>>
>>>
>>> Right, but applying bounce buffering to RX will make it more secure.
>>> The device won't be able to modify the content after unmap. Just like what
>>> iommu_unmap does.
>>
>> Sure, however the goals of using bounce buffering equally applies to RX
>> and TX in that this is the only layer sitting between a stack (block,
>> networking, USB, etc.) and the underlying device driver that scales well
>> in order to massage a dma_addr_t to be within a particular physical range.
>>
>> There is however room for improvement if the drivers are willing to
>> change their buffer allocation strategy. When you receive Wi-Fi frames
>> you need to allocate buffers for the Wi-Fi device to DMA into, and that
>> happens ahead of the DMA transfers by the Wi-Fi device. At buffer
>> allocation time you could very well allocate these frames from the
>> restricted DMA region without having to bounce buffer them since the
>> host CPU is in control over where and when to DMA into.
>>
>
> That is, however, still a trade-off between saving that one copy and
> protection from the DMA tampering with the packet contents when the
> kernel is reading them. Notice how the copy effectively makes a
> snapshot of the contents, guaranteeing that the kernel has a
> consistent view of the packet, which is not true if the DMA could
> modify the buffer contents in the middle of CPU accesses.
I would say that the window just became so much narrower for the PCIe
end-point to overwrite contents with the copy because it would have to
happen within the dma_unmap_{page,single} time and before the copy is
finished to the bounce buffer.
--
Florian
^ permalink raw reply
* Re: [RFC PATCH v3 0/6] Restricted DMA
From: Tomasz Figa @ 2021-01-13 2:29 UTC (permalink / raw)
To: Florian Fainelli
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
xypron.glpk, Joerg Roedel, Rafael J . Wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
Will Deacon, Konrad Rzeszutek Wilk, Dan Williams, linuxppc-dev,
Rob Herring, Claire Chang, boris.ostrovsky, Andy Shevchenko,
jgross, Nicolas Boichat, Greg KH, Randy Dunlap, lkml,
list@263.net:IOMMU DRIVERS, Jim Quinlan, Robin Murphy, bauerman
In-Reply-To: <23a09b9a-70fc-a7a8-f3ea-b0bfa60507f0@gmail.com>
Hi Florian,
On Wed, Jan 13, 2021 at 3:01 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>
> On 1/11/21 11:48 PM, Claire Chang wrote:
> > On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
> >>
> >> On 1/7/21 9:42 AM, Claire Chang wrote:
> >>
> >>>> Can you explain how ATF gets involved and to what extent it does help,
> >>>> besides enforcing a secure region from the ARM CPU's perpsective? Does
> >>>> the PCIe root complex not have an IOMMU but can somehow be denied access
> >>>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
> >>>> still some sort of basic protection that the HW enforces, right?
> >>>
> >>> We need the ATF support for memory MPU (memory protection unit).
> >>> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
> >>> region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
> >>> that specific regions.
> >>
> >> OK so you do have a protection unit of some sort to enforce which region
> >> in DRAM the PCIE bridge is allowed to access, that makes sense,
> >> otherwise the restricted DMA region would only be a hint but nothing you
> >> can really enforce. This is almost entirely analogous to our systems then.
> >
> > Here is the example of setting the MPU:
> > https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
> >
> >>
> >> There may be some value in standardizing on an ARM SMCCC call then since
> >> you already support two different SoC vendors.
> >>
> >>>
> >>>>
> >>>> On Broadcom STB SoCs we have had something similar for a while however
> >>>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
> >>>> basic protection mechanism whereby we can configure a region in DRAM to
> >>>> be PCIe read/write and CPU read/write which then gets used as the PCIe
> >>>> inbound region for the PCIe EP. By default the PCIe bridge is not
> >>>> allowed access to DRAM so we must call into a security agent to allow
> >>>> the PCIe bridge to access the designated DRAM region.
> >>>>
> >>>> We have done this using a private CMA area region assigned via Device
> >>>> Tree, assigned with a and requiring the PCIe EP driver to use
> >>>> dma_alloc_from_contiguous() in order to allocate from this device
> >>>> private CMA area. The only drawback with that approach is that it
> >>>> requires knowing how much memory you need up front for buffers and DMA
> >>>> descriptors that the PCIe EP will need to process. The problem is that
> >>>> it requires driver modifications and that does not scale over the number
> >>>> of PCIe EP drivers, some we absolutely do not control, but there is no
> >>>> need to bounce buffer. Your approach scales better across PCIe EP
> >>>> drivers however it does require bounce buffering which could be a
> >>>> performance hit.
> >>>
> >>> Only the streaming DMA (map/unmap) needs bounce buffering.
> >>
> >> True, and typically only on transmit since you don't really control
> >> where the sk_buff are allocated from, right? On RX since you need to
> >> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
> >> them from a pool that already falls within the restricted DMA region, right?
> >>
> >
> > Right, but applying bounce buffering to RX will make it more secure.
> > The device won't be able to modify the content after unmap. Just like what
> > iommu_unmap does.
>
> Sure, however the goals of using bounce buffering equally applies to RX
> and TX in that this is the only layer sitting between a stack (block,
> networking, USB, etc.) and the underlying device driver that scales well
> in order to massage a dma_addr_t to be within a particular physical range.
>
> There is however room for improvement if the drivers are willing to
> change their buffer allocation strategy. When you receive Wi-Fi frames
> you need to allocate buffers for the Wi-Fi device to DMA into, and that
> happens ahead of the DMA transfers by the Wi-Fi device. At buffer
> allocation time you could very well allocate these frames from the
> restricted DMA region without having to bounce buffer them since the
> host CPU is in control over where and when to DMA into.
>
That is, however, still a trade-off between saving that one copy and
protection from the DMA tampering with the packet contents when the
kernel is reading them. Notice how the copy effectively makes a
snapshot of the contents, guaranteeing that the kernel has a
consistent view of the packet, which is not true if the DMA could
modify the buffer contents in the middle of CPU accesses.
Best regards,
Tomasz
> The issue is that each network driver may implement its own buffer
> allocation strategy, some may simply call netdev_alloc_skb() which gives
> zero control over where the buffer comes from unless you play tricks
> with NUMA node allocations and somehow declare that your restricted DMA
> region is a different NUMA node. If the driver allocates pages and then
> attaches a SKB to that page using build_skb(), then you have much more
> control over where that page comes from, and this is where using a
> device private CMA are helps, because you can just do
> dma_alloc_from_contiguous() and that will ensure that the pages are
> coming from your specific CMA area.
>
> Few questions on the implementation:
>
> - is there any warning or error being printed if the restricted DMA
> region is outside of a device's DMA addressable range?
>
> - are there are any helpful statistics that could be shown to indicate
> that the restricted DMA region was sized too small, e.g.: that
> allocation of a DMA buffer failed because we ran out of space in the
> swiotlb pool?
>
> >
> >>> I also added alloc/free support in this series
> >>> (https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc() will
> >>> try to allocate memory from the predefined memory region.
> >>>
> >>> As for the performance hit, it should be similar to the default swiotlb.
> >>> Here are my experiment results. Both SoCs lack IOMMU for PCIe.
> >>>
> >>> PCIe wifi vht80 throughput -
> >>>
> >>> MTK SoC tcp_tx tcp_rx udp_tx udp_rx
> >>> w/o Restricted DMA 244.1 134.66 312.56 350.79
> >>> w/ Restricted DMA 246.95 136.59 363.21 351.99
> >>>
> >>> Rockchip SoC tcp_tx tcp_rx udp_tx udp_rx
> >>> w/o Restricted DMA 237.87 133.86 288.28 361.88
> >>> w/ Restricted DMA 256.01 130.95 292.28 353.19
> >>
> >> How come you get better throughput with restricted DMA? Is it because
> >> doing DMA to/from a contiguous region allows for better grouping of
> >> transactions from the DRAM controller's perspective somehow?
> >
> > I'm not sure, but actually, enabling the default swiotlb for wifi also helps the
> > throughput a little bit for me.
>
> OK, it would be interesting if you could get to the bottom of why
> performance does increase with swiotlb.
> --
> Florian
^ permalink raw reply
* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Robin Murphy @ 2021-01-13 1:53 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, Claire Chang
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
Joerg Roedel, rafael.j.wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
will, dan.j.williams, Rob Herring, boris.ostrovsky,
Andy Shevchenko, jgross, Nicolas Boichat, Greg KH, rdunlap, lkml,
Tomasz Figa, iommu, xypron.glpk, linuxppc-dev, bauerman
In-Reply-To: <20210107175740.GA16519@char.us.oracle.com>
On 2021-01-07 17:57, Konrad Rzeszutek Wilk wrote:
> On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
>> Hi Greg and Konrad,
>>
>> This change is intended to be non-arch specific. Any arch that lacks DMA access
>> control and has devices not behind an IOMMU can make use of it. Could you share
>> why you think this should be arch specific?
>
> The idea behind non-arch specific code is it to be generic. The devicetree
> is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
> be in arch specific code.
Sorry, but that's an absurd argument. By the same token you'd equally
have to claim that bits of, say, the Broadcom WiFi driver (not to
mention dozens of others) should be split out into arch code, since not
all platforms use the devicetree parts, nor the ACPI parts, nor the PCI
parts...
There is nothing architecture-specific about using devicetree as a
system description - AFAIK there *are* a handful of x86 platforms that
use it, besides even more architectures than you've listed above. It has
long been the policy that devicetree-related code for a particular
subsystem should just live within that subsystem. Sometimes if there's
enough of it it gets collected together into its own file - e.g.
drivers/pci/of.c - otherwise it tends to just get #ifdef'ed - e.g.
of_spi_parse_dt(), or the other DMA reserved-memory consumers that
already exist as Florian points out.
Besides, there are far more platforms that enable CONFIG_OF than enable
CONFIG_SWIOTLB, so by that metric the whole of the SWIOTLB code itself
is even less "generic" than any DT parsing :P
Robin.
^ permalink raw reply
* Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement
From: Tyrel Datwyler @ 2021-01-13 0:33 UTC (permalink / raw)
To: Brian King, james.bottomley
Cc: martin.petersen, linux-scsi, james.smart, linux-kernel, brking,
linuxppc-dev
In-Reply-To: <0525bee7-433f-dcc7-9e35-e8706d6edee5@linux.vnet.ibm.com>
On 1/12/21 2:54 PM, Brian King wrote:
> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
>> Introduce several new vhost fields for managing MQ state of the adapter
>> as well as initial defaults for MQ enablement.
>>
>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>> ---
>> drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
>> drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
>> 2 files changed, 17 insertions(+)
>>
>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>> index ba95438a8912..9200fe49c57e 100644
>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
>> .max_sectors = IBMVFC_MAX_SECTORS,
>> .shost_attrs = ibmvfc_attrs,
>> .track_queue_depth = 1,
>> + .host_tagset = 1,
>
> This doesn't seem right. You are setting host_tagset, which means you want a
> shared, host wide, tag set for commands. It also means that the total
> queue depth for the host is can_queue. However, it looks like you are allocating
> max_requests events for each sub crq, which means you are over allocating memory.
With the shared tagset yes the queue depth for the host is can_queue, but this
also implies that the max queue depth for each hw queue is also can_queue. So,
in the worst case that all commands are queued down the same hw queue we need an
event pool with can_queue commands.
>
> Looking at this closer, we might have bigger problems. There is a host wide
> max number of commands that the VFC host supports, which gets returned on
> NPIV Login. This value can change across a live migration event.
From what I understand the max commands can only become less.
>
> The ibmvfc driver, which does the same thing the lpfc driver does, modifies
> can_queue on the scsi_host *after* the tag set has been allocated. This looks
> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
> we look at can_queue once the tag set is setup, and I'm not seeing a good way
> to dynamically change the host queue depth once the tag set is setup.
>
> Unless I'm missing something, our best options appear to either be to implement
> our own host wide busy reference counting, which doesn't sound very good, or
> we need to add some API to block / scsi that allows us to dynamically change
> can_queue.
Changing can_queue won't do use any good with the shared tagset becasue each
queue still needs to be able to queue can_queue number of commands in the worst
case.
The complaint before was not using shared tags increases the tag memory
allocation because they are statically allocated for each queue. The question is
what claims more memory our event pool allocation or the tagset per queue
allocation.
If we chose to not use the shared tagset then the queue depth for each hw queue
becomes (can_queue / nr_hw_queues).
-Tyrel
>
> Thanks,
>
> Brian
>
>
^ permalink raw reply
* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Florian Fainelli @ 2021-01-13 0:03 UTC (permalink / raw)
To: Claire Chang, robh+dt, mpe, benh, paulus, joro, will,
frowand.list, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
hch, m.szyprowski, robin.murphy
Cc: drinkcat, devicetree, heikki.krogerus, saravanak, peterz,
xypron.glpk, rafael.j.wysocki, linux-kernel, andriy.shevchenko,
tfiga, bgolaszewski, iommu, grant.likely, rdunlap, gregkh,
xen-devel, dan.j.williams, treding, linuxppc-dev, mingo, bauerman
In-Reply-To: <20210106034124.30560-3-tientzu@chromium.org>
On 1/5/21 7:41 PM, Claire Chang wrote:
> Add the initialization function to create restricted DMA pools from
> matching reserved-memory nodes in the device tree.
>
> Signed-off-by: Claire Chang <tientzu@chromium.org>
> ---
> include/linux/device.h | 4 ++
> include/linux/swiotlb.h | 7 +-
> kernel/dma/Kconfig | 1 +
> kernel/dma/swiotlb.c | 144 ++++++++++++++++++++++++++++++++++------
> 4 files changed, 131 insertions(+), 25 deletions(-)
>
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 89bb8b84173e..ca6f71ec8871 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -413,6 +413,7 @@ struct dev_links_info {
> * @dma_pools: Dma pools (if dma'ble device).
> * @dma_mem: Internal for coherent mem override.
> * @cma_area: Contiguous memory area for dma allocations
> + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
> * @archdata: For arch-specific additions.
> * @of_node: Associated device tree node.
> * @fwnode: Associated device node supplied by platform firmware.
> @@ -515,6 +516,9 @@ struct device {
> #ifdef CONFIG_DMA_CMA
> struct cma *cma_area; /* contiguous memory area for dma
> allocations */
> +#endif
> +#ifdef CONFIG_SWIOTLB
> + struct io_tlb_mem *dma_io_tlb_mem;
> #endif
> /* arch specific additions */
> struct dev_archdata archdata;
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index dd8eb57cbb8f..a1bbd7788885 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -76,12 +76,13 @@ extern enum swiotlb_force swiotlb_force;
> *
> * @start: The start address of the swiotlb memory pool. Used to do a quick
> * range check to see if the memory was in fact allocated by this
> - * API.
> + * API. For restricted DMA pool, this is device tree adjustable.
Maybe write it as this is "firmware adjustable" such that when/if ACPI
needs something like this, the description does not need updating.
[snip]
> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> + struct device *dev)
> +{
> + struct io_tlb_mem *mem = rmem->priv;
> + int ret;
> +
> + if (dev->dma_io_tlb_mem)
> + return -EBUSY;
> +
> + if (!mem) {
> + mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> + if (!mem)
> + return -ENOMEM;
> +
> + if (!memremap(rmem->base, rmem->size, MEMREMAP_WB)) {
MEMREMAP_WB sounds appropriate as a default.
Documentation/devicetree/bindings/reserved-memory/ramoops.txt does
define an "unbuffered" property which in premise could be applied to the
generic reserved memory binding as well and that we may have to be
honoring here, if we were to make it more generic. Oh well, this does
not need to be addressed right now I guess.
--
Florian
^ permalink raw reply
* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Florian Fainelli @ 2021-01-12 23:52 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
xypron.glpk, Joerg Roedel, rafael.j.wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
will, dan.j.williams, linuxppc-dev, Rob Herring, Claire Chang,
boris.ostrovsky, Andy Shevchenko, jgross, Nicolas Boichat,
Greg KH, rdunlap, lkml, Tomasz Figa, list@263.net:IOMMU DRIVERS,
Robin Murphy, bauerman
In-Reply-To: <20210107211937.GA19460@char.us.oracle.com>
On 1/7/21 1:19 PM, Konrad Rzeszutek Wilk wrote:
> On Thu, Jan 07, 2021 at 10:09:14AM -0800, Florian Fainelli wrote:
>> On 1/7/21 9:57 AM, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Jan 08, 2021 at 01:39:18AM +0800, Claire Chang wrote:
>>>> Hi Greg and Konrad,
>>>>
>>>> This change is intended to be non-arch specific. Any arch that lacks DMA access
>>>> control and has devices not behind an IOMMU can make use of it. Could you share
>>>> why you think this should be arch specific?
>>>
>>> The idea behind non-arch specific code is it to be generic. The devicetree
>>> is specific to PowerPC, Sparc, and ARM, and not to x86 - hence it should
>>> be in arch specific code.
>>
>> In premise the same code could be used with an ACPI enabled system with
>> an appropriate service to identify the restricted DMA regions and unlock
>> them.
>
> Which this patchset is not.
ACPI is not included, but the comment about Device Tree being specific
to PowerPC, SPARC and ARM is x86 is not quite correct. There is an
architecture specific part to obtaining where the Device Tree lives in
memory, but the implementation itself is architecture agnostic (with
some early SPARC/OpenFirmware shenanigans), and x86 does, or rather did
support Device Tree to a very small extent with the CE4100 platform.
Would you prefer that an swiotlb_of.c file be created instead or
something along those lines to better encapsulate where the OF specific
code lives?
>
>>
>> More than 1 architecture requiring this function (ARM and ARM64 are the
>> two I can think of needing this immediately) sort of calls for making
>> the code architecture agnostic since past 2, you need something that scales.
>
> I believe the use-case is for ARM64 at this moment.
For the platforms that Claire uses, certainly for the ones we use, ARM
and ARM64 are in scope.
--
Florian
^ permalink raw reply
* Re: [RFC PATCH v3 6/6] of: Add plumbing for restricted DMA pool
From: Florian Fainelli @ 2021-01-12 23:48 UTC (permalink / raw)
To: Claire Chang, robh+dt, mpe, benh, paulus, joro, will,
frowand.list, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
hch, m.szyprowski, robin.murphy
Cc: drinkcat, devicetree, heikki.krogerus, saravanak, peterz,
xypron.glpk, rafael.j.wysocki, linux-kernel, andriy.shevchenko,
tfiga, bgolaszewski, iommu, grant.likely, rdunlap, gregkh,
xen-devel, dan.j.williams, treding, linuxppc-dev, mingo, bauerman
In-Reply-To: <20210106034124.30560-7-tientzu@chromium.org>
On 1/5/21 7:41 PM, Claire Chang wrote:
> If a device is not behind an IOMMU, we look up the device node and set
> up the restricted DMA when the restricted-dma-pool is presented.
>
> Signed-off-by: Claire Chang <tientzu@chromium.org>
> ---
[snip]
> +int of_dma_set_restricted_buffer(struct device *dev)
> +{
> + struct device_node *node;
> + int count, i;
> +
> + if (!dev->of_node)
> + return 0;
> +
> + count = of_property_count_elems_of_size(dev->of_node, "memory-region",
> + sizeof(phandle));
You could have an early check for count < 0, along with an error
message, if that is deemed useful.
> + for (i = 0; i < count; i++) {
> + node = of_parse_phandle(dev->of_node, "memory-region", i);
> + if (of_device_is_compatible(node, "restricted-dma-pool"))
And you may want to add here an of_device_is_available(node). A platform
that provides the Device Tree firmware and try to support multiple
different SoCs may try to determine if an IOMMU is present, and if it
is, it could be marking the restriced-dma-pool region with a 'status =
"disabled"' property, or any variant of that scheme.
> + return of_reserved_mem_device_init_by_idx(
> + dev, dev->of_node, i);
This does not seem to be supporting more than one memory region, did not
you want something like instead:
ret = of_reserved_mem_device_init_by_idx(...);
if (ret)
return ret;
> + }
> +
> + return 0;
> +}
> diff --git a/drivers/of/device.c b/drivers/of/device.c
> index aedfaaafd3e7..e2c7409956ab 100644
> --- a/drivers/of/device.c
> +++ b/drivers/of/device.c
> @@ -182,6 +182,10 @@ int of_dma_configure_id(struct device *dev, struct device_node *np,
> arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
>
> dev->dma_range_map = map;
> +
> + if (!iommu)
> + return of_dma_set_restricted_buffer(dev);
> +
> return 0;
> }
> EXPORT_SYMBOL_GPL(of_dma_configure_id);
> diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
> index d9e6a324de0a..28a2dfa197ba 100644
> --- a/drivers/of/of_private.h
> +++ b/drivers/of/of_private.h
> @@ -161,12 +161,17 @@ struct bus_dma_region;
> #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
> int of_dma_get_range(struct device_node *np,
> const struct bus_dma_region **map);
> +int of_dma_set_restricted_buffer(struct device *dev);
> #else
> static inline int of_dma_get_range(struct device_node *np,
> const struct bus_dma_region **map)
> {
> return -ENODEV;
> }
> +static inline int of_dma_get_restricted_buffer(struct device *dev)
> +{
> + return -ENODEV;
> +}
> #endif
>
> #endif /* _LINUX_OF_PRIVATE_H */
>
--
Florian
^ permalink raw reply
* Re: [RFC PATCH v3 4/6] swiotlb: Add restricted DMA alloc/free support.
From: Florian Fainelli @ 2021-01-12 23:41 UTC (permalink / raw)
To: Claire Chang, robh+dt, mpe, benh, paulus, joro, will,
frowand.list, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
hch, m.szyprowski, robin.murphy
Cc: drinkcat, devicetree, heikki.krogerus, saravanak, peterz,
xypron.glpk, rafael.j.wysocki, linux-kernel, andriy.shevchenko,
tfiga, bgolaszewski, iommu, grant.likely, rdunlap, gregkh,
xen-devel, dan.j.williams, treding, linuxppc-dev, mingo, bauerman
In-Reply-To: <20210106034124.30560-5-tientzu@chromium.org>
On 1/5/21 7:41 PM, Claire Chang wrote:
> Add the functions, swiotlb_alloc and swiotlb_free to support the
> memory allocation from restricted DMA pool.
>
> Signed-off-by: Claire Chang <tientzu@chromium.org>
> ---
[snip]
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 30ccbc08e229..126e9b3354d6 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -137,6 +137,11 @@ void *dma_direct_alloc(struct device *dev, size_t size,
> void *ret;
> int err;
>
> +#ifdef CONFIG_SWIOTLB
> + if (unlikely(dev->dma_io_tlb_mem))
> + return swiotlb_alloc(dev, size, dma_handle, attrs);
> +#endif
While this is potentially a hot path, I am not sure of the unkikely is
warranted, maybe best left as a plain conditional.
--
Florian
^ permalink raw reply
* Re: [RFC PATCH v3 3/6] swiotlb: Use restricted DMA pool if available
From: Florian Fainelli @ 2021-01-12 23:39 UTC (permalink / raw)
To: Claire Chang, robh+dt, mpe, benh, paulus, joro, will,
frowand.list, konrad.wilk, boris.ostrovsky, jgross, sstabellini,
hch, m.szyprowski, robin.murphy
Cc: drinkcat, devicetree, heikki.krogerus, saravanak, peterz,
xypron.glpk, rafael.j.wysocki, linux-kernel, andriy.shevchenko,
tfiga, bgolaszewski, iommu, grant.likely, rdunlap, gregkh,
xen-devel, dan.j.williams, treding, linuxppc-dev, mingo, bauerman
In-Reply-To: <20210106034124.30560-4-tientzu@chromium.org>
On 1/5/21 7:41 PM, Claire Chang wrote:
> Regardless of swiotlb setting, the restricted DMA pool is preferred if
> available.
>
> The restricted DMA pools provide a basic level of protection against
> the DMA overwriting buffer contents at unexpected times. However, to
> protect against general data leakage and system memory corruption, the
> system needs to provide a way to restrict the DMA to a predefined memory
> region.
>
> Signed-off-by: Claire Chang <tientzu@chromium.org>
You could probably split this patch into two:
- one that introduces the get_io_tlb_mem() getter, updates all callers
of is_swiotlb_buffer() to gain a 'struct device' argument
- another one that does add support for a non-default swiotlb pool and
adds dev->dma_io_tlb_mem
Other than that, LGTM!
--
Florian
^ permalink raw reply
* Re: [PATCH v4 18/21] ibmvfc: send Cancel MAD down each hw scsi channel
From: Brian King @ 2021-01-12 21:46 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20210111231225.105347-19-tyreld@linux.ibm.com>
On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index b0b0212344f3..24e1278acfeb 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -2418,18 +2418,79 @@ static struct ibmvfc_event *ibmvfc_init_tmf(struct ibmvfc_queue *queue,
> return evt;
> }
>
> -/**
> - * ibmvfc_cancel_all - Cancel all outstanding commands to the device
> - * @sdev: scsi device to cancel commands
> - * @type: type of error recovery being performed
> - *
> - * This sends a cancel to the VIOS for the specified device. This does
> - * NOT send any abort to the actual device. That must be done separately.
> - *
> - * Returns:
> - * 0 on success / other on failure
> - **/
> -static int ibmvfc_cancel_all(struct scsi_device *sdev, int type)
> +static int ibmvfc_cancel_all_mq(struct scsi_device *sdev, int type)
> +{
> + struct ibmvfc_host *vhost = shost_priv(sdev->host);
> + struct ibmvfc_event *evt, *found_evt, *temp;
> + struct ibmvfc_queue *queues = vhost->scsi_scrqs.scrqs;
> + unsigned long flags;
> + int num_hwq, i;
> + LIST_HEAD(cancelq);
> + u16 status;
> +
> + ENTER;
> + spin_lock_irqsave(vhost->host->host_lock, flags);
> + num_hwq = vhost->scsi_scrqs.active_queues;
> + for (i = 0; i < num_hwq; i++) {
> + spin_lock(queues[i].q_lock);
> + spin_lock(&queues[i].l_lock);
> + found_evt = NULL;
> + list_for_each_entry(evt, &queues[i].sent, queue_list) {
> + if (evt->cmnd && evt->cmnd->device == sdev) {
> + found_evt = evt;
> + break;
> + }
> + }
> + spin_unlock(&queues[i].l_lock);
> +
I really don't like the way the first for loop grabs all the q_locks, and then
there is a second for loop that drops all these locks... I think this code would
be cleaner if it looked like:
if (found_evt && vhost->logged_in) {
evt = ibmvfc_init_tmf(&queues[i], sdev, type);
evt->sync_iu = &queues[i].cancel_rsp;
ibmvfc_send_event(evt, vhost, default_timeout);
list_add_tail(&evt->cancel, &cancelq);
}
spin_unlock(queues[i].q_lock);
}
> + if (!found_evt)
> + continue;
> +
> + if (vhost->logged_in) {
> + evt = ibmvfc_init_tmf(&queues[i], sdev, type);
> + evt->sync_iu = &queues[i].cancel_rsp;
> + ibmvfc_send_event(evt, vhost, default_timeout);
> + list_add_tail(&evt->cancel, &cancelq);
> + }
> + }
> +
> + for (i = 0; i < num_hwq; i++)
> + spin_unlock(queues[i].q_lock);
> + spin_unlock_irqrestore(vhost->host->host_lock, flags);
> +
> + if (list_empty(&cancelq)) {
> + if (vhost->log_level > IBMVFC_DEFAULT_LOG_LEVEL)
> + sdev_printk(KERN_INFO, sdev, "No events found to cancel\n");
> + return 0;
> + }
> +
> + sdev_printk(KERN_INFO, sdev, "Cancelling outstanding commands.\n");
> +
> + list_for_each_entry_safe(evt, temp, &cancelq, cancel) {
> + wait_for_completion(&evt->comp);
> + status = be16_to_cpu(evt->queue->cancel_rsp.mad_common.status);
You probably want a list_del(&evt->cancel) here.
> + ibmvfc_free_event(evt);
> +
> + if (status != IBMVFC_MAD_SUCCESS) {
> + sdev_printk(KERN_WARNING, sdev, "Cancel failed with rc=%x\n", status);
> + switch (status) {
> + case IBMVFC_MAD_DRIVER_FAILED:
> + case IBMVFC_MAD_CRQ_ERROR:
> + /* Host adapter most likely going through reset, return success to
> + * the caller will wait for the command being cancelled to get returned
> + */
> + break;
> + default:
> + break;
I think this default break needs to be a return -EIO.
> + }
> + }
> + }
> +
> + sdev_printk(KERN_INFO, sdev, "Successfully cancelled outstanding commands\n");
> + return 0;
> +}
> +
> +static int ibmvfc_cancel_all_sq(struct scsi_device *sdev, int type)
> {
> struct ibmvfc_host *vhost = shost_priv(sdev->host);
> struct ibmvfc_event *evt, *found_evt;
> @@ -2498,6 +2559,27 @@ static int ibmvfc_cancel_all(struct scsi_device *sdev, int type)
> return 0;
> }
>
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement
From: Brian King @ 2021-01-12 22:54 UTC (permalink / raw)
To: Tyrel Datwyler, james.bottomley
Cc: martin.petersen, linux-scsi, james.smart, linux-kernel, brking,
linuxppc-dev
In-Reply-To: <20210111231225.105347-2-tyreld@linux.ibm.com>
On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> Introduce several new vhost fields for managing MQ state of the adapter
> as well as initial defaults for MQ enablement.
>
> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
> ---
> drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
> drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
> 2 files changed, 17 insertions(+)
>
> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> index ba95438a8912..9200fe49c57e 100644
> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
> .max_sectors = IBMVFC_MAX_SECTORS,
> .shost_attrs = ibmvfc_attrs,
> .track_queue_depth = 1,
> + .host_tagset = 1,
This doesn't seem right. You are setting host_tagset, which means you want a
shared, host wide, tag set for commands. It also means that the total
queue depth for the host is can_queue. However, it looks like you are allocating
max_requests events for each sub crq, which means you are over allocating memory.
Looking at this closer, we might have bigger problems. There is a host wide
max number of commands that the VFC host supports, which gets returned on
NPIV Login. This value can change across a live migration event.
The ibmvfc driver, which does the same thing the lpfc driver does, modifies
can_queue on the scsi_host *after* the tag set has been allocated. This looks
to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
we look at can_queue once the tag set is setup, and I'm not seeing a good way
to dynamically change the host queue depth once the tag set is setup.
Unless I'm missing something, our best options appear to either be to implement
our own host wide busy reference counting, which doesn't sound very good, or
we need to add some API to block / scsi that allows us to dynamically change
can_queue.
Thanks,
Brian
--
Brian King
Power Linux I/O
IBM Linux Technology Center
^ permalink raw reply
* Re: [PATCH] ASoC: imx-hdmi: Fix warning of the uninitialized variable ret
From: Mark Brown @ 2021-01-12 20:44 UTC (permalink / raw)
To: Nathan Chancellor
Cc: alsa-devel, timur, Xiubo.Lee, linuxppc-dev, Shengjiu Wang, tiwai,
perex, nicoleotsuka, festevam, linux-kernel
In-Reply-To: <20210112190921.GA3561911@ubuntu-m3-large-x86>
[-- Attachment #1: Type: text/plain, Size: 2405 bytes --]
On Tue, Jan 12, 2021 at 12:09:21PM -0700, Nathan Chancellor wrote:
> On Tue, Jan 12, 2021 at 06:48:48PM +0000, Mark Brown wrote:
> > This is a random warning fix, why would you expect it to be sent as a
> > bug fix? This is the first indication I've seen that anyone is seeing
> > it in mainline, in general the people who report and fix warnings are
> > doing so based on -next and the patch seems to be from a month ago. I
> > don't have this in my inbox so I assume it's applied already or needs to
> > be resubmitted anyway.
> Well, I consider compiler warnings bugs. I have had plenty of my
> compiler warning patches sent as bug fixes for an -rc. Furthermore, this
> patch was sent three times by different people, that should give you some
> indication that people are tripping over it.
I really don't have that good a recall of what warning fixes people are
sending, I might notice if I get two versions of the same thing that I
look at at roughly the same time but even with a few hours between it's
most likely that I'll have completely forgotten. Warning fixes are in
general not memorable, it's not a good sign if they are. The default
assumption for any warning fix that doesn't say anything else is going
to be that either the issue or the toolchain is very new.
For any kind of fix if you think that things are in some way urgent you
should say something promptly (or provide some indication of this in the
submission if you're sending the fix yourself, such as with a fixes
tag). If nobody says anything then you should assume that nobody else
is going to be aware of any urgency and that this will affect handling.
Should it happen that things aren't flagged up then of course do so but
consider that this may well be the first time people will be aware
there's any urgency, don't assume that people will have been operating
with information they didn't have.
> The first version was sent on December 11th, it looks like your pull for
> 5.11 went on the December 14th, then the second version was applied on
> December 16th so I figured it might be destined for 5.11 but I could not
> tell (your for-next branch is a merge of your for-5.11 and for-5.12):
If it's on the for-5.11 branch then it will be for 5.11, which it must
be if it was applied then. If it was and it was applied that long ago
it'll already be queued in Takashi's tree and I guess he didn't send it
on yet.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH] ASoC: imx-hdmi: Fix warning of the uninitialized variable ret
From: Nathan Chancellor @ 2021-01-12 19:09 UTC (permalink / raw)
To: Mark Brown
Cc: alsa-devel, timur, Xiubo.Lee, linuxppc-dev, Shengjiu Wang, tiwai,
perex, nicoleotsuka, festevam, linux-kernel
In-Reply-To: <20210112184848.GG4646@sirena.org.uk>
On Tue, Jan 12, 2021 at 06:48:48PM +0000, Mark Brown wrote:
> This is a random warning fix, why would you expect it to be sent as a
> bug fix? This is the first indication I've seen that anyone is seeing
> it in mainline, in general the people who report and fix warnings are
> doing so based on -next and the patch seems to be from a month ago. I
> don't have this in my inbox so I assume it's applied already or needs to
> be resubmitted anyway.
Well, I consider compiler warnings bugs. I have had plenty of my
compiler warning patches sent as bug fixes for an -rc. Furthermore, this
patch was sent three times by different people, that should give you some
indication that people are tripping over it.
https://lore.kernel.org/alsa-devel/X9NGQaF4pmK8oUAF@mwanda/
https://lore.kernel.org/alsa-devel/1608115464-18710-1-git-send-email-shengjiu.wang@nxp.com/
https://lore.kernel.org/alsa-devel/20201230154443.656997-1-arnd@kernel.org/
The first version was sent on December 11th, it looks like your pull for
5.11 went on the December 14th, then the second version was applied on
December 16th so I figured it might be destined for 5.11 but I could not
tell (your for-next branch is a merge of your for-5.11 and for-5.12):
https://lore.kernel.org/alsa-devel/160813397775.31838.8934909997692637790.b4-ty@kernel.org/
Cheers,
Nathan
^ permalink raw reply
* Re: [PATCH] ASoC: imx-hdmi: Fix warning of the uninitialized variable ret
From: Mark Brown @ 2021-01-12 18:48 UTC (permalink / raw)
To: Nathan Chancellor
Cc: alsa-devel, timur, Xiubo.Lee, linuxppc-dev, Shengjiu Wang, tiwai,
perex, nicoleotsuka, festevam, linux-kernel
In-Reply-To: <20210112181949.GA3241630@ubuntu-m3-large-x86>
[-- Attachment #1: Type: text/plain, Size: 943 bytes --]
On Tue, Jan 12, 2021 at 11:19:49AM -0700, Nathan Chancellor wrote:
Please delete unneeded context from mails when replying. Doing this
makes it much easier to find your reply in the message, helping ensure
it won't be missed by people scrolling through the irrelevant quoted
material.
> >
> > Signed-off-by: shengjiu wang <shengjiu.wang@nxp.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> > ---
> I still see a warning in v5.11-rc3 that is fixed by this patch, is it
> not going in this release cycle? It is a regression fix, seems like it
> should.
This is a random warning fix, why would you expect it to be sent as a
bug fix? This is the first indication I've seen that anyone is seeing
it in mainline, in general the people who report and fix warnings are
doing so based on -next and the patch seems to be from a month ago. I
don't have this in my inbox so I assume it's applied already or needs to
be resubmitted anyway.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH] tty: serial: cpm_uart: Add udbg support for enabling xmon
From: Christophe Leroy @ 2021-01-12 18:25 UTC (permalink / raw)
To: linuxppc-dev
In-Reply-To: <e4471bf81089252470efb3eed735d71a5b32adbd.1608716197.git.christophe.leroy@csgroup.eu>
Le 23/12/2020 à 10:38, Christophe Leroy a écrit :
> In order to use xmon with powerpc 8xx, the serial driver
> must provide udbg_putc() and udpb_getc().
>
> Provide them via cpm_put_poll_char() and cpm_get_poll_char().
>
> This requires CONFIG_CONSOLE_POLL.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
This patch has been merged in tty-next, it is visible in linux-next
Christophe
> ---
> drivers/tty/serial/cpm_uart/cpm_uart_core.c | 40 ++++++++++++++++++++-
> 1 file changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/tty/serial/cpm_uart/cpm_uart_core.c b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> index ba14ec5b9bc4..2920b9b602b3 100644
> --- a/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> +++ b/drivers/tty/serial/cpm_uart/cpm_uart_core.c
> @@ -1145,6 +1145,32 @@ static void cpm_put_poll_char(struct uart_port *port,
> ch[0] = (char)c;
> cpm_uart_early_write(pinfo, ch, 1, false);
> }
> +
> +static struct uart_port *udbg_port;
> +
> +static void udbg_cpm_putc(char c)
> +{
> + if (c == '\n')
> + cpm_put_poll_char(udbg_port, '\r');
> + cpm_put_poll_char(udbg_port, c);
> +}
> +
> +static int udbg_cpm_getc_poll(void)
> +{
> + int c = cpm_get_poll_char(udbg_port);
> +
> + return c == NO_POLL_CHAR ? -1 : c;
> +}
> +
> +static int udbg_cpm_getc(void)
> +{
> + int c;
> +
> + while ((c = udbg_cpm_getc_poll()) == -1)
> + cpu_relax();
> + return c;
> +}
> +
> #endif /* CONFIG_CONSOLE_POLL */
>
> static const struct uart_ops cpm_uart_pops = {
> @@ -1251,7 +1277,10 @@ static int cpm_uart_init_port(struct device_node *np,
> pinfo->gpios[i] = NULL;
>
> #ifdef CONFIG_PPC_EARLY_DEBUG_CPM
> - udbg_putc = NULL;
> +#ifdef CONFIG_CONSOLE_POLL
> + if (!udbg_port)
> +#endif
> + udbg_putc = NULL;
> #endif
>
> return cpm_uart_request_port(&pinfo->port);
> @@ -1370,6 +1399,15 @@ static int __init cpm_uart_console_setup(struct console *co, char *options)
> uart_set_options(port, co, baud, parity, bits, flow);
> cpm_line_cr_cmd(pinfo, CPM_CR_RESTART_TX);
>
> +#ifdef CONFIG_CONSOLE_POLL
> + if (!udbg_port) {
> + udbg_port = &pinfo->port;
> + udbg_putc = udbg_cpm_putc;
> + udbg_getc = udbg_cpm_getc;
> + udbg_getc_poll = udbg_cpm_getc_poll;
> + }
> +#endif
> +
> return 0;
> }
>
>
^ permalink raw reply
* Re: [PATCH] ASoC: imx-hdmi: Fix warning of the uninitialized variable ret
From: Nathan Chancellor @ 2021-01-12 18:19 UTC (permalink / raw)
To: Shengjiu Wang
Cc: alsa-devel, timur, Xiubo.Lee, linuxppc-dev, tiwai, perex,
nicoleotsuka, broonie, festevam, linux-kernel
In-Reply-To: <1608115464-18710-1-git-send-email-shengjiu.wang@nxp.com>
On Wed, Dec 16, 2020 at 06:44:24PM +0800, Shengjiu Wang wrote:
> From: shengjiu wang <shengjiu.wang@nxp.com>
>
> When condition ((hdmi_out && hdmi_in) || (!hdmi_out && !hdmi_in))
> is true, then goto fail, the uninitialized variable ret will be
> returned.
>
> Signed-off-by: shengjiu wang <shengjiu.wang@nxp.com>
> Reported-by: kernel test robot <lkp@intel.com>
> ---
> sound/soc/fsl/imx-hdmi.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/sound/soc/fsl/imx-hdmi.c b/sound/soc/fsl/imx-hdmi.c
> index 2c2a76a71940..ede4a9ad1054 100644
> --- a/sound/soc/fsl/imx-hdmi.c
> +++ b/sound/soc/fsl/imx-hdmi.c
> @@ -164,6 +164,7 @@ static int imx_hdmi_probe(struct platform_device *pdev)
>
> if ((hdmi_out && hdmi_in) || (!hdmi_out && !hdmi_in)) {
> dev_err(&pdev->dev, "Invalid HDMI DAI link\n");
> + ret = -EINVAL;
> goto fail;
> }
>
> --
> 2.17.1
>
I still see a warning in v5.11-rc3 that is fixed by this patch, is it
not going in this release cycle? It is a regression fix, seems like it
should.
Cheers,
Nathan
^ permalink raw reply
* Re: [RFC PATCH v3 0/6] Restricted DMA
From: Florian Fainelli @ 2021-01-12 18:01 UTC (permalink / raw)
To: Claire Chang
Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
Joerg Roedel, rafael.j.wysocki, Christoph Hellwig,
Bartosz Golaszewski, xen-devel, Thierry Reding, linux-devicetree,
will, Konrad Rzeszutek Wilk, dan.j.williams, linuxppc-dev,
Rob Herring, boris.ostrovsky, Andy Shevchenko, jgross,
Nicolas Boichat, Greg KH, rdunlap, lkml, Tomasz Figa,
list@263.net:IOMMU DRIVERS, Jim Quinlan, xypron.glpk,
Robin Murphy, bauerman
In-Reply-To: <CALiNf29_PmLJTVLksSHp3NFAaL52idqehSMOtatJ=jaM2Muq1g@mail.gmail.com>
On 1/11/21 11:48 PM, Claire Chang wrote:
> On Fri, Jan 8, 2021 at 1:59 AM Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>> On 1/7/21 9:42 AM, Claire Chang wrote:
>>
>>>> Can you explain how ATF gets involved and to what extent it does help,
>>>> besides enforcing a secure region from the ARM CPU's perpsective? Does
>>>> the PCIe root complex not have an IOMMU but can somehow be denied access
>>>> to a region that is marked NS=0 in the ARM CPU's MMU? If so, that is
>>>> still some sort of basic protection that the HW enforces, right?
>>>
>>> We need the ATF support for memory MPU (memory protection unit).
>>> Restricted DMA (with reserved-memory in dts) makes sure the predefined memory
>>> region is for PCIe DMA only, but we still need MPU to locks down PCIe access to
>>> that specific regions.
>>
>> OK so you do have a protection unit of some sort to enforce which region
>> in DRAM the PCIE bridge is allowed to access, that makes sense,
>> otherwise the restricted DMA region would only be a hint but nothing you
>> can really enforce. This is almost entirely analogous to our systems then.
>
> Here is the example of setting the MPU:
> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
>
>>
>> There may be some value in standardizing on an ARM SMCCC call then since
>> you already support two different SoC vendors.
>>
>>>
>>>>
>>>> On Broadcom STB SoCs we have had something similar for a while however
>>>> and while we don't have an IOMMU for the PCIe bridge, we do have a a
>>>> basic protection mechanism whereby we can configure a region in DRAM to
>>>> be PCIe read/write and CPU read/write which then gets used as the PCIe
>>>> inbound region for the PCIe EP. By default the PCIe bridge is not
>>>> allowed access to DRAM so we must call into a security agent to allow
>>>> the PCIe bridge to access the designated DRAM region.
>>>>
>>>> We have done this using a private CMA area region assigned via Device
>>>> Tree, assigned with a and requiring the PCIe EP driver to use
>>>> dma_alloc_from_contiguous() in order to allocate from this device
>>>> private CMA area. The only drawback with that approach is that it
>>>> requires knowing how much memory you need up front for buffers and DMA
>>>> descriptors that the PCIe EP will need to process. The problem is that
>>>> it requires driver modifications and that does not scale over the number
>>>> of PCIe EP drivers, some we absolutely do not control, but there is no
>>>> need to bounce buffer. Your approach scales better across PCIe EP
>>>> drivers however it does require bounce buffering which could be a
>>>> performance hit.
>>>
>>> Only the streaming DMA (map/unmap) needs bounce buffering.
>>
>> True, and typically only on transmit since you don't really control
>> where the sk_buff are allocated from, right? On RX since you need to
>> hand buffer addresses to the WLAN chip prior to DMA, you can allocate
>> them from a pool that already falls within the restricted DMA region, right?
>>
>
> Right, but applying bounce buffering to RX will make it more secure.
> The device won't be able to modify the content after unmap. Just like what
> iommu_unmap does.
Sure, however the goals of using bounce buffering equally applies to RX
and TX in that this is the only layer sitting between a stack (block,
networking, USB, etc.) and the underlying device driver that scales well
in order to massage a dma_addr_t to be within a particular physical range.
There is however room for improvement if the drivers are willing to
change their buffer allocation strategy. When you receive Wi-Fi frames
you need to allocate buffers for the Wi-Fi device to DMA into, and that
happens ahead of the DMA transfers by the Wi-Fi device. At buffer
allocation time you could very well allocate these frames from the
restricted DMA region without having to bounce buffer them since the
host CPU is in control over where and when to DMA into.
The issue is that each network driver may implement its own buffer
allocation strategy, some may simply call netdev_alloc_skb() which gives
zero control over where the buffer comes from unless you play tricks
with NUMA node allocations and somehow declare that your restricted DMA
region is a different NUMA node. If the driver allocates pages and then
attaches a SKB to that page using build_skb(), then you have much more
control over where that page comes from, and this is where using a
device private CMA are helps, because you can just do
dma_alloc_from_contiguous() and that will ensure that the pages are
coming from your specific CMA area.
Few questions on the implementation:
- is there any warning or error being printed if the restricted DMA
region is outside of a device's DMA addressable range?
- are there are any helpful statistics that could be shown to indicate
that the restricted DMA region was sized too small, e.g.: that
allocation of a DMA buffer failed because we ran out of space in the
swiotlb pool?
>
>>> I also added alloc/free support in this series
>>> (https://lore.kernel.org/patchwork/patch/1360995/), so dma_direct_alloc() will
>>> try to allocate memory from the predefined memory region.
>>>
>>> As for the performance hit, it should be similar to the default swiotlb.
>>> Here are my experiment results. Both SoCs lack IOMMU for PCIe.
>>>
>>> PCIe wifi vht80 throughput -
>>>
>>> MTK SoC tcp_tx tcp_rx udp_tx udp_rx
>>> w/o Restricted DMA 244.1 134.66 312.56 350.79
>>> w/ Restricted DMA 246.95 136.59 363.21 351.99
>>>
>>> Rockchip SoC tcp_tx tcp_rx udp_tx udp_rx
>>> w/o Restricted DMA 237.87 133.86 288.28 361.88
>>> w/ Restricted DMA 256.01 130.95 292.28 353.19
>>
>> How come you get better throughput with restricted DMA? Is it because
>> doing DMA to/from a contiguous region allows for better grouping of
>> transactions from the DRAM controller's perspective somehow?
>
> I'm not sure, but actually, enabling the default swiotlb for wifi also helps the
> throughput a little bit for me.
OK, it would be interesting if you could get to the bottom of why
performance does increase with swiotlb.
--
Florian
^ permalink raw reply
* [patch 4/4] powerpc/mm/highmem: Use __set_pte_at() for kmap_local()
From: Thomas Gleixner @ 2021-01-12 17:01 UTC (permalink / raw)
To: LKML
Cc: Thomas Bogendoerfer, Andreas Larsson, Peter Zijlstra,
Paul Cercueil, linux-mm, sparclinux, Andrew Morton, linuxppc-dev,
David S. Miller
In-Reply-To: <20210112170136.078559026@linutronix.de>
The original PowerPC highmem mapping function used __set_pte_at() to denote
that the mapping is per CPU. This got lost with the conversion to the
generic implementation.
Override the default map function.
Fixes: 47da42b27a56 ("powerpc/mm/highmem: Switch to generic kmap atomic")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
---
arch/powerpc/include/asm/highmem.h | 2 ++
1 file changed, 2 insertions(+)
--- a/arch/powerpc/include/asm/highmem.h
+++ b/arch/powerpc/include/asm/highmem.h
@@ -58,6 +58,8 @@ extern pte_t *pkmap_page_table;
#define flush_cache_kmaps() flush_cache_all()
+#define arch_kmap_local_set_pte(mm, vaddr, ptep, ptev) \
+ __set_pte_at(mm, vaddr, ptep, ptev, 1)
#define arch_kmap_local_post_map(vaddr, pteval) \
local_flush_tlb_page(NULL, vaddr)
#define arch_kmap_local_post_unmap(vaddr) \
^ permalink raw reply
* [patch 0/4] mm/highmem: Fix fallout from generic kmap_local conversions
From: Thomas Gleixner @ 2021-01-12 17:01 UTC (permalink / raw)
To: LKML
Cc: Thomas Bogendoerfer, Andreas Larsson, Peter Zijlstra,
Paul Cercueil, linux-mm, sparclinux, Andrew Morton, linuxppc-dev,
David S. Miller
The kmap_local conversion wreckaged sparc, mips and powerpc as it missed
some of the details in the original implementation.
The following series addresses that.
Thanks,
tglx
---
arch/mips/include/asm/highmem.h | 1 +
arch/sparc/include/asm/highmem.h | 9 +++++----
b/arch/powerpc/include/asm/highmem.h | 2 ++
mm/highmem.c | 7 ++++++-
4 files changed, 14 insertions(+), 5 deletions(-)
^ permalink raw reply
* [patch 3/4] mips/mm/highmem: Use set_pte() for kmap_local()
From: Thomas Gleixner @ 2021-01-12 17:01 UTC (permalink / raw)
To: LKML
Cc: Thomas Bogendoerfer, Andreas Larsson, Peter Zijlstra,
Paul Cercueil, linux-mm, sparclinux, Andrew Morton, linuxppc-dev,
David S. Miller
In-Reply-To: <20210112170136.078559026@linutronix.de>
set_pte_at() on MIPS invokes update_cache() which might recurse into
kmap_local(). Use set_pte() like the original MIPS highmem implementation
did.
Fixes: a4c33e83bca1 ("mips/mm/highmem: Switch to generic kmap atomic")
Reported-by: Paul Cercueil <paul@crapouillou.net>
Reported-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
arch/mips/include/asm/highmem.h | 1 +
1 file changed, 1 insertion(+)
--- a/arch/mips/include/asm/highmem.h
+++ b/arch/mips/include/asm/highmem.h
@@ -51,6 +51,7 @@ extern void kmap_flush_tlb(unsigned long
#define flush_cache_kmaps() BUG_ON(cpu_has_dc_aliases)
+#define arch_kmap_local_set_pte(mm, vaddr, ptep, ptev) set_pte(ptep, ptev)
#define arch_kmap_local_post_map(vaddr, pteval) local_flush_tlb_one(vaddr)
#define arch_kmap_local_post_unmap(vaddr) local_flush_tlb_one(vaddr)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox