LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC PATCH v3 4/6] swiotlb: Add restricted DMA alloc/free support.
From: Robin Murphy @ 2021-01-13 18:27 UTC (permalink / raw)
  To: Christoph Hellwig, Claire Chang
  Cc: heikki.krogerus, peterz, grant.likely, paulus, frowand.list,
	mingo, m.szyprowski, sstabellini, saravanak, joro,
	rafael.j.wysocki, bgolaszewski, xen-devel, treding, devicetree,
	will, konrad.wilk, dan.j.williams, robh+dt, boris.ostrovsky,
	andriy.shevchenko, jgross, drinkcat, gregkh, rdunlap,
	linux-kernel, tfiga, iommu, xypron.glpk, linuxppc-dev, bauerman
In-Reply-To: <20210113124847.GC1383@lst.de>

On 2021-01-13 12:48, Christoph Hellwig wrote:
>> +#ifdef CONFIG_SWIOTLB
>> +	if (unlikely(dev->dma_io_tlb_mem))
>> +		return swiotlb_alloc(dev, size, dma_handle, attrs);
>> +#endif
> 
> Another place where the dma_io_tlb_mem is useful to avoid the ifdef.
> 
>> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
>> -		size_t mapping_size, size_t alloc_size,
>> -		enum dma_data_direction dir, unsigned long attrs)
>> +static int swiotlb_tbl_find_free_region(struct device *hwdev,
>> +					dma_addr_t tbl_dma_addr,
>> +					size_t alloc_size,
>> +					unsigned long attrs)
> 
>> +static void swiotlb_tbl_release_region(struct device *hwdev, int index,
>> +				       size_t size)
> 
> This refactoring should be another prep patch.
> 
> 
>> +void *swiotlb_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
>> +		    unsigned long attrs)
> 
> I'd rather have the names convey there are for the per-device bounce
> buffer in some form.
> 
>> +	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> 
> While we're at it I wonder if the io_tlb is something we could change
> while we're at it.  Maybe replace io_tlb_mem with struct swiotlb
> and rename the field in struct device to dev_swiotlb?
> 
>> +	int index;
>> +	void *vaddr;
>> +	phys_addr_t tlb_addr;
>> +
>> +	size = PAGE_ALIGN(size);
>> +	index = swiotlb_tbl_find_free_region(dev, mem->start, size, attrs);
>> +	if (index < 0)
>> +		return NULL;
>> +
>> +	tlb_addr = mem->start + (index << IO_TLB_SHIFT);
>> +	*dma_handle = phys_to_dma_unencrypted(dev, tlb_addr);
>> +
>> +	if (!dev_is_dma_coherent(dev)) {
>> +		unsigned long pfn = PFN_DOWN(tlb_addr);
>> +
>> +		/* remove any dirty cache lines on the kernel alias */
>> +		arch_dma_prep_coherent(pfn_to_page(pfn), size);
> 
> Can we hook in somewhat lower level in the dma-direct code so that all
> the remapping in dma-direct can be reused instead of duplicated?  That
> also becomes important if we want to use non-remapping uncached support,
> e.g. on mips or x86, or the direct changing of the attributes that Will
> planned to look into for arm64.

Indeed, AFAICS this ought to boil down to a direct equivalent of 
__dma_direct_alloc_pages() - other than the address there should be no 
conceptual difference between pages from the restricted pool and those 
from the regular page allocator, so this probably deserves to be plumbed 
in as an alternative to that.

Robin.

^ permalink raw reply

* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Robin Murphy @ 2021-01-13 18:03 UTC (permalink / raw)
  To: Florian Fainelli, Nicolas Saenz Julienne, Claire Chang, robh+dt,
	mpe, benh, paulus, joro, will, frowand.list, konrad.wilk,
	boris.ostrovsky, jgross, sstabellini, hch, m.szyprowski
  Cc: devicetree, heikki.krogerus, grant.likely, saravanak, peterz,
	xypron.glpk, rafael.j.wysocki, linux-kernel, treding,
	bgolaszewski, iommu, drinkcat, rdunlap, gregkh, xen-devel,
	dan.j.williams, andriy.shevchenko, linuxppc-dev, mingo
In-Reply-To: <9b4fe35f-a880-fcea-0591-b65406abbfa8@gmail.com>

On 2021-01-13 17:43, Florian Fainelli wrote:
> On 1/13/21 7:27 AM, Robin Murphy wrote:
>> On 2021-01-13 13:59, Nicolas Saenz Julienne wrote:
>>> Hi All,
>>>
>>> On Tue, 2021-01-12 at 16:03 -0800, Florian Fainelli wrote:
>>>> On 1/5/21 7:41 PM, Claire Chang wrote:
>>>>> Add the initialization function to create restricted DMA pools from
>>>>> matching reserved-memory nodes in the device tree.
>>>>>
>>>>> Signed-off-by: Claire Chang <tientzu@chromium.org>
>>>>> ---
>>>>>    include/linux/device.h  |   4 ++
>>>>>    include/linux/swiotlb.h |   7 +-
>>>>>    kernel/dma/Kconfig      |   1 +
>>>>>    kernel/dma/swiotlb.c    | 144
>>>>> ++++++++++++++++++++++++++++++++++------
>>>>>    4 files changed, 131 insertions(+), 25 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>>>> index 89bb8b84173e..ca6f71ec8871 100644
>>>>> --- a/include/linux/device.h
>>>>> +++ b/include/linux/device.h
>>>>> @@ -413,6 +413,7 @@ struct dev_links_info {
>>>>>     * @dma_pools:    Dma pools (if dma'ble device).
>>>>>     * @dma_mem:    Internal for coherent mem override.
>>>>>     * @cma_area:    Contiguous memory area for dma allocations
>>>>> + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
>>>>>     * @archdata:    For arch-specific additions.
>>>>>     * @of_node:    Associated device tree node.
>>>>>     * @fwnode:    Associated device node supplied by platform firmware.
>>>>> @@ -515,6 +516,9 @@ struct device {
>>>>>    #ifdef CONFIG_DMA_CMA
>>>>>        struct cma *cma_area;        /* contiguous memory area for dma
>>>>>                           allocations */
>>>>> +#endif
>>>>> +#ifdef CONFIG_SWIOTLB
>>>>> +    struct io_tlb_mem    *dma_io_tlb_mem;
>>>>>    #endif
>>>>>        /* arch specific additions */
>>>>>        struct dev_archdata    archdata;
>>>>> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
>>>>> index dd8eb57cbb8f..a1bbd7788885 100644
>>>>> --- a/include/linux/swiotlb.h
>>>>> +++ b/include/linux/swiotlb.h
>>>>> @@ -76,12 +76,13 @@ extern enum swiotlb_force swiotlb_force;
>>>>>     *
>>>>>     * @start:    The start address of the swiotlb memory pool. Used
>>>>> to do a quick
>>>>>     *        range check to see if the memory was in fact allocated
>>>>> by this
>>>>> - *        API.
>>>>> + *        API. For restricted DMA pool, this is device tree
>>>>> adjustable.
>>>>
>>>> Maybe write it as this is "firmware adjustable" such that when/if ACPI
>>>> needs something like this, the description does not need updating.
>>
>> TBH I really don't think this needs calling out at all. Even in the
>> regular case, the details of exactly how and where the pool is allocated
>> are beyond the scope of this code - architectures already have several
>> ways to control that and make their own decisions.
>>
>>>>
>>>> [snip]
>>>>
>>>>> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
>>>>> +                    struct device *dev)
>>>>> +{
>>>>> +    struct io_tlb_mem *mem = rmem->priv;
>>>>> +    int ret;
>>>>> +
>>>>> +    if (dev->dma_io_tlb_mem)
>>>>> +        return -EBUSY;
>>>>> +
>>>>> +    if (!mem) {
>>>>> +        mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>>>>> +        if (!mem)
>>>>> +            return -ENOMEM;
>>>>> +
>>>>> +        if (!memremap(rmem->base, rmem->size, MEMREMAP_WB)) {
>>>>
>>>> MEMREMAP_WB sounds appropriate as a default.
>>>
>>> As per the binding 'no-map' has to be disabled here. So AFAIU, this
>>> memory will
>>> be part of the linear mapping. Is this really needed then?
>>
>> More than that, I'd assume that we *have* to use the linear/direct map
>> address rather than anything that has any possibility of being a vmalloc
>> remap, otherwise we can no longer safely rely on
>> phys_to_dma/dma_to_phys, no?
> 
> I believe you are right, which means that if we want to make use of the
> restricted DMA pool on a 32-bit architecture (and we do, at least, I do)
> we should probably add some error checking/warning to ensure the
> restricted DMA pool falls within the linear map.

Oh, good point - I'm so used to 64-bit that I instinctively just blanked 
out the !PageHighMem() condition in try_ram_remap(). So maybe the 
original intent here *was* to effectively just implement that check, but 
if so it could still do with being a lot more explicit.

Cheers,
Robin.

^ permalink raw reply

* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Florian Fainelli @ 2021-01-13 17:43 UTC (permalink / raw)
  To: Robin Murphy, Nicolas Saenz Julienne, Claire Chang, robh+dt, mpe,
	benh, paulus, joro, will, frowand.list, konrad.wilk,
	boris.ostrovsky, jgross, sstabellini, hch, m.szyprowski
  Cc: devicetree, heikki.krogerus, grant.likely, saravanak, peterz,
	xypron.glpk, rafael.j.wysocki, linux-kernel, treding,
	bgolaszewski, iommu, drinkcat, rdunlap, gregkh, xen-devel,
	dan.j.williams, andriy.shevchenko, linuxppc-dev, mingo
In-Reply-To: <4c4989b5-f825-7e04-ca66-038cf6b9d5e9@arm.com>

On 1/13/21 7:27 AM, Robin Murphy wrote:
> On 2021-01-13 13:59, Nicolas Saenz Julienne wrote:
>> Hi All,
>>
>> On Tue, 2021-01-12 at 16:03 -0800, Florian Fainelli wrote:
>>> On 1/5/21 7:41 PM, Claire Chang wrote:
>>>> Add the initialization function to create restricted DMA pools from
>>>> matching reserved-memory nodes in the device tree.
>>>>
>>>> Signed-off-by: Claire Chang <tientzu@chromium.org>
>>>> ---
>>>>   include/linux/device.h  |   4 ++
>>>>   include/linux/swiotlb.h |   7 +-
>>>>   kernel/dma/Kconfig      |   1 +
>>>>   kernel/dma/swiotlb.c    | 144
>>>> ++++++++++++++++++++++++++++++++++------
>>>>   4 files changed, 131 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>>> index 89bb8b84173e..ca6f71ec8871 100644
>>>> --- a/include/linux/device.h
>>>> +++ b/include/linux/device.h
>>>> @@ -413,6 +413,7 @@ struct dev_links_info {
>>>>    * @dma_pools:    Dma pools (if dma'ble device).
>>>>    * @dma_mem:    Internal for coherent mem override.
>>>>    * @cma_area:    Contiguous memory area for dma allocations
>>>> + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
>>>>    * @archdata:    For arch-specific additions.
>>>>    * @of_node:    Associated device tree node.
>>>>    * @fwnode:    Associated device node supplied by platform firmware.
>>>> @@ -515,6 +516,9 @@ struct device {
>>>>   #ifdef CONFIG_DMA_CMA
>>>>       struct cma *cma_area;        /* contiguous memory area for dma
>>>>                          allocations */
>>>> +#endif
>>>> +#ifdef CONFIG_SWIOTLB
>>>> +    struct io_tlb_mem    *dma_io_tlb_mem;
>>>>   #endif
>>>>       /* arch specific additions */
>>>>       struct dev_archdata    archdata;
>>>> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
>>>> index dd8eb57cbb8f..a1bbd7788885 100644
>>>> --- a/include/linux/swiotlb.h
>>>> +++ b/include/linux/swiotlb.h
>>>> @@ -76,12 +76,13 @@ extern enum swiotlb_force swiotlb_force;
>>>>    *
>>>>    * @start:    The start address of the swiotlb memory pool. Used
>>>> to do a quick
>>>>    *        range check to see if the memory was in fact allocated
>>>> by this
>>>> - *        API.
>>>> + *        API. For restricted DMA pool, this is device tree
>>>> adjustable.
>>>
>>> Maybe write it as this is "firmware adjustable" such that when/if ACPI
>>> needs something like this, the description does not need updating.
> 
> TBH I really don't think this needs calling out at all. Even in the
> regular case, the details of exactly how and where the pool is allocated
> are beyond the scope of this code - architectures already have several
> ways to control that and make their own decisions.
> 
>>>
>>> [snip]
>>>
>>>> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
>>>> +                    struct device *dev)
>>>> +{
>>>> +    struct io_tlb_mem *mem = rmem->priv;
>>>> +    int ret;
>>>> +
>>>> +    if (dev->dma_io_tlb_mem)
>>>> +        return -EBUSY;
>>>> +
>>>> +    if (!mem) {
>>>> +        mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>>>> +        if (!mem)
>>>> +            return -ENOMEM;
>>>> +
>>>> +        if (!memremap(rmem->base, rmem->size, MEMREMAP_WB)) {
>>>
>>> MEMREMAP_WB sounds appropriate as a default.
>>
>> As per the binding 'no-map' has to be disabled here. So AFAIU, this
>> memory will
>> be part of the linear mapping. Is this really needed then?
> 
> More than that, I'd assume that we *have* to use the linear/direct map
> address rather than anything that has any possibility of being a vmalloc
> remap, otherwise we can no longer safely rely on
> phys_to_dma/dma_to_phys, no?

I believe you are right, which means that if we want to make use of the
restricted DMA pool on a 32-bit architecture (and we do, at least, I do)
we should probably add some error checking/warning to ensure the
restricted DMA pool falls within the linear map.
-- 
Florian

^ permalink raw reply

* Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement
From: Brian King @ 2021-01-13 17:13 UTC (permalink / raw)
  To: Tyrel Datwyler, james.bottomley
  Cc: martin.petersen, linux-scsi, james.smart, linux-kernel, brking,
	Ming Lei, linuxppc-dev
In-Reply-To: <a8623705-6d49-2056-09bb-80190e0b6f52@linux.ibm.com>

On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
> On 1/12/21 2:54 PM, Brian King wrote:
>> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
>>> Introduce several new vhost fields for managing MQ state of the adapter
>>> as well as initial defaults for MQ enablement.
>>>
>>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>>> ---
>>>  drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
>>>  drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
>>>  2 files changed, 17 insertions(+)
>>>
>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>>> index ba95438a8912..9200fe49c57e 100644
>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
>>>  	.max_sectors = IBMVFC_MAX_SECTORS,
>>>  	.shost_attrs = ibmvfc_attrs,
>>>  	.track_queue_depth = 1,
>>> +	.host_tagset = 1,
>>
>> This doesn't seem right. You are setting host_tagset, which means you want a
>> shared, host wide, tag set for commands. It also means that the total
>> queue depth for the host is can_queue. However, it looks like you are allocating
>> max_requests events for each sub crq, which means you are over allocating memory.
> 
> With the shared tagset yes the queue depth for the host is can_queue, but this
> also implies that the max queue depth for each hw queue is also can_queue. So,
> in the worst case that all commands are queued down the same hw queue we need an
> event pool with can_queue commands.
> 
>>
>> Looking at this closer, we might have bigger problems. There is a host wide
>> max number of commands that the VFC host supports, which gets returned on
>> NPIV Login. This value can change across a live migration event.
> 
> From what I understand the max commands can only become less.
> 
>>
>> The ibmvfc driver, which does the same thing the lpfc driver does, modifies
>> can_queue on the scsi_host *after* the tag set has been allocated. This looks
>> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
>> we look at can_queue once the tag set is setup, and I'm not seeing a good way
>> to dynamically change the host queue depth once the tag set is setup. 
>>
>> Unless I'm missing something, our best options appear to either be to implement
>> our own host wide busy reference counting, which doesn't sound very good, or
>> we need to add some API to block / scsi that allows us to dynamically change
>> can_queue.
> 
> Changing can_queue won't do use any good with the shared tagset becasue each
> queue still needs to be able to queue can_queue number of commands in the worst
> case.

The issue I'm trying to highlight here is the following scenario:

1. We set shost->can_queue, then call scsi_add_host, which allocates the tag set.

2. On our NPIV login response from the VIOS, we might get a lower value than we
initially set in shost->can_queue, so we update it, but nobody ever looks at it
again, and we don't have any protection against sending too many commands to the host.


Basically, we no longer have any code that ensures we don't send more
commands to the VIOS than we are told it supports. According to the architecture,
if we actually do this, the VIOS will do an h_free_crq, which would be a bit
of a bug on our part.

I don't think it was ever clearly defined in the API that a driver can
change shost->can_queue after calling scsi_add_host, but up until
commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now
it doesn't. 

I started looking through drivers that do this, and so far, it looks like the
following drivers do: ibmvfc, lpfc, aix94xx, libfc, BusLogic, and likely others...

We probably need an API that lets us change shost->can_queue dynamically.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center


^ permalink raw reply

* Re: [PATCH for 5.10] powerpc/32s: Fix RTAS machine check with VMAP stack
From: Sasha Levin @ 2021-01-13 16:50 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev, linux-kernel, stable
In-Reply-To: <790e46767a5f10ae991969b064682c8c82f96bc3.1610519852.git.christophe.leroy@csgroup.eu>

On Wed, Jan 13, 2021 at 06:40:20AM +0000, Christophe Leroy wrote:
>This is backport for 5.10
>
>(cherry picked from commit 98bf2d3f4970179c702ef64db658e0553bc6ef3a)
>
>When we have VMAP stack, exception prolog 1 sets r1, not r11.
>
>When it is not an RTAS machine check, don't trash r1 because it is
>needed by prolog 1.
>
>Fixes: da7bb43ab9da ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU")
>Cc: stable@vger.kernel.org # v5.10+
>Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>[mpe: Squash in fixup for RTAS machine check from Christophe]
>Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
>Link: https://lore.kernel.org/r/bc77d61d1c18940e456a2dee464f1e2eda65a3f0.1608621048.git.christophe.leroy@csgroup.eu

Queued up, thanks!

-- 
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH for 4.9/4.14] powerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_at
From: Sasha Levin @ 2021-01-13 16:48 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev, linux-kernel, stable
In-Reply-To: <af07f8f0bb734bc1f906c1f79219f81c13c6ad2c.1610521804.git.christophe.leroy@csgroup.eu>

On Wed, Jan 13, 2021 at 07:12:44AM +0000, Christophe Leroy wrote:
>From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>
>Backport for 4.9 and 4.14
>
>(cherry picked from commit d85be8a49e733dcd23674aa6202870d54bf5600d)
>
>The placeholder for instruction selection should use the second
>argument's operand, which is %1, not %0. This could generate incorrect
>assembly code if the memory addressing of operand %0 is a different
>form from that of operand %1.
>
>Also remove the %Un placeholder because having %Un placeholders
>for two operands which are based on the same local var (ptep) doesn't
>make much sense. By the way, it doesn't change the current behaviour
>because "<>" constraint is missing for the associated "=m".
>
>[chleroy: revised commit log iaw segher's comments and removed %U0]
>
>Fixes: 9bf2b5cdc5fe ("powerpc: Fixes for CONFIG_PTE_64BIT for SMP support")
>Cc: <stable@vger.kernel.org> # v2.6.28+
>Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
>Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
>Link: https://lore.kernel.org/r/96354bd77977a6a933fe9020da57629007fdb920.1603358942.git.christophe.leroy@csgroup.eu

I took this and the 4.4 backport, thanks!

-- 
Thanks,
Sasha

^ permalink raw reply

* [PATCH 6/7] net: ethernet: toshiba: ps3_gelic_net: Fix some kernel-doc misdemeanours
From: Lee Jones @ 2021-01-13 16:41 UTC (permalink / raw)
  To: lee.jones
  Cc: Andrew Lunn, Geoff Levand, linux-kernel, Jens Osterkamp, netdev,
	Paul Mackerras, Utz Bacher, Jakub Kicinski, linuxppc-dev,
	David S. Miller
In-Reply-To: <20210113164123.1334116-1-lee.jones@linaro.org>

Fixes the following W=1 kernel build warning(s):

 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1107: warning: Function parameter or member 'irq' not described in 'gelic_card_interrupt'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1107: warning: Function parameter or member 'ptr' not described in 'gelic_card_interrupt'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1407: warning: Function parameter or member 'txqueue' not described in 'gelic_net_tx_timeout'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1439: warning: Function parameter or member 'napi' not described in 'gelic_ether_setup_netdev_ops'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1639: warning: Function parameter or member 'dev' not described in 'ps3_gelic_driver_probe'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1795: warning: Function parameter or member 'dev' not described in 'ps3_gelic_driver_remove'

Cc: Geoff Levand <geoff@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Utz Bacher <utz.bacher@de.ibm.com>
Cc: Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Cc: netdev@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 drivers/net/ethernet/toshiba/ps3_gelic_net.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
index 3d1fc8d2ca667..55e652624bd76 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
@@ -1100,7 +1100,7 @@ static int gelic_net_poll(struct napi_struct *napi, int budget)
 	return packets_done;
 }
 
-/**
+/*
  * gelic_card_interrupt - event handler for gelic_net
  */
 static irqreturn_t gelic_card_interrupt(int irq, void *ptr)
@@ -1400,6 +1400,7 @@ static void gelic_net_tx_timeout_task(struct work_struct *work)
 /**
  * gelic_net_tx_timeout - called when the tx timeout watchdog kicks in.
  * @netdev: interface device structure
+ * @txqueue: unused
  *
  * called, if tx hangs. Schedules a task that resets the interface
  */
@@ -1431,6 +1432,7 @@ static const struct net_device_ops gelic_netdevice_ops = {
 /**
  * gelic_ether_setup_netdev_ops - initialization of net_device operations
  * @netdev: net_device structure
+ * @napi: napi structure
  *
  * fills out function pointers in the net_device structure
  */
@@ -1632,7 +1634,7 @@ static void gelic_card_get_vlan_info(struct gelic_card *card)
 	dev_info(ctodev(card), "internal vlan %s\n",
 		 card->vlan_required? "enabled" : "disabled");
 }
-/**
+/*
  * ps3_gelic_driver_probe - add a device to the control of this driver
  */
 static int ps3_gelic_driver_probe(struct ps3_system_bus_device *dev)
@@ -1787,7 +1789,7 @@ static int ps3_gelic_driver_probe(struct ps3_system_bus_device *dev)
 	return result;
 }
 
-/**
+/*
  * ps3_gelic_driver_remove - remove a device from the control of this driver
  */
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 0/7] Rid W=1 warnings in Ethernet
From: Lee Jones @ 2021-01-13 16:41 UTC (permalink / raw)
  To: lee.jones
  Cc: Paul Durrant, Kurt Kanzenbach, Alexei Starovoitov,
	Gustavo A. R. Silva, Peter Cammaert, Paul Mackerras,
	Sukadev Bhattiprolu, Wei Liu, Daniel Borkmann, John Fastabend,
	Santiago Leon, Jakub Kicinski, Grygorii Strashko, Thomas Falcon,
	Jesper Dangaard Brouer, Jens Osterkamp, Rusty Russell,
	Daris A Nevil, Lijun Pan, xen-devel, Ivan Khoronzhuk,
	Nicolas Pitre, Geoff Levand, netdev, linux-kernel, Erik Stahlman,
	John Allen, Utz Bacher, Dany Madden, bpf, linuxppc-dev,
	David S. Miller, Russell King

Resending the stragglers again.                                                                                  

This set is part of a larger effort attempting to clean-up W=1                                                   
kernel builds, which are currently overwhelmingly riddled with                                                   
niggly little warnings.                                                                                          
                                                                                                                 
v2:                                                                                                              
 - Squashed IBM patches                                                                                      
 - Fixed real issue in SMSC
 - Added Andrew's Reviewed-by tags on remainder

Lee Jones (7):
  net: ethernet: smsc: smc91x: Fix function name in kernel-doc header
  net: xen-netback: xenbus: Demote nonconformant kernel-doc headers
  net: ethernet: ti: am65-cpsw-qos: Demote non-conformant function
    header
  net: ethernet: ti: am65-cpts: Document am65_cpts_rx_enable()'s 'en'
    parameter
  net: ethernet: ibm: ibmvnic: Fix some kernel-doc misdemeanours
  net: ethernet: toshiba: ps3_gelic_net: Fix some kernel-doc
    misdemeanours
  net: ethernet: toshiba: spider_net: Document a whole bunch of function
    parameters

 drivers/net/ethernet/ibm/ibmvnic.c           | 27 ++++++++++----------
 drivers/net/ethernet/smsc/smc91x.c           |  2 +-
 drivers/net/ethernet/ti/am65-cpsw-qos.c      |  2 +-
 drivers/net/ethernet/ti/am65-cpts.c          |  2 +-
 drivers/net/ethernet/toshiba/ps3_gelic_net.c |  8 +++---
 drivers/net/ethernet/toshiba/spider_net.c    | 18 ++++++++-----
 drivers/net/xen-netback/xenbus.c             |  4 +--
 drivers/net/xen-netfront.c                   |  6 ++---
 8 files changed, 37 insertions(+), 32 deletions(-)

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: bpf@vger.kernel.org
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dany Madden <drt@linux.ibm.com>
Cc: Daris A Nevil <dnevil@snmc.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Erik Stahlman <erik@vt.edu>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Ishizaki Kou <kou.ishizaki@toshiba.co.jp>
Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Kurt Kanzenbach <kurt@linutronix.de>
Cc: Lijun Pan <ljp@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: netdev@vger.kernel.org
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Paul Durrant <paul@xen.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Cammaert <pc@denkart.be>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Santiago Leon <santi_leon@yahoo.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Cc: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Cc: Utz Bacher <utz.bacher@de.ibm.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: xen-devel@lists.xenproject.org
-- 
2.25.1


^ permalink raw reply

* [PATCH 5/7] net: ethernet: ibm: ibmvnic: Fix some kernel-doc misdemeanours
From: Lee Jones @ 2021-01-13 16:41 UTC (permalink / raw)
  To: lee.jones
  Cc: Thomas Falcon, John Allen, linux-kernel, Santiago Leon,
	Jakub Kicinski, netdev, Lijun Pan, Dany Madden, Paul Mackerras,
	Sukadev Bhattiprolu, linuxppc-dev, David S. Miller
In-Reply-To: <20210113164123.1334116-1-lee.jones@linaro.org>

Fixes the following W=1 kernel build warning(s):

 from drivers/net/ethernet/ibm/ibmvnic.c:35:
 inlined from ‘handle_vpd_rsp’ at drivers/net/ethernet/ibm/ibmvnic.c:4124:3:
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_field' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'skb' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_len' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_data' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_field' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_data' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'len' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_len' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'scrq_arr' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'txbuff' not described in 'build_hdr_descs_arr'
 drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'num_entries' not described in 'build_hdr_descs_arr'
 drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'hdr_field' not described in 'build_hdr_descs_arr'
 drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'adapter' not described in 'do_change_param_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'rwi' not described in 'do_change_param_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'reset_state' not described in 'do_change_param_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'adapter' not described in 'do_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'rwi' not described in 'do_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'reset_state' not described in 'do_reset'

Cc: Dany Madden <drt@linux.ibm.com>
Cc: Lijun Pan <ljp@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Santiago Leon <santi_leon@yahoo.com>
Cc: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: netdev@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 27 +++++++++++++--------------
 drivers/net/xen-netfront.c         |  6 +++---
 2 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index aed985e08e8ad..4c4252e68de5a 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1384,10 +1384,10 @@ static int ibmvnic_close(struct net_device *netdev)
 
 /**
  * build_hdr_data - creates L2/L3/L4 header data buffer
- * @hdr_field - bitfield determining needed headers
- * @skb - socket buffer
- * @hdr_len - array of header lengths
- * @tot_len - total length of data
+ * @hdr_field: bitfield determining needed headers
+ * @skb: socket buffer
+ * @hdr_len: array of header lengths
+ * @hdr_data: buffer to write the header to
  *
  * Reads hdr_field to determine which headers are needed by firmware.
  * Builds a buffer containing these headers.  Saves individual header
@@ -1444,11 +1444,11 @@ static int build_hdr_data(u8 hdr_field, struct sk_buff *skb,
 
 /**
  * create_hdr_descs - create header and header extension descriptors
- * @hdr_field - bitfield determining needed headers
- * @data - buffer containing header data
- * @len - length of data buffer
- * @hdr_len - array of individual header lengths
- * @scrq_arr - descriptor array
+ * @hdr_field: bitfield determining needed headers
+ * @hdr_data: buffer containing header data
+ * @len: length of data buffer
+ * @hdr_len: array of individual header lengths
+ * @scrq_arr: descriptor array
  *
  * Creates header and, if needed, header extension descriptors and
  * places them in a descriptor array, scrq_arr
@@ -1496,10 +1496,9 @@ static int create_hdr_descs(u8 hdr_field, u8 *hdr_data, int len, int *hdr_len,
 
 /**
  * build_hdr_descs_arr - build a header descriptor array
- * @skb - socket buffer
- * @num_entries - number of descriptors to be sent
- * @subcrq - first TX descriptor
- * @hdr_field - bit field determining which headers will be sent
+ * @txbuff: tx buffer
+ * @num_entries: number of descriptors to be sent
+ * @hdr_field: bit field determining which headers will be sent
  *
  * This function will build a TX descriptor array with applicable
  * L2/L3/L4 packet header descriptors to be sent by send_subcrq_indirect.
@@ -1925,7 +1924,7 @@ static int ibmvnic_set_mac(struct net_device *netdev, void *p)
 	return rc;
 }
 
-/**
+/*
  * do_reset returns zero if we are able to keep processing reset events, or
  * non-zero if we hit a fatal error and must halt.
  */
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index c20b78120bb42..6ef2adbd283a3 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1580,7 +1580,7 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 	return ERR_PTR(err);
 }
 
-/**
+/*
  * Entry point to this code when a new device is created.  Allocate the basic
  * structures and the ring buffers for communication with the backend, and
  * inform the backend of the appropriate details for those.
@@ -1657,7 +1657,7 @@ static void xennet_disconnect_backend(struct netfront_info *info)
 	}
 }
 
-/**
+/*
  * We are reconnecting to the backend, due to a suspend/resume, or a backend
  * driver restart.  We tear down our netif structure and recreate it, but
  * leave the device-layer structures intact so that this is transparent to the
@@ -2303,7 +2303,7 @@ static int xennet_connect(struct net_device *dev)
 	return 0;
 }
 
-/**
+/*
  * Callback received when the backend's state changes.
  */
 static void netback_changed(struct xenbus_device *dev,
-- 
2.25.1


^ permalink raw reply related

* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Robin Murphy @ 2021-01-13 15:27 UTC (permalink / raw)
  To: Nicolas Saenz Julienne, Florian Fainelli, Claire Chang, robh+dt,
	mpe, benh, paulus, joro, will, frowand.list, konrad.wilk,
	boris.ostrovsky, jgross, sstabellini, hch, m.szyprowski
  Cc: devicetree, heikki.krogerus, grant.likely, saravanak, peterz,
	xypron.glpk, rafael.j.wysocki, linux-kernel, treding,
	bgolaszewski, iommu, drinkcat, rdunlap, gregkh, xen-devel,
	dan.j.williams, andriy.shevchenko, linuxppc-dev, mingo
In-Reply-To: <7ed51025f051f65f3dfe10a88caeb648821994b1.camel@suse.de>

On 2021-01-13 13:59, Nicolas Saenz Julienne wrote:
> Hi All,
> 
> On Tue, 2021-01-12 at 16:03 -0800, Florian Fainelli wrote:
>> On 1/5/21 7:41 PM, Claire Chang wrote:
>>> Add the initialization function to create restricted DMA pools from
>>> matching reserved-memory nodes in the device tree.
>>>
>>> Signed-off-by: Claire Chang <tientzu@chromium.org>
>>> ---
>>>   include/linux/device.h  |   4 ++
>>>   include/linux/swiotlb.h |   7 +-
>>>   kernel/dma/Kconfig      |   1 +
>>>   kernel/dma/swiotlb.c    | 144 ++++++++++++++++++++++++++++++++++------
>>>   4 files changed, 131 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>> index 89bb8b84173e..ca6f71ec8871 100644
>>> --- a/include/linux/device.h
>>> +++ b/include/linux/device.h
>>> @@ -413,6 +413,7 @@ struct dev_links_info {
>>>    * @dma_pools:	Dma pools (if dma'ble device).
>>>    * @dma_mem:	Internal for coherent mem override.
>>>    * @cma_area:	Contiguous memory area for dma allocations
>>> + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
>>>    * @archdata:	For arch-specific additions.
>>>    * @of_node:	Associated device tree node.
>>>    * @fwnode:	Associated device node supplied by platform firmware.
>>> @@ -515,6 +516,9 @@ struct device {
>>>   #ifdef CONFIG_DMA_CMA
>>>   	struct cma *cma_area;		/* contiguous memory area for dma
>>>   					   allocations */
>>> +#endif
>>> +#ifdef CONFIG_SWIOTLB
>>> +	struct io_tlb_mem	*dma_io_tlb_mem;
>>>   #endif
>>>   	/* arch specific additions */
>>>   	struct dev_archdata	archdata;
>>> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
>>> index dd8eb57cbb8f..a1bbd7788885 100644
>>> --- a/include/linux/swiotlb.h
>>> +++ b/include/linux/swiotlb.h
>>> @@ -76,12 +76,13 @@ extern enum swiotlb_force swiotlb_force;
>>>    *
>>>    * @start:	The start address of the swiotlb memory pool. Used to do a quick
>>>    *		range check to see if the memory was in fact allocated by this
>>> - *		API.
>>> + *		API. For restricted DMA pool, this is device tree adjustable.
>>
>> Maybe write it as this is "firmware adjustable" such that when/if ACPI
>> needs something like this, the description does not need updating.

TBH I really don't think this needs calling out at all. Even in the 
regular case, the details of exactly how and where the pool is allocated 
are beyond the scope of this code - architectures already have several 
ways to control that and make their own decisions.

>>
>> [snip]
>>
>>> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
>>> +				    struct device *dev)
>>> +{
>>> +	struct io_tlb_mem *mem = rmem->priv;
>>> +	int ret;
>>> +
>>> +	if (dev->dma_io_tlb_mem)
>>> +		return -EBUSY;
>>> +
>>> +	if (!mem) {
>>> +		mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>>> +		if (!mem)
>>> +			return -ENOMEM;
>>> +
>>> +		if (!memremap(rmem->base, rmem->size, MEMREMAP_WB)) {
>>
>> MEMREMAP_WB sounds appropriate as a default.
> 
> As per the binding 'no-map' has to be disabled here. So AFAIU, this memory will
> be part of the linear mapping. Is this really needed then?

More than that, I'd assume that we *have* to use the linear/direct map 
address rather than anything that has any possibility of being a vmalloc 
remap, otherwise we can no longer safely rely on 
phys_to_dma/dma_to_phys, no?

That said, given that we're not actually using the returned address, I'm 
not entirely sure what the point of this call is anyway. If we can 
assume it's always going to return the linear map address via 
try_ram_remap() then we can equally just go ahead and use the linear map 
address straight away. I don't really see how we could ever hit the 
"is_ram == REGION_MIXED" case in memremap() that would return NULL, if 
we passed the memblock check earlier in __reserved_mem_alloc_size() such 
that this rmem node ever got to be initialised at all.

Robin.

>> Documentation/devicetree/bindings/reserved-memory/ramoops.txt does
>> define an "unbuffered" property which in premise could be applied to the
>> generic reserved memory binding as well and that we may have to be
>> honoring here, if we were to make it more generic. Oh well, this does
>> not need to be addressed right now I guess.
> 
> 
> 

^ permalink raw reply

* Re: [PATCH v5 18/21] powerpc: move NMI entry/exit code into wrapper
From: Christophe Leroy @ 2021-01-13 15:13 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20210113073215.516986-19-npiggin@gmail.com>



Le 13/01/2021 à 08:32, Nicholas Piggin a écrit :
> This moves the common NMI entry and exit code into the interrupt handler
> wrappers.
> 
> This changes the behaviour of soft-NMI (watchdog) and HMI interrupts, and
> also MCE interrupts on 64e, by adding missing parts of the NMI entry to
> them.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/include/asm/interrupt.h | 24 ++++++++++++++++
>   arch/powerpc/kernel/mce.c            | 11 --------
>   arch/powerpc/kernel/traps.c          | 42 +++++-----------------------
>   arch/powerpc/kernel/watchdog.c       | 10 +++----
>   4 files changed, 35 insertions(+), 52 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
> index e278dffe7657..01192e213f9a 100644
> --- a/arch/powerpc/include/asm/interrupt.h
> +++ b/arch/powerpc/include/asm/interrupt.h
> @@ -95,14 +95,38 @@ static inline void interrupt_async_exit_prepare(struct pt_regs *regs, struct int
>   }
>   
>   struct interrupt_nmi_state {
> +#ifdef CONFIG_PPC64
> +	u8 ftrace_enabled;
> +#endif
>   };
>   
>   static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct interrupt_nmi_state *state)
>   {
> +#ifdef CONFIG_PPC64
> +	state->ftrace_enabled = this_cpu_get_ftrace_enabled();
> +	this_cpu_set_ftrace_enabled(0);
> +#endif
> +
> +	/*
> +	 * Do not use nmi_enter() for pseries hash guest taking a real-mode
> +	 * NMI because not everything it touches is within the RMA limit.
> +	 */
> +	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
> +			!firmware_has_feature(FW_FEATURE_LPAR) ||
> +			radix_enabled() || (mfmsr() & MSR_DR))
> +		nmi_enter();
>   }
>   
>   static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct interrupt_nmi_state *state)
>   {
> +	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
> +			!firmware_has_feature(FW_FEATURE_LPAR) ||
> +			radix_enabled() || (mfmsr() & MSR_DR))
> +		nmi_exit();
> +
> +#ifdef CONFIG_PPC64
> +	this_cpu_set_ftrace_enabled(state->ftrace_enabled);
> +#endif
>   }
>   
>   /**
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index 54269947113d..51456217ec40 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -592,12 +592,6 @@ EXPORT_SYMBOL_GPL(machine_check_print_event_info);
>   DEFINE_INTERRUPT_HANDLER_NMI(machine_check_early)
>   {
>   	long handled = 0;
> -	u8 ftrace_enabled = this_cpu_get_ftrace_enabled();
> -
> -	this_cpu_set_ftrace_enabled(0);
> -	/* Do not use nmi_enter/exit for pseries hpte guest */
> -	if (radix_enabled() || !firmware_has_feature(FW_FEATURE_LPAR))
> -		nmi_enter();
>   
>   	hv_nmi_check_nonrecoverable(regs);
>   
> @@ -607,11 +601,6 @@ DEFINE_INTERRUPT_HANDLER_NMI(machine_check_early)
>   	if (ppc_md.machine_check_early)
>   		handled = ppc_md.machine_check_early(regs);
>   
> -	if (radix_enabled() || !firmware_has_feature(FW_FEATURE_LPAR))
> -		nmi_exit();
> -
> -	this_cpu_set_ftrace_enabled(ftrace_enabled);
> -
>   	return handled;
>   }
>   
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index b4f23e871a68..43d23232ef5c 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -435,11 +435,6 @@ DEFINE_INTERRUPT_HANDLER_NMI(system_reset_exception)
>   {
>   	unsigned long hsrr0, hsrr1;
>   	bool saved_hsrrs = false;
> -	u8 ftrace_enabled = this_cpu_get_ftrace_enabled();
> -
> -	this_cpu_set_ftrace_enabled(0);
> -
> -	nmi_enter();
>   
>   	/*
>   	 * System reset can interrupt code where HSRRs are live and MSR[RI]=1.
> @@ -511,10 +506,6 @@ DEFINE_INTERRUPT_HANDLER_NMI(system_reset_exception)
>   		mtspr(SPRN_HSRR1, hsrr1);
>   	}
>   
> -	nmi_exit();
> -
> -	this_cpu_set_ftrace_enabled(ftrace_enabled);
> -
>   	/* What should we do here? We could issue a shutdown or hard reset. */
>   
>   	return 0;
> @@ -792,6 +783,12 @@ int machine_check_generic(struct pt_regs *regs)
>   #endif /* everything else */
>   
>   
> +/*
> + * BOOK3S_64 does not call this handler as a non-maskable interrupt
> + * (it uses its own early real-mode handler to handle the MCE proper
> + * and then raises irq_work to call this handler when interrupts are
> + * enabled).
> + */
>   #ifdef CONFIG_PPC_BOOK3S_64
>   DEFINE_INTERRUPT_HANDLER_ASYNC(machine_check_exception)
>   #else
> @@ -800,20 +797,6 @@ DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
>   {
>   	int recover = 0;
>   
> -	/*
> -	 * BOOK3S_64 does not call this handler as a non-maskable interrupt
> -	 * (it uses its own early real-mode handler to handle the MCE proper
> -	 * and then raises irq_work to call this handler when interrupts are
> -	 * enabled).
> -	 *
> -	 * This is silly. The BOOK3S_64 should just call a different function
> -	 * rather than expecting semantics to magically change. Something
> -	 * like 'non_nmi_machine_check_exception()', perhaps?
> -	 */
> -	const bool nmi = !IS_ENABLED(CONFIG_PPC_BOOK3S_64);
> -
> -	if (nmi) nmi_enter();
> -
>   	__this_cpu_inc(irq_stat.mce_exceptions);
>   
>   	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
> @@ -838,24 +821,17 @@ DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
>   	if (check_io_access(regs))
>   		goto bail;
>   
> -	if (nmi) nmi_exit();
> -

IIRC, not doing the nmi_exit() before the die() is problematic.

See 
https://github.com/linuxppc/linux/commit/daf00ae71dad8aa05965713c62558aeebf2df48e#diff-70077148c383252ca949063eaf1b0250620e4607b43f4ef3fd2d8f448a83ab0a

>   	die("Machine check", regs, SIGBUS);
>   
>   	/* Must die if the interrupt is not recoverable */
>   	if (!(regs->msr & MSR_RI))
>   		die("Unrecoverable Machine check", regs, SIGBUS);
>   
> -#ifdef CONFIG_PPC_BOOK3S_64
>   bail:
> +#ifdef CONFIG_PPC_BOOK3S_64
>   	return;
>   #else
>   	return 0;
> -
> -bail:
> -	if (nmi) nmi_exit();
> -
> -	return 0;
>   #endif
>   }
>   NOKPROBE_SYMBOL(machine_check_exception);
> @@ -1873,14 +1849,10 @@ DEFINE_INTERRUPT_HANDLER(vsx_unavailable_tm)
>   #ifdef CONFIG_PPC64
>   DEFINE_INTERRUPT_HANDLER_NMI(performance_monitor_exception_nmi)
>   {
> -	nmi_enter();
> -
>   	__this_cpu_inc(irq_stat.pmu_irqs);
>   
>   	perf_irq(regs);
>   
> -	nmi_exit();
> -
>   	return 0;
>   }
>   #endif
> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
> index 824b9376ac35..dc39534836a3 100644
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -254,11 +254,12 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
>   	int cpu = raw_smp_processor_id();
>   	u64 tb;
>   
> +	/* should only arrive from kernel, with irqs disabled */
> +	WARN_ON_ONCE(!arch_irq_disabled_regs(regs));
> +
>   	if (!cpumask_test_cpu(cpu, &wd_cpus_enabled))
>   		return 0;
>   
> -	nmi_enter();
> -
>   	__this_cpu_inc(irq_stat.soft_nmi_irqs);
>   
>   	tb = get_tb();
> @@ -266,7 +267,7 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
>   		wd_smp_lock(&flags);
>   		if (cpumask_test_cpu(cpu, &wd_smp_cpus_stuck)) {
>   			wd_smp_unlock(&flags);
> -			goto out;
> +			return 0;
>   		}
>   		set_cpu_stuck(cpu, tb);
>   
> @@ -290,9 +291,6 @@ DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
>   	if (wd_panic_timeout_tb < 0x7fffffff)
>   		mtspr(SPRN_DEC, wd_panic_timeout_tb);
>   
> -out:
> -	nmi_exit();
> -
>   	return 0;
>   }
>   
> 

^ permalink raw reply

* Re: [PATCH v5 17/21] powerpc/64: entry cpu time accounting in C
From: Christophe Leroy @ 2021-01-13 15:05 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20210113073215.516986-18-npiggin@gmail.com>



Le 13/01/2021 à 08:32, Nicholas Piggin a écrit :
> There is no need for this to be in asm, use the new intrrupt entry wrapper.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/include/asm/interrupt.h |  7 +++++++
>   arch/powerpc/include/asm/ppc_asm.h   | 24 ------------------------
>   arch/powerpc/kernel/exceptions-64e.S |  1 -
>   arch/powerpc/kernel/exceptions-64s.S |  5 -----
>   4 files changed, 7 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
> index 6eba7c489753..e278dffe7657 100644
> --- a/arch/powerpc/include/asm/interrupt.h
> +++ b/arch/powerpc/include/asm/interrupt.h
> @@ -4,6 +4,7 @@
>   
>   #include <linux/context_tracking.h>
>   #include <linux/hardirq.h>
> +#include <asm/cputime.h>
>   #include <asm/ftrace.h>
>   
>   struct interrupt_state {
> @@ -25,6 +26,9 @@ static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrup
>   	if (user_mode(regs)) {
>   		CT_WARN_ON(ct_state() != CONTEXT_USER);
>   		user_exit_irqoff();
> +
> +		account_cpu_user_entry();

Are interrupts still disabled here ? Otherwise you risk getting IRQ time accounted on user.

> +		account_stolen_time();
>   	} else {
>   		/*
>   		 * CT_WARN_ON comes here via program_check_exception,
> @@ -38,6 +42,9 @@ static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrup
>   #ifdef CONFIG_PPC_BOOK3E_64
>   	state->ctx_state = exception_enter();
>   #endif
> +
> +	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) && user_mode(regs))
> +		account_cpu_user_entry();

Isn't this interrupt_enter_prepare() function called also on PPC32 ?
Have you removed the ACCOUNT_CPU_USER_ENTRY() from entry_32.S ?

>   }
>   
>   /*
> diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
> index cc1bca571332..3dceb64fc9af 100644
> --- a/arch/powerpc/include/asm/ppc_asm.h
> +++ b/arch/powerpc/include/asm/ppc_asm.h
> @@ -25,7 +25,6 @@
>   #ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
>   #define ACCOUNT_CPU_USER_ENTRY(ptr, ra, rb)
>   #define ACCOUNT_CPU_USER_EXIT(ptr, ra, rb)
> -#define ACCOUNT_STOLEN_TIME
>   #else
>   #define ACCOUNT_CPU_USER_ENTRY(ptr, ra, rb)				\
>   	MFTB(ra);			/* get timebase */		\
> @@ -44,29 +43,6 @@
>   	PPC_LL	ra, ACCOUNT_SYSTEM_TIME(ptr);				\
>   	add	ra,ra,rb;		/* add on to system time */	\
>   	PPC_STL	ra, ACCOUNT_SYSTEM_TIME(ptr)
> -
> -#ifdef CONFIG_PPC_SPLPAR
> -#define ACCOUNT_STOLEN_TIME						\
> -BEGIN_FW_FTR_SECTION;							\
> -	beq	33f;							\
> -	/* from user - see if there are any DTL entries to process */	\
> -	ld	r10,PACALPPACAPTR(r13);	/* get ptr to VPA */		\
> -	ld	r11,PACA_DTL_RIDX(r13);	/* get log read index */	\
> -	addi	r10,r10,LPPACA_DTLIDX;					\
> -	LDX_BE	r10,0,r10;		/* get log write index */	\
> -	cmpd	cr1,r11,r10;						\
> -	beq+	cr1,33f;						\
> -	bl	accumulate_stolen_time;				\
> -	ld	r12,_MSR(r1);						\
> -	andi.	r10,r12,MSR_PR;		/* Restore cr0 (coming from user) */ \
> -33:									\
> -END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
> -
> -#else  /* CONFIG_PPC_SPLPAR */
> -#define ACCOUNT_STOLEN_TIME
> -
> -#endif /* CONFIG_PPC_SPLPAR */
> -
>   #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
>   
>   /*
> diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
> index 52421042a020..87b3e74ded41 100644
> --- a/arch/powerpc/kernel/exceptions-64e.S
> +++ b/arch/powerpc/kernel/exceptions-64e.S
> @@ -398,7 +398,6 @@ exc_##n##_common:							    \
>   	std	r10,_NIP(r1);		/* save SRR0 to stackframe */	    \
>   	std	r11,_MSR(r1);		/* save SRR1 to stackframe */	    \
>   	beq	2f;			/* if from kernel mode */	    \
> -	ACCOUNT_CPU_USER_ENTRY(r13,r10,r11);/* accounting (uses cr0+eq) */  \
>   2:	ld	r3,excf+EX_R10(r13);	/* get back r10 */		    \
>   	ld	r4,excf+EX_R11(r13);	/* get back r11 */		    \
>   	mfspr	r5,scratch;		/* get back r13 */		    \
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index df4ee073386b..68505e35bcf7 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -577,7 +577,6 @@ DEFINE_FIXED_SYMBOL(\name\()_common_real)
>   	kuap_save_amr_and_lock r9, r10, cr1, cr0
>   	.endif
>   	beq	101f			/* if from kernel mode		*/
> -	ACCOUNT_CPU_USER_ENTRY(r13, r9, r10)
>   BEGIN_FTR_SECTION
>   	ld	r9,IAREA+EX_PPR(r13)	/* Read PPR from paca		*/
>   	std	r9,_PPR(r1)
> @@ -645,10 +644,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
>   	ld	r11,exception_marker@toc(r2)
>   	std	r10,RESULT(r1)		/* clear regs->result		*/
>   	std	r11,STACK_FRAME_OVERHEAD-16(r1) /* mark the frame	*/
> -
> -	.if ISTACK
> -	ACCOUNT_STOLEN_TIME
> -	.endif
>   .endm
>   
>   /*
> 

^ permalink raw reply

* Re: [PATCH v5 16/21] powerpc/64: move account_stolen_time into its own function
From: Christophe Leroy @ 2021-01-13 14:59 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20210113073215.516986-17-npiggin@gmail.com>



Le 13/01/2021 à 08:32, Nicholas Piggin a écrit :
> This will be used by interrupt entry as well.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/include/asm/cputime.h | 15 +++++++++++++++
>   arch/powerpc/kernel/syscall_64.c   | 10 +---------
>   2 files changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h
> index ed75d1c318e3..3f61604e1fcf 100644
> --- a/arch/powerpc/include/asm/cputime.h
> +++ b/arch/powerpc/include/asm/cputime.h
> @@ -87,6 +87,18 @@ static notrace inline void account_cpu_user_exit(void)
>   	acct->starttime_user = tb;
>   }
>   
> +static notrace inline void account_stolen_time(void)
> +{
> +#ifdef CONFIG_PPC_SPLPAR
> +	if (IS_ENABLED(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) &&

Arent' you already inside a CONFIG_VIRT_CPU_ACCOUNTING_NATIVE section ?

> +	    firmware_has_feature(FW_FEATURE_SPLPAR)) {
> +		struct lppaca *lp = local_paca->lppaca_ptr;
> +
> +		if (unlikely(local_paca->dtl_ridx != be64_to_cpu(lp->dtl_idx)))
> +			accumulate_stolen_time();
> +	}
> +#endif
> +}
>   
>   #endif /* __KERNEL__ */
>   #else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
> @@ -96,5 +108,8 @@ static inline void account_cpu_user_entry(void)
>   static inline void account_cpu_user_exit(void)
>   {
>   }
> +static notrace inline void account_stolen_time(void)
> +{
> +}
>   #endif /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
>   #endif /* __POWERPC_CPUTIME_H */
> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> index 42f0ad4b2fbb..32f72965da26 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -69,15 +69,7 @@ notrace long system_call_exception(long r3, long r4, long r5,
>   
>   	account_cpu_user_entry();
>   
> -#ifdef CONFIG_PPC_SPLPAR
> -	if (IS_ENABLED(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) &&
> -	    firmware_has_feature(FW_FEATURE_SPLPAR)) {
> -		struct lppaca *lp = local_paca->lppaca_ptr;
> -
> -		if (unlikely(local_paca->dtl_ridx != be64_to_cpu(lp->dtl_idx)))
> -			accumulate_stolen_time();
> -	}
> -#endif
> +	account_stolen_time();
>   
>   	/*
>   	 * This is not required for the syscall exit path, but makes the
> 

^ permalink raw reply

* Re: [PATCH v5 15/21] powerpc/64s: reconcile interrupts in C
From: Christophe Leroy @ 2021-01-13 14:54 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20210113073215.516986-16-npiggin@gmail.com>



Le 13/01/2021 à 08:32, Nicholas Piggin a écrit :
> There is no need for this to be in asm, use the new intrrupt entry wrapper.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/include/asm/interrupt.h | 15 +++++++++++----
>   arch/powerpc/kernel/exceptions-64s.S | 26 --------------------------
>   2 files changed, 11 insertions(+), 30 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
> index 34d7cca2cb2e..6eba7c489753 100644
> --- a/arch/powerpc/include/asm/interrupt.h
> +++ b/arch/powerpc/include/asm/interrupt.h
> @@ -14,11 +14,14 @@ struct interrupt_state {
>   
>   static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrupt_state *state)
>   {
> -#ifdef CONFIG_PPC_BOOK3E_64
> -	state->ctx_state = exception_enter();
> -#endif
> -

Can't the above stay on top of the function ?

> +	/*
> +	 * Book3E reconciles irq soft mask in asm
> +	 */
>   #ifdef CONFIG_PPC_BOOK3S_64
> +	if (irq_soft_mask_set_return(IRQS_ALL_DISABLED) == IRQS_ENABLED)
> +		trace_hardirqs_off();
> +	local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
> +
>   	if (user_mode(regs)) {
>   		CT_WARN_ON(ct_state() != CONTEXT_USER);
>   		user_exit_irqoff();
> @@ -31,6 +34,10 @@ static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrup
>   			CT_WARN_ON(ct_state() != CONTEXT_KERNEL);
>   	}
>   #endif
> +
> +#ifdef CONFIG_PPC_BOOK3E_64
> +	state->ctx_state = exception_enter();
> +#endif
>   }
>   
>   /*
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 8b0db807974c..df4ee073386b 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -139,7 +139,6 @@ name:
>   #define IKVM_VIRT	.L_IKVM_VIRT_\name\()	/* Virt entry tests KVM */
>   #define ISTACK		.L_ISTACK_\name\()	/* Set regular kernel stack */
>   #define __ISTACK(name)	.L_ISTACK_ ## name
> -#define IRECONCILE	.L_IRECONCILE_\name\()	/* Do RECONCILE_IRQ_STATE */
>   #define IKUAP		.L_IKUAP_\name\()	/* Do KUAP lock */
>   
>   #define INT_DEFINE_BEGIN(n)						\
> @@ -203,9 +202,6 @@ do_define_int n
>   	.ifndef ISTACK
>   		ISTACK=1
>   	.endif
> -	.ifndef IRECONCILE
> -		IRECONCILE=1
> -	.endif
>   	.ifndef IKUAP
>   		IKUAP=1
>   	.endif
> @@ -653,10 +649,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_CFAR)
>   	.if ISTACK
>   	ACCOUNT_STOLEN_TIME
>   	.endif
> -
> -	.if IRECONCILE
> -	RECONCILE_IRQ_STATE(r10, r11)
> -	.endif
>   .endm
>   
>   /*
> @@ -935,7 +927,6 @@ INT_DEFINE_BEGIN(system_reset)
>   	 */
>   	ISET_RI=0
>   	ISTACK=0
> -	IRECONCILE=0
>   	IKVM_REAL=1
>   INT_DEFINE_END(system_reset)
>   
> @@ -1123,7 +1114,6 @@ INT_DEFINE_BEGIN(machine_check_early)
>   	ISTACK=0
>   	IDAR=1
>   	IDSISR=1
> -	IRECONCILE=0
>   	IKUAP=0 /* We don't touch AMR here, we never go to virtual mode */
>   INT_DEFINE_END(machine_check_early)
>   
> @@ -1476,7 +1466,6 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
>   INT_DEFINE_BEGIN(data_access_slb)
>   	IVEC=0x380
>   	IAREA=PACA_EXSLB
> -	IRECONCILE=0
>   	IDAR=1
>   	IKVM_SKIP=1
>   	IKVM_REAL=1
> @@ -1503,7 +1492,6 @@ MMU_FTR_SECTION_ELSE
>   	li	r3,-EFAULT
>   ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
>   	std	r3,RESULT(r1)
> -	RECONCILE_IRQ_STATE(r10, r11)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	do_bad_slb_fault
>   	b	interrupt_return
> @@ -1565,7 +1553,6 @@ ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
>   INT_DEFINE_BEGIN(instruction_access_slb)
>   	IVEC=0x480
>   	IAREA=PACA_EXSLB
> -	IRECONCILE=0
>   	IISIDE=1
>   	IDAR=1
>   #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
> @@ -1594,7 +1581,6 @@ MMU_FTR_SECTION_ELSE
>   	li	r3,-EFAULT
>   ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
>   	std	r3,RESULT(r1)
> -	RECONCILE_IRQ_STATE(r10, r11)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	do_bad_slb_fault
>   	b	interrupt_return
> @@ -1754,7 +1740,6 @@ EXC_COMMON_BEGIN(program_check_common)
>    */
>   INT_DEFINE_BEGIN(fp_unavailable)
>   	IVEC=0x800
> -	IRECONCILE=0
>   #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>   	IKVM_REAL=1
>   #endif
> @@ -1769,7 +1754,6 @@ EXC_VIRT_END(fp_unavailable, 0x4800, 0x100)
>   EXC_COMMON_BEGIN(fp_unavailable_common)
>   	GEN_COMMON fp_unavailable
>   	bne	1f			/* if from user, just load it up */
> -	RECONCILE_IRQ_STATE(r10, r11)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	kernel_fp_unavailable_exception
>   0:	trap
> @@ -1788,7 +1772,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_TM)
>   	b	fast_interrupt_return
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   2:	/* User process was in a transaction */
> -	RECONCILE_IRQ_STATE(r10, r11)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	fp_unavailable_tm
>   	b	interrupt_return
> @@ -1853,7 +1836,6 @@ INT_DEFINE_BEGIN(hdecrementer)
>   	IVEC=0x980
>   	IHSRR=1
>   	ISTACK=0
> -	IRECONCILE=0
>   	IKVM_REAL=1
>   	IKVM_VIRT=1
>   INT_DEFINE_END(hdecrementer)
> @@ -2227,7 +2209,6 @@ INT_DEFINE_BEGIN(hmi_exception_early)
>   	IHSRR=1
>   	IREALMODE_COMMON=1
>   	ISTACK=0
> -	IRECONCILE=0
>   	IKUAP=0 /* We don't touch AMR here, we never go to virtual mode */
>   	IKVM_REAL=1
>   INT_DEFINE_END(hmi_exception_early)
> @@ -2401,7 +2382,6 @@ EXC_COMMON_BEGIN(performance_monitor_common)
>    */
>   INT_DEFINE_BEGIN(altivec_unavailable)
>   	IVEC=0xf20
> -	IRECONCILE=0
>   #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>   	IKVM_REAL=1
>   #endif
> @@ -2431,7 +2411,6 @@ BEGIN_FTR_SECTION
>   	b	fast_interrupt_return
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   2:	/* User process was in a transaction */
> -	RECONCILE_IRQ_STATE(r10, r11)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	altivec_unavailable_tm
>   	b	interrupt_return
> @@ -2439,7 +2418,6 @@ BEGIN_FTR_SECTION
>   1:
>   END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
>   #endif
> -	RECONCILE_IRQ_STATE(r10, r11)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	altivec_unavailable_exception
>   	b	interrupt_return
> @@ -2455,7 +2433,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ALTIVEC)
>    */
>   INT_DEFINE_BEGIN(vsx_unavailable)
>   	IVEC=0xf40
> -	IRECONCILE=0
>   #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>   	IKVM_REAL=1
>   #endif
> @@ -2484,7 +2461,6 @@ BEGIN_FTR_SECTION
>   	b	load_up_vsx
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   2:	/* User process was in a transaction */
> -	RECONCILE_IRQ_STATE(r10, r11)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	vsx_unavailable_tm
>   	b	interrupt_return
> @@ -2492,7 +2468,6 @@ BEGIN_FTR_SECTION
>   1:
>   END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>   #endif
> -	RECONCILE_IRQ_STATE(r10, r11)
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	bl	vsx_unavailable_exception
>   	b	interrupt_return
> @@ -2827,7 +2802,6 @@ EXC_VIRT_NONE(0x5800, 0x100)
>   INT_DEFINE_BEGIN(soft_nmi)
>   	IVEC=0x900
>   	ISTACK=0
> -	IRECONCILE=0	/* Soft-NMI may fire under local_irq_disable */
>   INT_DEFINE_END(soft_nmi)
>   
>   /*
> 

^ permalink raw reply

* Re: [PATCH v5 09/21] powerpc/64: context tracking remove _TIF_NOHZ
From: Christophe Leroy @ 2021-01-13 14:50 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20210113073215.516986-10-npiggin@gmail.com>



Le 13/01/2021 à 08:32, Nicholas Piggin a écrit :
> Add context tracking to the system call handler explicitly, and remove
> _TIF_NOHZ.
> 
> This saves 35 cycles on gettid system call cost on POWER9 with a
> CONFIG_NOHZ_FULL kernel.

35 cycles among 100 cycles, or among 5000 cycles ? I meant what pourcentage to you win ?

Christophe

> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/Kconfig                   |  1 -
>   arch/powerpc/include/asm/thread_info.h |  4 +---
>   arch/powerpc/kernel/ptrace/ptrace.c    |  4 ----
>   arch/powerpc/kernel/signal.c           |  4 ----
>   arch/powerpc/kernel/syscall_64.c       | 10 ++++++++++
>   5 files changed, 11 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 107bb4319e0e..28d5a1b1510f 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -196,7 +196,6 @@ config PPC
>   	select HAVE_STACKPROTECTOR		if PPC64 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r13)
>   	select HAVE_STACKPROTECTOR		if PPC32 && $(cc-option,-mstack-protector-guard=tls -mstack-protector-guard-reg=r2)
>   	select HAVE_CONTEXT_TRACKING		if PPC64
> -	select HAVE_TIF_NOHZ			if PPC64
>   	select HAVE_DEBUG_KMEMLEAK
>   	select HAVE_DEBUG_STACKOVERFLOW
>   	select HAVE_DYNAMIC_FTRACE
> diff --git a/arch/powerpc/include/asm/thread_info.h b/arch/powerpc/include/asm/thread_info.h
> index 3d8a47af7a25..386d576673a1 100644
> --- a/arch/powerpc/include/asm/thread_info.h
> +++ b/arch/powerpc/include/asm/thread_info.h
> @@ -94,7 +94,6 @@ void arch_setup_new_exec(void);
>   #define TIF_PATCH_PENDING	6	/* pending live patching update */
>   #define TIF_SYSCALL_AUDIT	7	/* syscall auditing active */
>   #define TIF_SINGLESTEP		8	/* singlestepping active */
> -#define TIF_NOHZ		9	/* in adaptive nohz mode */
>   #define TIF_SECCOMP		10	/* secure computing */
>   #define TIF_RESTOREALL		11	/* Restore all regs (implies NOERROR) */
>   #define TIF_NOERROR		12	/* Force successful syscall return */
> @@ -128,11 +127,10 @@ void arch_setup_new_exec(void);
>   #define _TIF_UPROBE		(1<<TIF_UPROBE)
>   #define _TIF_SYSCALL_TRACEPOINT	(1<<TIF_SYSCALL_TRACEPOINT)
>   #define _TIF_EMULATE_STACK_STORE	(1<<TIF_EMULATE_STACK_STORE)
> -#define _TIF_NOHZ		(1<<TIF_NOHZ)
>   #define _TIF_SYSCALL_EMU	(1<<TIF_SYSCALL_EMU)
>   #define _TIF_SYSCALL_DOTRACE	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
>   				 _TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT | \
> -				 _TIF_NOHZ | _TIF_SYSCALL_EMU)
> +				 _TIF_SYSCALL_EMU)
>   
>   #define _TIF_USER_WORK_MASK	(_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
>   				 _TIF_NOTIFY_RESUME | _TIF_UPROBE | \
> diff --git a/arch/powerpc/kernel/ptrace/ptrace.c b/arch/powerpc/kernel/ptrace/ptrace.c
> index 3d44b73adb83..4f3d4ff3728c 100644
> --- a/arch/powerpc/kernel/ptrace/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace/ptrace.c
> @@ -262,8 +262,6 @@ long do_syscall_trace_enter(struct pt_regs *regs)
>   {
>   	u32 flags;
>   
> -	user_exit();
> -
>   	flags = READ_ONCE(current_thread_info()->flags) &
>   		(_TIF_SYSCALL_EMU | _TIF_SYSCALL_TRACE);
>   
> @@ -340,8 +338,6 @@ void do_syscall_trace_leave(struct pt_regs *regs)
>   	step = test_thread_flag(TIF_SINGLESTEP);
>   	if (step || test_thread_flag(TIF_SYSCALL_TRACE))
>   		tracehook_report_syscall_exit(regs, step);
> -
> -	user_enter();
>   }
>   
>   void __init pt_regs_check(void);
> diff --git a/arch/powerpc/kernel/signal.c b/arch/powerpc/kernel/signal.c
> index 53782aa60ade..9ded046edb0e 100644
> --- a/arch/powerpc/kernel/signal.c
> +++ b/arch/powerpc/kernel/signal.c
> @@ -282,8 +282,6 @@ static void do_signal(struct task_struct *tsk)
>   
>   void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
>   {
> -	user_exit();
> -
>   	if (thread_info_flags & _TIF_UPROBE)
>   		uprobe_notify_resume(regs);
>   
> @@ -299,8 +297,6 @@ void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
>   		tracehook_notify_resume(regs);
>   		rseq_handle_notify_resume(NULL, regs);
>   	}
> -
> -	user_enter();
>   }
>   
>   static unsigned long get_tm_stackpointer(struct task_struct *tsk)
> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> index dd87b2118620..d7d256a7a41f 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -1,9 +1,11 @@
>   // SPDX-License-Identifier: GPL-2.0-or-later
>   
> +#include <linux/context_tracking.h>
>   #include <linux/err.h>
>   #include <asm/asm-prototypes.h>
>   #include <asm/kup.h>
>   #include <asm/cputime.h>
> +#include <asm/interrupt.h>
>   #include <asm/hw_irq.h>
>   #include <asm/interrupt.h>
>   #include <asm/kprobes.h>
> @@ -28,6 +30,9 @@ notrace long system_call_exception(long r3, long r4, long r5,
>   	if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
>   		BUG_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED);
>   
> +	CT_WARN_ON(ct_state() == CONTEXT_KERNEL);
> +	user_exit_irqoff();
> +
>   	trace_hardirqs_off(); /* finish reconciling */
>   
>   	if (IS_ENABLED(CONFIG_PPC_BOOK3S))
> @@ -182,6 +187,8 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
>   	unsigned long ti_flags;
>   	unsigned long ret = 0;
>   
> +	CT_WARN_ON(ct_state() == CONTEXT_USER);
> +
>   	kuap_check_amr();
>   
>   	regs->result = r3;
> @@ -258,8 +265,11 @@ notrace unsigned long syscall_exit_prepare(unsigned long r3,
>   		}
>   	}
>   
> +	user_enter_irqoff();
> +
>   	/* scv need not set RI=0 because SRRs are not used */
>   	if (unlikely(!prep_irq_for_enabled_exit(!scv))) {
> +		user_exit_irqoff();
>   		local_irq_enable();
>   		goto again;
>   	}
> 

^ permalink raw reply

* Re: [PATCH v5 06/21] powerpc: interrupt handler wrapper functions
From: Christophe Leroy @ 2021-01-13 14:45 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20210113073215.516986-7-npiggin@gmail.com>



Le 13/01/2021 à 08:32, Nicholas Piggin a écrit :
> Add wrapper functions (derived from x86 macros) for interrupt handler
> functions. This allows interrupt entry code to be written in C.

Looks like you are doing more than just that is this patch. WOuld be worth splitting in several 
patches I think,

I'd suggest:
- Other patches for unrelated changes, see below for details
- One patch that brings the wrapper macros
- One patch that uses those macros

> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/include/asm/asm-prototypes.h     |  29 ---
>   arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 -
>   arch/powerpc/include/asm/hw_irq.h             |   9 -
>   arch/powerpc/include/asm/interrupt.h          | 218 ++++++++++++++++++
>   arch/powerpc/include/asm/time.h               |   2 +
>   arch/powerpc/kernel/dbell.c                   |  12 +-
>   arch/powerpc/kernel/exceptions-64s.S          |   7 +-
>   arch/powerpc/kernel/head_book3s_32.S          |   6 +-
>   arch/powerpc/kernel/irq.c                     |   3 +-
>   arch/powerpc/kernel/mce.c                     |   5 +-
>   arch/powerpc/kernel/syscall_64.c              |   1 +
>   arch/powerpc/kernel/tau_6xx.c                 |   2 +-
>   arch/powerpc/kernel/time.c                    |   3 +-
>   arch/powerpc/kernel/traps.c                   |  90 +++++---
>   arch/powerpc/kernel/watchdog.c                |   7 +-
>   arch/powerpc/kvm/book3s_hv.c                  |   1 +
>   arch/powerpc/kvm/book3s_hv_builtin.c          |   1 +
>   arch/powerpc/kvm/booke.c                      |   1 +
>   arch/powerpc/mm/book3s64/hash_utils.c         |  57 +++--
>   arch/powerpc/mm/book3s64/slb.c                |  29 +--
>   arch/powerpc/mm/fault.c                       |  15 +-
>   arch/powerpc/platforms/powernv/idle.c         |   1 +
>   22 files changed, 374 insertions(+), 126 deletions(-)
>   create mode 100644 arch/powerpc/include/asm/interrupt.h
> 

...

> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
> new file mode 100644
> index 000000000000..60363e5eeffa
> --- /dev/null
> +++ b/arch/powerpc/include/asm/interrupt.h

...

> +/* Interrupt handlers */
> +DECLARE_INTERRUPT_HANDLER_NMI(machine_check_early);
> +DECLARE_INTERRUPT_HANDLER_NMI(hmi_exception_realmode);
> +DECLARE_INTERRUPT_HANDLER(SMIException);
> +DECLARE_INTERRUPT_HANDLER(handle_hmi_exception);
> +DECLARE_INTERRUPT_HANDLER(instruction_breakpoint_exception);
> +DECLARE_INTERRUPT_HANDLER(RunModeException);
> +DECLARE_INTERRUPT_HANDLER(single_step_exception);
> +DECLARE_INTERRUPT_HANDLER(program_check_exception);
> +DECLARE_INTERRUPT_HANDLER(alignment_exception);
> +DECLARE_INTERRUPT_HANDLER(StackOverflow);
> +DECLARE_INTERRUPT_HANDLER(stack_overflow_exception);
> +DECLARE_INTERRUPT_HANDLER(kernel_fp_unavailable_exception);
> +DECLARE_INTERRUPT_HANDLER(altivec_unavailable_exception);
> +DECLARE_INTERRUPT_HANDLER(vsx_unavailable_exception);
> +DECLARE_INTERRUPT_HANDLER(fp_unavailable_tm);
> +DECLARE_INTERRUPT_HANDLER(altivec_unavailable_tm);
> +DECLARE_INTERRUPT_HANDLER(vsx_unavailable_tm);
> +DECLARE_INTERRUPT_HANDLER(facility_unavailable_exception);
> +DECLARE_INTERRUPT_HANDLER_ASYNC(TAUException);
> +DECLARE_INTERRUPT_HANDLER(altivec_assist_exception);
> +DECLARE_INTERRUPT_HANDLER(unrecoverable_exception);
> +DECLARE_INTERRUPT_HANDLER(kernel_bad_stack);
> +DECLARE_INTERRUPT_HANDLER_NMI(system_reset_exception);
> +#ifdef CONFIG_PPC_BOOK3S_64
> +DECLARE_INTERRUPT_HANDLER_ASYNC(machine_check_exception);
> +#else
> +DECLARE_INTERRUPT_HANDLER_NMI(machine_check_exception);
> +#endif
> +DECLARE_INTERRUPT_HANDLER(emulation_assist_interrupt);
> +DECLARE_INTERRUPT_HANDLER_RAW(do_slb_fault);
> +DECLARE_INTERRUPT_HANDLER(do_bad_slb_fault);
> +DECLARE_INTERRUPT_HANDLER_RAW(do_hash_fault);
> +DECLARE_INTERRUPT_HANDLER_RET(do_page_fault);
> +DECLARE_INTERRUPT_HANDLER(__do_bad_page_fault);
> +DECLARE_INTERRUPT_HANDLER(do_bad_page_fault);

Missing DECLARE_INTERRUPT_HANDLER(do_break)

> +
> +DECLARE_INTERRUPT_HANDLER_ASYNC(timer_interrupt);
> +DECLARE_INTERRUPT_HANDLER_NMI(performance_monitor_exception_nmi);
> +DECLARE_INTERRUPT_HANDLER_ASYNC(performance_monitor_exception_async);
> +DECLARE_INTERRUPT_HANDLER_RAW(performance_monitor_exception);
> +DECLARE_INTERRUPT_HANDLER(WatchdogException);
> +DECLARE_INTERRUPT_HANDLER(unknown_exception);
> +DECLARE_INTERRUPT_HANDLER_ASYNC(unknown_async_exception);
> +
> +void replay_system_reset(void);
> +void replay_soft_interrupts(void);
> +
> +#endif /* _ASM_POWERPC_INTERRUPT_H */
> diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
> index 8f789b597bae..8dd3cdb25338 100644
> --- a/arch/powerpc/include/asm/time.h
> +++ b/arch/powerpc/include/asm/time.h
> @@ -102,6 +102,8 @@ DECLARE_PER_CPU(u64, decrementers_next_tb);
>   /* Convert timebase ticks to nanoseconds */
>   unsigned long long tb_to_ns(unsigned long long tb_ticks);
>   
> +void timer_broadcast_interrupt(void);

This seems unrelated. I think a separate patch woud be better for moving prototypes without making 
them wrappers.

> +
>   /* SPLPAR */
>   void accumulate_stolen_time(void);
>   
> diff --git a/arch/powerpc/kernel/dbell.c b/arch/powerpc/kernel/dbell.c
> index 52680cf07c9d..c0f99f8ffa7d 100644
> --- a/arch/powerpc/kernel/dbell.c
> +++ b/arch/powerpc/kernel/dbell.c
> @@ -12,14 +12,14 @@
>   #include <linux/hardirq.h>
>   
>   #include <asm/dbell.h>
> +#include <asm/interrupt.h>
>   #include <asm/irq_regs.h>
>   #include <asm/kvm_ppc.h>
>   #include <asm/trace.h>
>   
> -#ifdef CONFIG_SMP
> -

This seems unrelated, is that needed ? What's the problem with having to full versions of 
doorbell_exception() ?

> -void doorbell_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER_ASYNC(doorbell_exception)
>   {
> +#ifdef CONFIG_SMP
>   	struct pt_regs *old_regs = set_irq_regs(regs);
>   
>   	irq_enter();
> @@ -37,11 +37,7 @@ void doorbell_exception(struct pt_regs *regs)
>   	trace_doorbell_exit(regs);
>   	irq_exit();
>   	set_irq_regs(old_regs);
> -}
>   #else /* CONFIG_SMP */
> -void doorbell_exception(struct pt_regs *regs)
> -{
>   	printk(KERN_WARNING "Received doorbell on non-smp system\n");
> -}
>   #endif /* CONFIG_SMP */
> -
> +}
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 36dea2020ec5..8b0db807974c 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1923,7 +1923,7 @@ EXC_COMMON_BEGIN(doorbell_super_common)
>   #ifdef CONFIG_PPC_DOORBELL
>   	bl	doorbell_exception
>   #else
> -	bl	unknown_exception
> +	bl	unknown_async_exception

Unrelated  to wrappers ?

>   #endif
>   	b	interrupt_return
>   
> @@ -2136,8 +2136,7 @@ EXC_COMMON_BEGIN(h_data_storage_common)
>   	GEN_COMMON h_data_storage
>   	addi    r3,r1,STACK_FRAME_OVERHEAD
>   BEGIN_MMU_FTR_SECTION
> -	li	r4,SIGSEGV
> -	bl      bad_page_fault
> +	bl      do_bad_page_fault

Is this name change related ?

>   MMU_FTR_SECTION_ELSE
>   	bl      unknown_exception
>   ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX)
> @@ -2310,7 +2309,7 @@ EXC_COMMON_BEGIN(h_doorbell_common)
>   #ifdef CONFIG_PPC_DOORBELL
>   	bl	doorbell_exception
>   #else
> -	bl	unknown_exception
> +	bl	unknown_async_exception
>   #endif
>   	b	interrupt_return
>   
> diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
> index 94ad1372c490..9b4d5432e2db 100644
> --- a/arch/powerpc/kernel/head_book3s_32.S
> +++ b/arch/powerpc/kernel/head_book3s_32.S
> @@ -238,8 +238,8 @@ __secondary_hold_acknowledge:
>   
>   /* System reset */
>   /* core99 pmac starts the seconary here by changing the vector, and
> -   putting it back to what it was (unknown_exception) when done.  */
> -	EXCEPTION(0x100, Reset, unknown_exception, EXC_XFER_STD)
> +   putting it back to what it was (unknown_async_exception) when done.  */
> +	EXCEPTION(0x100, Reset, unknown_async_exception, EXC_XFER_STD)
>   
>   /* Machine check */
>   /*
> @@ -631,7 +631,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_DTLB_SW_LRU)
>   #endif
>   
>   #ifndef CONFIG_TAU_INT
> -#define TAUException	unknown_exception
> +#define TAUException	unknown_async_exception
>   #endif
>   
>   	EXCEPTION(0x1300, Trap_13, instruction_breakpoint_exception, EXC_XFER_STD)
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index 6b1eca53e36c..2055d204d08e 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -54,6 +54,7 @@
>   #include <linux/pgtable.h>
>   
>   #include <linux/uaccess.h>
> +#include <asm/interrupt.h>
>   #include <asm/io.h>
>   #include <asm/irq.h>
>   #include <asm/cache.h>
> @@ -665,7 +666,7 @@ void __do_irq(struct pt_regs *regs)
>   	irq_exit();
>   }
>   
> -void do_IRQ(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER_ASYNC(do_IRQ)
>   {
>   	struct pt_regs *old_regs = set_irq_regs(regs);
>   	void *cursp, *irqsp, *sirqsp;
> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index 9f3e133b57b7..54269947113d 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -18,6 +18,7 @@
>   #include <linux/extable.h>
>   #include <linux/ftrace.h>
>   
> +#include <asm/interrupt.h>
>   #include <asm/machdep.h>
>   #include <asm/mce.h>
>   #include <asm/nmi.h>
> @@ -588,7 +589,7 @@ EXPORT_SYMBOL_GPL(machine_check_print_event_info);
>    *
>    * regs->nip and regs->msr contains srr0 and ssr1.
>    */
> -long notrace machine_check_early(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER_NMI(machine_check_early)
>   {
>   	long handled = 0;
>   	u8 ftrace_enabled = this_cpu_get_ftrace_enabled();
> @@ -722,7 +723,7 @@ long hmi_handle_debugtrig(struct pt_regs *regs)
>   /*
>    * Return values:
>    */
> -long hmi_exception_realmode(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER_NMI(hmi_exception_realmode)
>   {	
>   	int ret;
>   
> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> index 7c85ed04a164..dd87b2118620 100644
> --- a/arch/powerpc/kernel/syscall_64.c
> +++ b/arch/powerpc/kernel/syscall_64.c
> @@ -5,6 +5,7 @@
>   #include <asm/kup.h>
>   #include <asm/cputime.h>
>   #include <asm/hw_irq.h>
> +#include <asm/interrupt.h>
>   #include <asm/kprobes.h>
>   #include <asm/paca.h>
>   #include <asm/ptrace.h>
> diff --git a/arch/powerpc/kernel/tau_6xx.c b/arch/powerpc/kernel/tau_6xx.c
> index 0b4694b8d248..46b2e5de4ef5 100644
> --- a/arch/powerpc/kernel/tau_6xx.c
> +++ b/arch/powerpc/kernel/tau_6xx.c
> @@ -100,7 +100,7 @@ static void TAUupdate(int cpu)
>    * with interrupts disabled
>    */
>   
> -void TAUException(struct pt_regs * regs)
> +DEFINE_INTERRUPT_HANDLER_ASYNC(TAUException)
>   {
>   	int cpu = smp_processor_id();
>   
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index 67feb3524460..435a251247ed 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -56,6 +56,7 @@
>   #include <linux/processor.h>
>   #include <asm/trace.h>
>   
> +#include <asm/interrupt.h>
>   #include <asm/io.h>
>   #include <asm/nvram.h>
>   #include <asm/cache.h>
> @@ -570,7 +571,7 @@ void arch_irq_work_raise(void)
>    * timer_interrupt - gets called when the decrementer overflows,
>    * with interrupts disabled.
>    */
> -void timer_interrupt(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER_ASYNC(timer_interrupt)
>   {
>   	struct clock_event_device *evt = this_cpu_ptr(&decrementers);
>   	u64 *next_tb = this_cpu_ptr(&decrementers_next_tb);
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 9b5298c016c7..f4462b481248 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -41,6 +41,7 @@
>   #include <asm/emulated_ops.h>
>   #include <linux/uaccess.h>
>   #include <asm/debugfs.h>
> +#include <asm/interrupt.h>
>   #include <asm/io.h>
>   #include <asm/machdep.h>
>   #include <asm/rtas.h>
> @@ -430,8 +431,7 @@ void hv_nmi_check_nonrecoverable(struct pt_regs *regs)
>   	regs->msr &= ~MSR_RI;
>   #endif
>   }
> -
> -void system_reset_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER_NMI(system_reset_exception)
>   {
>   	unsigned long hsrr0, hsrr1;
>   	bool saved_hsrrs = false;
> @@ -516,7 +516,10 @@ void system_reset_exception(struct pt_regs *regs)
>   	this_cpu_set_ftrace_enabled(ftrace_enabled);
>   
>   	/* What should we do here? We could issue a shutdown or hard reset. */
> +
> +	return 0;
>   }
> +NOKPROBE_SYMBOL(system_reset_exception);

Is this NOKPROBE_SYMBOL() related to wrappers or just a bug fix ?

>   
>   /*
>    * I/O accesses can cause machine checks on powermacs.
> @@ -788,7 +791,12 @@ int machine_check_generic(struct pt_regs *regs)
>   }
>   #endif /* everything else */
>   
> -void machine_check_exception(struct pt_regs *regs)
> +
> +#ifdef CONFIG_PPC_BOOK3S_64
> +DEFINE_INTERRUPT_HANDLER_ASYNC(machine_check_exception)
> +#else
> +DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
> +#endif
>   {
>   	int recover = 0;
>   
> @@ -838,13 +846,21 @@ void machine_check_exception(struct pt_regs *regs)
>   	if (!(regs->msr & MSR_RI))
>   		die("Unrecoverable Machine check", regs, SIGBUS);
>   
> +#ifdef CONFIG_PPC_BOOK3S_64
> +bail:
>   	return;
> +#else
> +	return 0;
>   
>   bail:
>   	if (nmi) nmi_exit();
> +
> +	return 0;
> +#endif

Looks fishy. Can't we have both returning either long or void ?

>   }
> +NOKPROBE_SYMBOL(machine_check_exception);

Is this NOKPROBE_SYMBOL() related to wrappers or just a bug fix ?

>   
> -void SMIException(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(SMIException) /* async? */
>   {
>   	die("System Management Interrupt", regs, SIGABRT);
>   }
> @@ -1030,7 +1046,7 @@ static void p9_hmi_special_emu(struct pt_regs *regs)
>   }
>   #endif /* CONFIG_VSX */
>   
> -void handle_hmi_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER_ASYNC(handle_hmi_exception)
>   {
>   	struct pt_regs *old_regs;
>   
> @@ -1059,7 +1075,7 @@ void handle_hmi_exception(struct pt_regs *regs)
>   	set_irq_regs(old_regs);
>   }
>   
> -void unknown_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(unknown_exception)
>   {
>   	enum ctx_state prev_state = exception_enter();
>   
> @@ -1071,7 +1087,19 @@ void unknown_exception(struct pt_regs *regs)
>   	exception_exit(prev_state);
>   }
>   
> -void instruction_breakpoint_exception(struct pt_regs *regs)

shouldn't unknown_async_exception() be added in a preceeding patch ?

> +DEFINE_INTERRUPT_HANDLER_ASYNC(unknown_async_exception)
> +{
> +	enum ctx_state prev_state = exception_enter();
> +
> +	printk("Bad trap at PC: %lx, SR: %lx, vector=%lx\n",
> +	       regs->nip, regs->msr, regs->trap);
> +
> +	_exception(SIGTRAP, regs, TRAP_UNK, 0);
> +
> +	exception_exit(prev_state);
> +}
> +
> +DEFINE_INTERRUPT_HANDLER(instruction_breakpoint_exception)
>   {
>   	enum ctx_state prev_state = exception_enter();
>   
> @@ -1086,12 +1114,12 @@ void instruction_breakpoint_exception(struct pt_regs *regs)
>   	exception_exit(prev_state);
>   }
>   
> -void RunModeException(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(RunModeException)
>   {
>   	_exception(SIGTRAP, regs, TRAP_UNK, 0);
>   }
>   
> -void single_step_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(single_step_exception)
>   {
>   	enum ctx_state prev_state = exception_enter();
>   
> @@ -1436,7 +1464,7 @@ static int emulate_math(struct pt_regs *regs)
>   static inline int emulate_math(struct pt_regs *regs) { return -1; }
>   #endif
>   
> -void program_check_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(program_check_exception)
>   {
>   	enum ctx_state prev_state = exception_enter();
>   	unsigned int reason = get_reason(regs);
> @@ -1561,14 +1589,14 @@ NOKPROBE_SYMBOL(program_check_exception);
>    * This occurs when running in hypervisor mode on POWER6 or later
>    * and an illegal instruction is encountered.
>    */
> -void emulation_assist_interrupt(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(emulation_assist_interrupt)
>   {
>   	regs->msr |= REASON_ILLEGAL;
>   	program_check_exception(regs);
>   }
>   NOKPROBE_SYMBOL(emulation_assist_interrupt);
>   
> -void alignment_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(alignment_exception)
>   {
>   	enum ctx_state prev_state = exception_enter();
>   	int sig, code, fixed = 0;
> @@ -1618,7 +1646,7 @@ void alignment_exception(struct pt_regs *regs)
>   	exception_exit(prev_state);
>   }
>   
> -void StackOverflow(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(StackOverflow)
>   {
>   	pr_crit("Kernel stack overflow in process %s[%d], r1=%lx\n",
>   		current->comm, task_pid_nr(current), regs->gpr[1]);
> @@ -1627,7 +1655,7 @@ void StackOverflow(struct pt_regs *regs)
>   	panic("kernel stack overflow");
>   }
>   
> -void stack_overflow_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(stack_overflow_exception)
>   {
>   	enum ctx_state prev_state = exception_enter();
>   
> @@ -1636,7 +1664,7 @@ void stack_overflow_exception(struct pt_regs *regs)
>   	exception_exit(prev_state);
>   }
>   
> -void kernel_fp_unavailable_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(kernel_fp_unavailable_exception)
>   {
>   	enum ctx_state prev_state = exception_enter();
>   
> @@ -1647,7 +1675,7 @@ void kernel_fp_unavailable_exception(struct pt_regs *regs)
>   	exception_exit(prev_state);
>   }
>   
> -void altivec_unavailable_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(altivec_unavailable_exception)
>   {
>   	enum ctx_state prev_state = exception_enter();
>   
> @@ -1666,7 +1694,7 @@ void altivec_unavailable_exception(struct pt_regs *regs)
>   	exception_exit(prev_state);
>   }
>   
> -void vsx_unavailable_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(vsx_unavailable_exception)
>   {
>   	if (user_mode(regs)) {
>   		/* A user program has executed an vsx instruction,
> @@ -1697,7 +1725,7 @@ static void tm_unavailable(struct pt_regs *regs)
>   	die("Unrecoverable TM Unavailable Exception", regs, SIGABRT);
>   }
>   
> -void facility_unavailable_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(facility_unavailable_exception)
>   {
>   	static char *facility_strings[] = {
>   		[FSCR_FP_LG] = "FPU",
> @@ -1817,7 +1845,7 @@ void facility_unavailable_exception(struct pt_regs *regs)
>   
>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>   
> -void fp_unavailable_tm(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(fp_unavailable_tm)
>   {
>   	/* Note:  This does not handle any kind of FP laziness. */
>   
> @@ -1850,7 +1878,7 @@ void fp_unavailable_tm(struct pt_regs *regs)
>   	tm_recheckpoint(&current->thread);
>   }
>   
> -void altivec_unavailable_tm(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(altivec_unavailable_tm)
>   {
>   	/* See the comments in fp_unavailable_tm().  This function operates
>   	 * the same way.
> @@ -1865,7 +1893,7 @@ void altivec_unavailable_tm(struct pt_regs *regs)
>   	current->thread.used_vr = 1;
>   }
>   
> -void vsx_unavailable_tm(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(vsx_unavailable_tm)
>   {
>   	/* See the comments in fp_unavailable_tm().  This works similarly,
>   	 * though we're loading both FP and VEC registers in here.
> @@ -1890,7 +1918,8 @@ void vsx_unavailable_tm(struct pt_regs *regs)
>   }
>   #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
>   
> -static void performance_monitor_exception_nmi(struct pt_regs *regs)
> +#ifdef CONFIG_PPC64
> +DEFINE_INTERRUPT_HANDLER_NMI(performance_monitor_exception_nmi)
>   {
>   	nmi_enter();
>   
> @@ -1899,9 +1928,12 @@ static void performance_monitor_exception_nmi(struct pt_regs *regs)
>   	perf_irq(regs);
>   
>   	nmi_exit();
> +
> +	return 0;
>   }
> +#endif
>   
> -static void performance_monitor_exception_async(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER_ASYNC(performance_monitor_exception_async)
>   {
>   	irq_enter();
>   
> @@ -1912,7 +1944,7 @@ static void performance_monitor_exception_async(struct pt_regs *regs)
>   	irq_exit();
>   }
>   
> -void performance_monitor_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER_RAW(performance_monitor_exception)
>   {
>   	/*
>   	 * On 64-bit, if perf interrupts hit in a local_irq_disable
> @@ -1924,6 +1956,8 @@ void performance_monitor_exception(struct pt_regs *regs)
>   		performance_monitor_exception_nmi(regs);
>   	else
>   		performance_monitor_exception_async(regs);
> +
> +	return 0;
>   }
>   
>   #ifdef CONFIG_PPC_ADV_DEBUG_REGS
> @@ -2057,7 +2091,7 @@ NOKPROBE_SYMBOL(DebugException);
>   #endif /* CONFIG_PPC_ADV_DEBUG_REGS */
>   
>   #ifdef CONFIG_ALTIVEC
> -void altivec_assist_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(altivec_assist_exception)
>   {
>   	int err;
>   
> @@ -2199,7 +2233,7 @@ void SPEFloatingPointRoundException(struct pt_regs *regs)
>    * in the MSR is 0.  This indicates that SRR0/1 are live, and that
>    * we therefore lost state by taking this exception.
>    */
> -void unrecoverable_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(unrecoverable_exception)
>   {
>   	pr_emerg("Unrecoverable exception %lx at %lx (msr=%lx)\n",
>   		 regs->trap, regs->nip, regs->msr);
> @@ -2219,7 +2253,7 @@ void __attribute__ ((weak)) WatchdogHandler(struct pt_regs *regs)
>   	return;
>   }
>   
> -void WatchdogException(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(WatchdogException) /* XXX NMI? async? */
>   {
>   	printk (KERN_EMERG "PowerPC Book-E Watchdog Exception\n");
>   	WatchdogHandler(regs);
> @@ -2230,7 +2264,7 @@ void WatchdogException(struct pt_regs *regs)
>    * We enter here if we discover during exception entry that we are
>    * running in supervisor mode with a userspace value in the stack pointer.
>    */
> -void kernel_bad_stack(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(kernel_bad_stack)
>   {
>   	printk(KERN_EMERG "Bad kernel stack pointer %lx at %lx\n",
>   	       regs->gpr[1], regs->nip);
> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
> index af3c15a1d41e..824b9376ac35 100644
> --- a/arch/powerpc/kernel/watchdog.c
> +++ b/arch/powerpc/kernel/watchdog.c
> @@ -26,6 +26,7 @@
>   #include <linux/delay.h>
>   #include <linux/smp.h>
>   
> +#include <asm/interrupt.h>
>   #include <asm/paca.h>
>   
>   /*
> @@ -247,14 +248,14 @@ static void watchdog_timer_interrupt(int cpu)
>   		watchdog_smp_panic(cpu, tb);
>   }
>   
> -void soft_nmi_interrupt(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
>   {
>   	unsigned long flags;
>   	int cpu = raw_smp_processor_id();
>   	u64 tb;
>   
>   	if (!cpumask_test_cpu(cpu, &wd_cpus_enabled))
> -		return;
> +		return 0;
>   
>   	nmi_enter();
>   
> @@ -291,6 +292,8 @@ void soft_nmi_interrupt(struct pt_regs *regs)
>   
>   out:
>   	nmi_exit();
> +
> +	return 0;
>   }
>   
>   static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 6f612d240392..3f9a229f82a2 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -53,6 +53,7 @@
>   #include <asm/cputable.h>
>   #include <asm/cacheflush.h>
>   #include <linux/uaccess.h>
> +#include <asm/interrupt.h>
>   #include <asm/io.h>
>   #include <asm/kvm_ppc.h>
>   #include <asm/kvm_book3s.h>
> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
> index 8053efdf7ea7..10fc274bea65 100644
> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
> @@ -17,6 +17,7 @@
>   
>   #include <asm/asm-prototypes.h>
>   #include <asm/cputable.h>
> +#include <asm/interrupt.h>
>   #include <asm/kvm_ppc.h>
>   #include <asm/kvm_book3s.h>
>   #include <asm/archrandom.h>
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 288a9820ec01..bd2bb73021d8 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -20,6 +20,7 @@
>   
>   #include <asm/cputable.h>
>   #include <linux/uaccess.h>
> +#include <asm/interrupt.h>
>   #include <asm/kvm_ppc.h>
>   #include <asm/cacheflush.h>
>   #include <asm/dbell.h>
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 77073a256cff..453afb9ae9b4 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -38,6 +38,7 @@
>   #include <linux/pgtable.h>
>   
>   #include <asm/debugfs.h>
> +#include <asm/interrupt.h>
>   #include <asm/processor.h>
>   #include <asm/mmu.h>
>   #include <asm/mmu_context.h>
> @@ -1512,7 +1513,7 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap,
>   }
>   EXPORT_SYMBOL_GPL(hash_page);
>   
> -long do_hash_fault(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER_RET(__do_hash_fault)
>   {
>   	unsigned long ea = regs->dar;
>   	unsigned long dsisr = regs->dsisr;
> @@ -1522,27 +1523,6 @@ long do_hash_fault(struct pt_regs *regs)
>   	unsigned int region_id;
>   	long err;
>   
> -	if (unlikely(dsisr & (DSISR_BAD_FAULT_64S | DSISR_DABRMATCH | DSISR_KEYFAULT)))
> -		goto page_fault;
> -
> -	/*
> -	 * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then
> -	 * don't call hash_page, just fail the fault. This is required to
> -	 * prevent re-entrancy problems in the hash code, namely perf
> -	 * interrupts hitting while something holds H_PAGE_BUSY, and taking a
> -	 * hash fault. See the comment in hash_preload().
> -	 *
> -	 * We come here as a result of a DSI at a point where we don't want
> -	 * to call hash_page, such as when we are accessing memory (possibly
> -	 * user memory) inside a PMU interrupt that occurred while interrupts
> -	 * were soft-disabled.  We want to invoke the exception handler for
> -	 * the access, or panic if there isn't a handler.
> -	 */
> -	if (unlikely(in_nmi())) {
> -		bad_page_fault(regs, SIGSEGV);
> -		return 0;
> -	}
> -
>   	region_id = get_region_id(ea);
>   	if ((region_id == VMALLOC_REGION_ID) || (region_id == IO_REGION_ID))
>   		mm = &init_mm;
> @@ -1583,13 +1563,44 @@ long do_hash_fault(struct pt_regs *regs)
>   		err = 0;
>   
>   	} else if (err) {
> -page_fault:
>   		err = do_page_fault(regs);
>   	}
>   
>   	return err;
>   }
>   
> +/*
> + * The _RAW interrupt entry checks for the in_nmi() case before
> + * running the full handler.
> + */
> +DEFINE_INTERRUPT_HANDLER_RAW(do_hash_fault)

Could we do that split into __do_hash_fault() / do_hash_fault() in a preceeding patch ?

> +{
> +	unsigned long dsisr = regs->dsisr;
> +
> +	if (unlikely(dsisr & (DSISR_BAD_FAULT_64S | DSISR_DABRMATCH | DSISR_KEYFAULT)))
> +		return do_page_fault(regs);
> +
> +	/*
> +	 * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then
> +	 * don't call hash_page, just fail the fault. This is required to
> +	 * prevent re-entrancy problems in the hash code, namely perf
> +	 * interrupts hitting while something holds H_PAGE_BUSY, and taking a
> +	 * hash fault. See the comment in hash_preload().
> +	 *
> +	 * We come here as a result of a DSI at a point where we don't want
> +	 * to call hash_page, such as when we are accessing memory (possibly
> +	 * user memory) inside a PMU interrupt that occurred while interrupts
> +	 * were soft-disabled.  We want to invoke the exception handler for
> +	 * the access, or panic if there isn't a handler.
> +	 */
> +	if (unlikely(in_nmi())) {
> +		do_bad_page_fault(regs);
> +		return 0;
> +	}
> +
> +	return __do_hash_fault(regs);
> +}
> +
>   #ifdef CONFIG_PPC_MM_SLICES
>   static bool should_hash_preload(struct mm_struct *mm, unsigned long ea)
>   {
> diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
> index c581548b533f..0ae10adae203 100644
> --- a/arch/powerpc/mm/book3s64/slb.c
> +++ b/arch/powerpc/mm/book3s64/slb.c
> @@ -10,6 +10,7 @@
>    */
>   
>   #include <asm/asm-prototypes.h>
> +#include <asm/interrupt.h>
>   #include <asm/mmu.h>
>   #include <asm/mmu_context.h>
>   #include <asm/paca.h>
> @@ -813,7 +814,7 @@ static long slb_allocate_user(struct mm_struct *mm, unsigned long ea)
>   	return slb_insert_entry(ea, context, flags, ssize, false);
>   }
>   
> -long do_slb_fault(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER_RAW(do_slb_fault)
>   {
>   	unsigned long ea = regs->dar;
>   	unsigned long id = get_region_id(ea);
> @@ -827,17 +828,19 @@ long do_slb_fault(struct pt_regs *regs)
>   	/*
>   	 * SLB kernel faults must be very careful not to touch anything
>   	 * that is not bolted. E.g., PACA and global variables are okay,
> -	 * mm->context stuff is not.
> -	 *
> -	 * SLB user faults can access all of kernel memory, but must be
> -	 * careful not to touch things like IRQ state because it is not
> -	 * "reconciled" here. The difficulty is that we must use
> -	 * fast_exception_return to return from kernel SLB faults without
> -	 * looking at possible non-bolted memory. We could test user vs
> -	 * kernel faults in the interrupt handler asm and do a full fault,
> -	 * reconcile, ret_from_except for user faults which would make them
> -	 * first class kernel code. But for performance it's probably nicer
> -	 * if they go via fast_exception_return too.
> +	 * mm->context stuff is not. SLB user faults may access all of
> +	 * memory (and induce one recursive SLB kernel fault), so the
> +	 * kernel fault must not trample on the user fault state at those
> +	 * points.
> +	 */
> +
> +	/*
> +	 * This is a _RAW interrupt handler, so it must not touch local
> +	 * irq state, or schedule. We could test for usermode and upgrade
> +	 * to a normal process context (synchronous) interrupt for those,
> +	 * which would make them first-class kernel code and able to be
> +	 * traced and instrumented, although performance would suffer a
> +	 * bit, it would probably be a good tradeoff.

Is the comment change really related to the wrapper macros ?

>   	 */
>   	if (id >= LINEAR_MAP_REGION_ID) {
>   		long err;
> @@ -866,7 +869,7 @@ long do_slb_fault(struct pt_regs *regs)
>   	}
>   }
>   
> -void do_bad_slb_fault(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(do_bad_slb_fault)
>   {
>   	int err = regs->result;
>   
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 36604ff8b3ec..9e1cd74ebb13 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -34,6 +34,7 @@
>   #include <linux/uaccess.h>
>   
>   #include <asm/firmware.h>
> +#include <asm/interrupt.h>
>   #include <asm/page.h>
>   #include <asm/mmu.h>
>   #include <asm/mmu_context.h>
> @@ -547,7 +548,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>   }
>   NOKPROBE_SYMBOL(__do_page_fault);
>   
> -long do_page_fault(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER_RET(do_page_fault)
>   {
>   	enum ctx_state prev_state = exception_enter();
>   	unsigned long address = regs->dar;
> @@ -641,3 +642,15 @@ void bad_page_fault(struct pt_regs *regs, int sig)
>   	else
>   		__bad_page_fault(regs, sig);
>   }
> +
> +#ifdef CONFIG_PPC_BOOK3S_64
> +DEFINE_INTERRUPT_HANDLER(__do_bad_page_fault)
> +{
> +	__bad_page_fault(regs, SIGSEGV);
> +}
> +
> +DEFINE_INTERRUPT_HANDLER(do_bad_page_fault)
> +{
> +	bad_page_fault(regs, SIGSEGV);
> +}
> +#endif
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index e6f461812856..999997d9e9a9 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -14,6 +14,7 @@
>   
>   #include <asm/asm-prototypes.h>
>   #include <asm/firmware.h>
> +#include <asm/interrupt.h>
>   #include <asm/machdep.h>
>   #include <asm/opal.h>
>   #include <asm/cputhreads.h>
> 

^ permalink raw reply

* Re: [PATCH v5 04/21] powerpc: bad_page_fault, do_break get registers from regs
From: Christophe Leroy @ 2021-01-13 14:25 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20210113073215.516986-5-npiggin@gmail.com>



Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :
> Similar to the previous patch this makes interrupt handler function
> types more regular so they can be wrapped with the next patch.
> 
> bad_page_fault and do_break are not performance critical.

It's a bit different between do_break() and bad_page_fault():
- do_break() is not performance critical for sure
- bad_page_fault(), it doesn't matter, because bad_page_fault() was not using the address param so 
it doesn't get anything from regs at the end.

Maybe it would be worth splitting in two patches, one for bad_page_fault() and one for do_break()

Christophe

> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/include/asm/bug.h             |  4 ++--
>   arch/powerpc/include/asm/debug.h           |  3 +--
>   arch/powerpc/kernel/entry_32.S             |  3 +--
>   arch/powerpc/kernel/exceptions-64e.S       |  3 +--
>   arch/powerpc/kernel/exceptions-64s.S       |  3 +--
>   arch/powerpc/kernel/head_8xx.S             |  5 ++---
>   arch/powerpc/kernel/process.c              |  7 +++----
>   arch/powerpc/kernel/traps.c                |  2 +-
>   arch/powerpc/mm/book3s64/hash_utils.c      |  4 ++--
>   arch/powerpc/mm/book3s64/slb.c             |  2 +-
>   arch/powerpc/mm/fault.c                    | 10 +++++-----
>   arch/powerpc/platforms/8xx/machine_check.c |  2 +-
>   12 files changed, 21 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index f7827e993196..4220789b9a97 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -112,8 +112,8 @@
>   
>   struct pt_regs;
>   long do_page_fault(struct pt_regs *);
> -extern void bad_page_fault(struct pt_regs *, unsigned long, int);
> -void __bad_page_fault(struct pt_regs *regs, unsigned long address, int sig);
> +void bad_page_fault(struct pt_regs *, int);
> +void __bad_page_fault(struct pt_regs *regs, int sig);
>   extern void _exception(int, struct pt_regs *, int, unsigned long);
>   extern void _exception_pkey(struct pt_regs *, unsigned long, int);
>   extern void die(const char *, struct pt_regs *, long);
> diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
> index ec57daf87f40..0550eceab3ca 100644
> --- a/arch/powerpc/include/asm/debug.h
> +++ b/arch/powerpc/include/asm/debug.h
> @@ -52,8 +52,7 @@ extern void do_send_trap(struct pt_regs *regs, unsigned long address,
>   			 unsigned long error_code, int brkpt);
>   #else
>   
> -extern void do_break(struct pt_regs *regs, unsigned long address,
> -		     unsigned long error_code);
> +void do_break(struct pt_regs *regs);
>   #endif
>   
>   #endif /* _ASM_POWERPC_DEBUG_H */
> diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
> index a32157ce0551..a94127eed56b 100644
> --- a/arch/powerpc/kernel/entry_32.S
> +++ b/arch/powerpc/kernel/entry_32.S
> @@ -673,9 +673,8 @@ handle_page_fault:
>   	lwz	r0,_TRAP(r1)
>   	clrrwi	r0,r0,1
>   	stw	r0,_TRAP(r1)
> -	mr	r5,r3
> +	mr	r4,r3		/* err arg for bad_page_fault */
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	lwz	r4,_DAR(r1)
>   	bl	__bad_page_fault
>   	b	ret_from_except_full
>   
> diff --git a/arch/powerpc/kernel/exceptions-64e.S b/arch/powerpc/kernel/exceptions-64e.S
> index 43e71d86dcbf..52421042a020 100644
> --- a/arch/powerpc/kernel/exceptions-64e.S
> +++ b/arch/powerpc/kernel/exceptions-64e.S
> @@ -1018,9 +1018,8 @@ storage_fault_common:
>   	bne-	1f
>   	b	ret_from_except_lite
>   1:	bl	save_nvgprs
> -	mr	r5,r3
> +	mr	r4,r3
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	ld	r4,_DAR(r1)
>   	bl	__bad_page_fault
>   	b	ret_from_except
>   
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 814cff2c649e..36dea2020ec5 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -2136,8 +2136,7 @@ EXC_COMMON_BEGIN(h_data_storage_common)
>   	GEN_COMMON h_data_storage
>   	addi    r3,r1,STACK_FRAME_OVERHEAD
>   BEGIN_MMU_FTR_SECTION
> -	ld	r4,_DAR(r1)
> -	li	r5,SIGSEGV
> +	li	r4,SIGSEGV
>   	bl      bad_page_fault
>   MMU_FTR_SECTION_ELSE
>   	bl      unknown_exception
> diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
> index 0b2c247cfdff..7869db974185 100644
> --- a/arch/powerpc/kernel/head_8xx.S
> +++ b/arch/powerpc/kernel/head_8xx.S
> @@ -364,10 +364,9 @@ do_databreakpoint:
>   	addi	r3,r1,STACK_FRAME_OVERHEAD
>   	mfspr	r4,SPRN_BAR
>   	stw	r4,_DAR(r11)
> -#ifdef CONFIG_VMAP_STACK
> -	lwz	r5,_DSISR(r11)
> -#else
> +#ifndef CONFIG_VMAP_STACK
>   	mfspr	r5,SPRN_DSISR
> +	stw	r5,_DSISR(r11)
>   #endif
>   	EXC_XFER_STD(0x1c00, do_break)
>   
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index a66f435dabbf..4f0f81e9420b 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -659,11 +659,10 @@ static void do_break_handler(struct pt_regs *regs)
>   	}
>   }
>   
> -void do_break (struct pt_regs *regs, unsigned long address,
> -		    unsigned long error_code)
> +void do_break(struct pt_regs *regs)
>   {
>   	current->thread.trap_nr = TRAP_HWBKPT;
> -	if (notify_die(DIE_DABR_MATCH, "dabr_match", regs, error_code,
> +	if (notify_die(DIE_DABR_MATCH, "dabr_match", regs, regs->dsisr,
>   			11, SIGSEGV) == NOTIFY_STOP)
>   		return;
>   
> @@ -681,7 +680,7 @@ void do_break (struct pt_regs *regs, unsigned long address,
>   		do_break_handler(regs);
>   
>   	/* Deliver the signal to userspace */
> -	force_sig_fault(SIGTRAP, TRAP_HWBKPT, (void __user *)address);
> +	force_sig_fault(SIGTRAP, TRAP_HWBKPT, (void __user *)regs->dar);
>   }
>   #endif	/* CONFIG_PPC_ADV_DEBUG_REGS */
>   
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 3ec7b443fe6b..f3f6af3141ee 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -1612,7 +1612,7 @@ void alignment_exception(struct pt_regs *regs)
>   	if (user_mode(regs))
>   		_exception(sig, regs, code, regs->dar);
>   	else
> -		bad_page_fault(regs, regs->dar, sig);
> +		bad_page_fault(regs, sig);
>   
>   bail:
>   	exception_exit(prev_state);
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 8d014924ee0d..77073a256cff 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1539,7 +1539,7 @@ long do_hash_fault(struct pt_regs *regs)
>   	 * the access, or panic if there isn't a handler.
>   	 */
>   	if (unlikely(in_nmi())) {
> -		bad_page_fault(regs, ea, SIGSEGV);
> +		bad_page_fault(regs, SIGSEGV);
>   		return 0;
>   	}
>   
> @@ -1578,7 +1578,7 @@ long do_hash_fault(struct pt_regs *regs)
>   			else
>   				_exception(SIGBUS, regs, BUS_ADRERR, ea);
>   		} else {
> -			bad_page_fault(regs, ea, SIGBUS);
> +			bad_page_fault(regs, SIGBUS);
>   		}
>   		err = 0;
>   
> diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
> index 985902ce0272..c581548b533f 100644
> --- a/arch/powerpc/mm/book3s64/slb.c
> +++ b/arch/powerpc/mm/book3s64/slb.c
> @@ -874,7 +874,7 @@ void do_bad_slb_fault(struct pt_regs *regs)
>   		if (user_mode(regs))
>   			_exception(SIGSEGV, regs, SEGV_BNDERR, regs->dar);
>   		else
> -			bad_page_fault(regs, regs->dar, SIGSEGV);
> +			bad_page_fault(regs, SIGSEGV);
>   	} else if (err == -EINVAL) {
>   		unrecoverable_exception(regs);
>   	} else {
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index e170501081a7..36604ff8b3ec 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -569,14 +569,14 @@ long do_page_fault(struct pt_regs *regs)
>   	/* 32 and 64e handle these errors in asm */
>   	if (unlikely(err)) {
>   		if (err > 0) {
> -			__bad_page_fault(regs, address, err);
> +			__bad_page_fault(regs, err);
>   			err = 0;
>   		} else {
>   			/*
>   			 * do_break() may change NV GPRS while handling the
>   			 * breakpoint. Return -ve to caller to do that.
>   			 */
> -			do_break(regs, address, error_code);
> +			do_break(regs);
>   		}
>   	}
>   #endif
> @@ -592,7 +592,7 @@ NOKPROBE_SYMBOL(do_page_fault);
>    * It is called from the DSI and ISI handlers in head.S and from some
>    * of the procedures in traps.c.
>    */
> -void __bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
> +void __bad_page_fault(struct pt_regs *regs, int sig)
>   {
>   	int is_write = page_fault_is_write(regs->dsisr);
>   
> @@ -630,7 +630,7 @@ void __bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
>   	die("Kernel access of bad area", regs, sig);
>   }
>   
> -void bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
> +void bad_page_fault(struct pt_regs *regs, int sig)
>   {
>   	const struct exception_table_entry *entry;
>   
> @@ -639,5 +639,5 @@ void bad_page_fault(struct pt_regs *regs, unsigned long address, int sig)
>   	if (entry)
>   		instruction_pointer_set(regs, extable_fixup(entry));
>   	else
> -		__bad_page_fault(regs, address, sig);
> +		__bad_page_fault(regs, sig);
>   }
> diff --git a/arch/powerpc/platforms/8xx/machine_check.c b/arch/powerpc/platforms/8xx/machine_check.c
> index 88dedf38eccd..656365975895 100644
> --- a/arch/powerpc/platforms/8xx/machine_check.c
> +++ b/arch/powerpc/platforms/8xx/machine_check.c
> @@ -26,7 +26,7 @@ int machine_check_8xx(struct pt_regs *regs)
>   	 * to deal with that than having a wart in the mcheck handler.
>   	 * -- BenH
>   	 */
> -	bad_page_fault(regs, regs->dar, SIGBUS);
> +	bad_page_fault(regs, SIGBUS);
>   	return 1;
>   #else
>   	return 0;
> 

^ permalink raw reply

* Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C
From: Christophe Leroy @ 2021-01-13 14:12 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <20210113073215.516986-3-npiggin@gmail.com>



Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :
> The page fault handling still has some complex logic particularly around
> hash table handling, in asm. Implement this in C instead.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>   arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 +
>   arch/powerpc/kernel/exceptions-64s.S          | 131 +++---------------
>   arch/powerpc/mm/book3s64/hash_utils.c         |  77 ++++++----
>   arch/powerpc/mm/fault.c                       |  46 ++++--
>   4 files changed, 107 insertions(+), 148 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> index 066b1d34c7bc..60a669379aa0 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
> @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long vpn,
>   #define HPTE_NOHPTE_UPDATE	0x2
>   #define HPTE_USE_KERNEL_KEY	0x4
>   
> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr);
>   extern int __hash_page_4K(unsigned long ea, unsigned long access,
>   			  unsigned long vsid, pte_t *ptep, unsigned long trap,
>   			  unsigned long flags, int ssize, int subpage_prot);
> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
> index 6e53f7638737..bcb5e81d2088 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>    *
>    * Handling:
>    * - Hash MMU
> - *   Go to do_hash_page first to see if the HPT can be filled from an entry in
> - *   the Linux page table. Hash faults can hit in kernel mode in a fairly
> + *   Go to do_hash_fault, which attempts to fill the HPT from an entry in the
> + *   Linux page table. Hash faults can hit in kernel mode in a fairly
>    *   arbitrary state (e.g., interrupts disabled, locks held) when accessing
>    *   "non-bolted" regions, e.g., vmalloc space. However these should always be
> - *   backed by Linux page tables.
> + *   backed by Linux page table entries.
>    *
> - *   If none is found, do a Linux page fault. Linux page faults can happen in
> - *   kernel mode due to user copy operations of course.
> + *   If no entry is found the Linux page fault handler is invoked (by
> + *   do_hash_fault). Linux page faults can happen in kernel mode due to user
> + *   copy operations of course.
>    *
>    *   KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest
>    *   MMU context, which may cause a DSI in the host, which must go to the
> @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common)
>   	GEN_COMMON data_access
>   	ld	r4,_DAR(r1)
>   	ld	r5,_DSISR(r1)

We have DSISR here. I think the dispatch between page fault or do_break() should be done here:
- It would be more similar to other arches
- Would avoid doing it also in instruction fault
- Would avoid that -1 return which looks more like a hack.

> +	addi	r3,r1,STACK_FRAME_OVERHEAD
>   BEGIN_MMU_FTR_SECTION
> -	ld	r6,_MSR(r1)
> -	li	r3,0x300
> -	b	do_hash_page		/* Try to handle as hpte fault */
> +	bl	do_hash_fault
>   MMU_FTR_SECTION_ELSE
> -	b	handle_page_fault
> +	bl	do_page_fault
>   ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
> +        cmpdi	r3,0

IIUC, this is for do_break(). Would be better done in a separate path.

> +	beq+	interrupt_return
> +	/* We need to restore NVGPRS */
> +	REST_NVGPRS(r1)
> +	b       interrupt_return
>   
>   	GEN_KVM data_access
>   
> @@ -1540,13 +1545,17 @@ EXC_COMMON_BEGIN(instruction_access_common)
>   	GEN_COMMON instruction_access
>   	ld	r4,_DAR(r1)
>   	ld	r5,_DSISR(r1)
> +	addi	r3,r1,STACK_FRAME_OVERHEAD
>   BEGIN_MMU_FTR_SECTION
> -	ld      r6,_MSR(r1)
> -	li	r3,0x400
> -	b	do_hash_page		/* Try to handle as hpte fault */
> +	bl	do_hash_fault
>   MMU_FTR_SECTION_ELSE
> -	b	handle_page_fault
> +	bl	do_page_fault
>   ALT_MMU_FTR_SECTION_END_IFCLR(MMU_FTR_TYPE_RADIX)
> +        cmpdi	r3,0

What is that for ? If that's for do_break(), its irrelant in ISI.

> +	beq+	interrupt_return
> +	/* We need to restore NVGPRS */
> +	REST_NVGPRS(r1)
> +	b       interrupt_return
>   
>   	GEN_KVM instruction_access
>   
> @@ -3221,99 +3230,3 @@ disable_machine_check:
>   	RFI_TO_KERNEL
>   1:	mtlr	r0
>   	blr
> -
> -/*
> - * Hash table stuff
> - */
> -	.balign	IFETCH_ALIGN_BYTES
> -do_hash_page:
> -#ifdef CONFIG_PPC_BOOK3S_64
> -	lis	r0,(DSISR_BAD_FAULT_64S | DSISR_DABRMATCH | DSISR_KEYFAULT)@h
> -	ori	r0,r0,DSISR_BAD_FAULT_64S@l
> -	and.	r0,r5,r0		/* weird error? */
> -	bne-	handle_page_fault	/* if not, try to insert a HPTE */
> -
> -	/*
> -	 * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then
> -	 * don't call hash_page, just fail the fault. This is required to
> -	 * prevent re-entrancy problems in the hash code, namely perf
> -	 * interrupts hitting while something holds H_PAGE_BUSY, and taking a
> -	 * hash fault. See the comment in hash_preload().
> -	 */
> -	ld	r11, PACA_THREAD_INFO(r13)
> -	lwz	r0,TI_PREEMPT(r11)
> -	andis.	r0,r0,NMI_MASK@h
> -	bne	77f
> -
> -	/*
> -	 * r3 contains the trap number
> -	 * r4 contains the faulting address
> -	 * r5 contains dsisr
> -	 * r6 msr
> -	 *
> -	 * at return r3 = 0 for success, 1 for page fault, negative for error
> -	 */
> -	bl	__hash_page		/* build HPTE if possible */
> -        cmpdi	r3,0			/* see if __hash_page succeeded */
> -
> -	/* Success */
> -	beq	interrupt_return	/* Return from exception on success */
> -
> -	/* Error */
> -	blt-	13f
> -
> -	/* Reload DAR/DSISR into r4/r5 for the DABR check below */
> -	ld	r4,_DAR(r1)
> -	ld      r5,_DSISR(r1)
> -#endif /* CONFIG_PPC_BOOK3S_64 */
> -
> -/* Here we have a page fault that hash_page can't handle. */
> -handle_page_fault:
> -11:	andis.  r0,r5,DSISR_DABRMATCH@h
> -	bne-    handle_dabr_fault
> -	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	bl	do_page_fault
> -	cmpdi	r3,0
> -	beq+	interrupt_return
> -	mr	r5,r3
> -	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	ld	r4,_DAR(r1)
> -	bl	__bad_page_fault
> -	b	interrupt_return
> -
> -/* We have a data breakpoint exception - handle it */
> -handle_dabr_fault:
> -	ld      r4,_DAR(r1)
> -	ld      r5,_DSISR(r1)
> -	addi    r3,r1,STACK_FRAME_OVERHEAD
> -	bl      do_break
> -	/*
> -	 * do_break() may have changed the NV GPRS while handling a breakpoint.
> -	 * If so, we need to restore them with their updated values.
> -	 */
> -	REST_NVGPRS(r1)
> -	b       interrupt_return
> -
> -
> -#ifdef CONFIG_PPC_BOOK3S_64
> -/* We have a page fault that hash_page could handle but HV refused
> - * the PTE insertion
> - */
> -13:	mr	r5,r3
> -	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	ld	r4,_DAR(r1)
> -	bl	low_hash_fault
> -	b	interrupt_return
> -#endif
> -
> -/*
> - * We come here as a result of a DSI at a point where we don't want
> - * to call hash_page, such as when we are accessing memory (possibly
> - * user memory) inside a PMU interrupt that occurred while interrupts
> - * were soft-disabled.  We want to invoke the exception handler for
> - * the access, or panic if there isn't a handler.
> - */
> -77:	addi	r3,r1,STACK_FRAME_OVERHEAD
> -	li	r5,SIGSEGV
> -	bl	bad_page_fault
> -	b	interrupt_return
> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
> index 73b06adb6eeb..5a61182ddf75 100644
> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -1512,16 +1512,40 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap,
>   }
>   EXPORT_SYMBOL_GPL(hash_page);
>   
> -int __hash_page(unsigned long trap, unsigned long ea, unsigned long dsisr,
> -		unsigned long msr)
> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr)
>   {
>   	unsigned long access = _PAGE_PRESENT | _PAGE_READ;
>   	unsigned long flags = 0;
> -	struct mm_struct *mm = current->mm;
> -	unsigned int region_id = get_region_id(ea);
> +	struct mm_struct *mm;
> +	unsigned int region_id;
> +	int err;
> +
> +	if (unlikely(dsisr & (DSISR_BAD_FAULT_64S | DSISR_DABRMATCH | DSISR_KEYFAULT)))
> +		goto page_fault;
> +
> +	/*
> +	 * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then
> +	 * don't call hash_page, just fail the fault. This is required to
> +	 * prevent re-entrancy problems in the hash code, namely perf
> +	 * interrupts hitting while something holds H_PAGE_BUSY, and taking a
> +	 * hash fault. See the comment in hash_preload().
> +	 *
> +	 * We come here as a result of a DSI at a point where we don't want
> +	 * to call hash_page, such as when we are accessing memory (possibly
> +	 * user memory) inside a PMU interrupt that occurred while interrupts
> +	 * were soft-disabled.  We want to invoke the exception handler for
> +	 * the access, or panic if there isn't a handler.
> +	 */
> +	if (unlikely(in_nmi())) {
> +		bad_page_fault(regs, ea, SIGSEGV);
> +		return 0;
> +	}
>   
> +	region_id = get_region_id(ea);
>   	if ((region_id == VMALLOC_REGION_ID) || (region_id == IO_REGION_ID))
>   		mm = &init_mm;
> +	else
> +		mm = current->mm;
>   
>   	if (dsisr & DSISR_NOHPTE)
>   		flags |= HPTE_NOHPTE_UPDATE;
> @@ -1537,13 +1561,31 @@ int __hash_page(unsigned long trap, unsigned long ea, unsigned long dsisr,
>   	 * 2) user space access kernel space.
>   	 */
>   	access |= _PAGE_PRIVILEGED;
> -	if ((msr & MSR_PR) || (region_id == USER_REGION_ID))
> +	if (user_mode(regs) || (region_id == USER_REGION_ID))
>   		access &= ~_PAGE_PRIVILEGED;
>   
> -	if (trap == 0x400)
> +	if (regs->trap == 0x400)
>   		access |= _PAGE_EXEC;
>   
> -	return hash_page_mm(mm, ea, access, trap, flags);
> +	err = hash_page_mm(mm, ea, access, regs->trap, flags);
> +	if (unlikely(err < 0)) {
> +		// failed to instert a hash PTE due to an hypervisor error
> +		if (user_mode(regs)) {
> +			if (IS_ENABLED(CONFIG_PPC_SUBPAGE_PROT) && err == -2)
> +				_exception(SIGSEGV, regs, SEGV_ACCERR, ea);
> +			else
> +				_exception(SIGBUS, regs, BUS_ADRERR, ea);
> +		} else {
> +			bad_page_fault(regs, ea, SIGBUS);
> +		}
> +		err = 0;
> +
> +	} else if (err) {
> +page_fault:
> +		err = do_page_fault(regs, ea, dsisr);
> +	}
> +
> +	return err;
>   }
>   
>   #ifdef CONFIG_PPC_MM_SLICES
> @@ -1843,27 +1885,6 @@ void flush_hash_range(unsigned long number, int local)
>   	}
>   }
>   
> -/*
> - * low_hash_fault is called when we the low level hash code failed
> - * to instert a PTE due to an hypervisor error
> - */
> -void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc)
> -{
> -	enum ctx_state prev_state = exception_enter();
> -
> -	if (user_mode(regs)) {
> -#ifdef CONFIG_PPC_SUBPAGE_PROT
> -		if (rc == -2)
> -			_exception(SIGSEGV, regs, SEGV_ACCERR, address);
> -		else
> -#endif
> -			_exception(SIGBUS, regs, BUS_ADRERR, address);
> -	} else
> -		bad_page_fault(regs, address, SIGBUS);
> -
> -	exception_exit(prev_state);
> -}
> -
>   long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
>   			   unsigned long pa, unsigned long rflags,
>   			   unsigned long vflags, int psize, int ssize)
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 8961b44f350c..77a3155c77b6 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -369,7 +369,9 @@ static void sanity_check_fault(bool is_write, bool is_user,
>   #define page_fault_is_bad(__err)	(0)
>   #elif defined(CONFIG_PPC_8xx)
>   #define page_fault_is_bad(__err)	((__err) & DSISR_NOEXEC_OR_G)
> -#elif defined(CONFIG_PPC64)
> +#elif defined(CONFIG_PPC_BOOK3S_64)
> +#define page_fault_is_bad(__err)	((__err) & (DSISR_BAD_FAULT_64S | DSISR_DABRMATCH))
> +#elif defined(CONFIG_PPC_BOOK3E_64)
>   #define page_fault_is_bad(__err)	((__err) & DSISR_BAD_FAULT_64S)
>   #else
>   #define page_fault_is_bad(__err)	((__err) & DSISR_BAD_FAULT_32S)
> @@ -404,6 +406,9 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>   		return 0;
>   
>   	if (unlikely(page_fault_is_bad(error_code))) {
> +		if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && (error_code & DSISR_DABRMATCH))
> +			return -1;
> +
>   		if (is_user) {
>   			_exception(SIGBUS, regs, BUS_OBJERR, address);
>   			return 0;
> @@ -545,20 +550,39 @@ NOKPROBE_SYMBOL(__do_page_fault);
>   int do_page_fault(struct pt_regs *regs, unsigned long address,
>   		  unsigned long error_code)
>   {
> -	const struct exception_table_entry *entry;
>   	enum ctx_state prev_state = exception_enter();
> -	int rc = __do_page_fault(regs, address, error_code);
> -	exception_exit(prev_state);
> -	if (likely(!rc))
> -		return 0;
> +	int err;
>   
> -	entry = search_exception_tables(regs->nip);
> -	if (unlikely(!entry))
> -		return rc;
> +	err = __do_page_fault(regs, address, error_code);
> +	if (unlikely(err)) {
> +		const struct exception_table_entry *entry;
>   
> -	instruction_pointer_set(regs, extable_fixup(entry));
> +		entry = search_exception_tables(regs->nip);
> +		if (likely(entry)) {
> +			instruction_pointer_set(regs, extable_fixup(entry));
> +			err = 0;
> +		}
> +	}
>   
> -	return 0;
> +#ifdef CONFIG_PPC_BOOK3S_64

Seems like you are re-implementing handle_page_fault() inside do_page_fault(). Wouldn't it be 
possible to keep do_page_fault() as is for the moment and implement a C version of handle_page_fault() ?

Or just keep it in assembly ? It is not that big, keeping it in assembly would keep things more 
common with PPC32, and would still allow to save NV GPRS only when needed.

> +	/* 32 and 64e handle these errors in asm */
> +	if (unlikely(err)) {

Really looks like a hack. I'd rather see do_break() dispatch being done as early as possible, same 
as we now do on book3s/32

> +		if (err > 0) {
> +			__bad_page_fault(regs, address, err);
> +			err = 0;
> +		} else {
> +			/*
> +			 * do_break() may change NV GPRS while handling the
> +			 * breakpoint. Return -ve to caller to do that.
> +			 */
> +			do_break(regs, address, error_code);
> +		}
> +	}
> +#endif
> +
> +	exception_exit(prev_state);
> +
> +	return err;
>   }
>   NOKPROBE_SYMBOL(do_page_fault);
>   
> 

Christophe

^ permalink raw reply

* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Nicolas Saenz Julienne @ 2021-01-13 13:59 UTC (permalink / raw)
  To: Florian Fainelli, Claire Chang, robh+dt, mpe, benh, paulus, joro,
	will, frowand.list, konrad.wilk, boris.ostrovsky, jgross,
	sstabellini, hch, m.szyprowski, robin.murphy
  Cc: devicetree, heikki.krogerus, grant.likely, saravanak, peterz,
	xypron.glpk, rafael.j.wysocki, linux-kernel, treding,
	bgolaszewski, iommu, drinkcat, rdunlap, gregkh, xen-devel,
	dan.j.williams, andriy.shevchenko, linuxppc-dev, mingo
In-Reply-To: <95ae9c1e-c1f1-5736-fe86-12ced1f648f9@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3191 bytes --]

Hi All,

On Tue, 2021-01-12 at 16:03 -0800, Florian Fainelli wrote:
> On 1/5/21 7:41 PM, Claire Chang wrote:
> > Add the initialization function to create restricted DMA pools from
> > matching reserved-memory nodes in the device tree.
> > 
> > Signed-off-by: Claire Chang <tientzu@chromium.org>
> > ---
> >  include/linux/device.h  |   4 ++
> >  include/linux/swiotlb.h |   7 +-
> >  kernel/dma/Kconfig      |   1 +
> >  kernel/dma/swiotlb.c    | 144 ++++++++++++++++++++++++++++++++++------
> >  4 files changed, 131 insertions(+), 25 deletions(-)
> > 
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index 89bb8b84173e..ca6f71ec8871 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -413,6 +413,7 @@ struct dev_links_info {
> >   * @dma_pools:	Dma pools (if dma'ble device).
> >   * @dma_mem:	Internal for coherent mem override.
> >   * @cma_area:	Contiguous memory area for dma allocations
> > + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
> >   * @archdata:	For arch-specific additions.
> >   * @of_node:	Associated device tree node.
> >   * @fwnode:	Associated device node supplied by platform firmware.
> > @@ -515,6 +516,9 @@ struct device {
> >  #ifdef CONFIG_DMA_CMA
> >  	struct cma *cma_area;		/* contiguous memory area for dma
> >  					   allocations */
> > +#endif
> > +#ifdef CONFIG_SWIOTLB
> > +	struct io_tlb_mem	*dma_io_tlb_mem;
> >  #endif
> >  	/* arch specific additions */
> >  	struct dev_archdata	archdata;
> > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > index dd8eb57cbb8f..a1bbd7788885 100644
> > --- a/include/linux/swiotlb.h
> > +++ b/include/linux/swiotlb.h
> > @@ -76,12 +76,13 @@ extern enum swiotlb_force swiotlb_force;
> >   *
> >   * @start:	The start address of the swiotlb memory pool. Used to do a quick
> >   *		range check to see if the memory was in fact allocated by this
> > - *		API.
> > + *		API. For restricted DMA pool, this is device tree adjustable.
> 
> Maybe write it as this is "firmware adjustable" such that when/if ACPI
> needs something like this, the description does not need updating.
> 
> [snip]
> 
> > +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> > +				    struct device *dev)
> > +{
> > +	struct io_tlb_mem *mem = rmem->priv;
> > +	int ret;
> > +
> > +	if (dev->dma_io_tlb_mem)
> > +		return -EBUSY;
> > +
> > +	if (!mem) {
> > +		mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> > +		if (!mem)
> > +			return -ENOMEM;
> > +
> > +		if (!memremap(rmem->base, rmem->size, MEMREMAP_WB)) {
> 
> MEMREMAP_WB sounds appropriate as a default.

As per the binding 'no-map' has to be disabled here. So AFAIU, this memory will
be part of the linear mapping. Is this really needed then?

> Documentation/devicetree/bindings/reserved-memory/ramoops.txt does
> define an "unbuffered" property which in premise could be applied to the
> generic reserved memory binding as well and that we may have to be
> honoring here, if we were to make it more generic. Oh well, this does
> not need to be addressed right now I guess.




[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [RFC PATCH v3 4/6] swiotlb: Add restricted DMA alloc/free support.
From: Christoph Hellwig @ 2021-01-13 12:48 UTC (permalink / raw)
  To: Claire Chang
  Cc: heikki.krogerus, peterz, grant.likely, paulus, frowand.list,
	mingo, m.szyprowski, sstabellini, saravanak, joro,
	rafael.j.wysocki, hch, bgolaszewski, xen-devel, treding,
	devicetree, will, konrad.wilk, dan.j.williams, linuxppc-dev,
	robh+dt, boris.ostrovsky, andriy.shevchenko, jgross, drinkcat,
	gregkh, rdunlap, linux-kernel, tfiga, iommu, xypron.glpk,
	robin.murphy, bauerman
In-Reply-To: <20210106034124.30560-5-tientzu@chromium.org>

> +#ifdef CONFIG_SWIOTLB
> +	if (unlikely(dev->dma_io_tlb_mem))
> +		return swiotlb_alloc(dev, size, dma_handle, attrs);
> +#endif

Another place where the dma_io_tlb_mem is useful to avoid the ifdef.

> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
> -		size_t mapping_size, size_t alloc_size,
> -		enum dma_data_direction dir, unsigned long attrs)
> +static int swiotlb_tbl_find_free_region(struct device *hwdev,
> +					dma_addr_t tbl_dma_addr,
> +					size_t alloc_size,
> +					unsigned long attrs)

> +static void swiotlb_tbl_release_region(struct device *hwdev, int index,
> +				       size_t size)

This refactoring should be another prep patch.


> +void *swiotlb_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
> +		    unsigned long attrs)

I'd rather have the names convey there are for the per-device bounce
buffer in some form.

> +	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;

While we're at it I wonder if the io_tlb is something we could change
while we're at it.  Maybe replace io_tlb_mem with struct swiotlb
and rename the field in struct device to dev_swiotlb?

> +	int index;
> +	void *vaddr;
> +	phys_addr_t tlb_addr;
> +
> +	size = PAGE_ALIGN(size);
> +	index = swiotlb_tbl_find_free_region(dev, mem->start, size, attrs);
> +	if (index < 0)
> +		return NULL;
> +
> +	tlb_addr = mem->start + (index << IO_TLB_SHIFT);
> +	*dma_handle = phys_to_dma_unencrypted(dev, tlb_addr);
> +
> +	if (!dev_is_dma_coherent(dev)) {
> +		unsigned long pfn = PFN_DOWN(tlb_addr);
> +
> +		/* remove any dirty cache lines on the kernel alias */
> +		arch_dma_prep_coherent(pfn_to_page(pfn), size);

Can we hook in somewhat lower level in the dma-direct code so that all
the remapping in dma-direct can be reused instead of duplicated?  That
also becomes important if we want to use non-remapping uncached support,
e.g. on mips or x86, or the direct changing of the attributes that Will
planned to look into for arm64.

^ permalink raw reply

* Re: [RFC PATCH v3 3/6] swiotlb: Use restricted DMA pool if available
From: Christoph Hellwig @ 2021-01-13 12:44 UTC (permalink / raw)
  To: Claire Chang
  Cc: heikki.krogerus, peterz, grant.likely, paulus, frowand.list,
	mingo, m.szyprowski, sstabellini, saravanak, joro,
	rafael.j.wysocki, hch, bgolaszewski, xen-devel, treding,
	devicetree, will, konrad.wilk, dan.j.williams, linuxppc-dev,
	robh+dt, boris.ostrovsky, andriy.shevchenko, jgross, drinkcat,
	gregkh, rdunlap, linux-kernel, tfiga, iommu, xypron.glpk,
	robin.murphy, bauerman
In-Reply-To: <20210106034124.30560-4-tientzu@chromium.org>

> +#ifdef CONFIG_SWIOTLB
> +	if (unlikely(swiotlb_force == SWIOTLB_FORCE) || dev->dma_io_tlb_mem)
>  		return swiotlb_map(dev, phys, size, dir, attrs);
> +#endif

Please provide a wrapper for the dev->dma_io_tlb_mem check that
always returns false if the per-device swiotlb support is not enabled.

> index 7fb2ac087d23..1f05af09e61a 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -222,7 +222,6 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
>  		mem->orig_addr[i] = INVALID_PHYS_ADDR;
>  	}
>  	mem->index = 0;
> -	no_iotlb_memory = false;

How does this fit in here?


^ permalink raw reply

* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Christoph Hellwig @ 2021-01-13 12:42 UTC (permalink / raw)
  To: Claire Chang
  Cc: heikki.krogerus, peterz, grant.likely, paulus, frowand.list,
	mingo, m.szyprowski, sstabellini, saravanak, joro,
	rafael.j.wysocki, hch, bgolaszewski, xen-devel, treding,
	devicetree, will, konrad.wilk, dan.j.williams, linuxppc-dev,
	robh+dt, boris.ostrovsky, andriy.shevchenko, jgross, drinkcat,
	gregkh, rdunlap, linux-kernel, tfiga, iommu, xypron.glpk,
	robin.murphy, bauerman
In-Reply-To: <20210106034124.30560-3-tientzu@chromium.org>

> +#ifdef CONFIG_SWIOTLB
> +	struct io_tlb_mem	*dma_io_tlb_mem;
>  #endif

Please add a new config option for this code instead of always building
it when swiotlb is enabled.

> +static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
> +				   size_t size)

Can you split the refactoring in swiotlb.c into one or more prep
patches?

> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> +				    struct device *dev)
> +{
> +	struct io_tlb_mem *mem = rmem->priv;
> +	int ret;
> +
> +	if (dev->dma_io_tlb_mem)
> +		return -EBUSY;
> +
> +	if (!mem) {
> +		mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> +		if (!mem)
> +			return -ENOMEM;

What is the calling convention here that allows for a NULL and non-NULL
private data?

^ permalink raw reply

* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Christoph Hellwig @ 2021-01-13 12:37 UTC (permalink / raw)
  To: Greg KH
  Cc: heikki.krogerus, peterz, grant.likely, paulus, frowand.list,
	mingo, m.szyprowski, sstabellini, saravanak, xypron.glpk, joro,
	rafael.j.wysocki, Christoph Hellwig, bgolaszewski, xen-devel,
	treding, devicetree, will, konrad.wilk, dan.j.williams, robh+dt,
	Claire Chang, boris.ostrovsky, andriy.shevchenko, jgross,
	drinkcat, linuxppc-dev, rdunlap, linux-kernel, tfiga, iommu,
	robin.murphy, bauerman
In-Reply-To: <X/7nkb/YDpKlakRO@kroah.com>

On Wed, Jan 13, 2021 at 01:29:05PM +0100, Greg KH wrote:
> > > Why does this have to be added here?  Shouldn't the platform-specific
> > > code handle it instead?
> > 
> > The whole code added here is pretty generic.  What we need to eventually
> > do, though is to add a separate dma_device instead of adding more and more
> > bloat to struct device.
> 
> I have no objections for that happening!

I'm pretty sure you agreed to it before in fact.  Now someone just needs
to find the time to do this heavy lifting, where "someone" probably means
me.

^ permalink raw reply

* Re: SPI not working on 5.10 and 5.11, bisected to 766c6b63aa04 ("spi: fix client driver breakages when using GPIO descriptors")
From: Mark Brown @ 2021-01-13 12:33 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linuxppc-dev@ozlabs.org, Sven Van Asbroeck, Linus Walleij,
	linux-kernel@vger.kernel.org, linux-spi
In-Reply-To: <dc5d8d35-31aa-b36d-72b0-17c8a7c13061@csgroup.eu>

[-- Attachment #1: Type: text/plain, Size: 445 bytes --]

On Wed, Jan 13, 2021 at 09:49:12AM +0100, Christophe Leroy wrote:

> With commit 766c6b63aa04 ("spi: fix client driver breakages when using GPIO
> descriptors") reverted, it is back to work:

...

> What shall I do ?

I would guess that there's an error with the chip select polarity
configuration on your system that just happened to work previously, I'd
suggest fixing this in the board configuration to bring it in line with
everything else.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Greg KH @ 2021-01-13 12:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: heikki.krogerus, peterz, grant.likely, paulus, frowand.list,
	mingo, m.szyprowski, sstabellini, saravanak, xypron.glpk, joro,
	rafael.j.wysocki, bgolaszewski, xen-devel, treding, devicetree,
	will, konrad.wilk, dan.j.williams, robh+dt, Claire Chang,
	boris.ostrovsky, andriy.shevchenko, jgross, drinkcat,
	linuxppc-dev, rdunlap, linux-kernel, tfiga, iommu, robin.murphy,
	bauerman
In-Reply-To: <20210113115126.GB29376@lst.de>

On Wed, Jan 13, 2021 at 12:51:26PM +0100, Christoph Hellwig wrote:
> On Wed, Jan 06, 2021 at 08:50:03AM +0100, Greg KH wrote:
> > > --- a/include/linux/device.h
> > > +++ b/include/linux/device.h
> > > @@ -413,6 +413,7 @@ struct dev_links_info {
> > >   * @dma_pools:	Dma pools (if dma'ble device).
> > >   * @dma_mem:	Internal for coherent mem override.
> > >   * @cma_area:	Contiguous memory area for dma allocations
> > > + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
> > 
> > Why does this have to be added here?  Shouldn't the platform-specific
> > code handle it instead?
> 
> The whole code added here is pretty generic.  What we need to eventually
> do, though is to add a separate dma_device instead of adding more and more
> bloat to struct device.

I have no objections for that happening!

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox