LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 5/7] net: ethernet: ibm: ibmvnic: Fix some kernel-doc misdemeanours
From: Lee Jones @ 2021-01-13 16:41 UTC (permalink / raw)
  To: lee.jones
  Cc: Thomas Falcon, John Allen, linux-kernel, Santiago Leon,
	Jakub Kicinski, netdev, Lijun Pan, Dany Madden, Paul Mackerras,
	Sukadev Bhattiprolu, linuxppc-dev, David S. Miller
In-Reply-To: <20210113164123.1334116-1-lee.jones@linaro.org>

Fixes the following W=1 kernel build warning(s):

 from drivers/net/ethernet/ibm/ibmvnic.c:35:
 inlined from ‘handle_vpd_rsp’ at drivers/net/ethernet/ibm/ibmvnic.c:4124:3:
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_field' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'skb' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_len' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1362: warning: Function parameter or member 'hdr_data' not described in 'build_hdr_data'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_field' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_data' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'len' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'hdr_len' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1423: warning: Function parameter or member 'scrq_arr' not described in 'create_hdr_descs'
 drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'txbuff' not described in 'build_hdr_descs_arr'
 drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'num_entries' not described in 'build_hdr_descs_arr'
 drivers/net/ethernet/ibm/ibmvnic.c:1474: warning: Function parameter or member 'hdr_field' not described in 'build_hdr_descs_arr'
 drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'adapter' not described in 'do_change_param_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'rwi' not described in 'do_change_param_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1832: warning: Function parameter or member 'reset_state' not described in 'do_change_param_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'adapter' not described in 'do_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'rwi' not described in 'do_reset'
 drivers/net/ethernet/ibm/ibmvnic.c:1911: warning: Function parameter or member 'reset_state' not described in 'do_reset'

Cc: Dany Madden <drt@linux.ibm.com>
Cc: Lijun Pan <ljp@linux.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Santiago Leon <santi_leon@yahoo.com>
Cc: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: netdev@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 27 +++++++++++++--------------
 drivers/net/xen-netfront.c         |  6 +++---
 2 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index aed985e08e8ad..4c4252e68de5a 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1384,10 +1384,10 @@ static int ibmvnic_close(struct net_device *netdev)
 
 /**
  * build_hdr_data - creates L2/L3/L4 header data buffer
- * @hdr_field - bitfield determining needed headers
- * @skb - socket buffer
- * @hdr_len - array of header lengths
- * @tot_len - total length of data
+ * @hdr_field: bitfield determining needed headers
+ * @skb: socket buffer
+ * @hdr_len: array of header lengths
+ * @hdr_data: buffer to write the header to
  *
  * Reads hdr_field to determine which headers are needed by firmware.
  * Builds a buffer containing these headers.  Saves individual header
@@ -1444,11 +1444,11 @@ static int build_hdr_data(u8 hdr_field, struct sk_buff *skb,
 
 /**
  * create_hdr_descs - create header and header extension descriptors
- * @hdr_field - bitfield determining needed headers
- * @data - buffer containing header data
- * @len - length of data buffer
- * @hdr_len - array of individual header lengths
- * @scrq_arr - descriptor array
+ * @hdr_field: bitfield determining needed headers
+ * @hdr_data: buffer containing header data
+ * @len: length of data buffer
+ * @hdr_len: array of individual header lengths
+ * @scrq_arr: descriptor array
  *
  * Creates header and, if needed, header extension descriptors and
  * places them in a descriptor array, scrq_arr
@@ -1496,10 +1496,9 @@ static int create_hdr_descs(u8 hdr_field, u8 *hdr_data, int len, int *hdr_len,
 
 /**
  * build_hdr_descs_arr - build a header descriptor array
- * @skb - socket buffer
- * @num_entries - number of descriptors to be sent
- * @subcrq - first TX descriptor
- * @hdr_field - bit field determining which headers will be sent
+ * @txbuff: tx buffer
+ * @num_entries: number of descriptors to be sent
+ * @hdr_field: bit field determining which headers will be sent
  *
  * This function will build a TX descriptor array with applicable
  * L2/L3/L4 packet header descriptors to be sent by send_subcrq_indirect.
@@ -1925,7 +1924,7 @@ static int ibmvnic_set_mac(struct net_device *netdev, void *p)
 	return rc;
 }
 
-/**
+/*
  * do_reset returns zero if we are able to keep processing reset events, or
  * non-zero if we hit a fatal error and must halt.
  */
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index c20b78120bb42..6ef2adbd283a3 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1580,7 +1580,7 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 	return ERR_PTR(err);
 }
 
-/**
+/*
  * Entry point to this code when a new device is created.  Allocate the basic
  * structures and the ring buffers for communication with the backend, and
  * inform the backend of the appropriate details for those.
@@ -1657,7 +1657,7 @@ static void xennet_disconnect_backend(struct netfront_info *info)
 	}
 }
 
-/**
+/*
  * We are reconnecting to the backend, due to a suspend/resume, or a backend
  * driver restart.  We tear down our netif structure and recreate it, but
  * leave the device-layer structures intact so that this is transparent to the
@@ -2303,7 +2303,7 @@ static int xennet_connect(struct net_device *dev)
 	return 0;
 }
 
-/**
+/*
  * Callback received when the backend's state changes.
  */
 static void netback_changed(struct xenbus_device *dev,
-- 
2.25.1


^ permalink raw reply related

* [PATCH v2 0/7] Rid W=1 warnings in Ethernet
From: Lee Jones @ 2021-01-13 16:41 UTC (permalink / raw)
  To: lee.jones
  Cc: Paul Durrant, Kurt Kanzenbach, Alexei Starovoitov,
	Gustavo A. R. Silva, Peter Cammaert, Paul Mackerras,
	Sukadev Bhattiprolu, Wei Liu, Daniel Borkmann, John Fastabend,
	Santiago Leon, Jakub Kicinski, Grygorii Strashko, Thomas Falcon,
	Jesper Dangaard Brouer, Jens Osterkamp, Rusty Russell,
	Daris A Nevil, Lijun Pan, xen-devel, Ivan Khoronzhuk,
	Nicolas Pitre, Geoff Levand, netdev, linux-kernel, Erik Stahlman,
	John Allen, Utz Bacher, Dany Madden, bpf, linuxppc-dev,
	David S. Miller, Russell King

Resending the stragglers again.                                                                                  

This set is part of a larger effort attempting to clean-up W=1                                                   
kernel builds, which are currently overwhelmingly riddled with                                                   
niggly little warnings.                                                                                          
                                                                                                                 
v2:                                                                                                              
 - Squashed IBM patches                                                                                      
 - Fixed real issue in SMSC
 - Added Andrew's Reviewed-by tags on remainder

Lee Jones (7):
  net: ethernet: smsc: smc91x: Fix function name in kernel-doc header
  net: xen-netback: xenbus: Demote nonconformant kernel-doc headers
  net: ethernet: ti: am65-cpsw-qos: Demote non-conformant function
    header
  net: ethernet: ti: am65-cpts: Document am65_cpts_rx_enable()'s 'en'
    parameter
  net: ethernet: ibm: ibmvnic: Fix some kernel-doc misdemeanours
  net: ethernet: toshiba: ps3_gelic_net: Fix some kernel-doc
    misdemeanours
  net: ethernet: toshiba: spider_net: Document a whole bunch of function
    parameters

 drivers/net/ethernet/ibm/ibmvnic.c           | 27 ++++++++++----------
 drivers/net/ethernet/smsc/smc91x.c           |  2 +-
 drivers/net/ethernet/ti/am65-cpsw-qos.c      |  2 +-
 drivers/net/ethernet/ti/am65-cpts.c          |  2 +-
 drivers/net/ethernet/toshiba/ps3_gelic_net.c |  8 +++---
 drivers/net/ethernet/toshiba/spider_net.c    | 18 ++++++++-----
 drivers/net/xen-netback/xenbus.c             |  4 +--
 drivers/net/xen-netfront.c                   |  6 ++---
 8 files changed, 37 insertions(+), 32 deletions(-)

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: bpf@vger.kernel.org
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dany Madden <drt@linux.ibm.com>
Cc: Daris A Nevil <dnevil@snmc.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Erik Stahlman <erik@vt.edu>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Ishizaki Kou <kou.ishizaki@toshiba.co.jp>
Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Kurt Kanzenbach <kurt@linutronix.de>
Cc: Lijun Pan <ljp@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: netdev@vger.kernel.org
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: Paul Durrant <paul@xen.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Cammaert <pc@denkart.be>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Santiago Leon <santi_leon@yahoo.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
Cc: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Cc: Utz Bacher <utz.bacher@de.ibm.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: xen-devel@lists.xenproject.org
-- 
2.25.1


^ permalink raw reply

* [PATCH 6/7] net: ethernet: toshiba: ps3_gelic_net: Fix some kernel-doc misdemeanours
From: Lee Jones @ 2021-01-13 16:41 UTC (permalink / raw)
  To: lee.jones
  Cc: Andrew Lunn, Geoff Levand, linux-kernel, Jens Osterkamp, netdev,
	Paul Mackerras, Utz Bacher, Jakub Kicinski, linuxppc-dev,
	David S. Miller
In-Reply-To: <20210113164123.1334116-1-lee.jones@linaro.org>

Fixes the following W=1 kernel build warning(s):

 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1107: warning: Function parameter or member 'irq' not described in 'gelic_card_interrupt'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1107: warning: Function parameter or member 'ptr' not described in 'gelic_card_interrupt'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1407: warning: Function parameter or member 'txqueue' not described in 'gelic_net_tx_timeout'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1439: warning: Function parameter or member 'napi' not described in 'gelic_ether_setup_netdev_ops'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1639: warning: Function parameter or member 'dev' not described in 'ps3_gelic_driver_probe'
 drivers/net/ethernet/toshiba/ps3_gelic_net.c:1795: warning: Function parameter or member 'dev' not described in 'ps3_gelic_driver_remove'

Cc: Geoff Levand <geoff@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Utz Bacher <utz.bacher@de.ibm.com>
Cc: Jens Osterkamp <Jens.Osterkamp@de.ibm.com>
Cc: netdev@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
---
 drivers/net/ethernet/toshiba/ps3_gelic_net.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
index 3d1fc8d2ca667..55e652624bd76 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
@@ -1100,7 +1100,7 @@ static int gelic_net_poll(struct napi_struct *napi, int budget)
 	return packets_done;
 }
 
-/**
+/*
  * gelic_card_interrupt - event handler for gelic_net
  */
 static irqreturn_t gelic_card_interrupt(int irq, void *ptr)
@@ -1400,6 +1400,7 @@ static void gelic_net_tx_timeout_task(struct work_struct *work)
 /**
  * gelic_net_tx_timeout - called when the tx timeout watchdog kicks in.
  * @netdev: interface device structure
+ * @txqueue: unused
  *
  * called, if tx hangs. Schedules a task that resets the interface
  */
@@ -1431,6 +1432,7 @@ static const struct net_device_ops gelic_netdevice_ops = {
 /**
  * gelic_ether_setup_netdev_ops - initialization of net_device operations
  * @netdev: net_device structure
+ * @napi: napi structure
  *
  * fills out function pointers in the net_device structure
  */
@@ -1632,7 +1634,7 @@ static void gelic_card_get_vlan_info(struct gelic_card *card)
 	dev_info(ctodev(card), "internal vlan %s\n",
 		 card->vlan_required? "enabled" : "disabled");
 }
-/**
+/*
  * ps3_gelic_driver_probe - add a device to the control of this driver
  */
 static int ps3_gelic_driver_probe(struct ps3_system_bus_device *dev)
@@ -1787,7 +1789,7 @@ static int ps3_gelic_driver_probe(struct ps3_system_bus_device *dev)
 	return result;
 }
 
-/**
+/*
  * ps3_gelic_driver_remove - remove a device from the control of this driver
  */
 
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH for 4.9/4.14] powerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_at
From: Sasha Levin @ 2021-01-13 16:48 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev, linux-kernel, stable
In-Reply-To: <af07f8f0bb734bc1f906c1f79219f81c13c6ad2c.1610521804.git.christophe.leroy@csgroup.eu>

On Wed, Jan 13, 2021 at 07:12:44AM +0000, Christophe Leroy wrote:
>From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>
>Backport for 4.9 and 4.14
>
>(cherry picked from commit d85be8a49e733dcd23674aa6202870d54bf5600d)
>
>The placeholder for instruction selection should use the second
>argument's operand, which is %1, not %0. This could generate incorrect
>assembly code if the memory addressing of operand %0 is a different
>form from that of operand %1.
>
>Also remove the %Un placeholder because having %Un placeholders
>for two operands which are based on the same local var (ptep) doesn't
>make much sense. By the way, it doesn't change the current behaviour
>because "<>" constraint is missing for the associated "=m".
>
>[chleroy: revised commit log iaw segher's comments and removed %U0]
>
>Fixes: 9bf2b5cdc5fe ("powerpc: Fixes for CONFIG_PTE_64BIT for SMP support")
>Cc: <stable@vger.kernel.org> # v2.6.28+
>Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>Acked-by: Segher Boessenkool <segher@kernel.crashing.org>
>Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
>Link: https://lore.kernel.org/r/96354bd77977a6a933fe9020da57629007fdb920.1603358942.git.christophe.leroy@csgroup.eu

I took this and the 4.4 backport, thanks!

-- 
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH for 5.10] powerpc/32s: Fix RTAS machine check with VMAP stack
From: Sasha Levin @ 2021-01-13 16:50 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev, linux-kernel, stable
In-Reply-To: <790e46767a5f10ae991969b064682c8c82f96bc3.1610519852.git.christophe.leroy@csgroup.eu>

On Wed, Jan 13, 2021 at 06:40:20AM +0000, Christophe Leroy wrote:
>This is backport for 5.10
>
>(cherry picked from commit 98bf2d3f4970179c702ef64db658e0553bc6ef3a)
>
>When we have VMAP stack, exception prolog 1 sets r1, not r11.
>
>When it is not an RTAS machine check, don't trash r1 because it is
>needed by prolog 1.
>
>Fixes: da7bb43ab9da ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU")
>Cc: stable@vger.kernel.org # v5.10+
>Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>[mpe: Squash in fixup for RTAS machine check from Christophe]
>Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
>Link: https://lore.kernel.org/r/bc77d61d1c18940e456a2dee464f1e2eda65a3f0.1608621048.git.christophe.leroy@csgroup.eu

Queued up, thanks!

-- 
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement
From: Brian King @ 2021-01-13 17:13 UTC (permalink / raw)
  To: Tyrel Datwyler, james.bottomley
  Cc: martin.petersen, linux-scsi, james.smart, linux-kernel, brking,
	Ming Lei, linuxppc-dev
In-Reply-To: <a8623705-6d49-2056-09bb-80190e0b6f52@linux.ibm.com>

On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
> On 1/12/21 2:54 PM, Brian King wrote:
>> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
>>> Introduce several new vhost fields for managing MQ state of the adapter
>>> as well as initial defaults for MQ enablement.
>>>
>>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
>>> ---
>>>  drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
>>>  drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
>>>  2 files changed, 17 insertions(+)
>>>
>>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
>>> index ba95438a8912..9200fe49c57e 100644
>>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
>>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
>>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
>>>  	.max_sectors = IBMVFC_MAX_SECTORS,
>>>  	.shost_attrs = ibmvfc_attrs,
>>>  	.track_queue_depth = 1,
>>> +	.host_tagset = 1,
>>
>> This doesn't seem right. You are setting host_tagset, which means you want a
>> shared, host wide, tag set for commands. It also means that the total
>> queue depth for the host is can_queue. However, it looks like you are allocating
>> max_requests events for each sub crq, which means you are over allocating memory.
> 
> With the shared tagset yes the queue depth for the host is can_queue, but this
> also implies that the max queue depth for each hw queue is also can_queue. So,
> in the worst case that all commands are queued down the same hw queue we need an
> event pool with can_queue commands.
> 
>>
>> Looking at this closer, we might have bigger problems. There is a host wide
>> max number of commands that the VFC host supports, which gets returned on
>> NPIV Login. This value can change across a live migration event.
> 
> From what I understand the max commands can only become less.
> 
>>
>> The ibmvfc driver, which does the same thing the lpfc driver does, modifies
>> can_queue on the scsi_host *after* the tag set has been allocated. This looks
>> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
>> we look at can_queue once the tag set is setup, and I'm not seeing a good way
>> to dynamically change the host queue depth once the tag set is setup. 
>>
>> Unless I'm missing something, our best options appear to either be to implement
>> our own host wide busy reference counting, which doesn't sound very good, or
>> we need to add some API to block / scsi that allows us to dynamically change
>> can_queue.
> 
> Changing can_queue won't do use any good with the shared tagset becasue each
> queue still needs to be able to queue can_queue number of commands in the worst
> case.

The issue I'm trying to highlight here is the following scenario:

1. We set shost->can_queue, then call scsi_add_host, which allocates the tag set.

2. On our NPIV login response from the VIOS, we might get a lower value than we
initially set in shost->can_queue, so we update it, but nobody ever looks at it
again, and we don't have any protection against sending too many commands to the host.


Basically, we no longer have any code that ensures we don't send more
commands to the VIOS than we are told it supports. According to the architecture,
if we actually do this, the VIOS will do an h_free_crq, which would be a bit
of a bug on our part.

I don't think it was ever clearly defined in the API that a driver can
change shost->can_queue after calling scsi_add_host, but up until
commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now
it doesn't. 

I started looking through drivers that do this, and so far, it looks like the
following drivers do: ibmvfc, lpfc, aix94xx, libfc, BusLogic, and likely others...

We probably need an API that lets us change shost->can_queue dynamically.

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center


^ permalink raw reply

* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Florian Fainelli @ 2021-01-13 17:43 UTC (permalink / raw)
  To: Robin Murphy, Nicolas Saenz Julienne, Claire Chang, robh+dt, mpe,
	benh, paulus, joro, will, frowand.list, konrad.wilk,
	boris.ostrovsky, jgross, sstabellini, hch, m.szyprowski
  Cc: devicetree, heikki.krogerus, grant.likely, saravanak, peterz,
	xypron.glpk, rafael.j.wysocki, linux-kernel, treding,
	bgolaszewski, iommu, drinkcat, rdunlap, gregkh, xen-devel,
	dan.j.williams, andriy.shevchenko, linuxppc-dev, mingo
In-Reply-To: <4c4989b5-f825-7e04-ca66-038cf6b9d5e9@arm.com>

On 1/13/21 7:27 AM, Robin Murphy wrote:
> On 2021-01-13 13:59, Nicolas Saenz Julienne wrote:
>> Hi All,
>>
>> On Tue, 2021-01-12 at 16:03 -0800, Florian Fainelli wrote:
>>> On 1/5/21 7:41 PM, Claire Chang wrote:
>>>> Add the initialization function to create restricted DMA pools from
>>>> matching reserved-memory nodes in the device tree.
>>>>
>>>> Signed-off-by: Claire Chang <tientzu@chromium.org>
>>>> ---
>>>>   include/linux/device.h  |   4 ++
>>>>   include/linux/swiotlb.h |   7 +-
>>>>   kernel/dma/Kconfig      |   1 +
>>>>   kernel/dma/swiotlb.c    | 144
>>>> ++++++++++++++++++++++++++++++++++------
>>>>   4 files changed, 131 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>>> index 89bb8b84173e..ca6f71ec8871 100644
>>>> --- a/include/linux/device.h
>>>> +++ b/include/linux/device.h
>>>> @@ -413,6 +413,7 @@ struct dev_links_info {
>>>>    * @dma_pools:    Dma pools (if dma'ble device).
>>>>    * @dma_mem:    Internal for coherent mem override.
>>>>    * @cma_area:    Contiguous memory area for dma allocations
>>>> + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
>>>>    * @archdata:    For arch-specific additions.
>>>>    * @of_node:    Associated device tree node.
>>>>    * @fwnode:    Associated device node supplied by platform firmware.
>>>> @@ -515,6 +516,9 @@ struct device {
>>>>   #ifdef CONFIG_DMA_CMA
>>>>       struct cma *cma_area;        /* contiguous memory area for dma
>>>>                          allocations */
>>>> +#endif
>>>> +#ifdef CONFIG_SWIOTLB
>>>> +    struct io_tlb_mem    *dma_io_tlb_mem;
>>>>   #endif
>>>>       /* arch specific additions */
>>>>       struct dev_archdata    archdata;
>>>> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
>>>> index dd8eb57cbb8f..a1bbd7788885 100644
>>>> --- a/include/linux/swiotlb.h
>>>> +++ b/include/linux/swiotlb.h
>>>> @@ -76,12 +76,13 @@ extern enum swiotlb_force swiotlb_force;
>>>>    *
>>>>    * @start:    The start address of the swiotlb memory pool. Used
>>>> to do a quick
>>>>    *        range check to see if the memory was in fact allocated
>>>> by this
>>>> - *        API.
>>>> + *        API. For restricted DMA pool, this is device tree
>>>> adjustable.
>>>
>>> Maybe write it as this is "firmware adjustable" such that when/if ACPI
>>> needs something like this, the description does not need updating.
> 
> TBH I really don't think this needs calling out at all. Even in the
> regular case, the details of exactly how and where the pool is allocated
> are beyond the scope of this code - architectures already have several
> ways to control that and make their own decisions.
> 
>>>
>>> [snip]
>>>
>>>> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
>>>> +                    struct device *dev)
>>>> +{
>>>> +    struct io_tlb_mem *mem = rmem->priv;
>>>> +    int ret;
>>>> +
>>>> +    if (dev->dma_io_tlb_mem)
>>>> +        return -EBUSY;
>>>> +
>>>> +    if (!mem) {
>>>> +        mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>>>> +        if (!mem)
>>>> +            return -ENOMEM;
>>>> +
>>>> +        if (!memremap(rmem->base, rmem->size, MEMREMAP_WB)) {
>>>
>>> MEMREMAP_WB sounds appropriate as a default.
>>
>> As per the binding 'no-map' has to be disabled here. So AFAIU, this
>> memory will
>> be part of the linear mapping. Is this really needed then?
> 
> More than that, I'd assume that we *have* to use the linear/direct map
> address rather than anything that has any possibility of being a vmalloc
> remap, otherwise we can no longer safely rely on
> phys_to_dma/dma_to_phys, no?

I believe you are right, which means that if we want to make use of the
restricted DMA pool on a 32-bit architecture (and we do, at least, I do)
we should probably add some error checking/warning to ensure the
restricted DMA pool falls within the linear map.
-- 
Florian

^ permalink raw reply

* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Robin Murphy @ 2021-01-13 18:03 UTC (permalink / raw)
  To: Florian Fainelli, Nicolas Saenz Julienne, Claire Chang, robh+dt,
	mpe, benh, paulus, joro, will, frowand.list, konrad.wilk,
	boris.ostrovsky, jgross, sstabellini, hch, m.szyprowski
  Cc: devicetree, heikki.krogerus, grant.likely, saravanak, peterz,
	xypron.glpk, rafael.j.wysocki, linux-kernel, treding,
	bgolaszewski, iommu, drinkcat, rdunlap, gregkh, xen-devel,
	dan.j.williams, andriy.shevchenko, linuxppc-dev, mingo
In-Reply-To: <9b4fe35f-a880-fcea-0591-b65406abbfa8@gmail.com>

On 2021-01-13 17:43, Florian Fainelli wrote:
> On 1/13/21 7:27 AM, Robin Murphy wrote:
>> On 2021-01-13 13:59, Nicolas Saenz Julienne wrote:
>>> Hi All,
>>>
>>> On Tue, 2021-01-12 at 16:03 -0800, Florian Fainelli wrote:
>>>> On 1/5/21 7:41 PM, Claire Chang wrote:
>>>>> Add the initialization function to create restricted DMA pools from
>>>>> matching reserved-memory nodes in the device tree.
>>>>>
>>>>> Signed-off-by: Claire Chang <tientzu@chromium.org>
>>>>> ---
>>>>>    include/linux/device.h  |   4 ++
>>>>>    include/linux/swiotlb.h |   7 +-
>>>>>    kernel/dma/Kconfig      |   1 +
>>>>>    kernel/dma/swiotlb.c    | 144
>>>>> ++++++++++++++++++++++++++++++++++------
>>>>>    4 files changed, 131 insertions(+), 25 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>>>> index 89bb8b84173e..ca6f71ec8871 100644
>>>>> --- a/include/linux/device.h
>>>>> +++ b/include/linux/device.h
>>>>> @@ -413,6 +413,7 @@ struct dev_links_info {
>>>>>     * @dma_pools:    Dma pools (if dma'ble device).
>>>>>     * @dma_mem:    Internal for coherent mem override.
>>>>>     * @cma_area:    Contiguous memory area for dma allocations
>>>>> + * @dma_io_tlb_mem: Internal for swiotlb io_tlb_mem override.
>>>>>     * @archdata:    For arch-specific additions.
>>>>>     * @of_node:    Associated device tree node.
>>>>>     * @fwnode:    Associated device node supplied by platform firmware.
>>>>> @@ -515,6 +516,9 @@ struct device {
>>>>>    #ifdef CONFIG_DMA_CMA
>>>>>        struct cma *cma_area;        /* contiguous memory area for dma
>>>>>                           allocations */
>>>>> +#endif
>>>>> +#ifdef CONFIG_SWIOTLB
>>>>> +    struct io_tlb_mem    *dma_io_tlb_mem;
>>>>>    #endif
>>>>>        /* arch specific additions */
>>>>>        struct dev_archdata    archdata;
>>>>> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
>>>>> index dd8eb57cbb8f..a1bbd7788885 100644
>>>>> --- a/include/linux/swiotlb.h
>>>>> +++ b/include/linux/swiotlb.h
>>>>> @@ -76,12 +76,13 @@ extern enum swiotlb_force swiotlb_force;
>>>>>     *
>>>>>     * @start:    The start address of the swiotlb memory pool. Used
>>>>> to do a quick
>>>>>     *        range check to see if the memory was in fact allocated
>>>>> by this
>>>>> - *        API.
>>>>> + *        API. For restricted DMA pool, this is device tree
>>>>> adjustable.
>>>>
>>>> Maybe write it as this is "firmware adjustable" such that when/if ACPI
>>>> needs something like this, the description does not need updating.
>>
>> TBH I really don't think this needs calling out at all. Even in the
>> regular case, the details of exactly how and where the pool is allocated
>> are beyond the scope of this code - architectures already have several
>> ways to control that and make their own decisions.
>>
>>>>
>>>> [snip]
>>>>
>>>>> +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
>>>>> +                    struct device *dev)
>>>>> +{
>>>>> +    struct io_tlb_mem *mem = rmem->priv;
>>>>> +    int ret;
>>>>> +
>>>>> +    if (dev->dma_io_tlb_mem)
>>>>> +        return -EBUSY;
>>>>> +
>>>>> +    if (!mem) {
>>>>> +        mem = kzalloc(sizeof(*mem), GFP_KERNEL);
>>>>> +        if (!mem)
>>>>> +            return -ENOMEM;
>>>>> +
>>>>> +        if (!memremap(rmem->base, rmem->size, MEMREMAP_WB)) {
>>>>
>>>> MEMREMAP_WB sounds appropriate as a default.
>>>
>>> As per the binding 'no-map' has to be disabled here. So AFAIU, this
>>> memory will
>>> be part of the linear mapping. Is this really needed then?
>>
>> More than that, I'd assume that we *have* to use the linear/direct map
>> address rather than anything that has any possibility of being a vmalloc
>> remap, otherwise we can no longer safely rely on
>> phys_to_dma/dma_to_phys, no?
> 
> I believe you are right, which means that if we want to make use of the
> restricted DMA pool on a 32-bit architecture (and we do, at least, I do)
> we should probably add some error checking/warning to ensure the
> restricted DMA pool falls within the linear map.

Oh, good point - I'm so used to 64-bit that I instinctively just blanked 
out the !PageHighMem() condition in try_ram_remap(). So maybe the 
original intent here *was* to effectively just implement that check, but 
if so it could still do with being a lot more explicit.

Cheers,
Robin.

^ permalink raw reply

* Re: [RFC PATCH v3 4/6] swiotlb: Add restricted DMA alloc/free support.
From: Robin Murphy @ 2021-01-13 18:27 UTC (permalink / raw)
  To: Christoph Hellwig, Claire Chang
  Cc: heikki.krogerus, peterz, grant.likely, paulus, frowand.list,
	mingo, m.szyprowski, sstabellini, saravanak, joro,
	rafael.j.wysocki, bgolaszewski, xen-devel, treding, devicetree,
	will, konrad.wilk, dan.j.williams, robh+dt, boris.ostrovsky,
	andriy.shevchenko, jgross, drinkcat, gregkh, rdunlap,
	linux-kernel, tfiga, iommu, xypron.glpk, linuxppc-dev, bauerman
In-Reply-To: <20210113124847.GC1383@lst.de>

On 2021-01-13 12:48, Christoph Hellwig wrote:
>> +#ifdef CONFIG_SWIOTLB
>> +	if (unlikely(dev->dma_io_tlb_mem))
>> +		return swiotlb_alloc(dev, size, dma_handle, attrs);
>> +#endif
> 
> Another place where the dma_io_tlb_mem is useful to avoid the ifdef.
> 
>> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
>> -		size_t mapping_size, size_t alloc_size,
>> -		enum dma_data_direction dir, unsigned long attrs)
>> +static int swiotlb_tbl_find_free_region(struct device *hwdev,
>> +					dma_addr_t tbl_dma_addr,
>> +					size_t alloc_size,
>> +					unsigned long attrs)
> 
>> +static void swiotlb_tbl_release_region(struct device *hwdev, int index,
>> +				       size_t size)
> 
> This refactoring should be another prep patch.
> 
> 
>> +void *swiotlb_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
>> +		    unsigned long attrs)
> 
> I'd rather have the names convey there are for the per-device bounce
> buffer in some form.
> 
>> +	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> 
> While we're at it I wonder if the io_tlb is something we could change
> while we're at it.  Maybe replace io_tlb_mem with struct swiotlb
> and rename the field in struct device to dev_swiotlb?
> 
>> +	int index;
>> +	void *vaddr;
>> +	phys_addr_t tlb_addr;
>> +
>> +	size = PAGE_ALIGN(size);
>> +	index = swiotlb_tbl_find_free_region(dev, mem->start, size, attrs);
>> +	if (index < 0)
>> +		return NULL;
>> +
>> +	tlb_addr = mem->start + (index << IO_TLB_SHIFT);
>> +	*dma_handle = phys_to_dma_unencrypted(dev, tlb_addr);
>> +
>> +	if (!dev_is_dma_coherent(dev)) {
>> +		unsigned long pfn = PFN_DOWN(tlb_addr);
>> +
>> +		/* remove any dirty cache lines on the kernel alias */
>> +		arch_dma_prep_coherent(pfn_to_page(pfn), size);
> 
> Can we hook in somewhat lower level in the dma-direct code so that all
> the remapping in dma-direct can be reused instead of duplicated?  That
> also becomes important if we want to use non-remapping uncached support,
> e.g. on mips or x86, or the direct changing of the attributes that Will
> planned to look into for arm64.

Indeed, AFAICS this ought to boil down to a direct equivalent of 
__dma_direct_alloc_pages() - other than the address there should be no 
conceptual difference between pages from the restricted pool and those 
from the regular page allocator, so this probably deserves to be plumbed 
in as an alternative to that.

Robin.

^ permalink raw reply

* Re: [RFC PATCH v3 4/6] swiotlb: Add restricted DMA alloc/free support.
From: Christoph Hellwig @ 2021-01-13 18:32 UTC (permalink / raw)
  To: Robin Murphy
  Cc: heikki.krogerus, peterz, grant.likely, paulus, frowand.list,
	mingo, m.szyprowski, sstabellini, saravanak, xypron.glpk, joro,
	rafael.j.wysocki, Christoph Hellwig, bgolaszewski, xen-devel,
	treding, devicetree, will, konrad.wilk, dan.j.williams, robh+dt,
	Claire Chang, boris.ostrovsky, andriy.shevchenko, jgross,
	drinkcat, gregkh, rdunlap, linux-kernel, tfiga, iommu,
	linuxppc-dev, bauerman
In-Reply-To: <82bb75bc-11e6-ac94-9d24-7c896e3aae98@arm.com>

On Wed, Jan 13, 2021 at 06:27:08PM +0000, Robin Murphy wrote:
>> Can we hook in somewhat lower level in the dma-direct code so that all
>> the remapping in dma-direct can be reused instead of duplicated?  That
>> also becomes important if we want to use non-remapping uncached support,
>> e.g. on mips or x86, or the direct changing of the attributes that Will
>> planned to look into for arm64.
>
> Indeed, AFAICS this ought to boil down to a direct equivalent of 
> __dma_direct_alloc_pages() - other than the address there should be no 
> conceptual difference between pages from the restricted pool and those from 
> the regular page allocator, so this probably deserves to be plumbed in as 
> an alternative to that.

Yes, that's what I mean.  You managed to word it much better, though.

^ permalink raw reply

* Re: [PATCH v4 21/21] ibmvfc: provide modules parameters for MQ settings
From: Brian King @ 2021-01-13 18:57 UTC (permalink / raw)
  To: Tyrel Datwyler, james.bottomley
  Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20210111231225.105347-22-tyreld@linux.ibm.com>

On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> @@ -5880,12 +5936,13 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>  
>  	shost->transportt = ibmvfc_transport_template;
>  	shost->can_queue = max_requests;
> +	shost->can_queue = (max_requests / nr_scsi_hw_queues);

This looks to be in conflict with the fact that the first patch requested a shared tag set, right?

>  	shost->max_lun = max_lun;
>  	shost->max_id = max_targets;
>  	shost->max_sectors = IBMVFC_MAX_SECTORS;
>  	shost->max_cmd_len = IBMVFC_MAX_CDB_LEN;
>  	shost->unique_id = shost->host_no;
> -	shost->nr_hw_queues = IBMVFC_MQ ? IBMVFC_SCSI_HW_QUEUES : 1;
> +	shost->nr_hw_queues = mq_enabled ? nr_scsi_hw_queues : 1;

You might want to range check this, to make sure its sane.
>  
>  	vhost = shost_priv(shost);
>  	INIT_LIST_HEAD(&vhost->targets);
> @@ -5897,8 +5954,8 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
>  	vhost->log_level = log_level;
>  	vhost->task_set = 1;
>  
> -	vhost->mq_enabled = IBMVFC_MQ;
> -	vhost->client_scsi_channels = IBMVFC_SCSI_CHANNELS;
> +	vhost->mq_enabled = mq_enabled;
> +	vhost->client_scsi_channels = min(nr_scsi_hw_queues, nr_scsi_channels);
>  	vhost->using_channels = 0;
>  	vhost->do_enquiry = 1;
>  
> 


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center


^ permalink raw reply

* Re: [PATCH v4 00/21] ibmvfc: initial MQ development
From: Brian King @ 2021-01-13 18:58 UTC (permalink / raw)
  To: Tyrel Datwyler, james.bottomley
  Cc: brking, linuxppc-dev, linux-scsi, martin.petersen, linux-kernel
In-Reply-To: <20210111231225.105347-1-tyreld@linux.ibm.com>

With the exception of the few comments I've shared, the rest of this looks
good to me and you can add my:

Reviewed-by: Brian King <brking@linux.vnet.ibm.com>

Thanks,

Brian

-- 
Brian King
Power Linux I/O
IBM Linux Technology Center


^ permalink raw reply

* Re: [PATCH v4 01/21] ibmvfc: add vhost fields and defaults for MQ enablement
From: Ming Lei @ 2021-01-14  1:27 UTC (permalink / raw)
  To: Brian King
  Cc: Tyrel Datwyler, martin.petersen, linux-scsi, linux-kernel,
	james.smart, james.bottomley, brking, linuxppc-dev
In-Reply-To: <51bfc34b-c2c4-bf14-c903-d37015f65361@linux.vnet.ibm.com>

On Wed, Jan 13, 2021 at 11:13:07AM -0600, Brian King wrote:
> On 1/12/21 6:33 PM, Tyrel Datwyler wrote:
> > On 1/12/21 2:54 PM, Brian King wrote:
> >> On 1/11/21 5:12 PM, Tyrel Datwyler wrote:
> >>> Introduce several new vhost fields for managing MQ state of the adapter
> >>> as well as initial defaults for MQ enablement.
> >>>
> >>> Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
> >>> ---
> >>>  drivers/scsi/ibmvscsi/ibmvfc.c | 8 ++++++++
> >>>  drivers/scsi/ibmvscsi/ibmvfc.h | 9 +++++++++
> >>>  2 files changed, 17 insertions(+)
> >>>
> >>> diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
> >>> index ba95438a8912..9200fe49c57e 100644
> >>> --- a/drivers/scsi/ibmvscsi/ibmvfc.c
> >>> +++ b/drivers/scsi/ibmvscsi/ibmvfc.c
> >>> @@ -3302,6 +3302,7 @@ static struct scsi_host_template driver_template = {
> >>>  	.max_sectors = IBMVFC_MAX_SECTORS,
> >>>  	.shost_attrs = ibmvfc_attrs,
> >>>  	.track_queue_depth = 1,
> >>> +	.host_tagset = 1,
> >>
> >> This doesn't seem right. You are setting host_tagset, which means you want a
> >> shared, host wide, tag set for commands. It also means that the total
> >> queue depth for the host is can_queue. However, it looks like you are allocating
> >> max_requests events for each sub crq, which means you are over allocating memory.
> > 
> > With the shared tagset yes the queue depth for the host is can_queue, but this
> > also implies that the max queue depth for each hw queue is also can_queue. So,
> > in the worst case that all commands are queued down the same hw queue we need an
> > event pool with can_queue commands.
> > 
> >>
> >> Looking at this closer, we might have bigger problems. There is a host wide
> >> max number of commands that the VFC host supports, which gets returned on
> >> NPIV Login. This value can change across a live migration event.
> > 
> > From what I understand the max commands can only become less.
> > 
> >>
> >> The ibmvfc driver, which does the same thing the lpfc driver does, modifies
> >> can_queue on the scsi_host *after* the tag set has been allocated. This looks
> >> to be a concern with ibmvfc, not sure about lpfc, as it doesn't look like
> >> we look at can_queue once the tag set is setup, and I'm not seeing a good way
> >> to dynamically change the host queue depth once the tag set is setup. 
> >>
> >> Unless I'm missing something, our best options appear to either be to implement
> >> our own host wide busy reference counting, which doesn't sound very good, or
> >> we need to add some API to block / scsi that allows us to dynamically change
> >> can_queue.
> > 
> > Changing can_queue won't do use any good with the shared tagset becasue each
> > queue still needs to be able to queue can_queue number of commands in the worst
> > case.
> 
> The issue I'm trying to highlight here is the following scenario:
> 
> 1. We set shost->can_queue, then call scsi_add_host, which allocates the tag set.
> 
> 2. On our NPIV login response from the VIOS, we might get a lower value than we
> initially set in shost->can_queue, so we update it, but nobody ever looks at it
> again, and we don't have any protection against sending too many commands to the host.
> 
> 
> Basically, we no longer have any code that ensures we don't send more
> commands to the VIOS than we are told it supports. According to the architecture,
> if we actually do this, the VIOS will do an h_free_crq, which would be a bit
> of a bug on our part.
> 
> I don't think it was ever clearly defined in the API that a driver can
> change shost->can_queue after calling scsi_add_host, but up until
> commit 6eb045e092efefafc6687409a6fa6d1dabf0fb69, this worked and now
> it doesn't. 

Actually it isn't related with commit 6eb045e092ef, because blk_mq_alloc_tag_set()
uses .can_queue to create driver tag sbitmap and request pool.

So even thought without 6eb045e092ef, the updated .can_queue can't work
as expected because the max driver tag depth has been fixed by blk-mq already.

What 6eb045e092ef does is just to remove the double check on max
host-wide allowed commands because that has been respected by blk-mq
driver tag allocation already.

> 
> I started looking through drivers that do this, and so far, it looks like the
> following drivers do: ibmvfc, lpfc, aix94xx, libfc, BusLogic, and likely others...
> 
> We probably need an API that lets us change shost->can_queue dynamically.

I'd suggest to confirm changing .can_queue is one real usecase.


Thanks,
Ming


^ permalink raw reply

* Re: [PATCH v2 0/7] Rid W=1 warnings in Ethernet
From: Jakub Kicinski @ 2021-01-14  2:35 UTC (permalink / raw)
  To: Lee Jones
  Cc: Paul Durrant, Kurt Kanzenbach, Alexei Starovoitov,
	Gustavo A. R. Silva, Peter Cammaert, Paul Mackerras,
	Sukadev Bhattiprolu, Wei Liu, Daniel Borkmann, John Fastabend,
	Santiago Leon, xen-devel, Grygorii Strashko, Thomas Falcon,
	Jesper Dangaard Brouer, Jens Osterkamp, Rusty Russell,
	Daris A Nevil, Lijun Pan, Ivan Khoronzhuk, Nicolas Pitre,
	Geoff Levand, netdev, linux-kernel, Erik Stahlman, John Allen,
	Utz Bacher, Dany Madden, bpf, linuxppc-dev, David S. Miller,
	Russell King
In-Reply-To: <20210113164123.1334116-1-lee.jones@linaro.org>

On Wed, 13 Jan 2021 16:41:16 +0000 Lee Jones wrote:
> Resending the stragglers again.                                                                                  
> 
> This set is part of a larger effort attempting to clean-up W=1                                                   
> kernel builds, which are currently overwhelmingly riddled with                                                   
> niggly little warnings.                                                                                          
>                                                                                                                  
> v2:                                                                                                              
>  - Squashed IBM patches                                                                                      
>  - Fixed real issue in SMSC
>  - Added Andrew's Reviewed-by tags on remainder

Does not apply, please rebase on net-next/master.

^ permalink raw reply

* Re: [PATCH v5 02/21] powerpc/64s: move the last of the page fault handling logic to C
From: Nicholas Piggin @ 2021-01-14  3:24 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
In-Reply-To: <b3f8fffd-ebbe-277d-9c71-cf3a6d8c4475@csgroup.eu>

Excerpts from Christophe Leroy's message of January 14, 2021 12:12 am:
> 
> 
> Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :
>> The page fault handling still has some complex logic particularly around
>> hash table handling, in asm. Implement this in C instead.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 +
>>   arch/powerpc/kernel/exceptions-64s.S          | 131 +++---------------
>>   arch/powerpc/mm/book3s64/hash_utils.c         |  77 ++++++----
>>   arch/powerpc/mm/fault.c                       |  46 ++++--
>>   4 files changed, 107 insertions(+), 148 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>> index 066b1d34c7bc..60a669379aa0 100644
>> --- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>> +++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
>> @@ -454,6 +454,7 @@ static inline unsigned long hpt_hash(unsigned long vpn,
>>   #define HPTE_NOHPTE_UPDATE	0x2
>>   #define HPTE_USE_KERNEL_KEY	0x4
>>   
>> +int do_hash_fault(struct pt_regs *regs, unsigned long ea, unsigned long dsisr);
>>   extern int __hash_page_4K(unsigned long ea, unsigned long access,
>>   			  unsigned long vsid, pte_t *ptep, unsigned long trap,
>>   			  unsigned long flags, int ssize, int subpage_prot);
>> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
>> index 6e53f7638737..bcb5e81d2088 100644
>> --- a/arch/powerpc/kernel/exceptions-64s.S
>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>> @@ -1401,14 +1401,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
>>    *
>>    * Handling:
>>    * - Hash MMU
>> - *   Go to do_hash_page first to see if the HPT can be filled from an entry in
>> - *   the Linux page table. Hash faults can hit in kernel mode in a fairly
>> + *   Go to do_hash_fault, which attempts to fill the HPT from an entry in the
>> + *   Linux page table. Hash faults can hit in kernel mode in a fairly
>>    *   arbitrary state (e.g., interrupts disabled, locks held) when accessing
>>    *   "non-bolted" regions, e.g., vmalloc space. However these should always be
>> - *   backed by Linux page tables.
>> + *   backed by Linux page table entries.
>>    *
>> - *   If none is found, do a Linux page fault. Linux page faults can happen in
>> - *   kernel mode due to user copy operations of course.
>> + *   If no entry is found the Linux page fault handler is invoked (by
>> + *   do_hash_fault). Linux page faults can happen in kernel mode due to user
>> + *   copy operations of course.
>>    *
>>    *   KVM: The KVM HDSI handler may perform a load with MSR[DR]=1 in guest
>>    *   MMU context, which may cause a DSI in the host, which must go to the
>> @@ -1439,13 +1440,17 @@ EXC_COMMON_BEGIN(data_access_common)
>>   	GEN_COMMON data_access
>>   	ld	r4,_DAR(r1)
>>   	ld	r5,_DSISR(r1)
> 
> We have DSISR here. I think the dispatch between page fault or do_break() should be done here:
> - It would be more similar to other arches

Other sub-archs?

> - Would avoid doing it also in instruction fault

True but it's hidden under an unlikely branch so won't really help 
instruction fault.

> - Would avoid that -1 return which looks more like a hack.

I don't really see it as a hack, we return a code to asm caller to
direct whether to restore registers or not, we alrady have this
pattern.

(I'm hoping all that might be go away one day by conrolling NV
regs from C if we can get good code generation but even if not we
still have it in the interrupt returns).

That said I will give it a try here. At very least it might be a
better intermediate step.

[snip]

>> +#ifdef CONFIG_PPC_BOOK3S_64
> 
> Seems like you are re-implementing handle_page_fault() inside do_page_fault(). Wouldn't it be 
> possible to keep do_page_fault() as is for the moment and implement a C version of handle_page_fault() ?

The test goes in a better place (existing unlikely branch) if we do it 
in do_page_fault.

> Or just keep it in assembly ? It is not that big, keeping it in assembly would keep things more 
> common with PPC32, and would still allow to save NV GPRS only when needed.

I think it's better to go the other way and move more of the other archs 
to C (in general that is, but for this patch as I said I will try the DABR
test in asm).

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v5 04/21] powerpc: bad_page_fault, do_break get registers from regs
From: Nicholas Piggin @ 2021-01-14  3:26 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
In-Reply-To: <27997243-fbec-acb5-6399-f0ee4cccfa27@csgroup.eu>

Excerpts from Christophe Leroy's message of January 14, 2021 12:25 am:
> 
> 
> Le 13/01/2021 à 08:31, Nicholas Piggin a écrit :
>> Similar to the previous patch this makes interrupt handler function
>> types more regular so they can be wrapped with the next patch.
>> 
>> bad_page_fault and do_break are not performance critical.
> 
> It's a bit different between do_break() and bad_page_fault():
> - do_break() is not performance critical for sure
> - bad_page_fault(), it doesn't matter, because bad_page_fault() was not using the address param so 
> it doesn't get anything from regs at the end.
> 
> Maybe it would be worth splitting in two patches, one for bad_page_fault() and one for do_break()

Okay I'll try it.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v5 06/21] powerpc: interrupt handler wrapper functions
From: Nicholas Piggin @ 2021-01-14  3:41 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
In-Reply-To: <b9710efb-5790-0206-237f-fd184015c811@csgroup.eu>

Excerpts from Christophe Leroy's message of January 14, 2021 12:45 am:
> 
> 
> Le 13/01/2021 à 08:32, Nicholas Piggin a écrit :
>> Add wrapper functions (derived from x86 macros) for interrupt handler
>> functions. This allows interrupt entry code to be written in C.
> 
> Looks like you are doing more than just that is this patch. WOuld be worth splitting in several 
> patches I think,
> 
> I'd suggest:
> - Other patches for unrelated changes, see below for details
> - One patch that brings the wrapper macros
> - One patch that uses those macros
> 
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   arch/powerpc/include/asm/asm-prototypes.h     |  29 ---
>>   arch/powerpc/include/asm/book3s/64/mmu-hash.h |   1 -
>>   arch/powerpc/include/asm/hw_irq.h             |   9 -
>>   arch/powerpc/include/asm/interrupt.h          | 218 ++++++++++++++++++
>>   arch/powerpc/include/asm/time.h               |   2 +
>>   arch/powerpc/kernel/dbell.c                   |  12 +-
>>   arch/powerpc/kernel/exceptions-64s.S          |   7 +-
>>   arch/powerpc/kernel/head_book3s_32.S          |   6 +-
>>   arch/powerpc/kernel/irq.c                     |   3 +-
>>   arch/powerpc/kernel/mce.c                     |   5 +-
>>   arch/powerpc/kernel/syscall_64.c              |   1 +
>>   arch/powerpc/kernel/tau_6xx.c                 |   2 +-
>>   arch/powerpc/kernel/time.c                    |   3 +-
>>   arch/powerpc/kernel/traps.c                   |  90 +++++---
>>   arch/powerpc/kernel/watchdog.c                |   7 +-
>>   arch/powerpc/kvm/book3s_hv.c                  |   1 +
>>   arch/powerpc/kvm/book3s_hv_builtin.c          |   1 +
>>   arch/powerpc/kvm/booke.c                      |   1 +
>>   arch/powerpc/mm/book3s64/hash_utils.c         |  57 +++--
>>   arch/powerpc/mm/book3s64/slb.c                |  29 +--
>>   arch/powerpc/mm/fault.c                       |  15 +-
>>   arch/powerpc/platforms/powernv/idle.c         |   1 +
>>   22 files changed, 374 insertions(+), 126 deletions(-)
>>   create mode 100644 arch/powerpc/include/asm/interrupt.h
>> 
> 
> ...
> 
>> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
>> new file mode 100644
>> index 000000000000..60363e5eeffa
>> --- /dev/null
>> +++ b/arch/powerpc/include/asm/interrupt.h
> 
> ...
> 
>> +/* Interrupt handlers */
>> +DECLARE_INTERRUPT_HANDLER_NMI(machine_check_early);
>> +DECLARE_INTERRUPT_HANDLER_NMI(hmi_exception_realmode);
>> +DECLARE_INTERRUPT_HANDLER(SMIException);
>> +DECLARE_INTERRUPT_HANDLER(handle_hmi_exception);
>> +DECLARE_INTERRUPT_HANDLER(instruction_breakpoint_exception);
>> +DECLARE_INTERRUPT_HANDLER(RunModeException);
>> +DECLARE_INTERRUPT_HANDLER(single_step_exception);
>> +DECLARE_INTERRUPT_HANDLER(program_check_exception);
>> +DECLARE_INTERRUPT_HANDLER(alignment_exception);
>> +DECLARE_INTERRUPT_HANDLER(StackOverflow);
>> +DECLARE_INTERRUPT_HANDLER(stack_overflow_exception);
>> +DECLARE_INTERRUPT_HANDLER(kernel_fp_unavailable_exception);
>> +DECLARE_INTERRUPT_HANDLER(altivec_unavailable_exception);
>> +DECLARE_INTERRUPT_HANDLER(vsx_unavailable_exception);
>> +DECLARE_INTERRUPT_HANDLER(fp_unavailable_tm);
>> +DECLARE_INTERRUPT_HANDLER(altivec_unavailable_tm);
>> +DECLARE_INTERRUPT_HANDLER(vsx_unavailable_tm);
>> +DECLARE_INTERRUPT_HANDLER(facility_unavailable_exception);
>> +DECLARE_INTERRUPT_HANDLER_ASYNC(TAUException);
>> +DECLARE_INTERRUPT_HANDLER(altivec_assist_exception);
>> +DECLARE_INTERRUPT_HANDLER(unrecoverable_exception);
>> +DECLARE_INTERRUPT_HANDLER(kernel_bad_stack);
>> +DECLARE_INTERRUPT_HANDLER_NMI(system_reset_exception);
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +DECLARE_INTERRUPT_HANDLER_ASYNC(machine_check_exception);
>> +#else
>> +DECLARE_INTERRUPT_HANDLER_NMI(machine_check_exception);
>> +#endif
>> +DECLARE_INTERRUPT_HANDLER(emulation_assist_interrupt);
>> +DECLARE_INTERRUPT_HANDLER_RAW(do_slb_fault);
>> +DECLARE_INTERRUPT_HANDLER(do_bad_slb_fault);
>> +DECLARE_INTERRUPT_HANDLER_RAW(do_hash_fault);
>> +DECLARE_INTERRUPT_HANDLER_RET(do_page_fault);
>> +DECLARE_INTERRUPT_HANDLER(__do_bad_page_fault);
>> +DECLARE_INTERRUPT_HANDLER(do_bad_page_fault);
> 
> Missing DECLARE_INTERRUPT_HANDLER(do_break)
> 
>> +
>> +DECLARE_INTERRUPT_HANDLER_ASYNC(timer_interrupt);
>> +DECLARE_INTERRUPT_HANDLER_NMI(performance_monitor_exception_nmi);
>> +DECLARE_INTERRUPT_HANDLER_ASYNC(performance_monitor_exception_async);
>> +DECLARE_INTERRUPT_HANDLER_RAW(performance_monitor_exception);
>> +DECLARE_INTERRUPT_HANDLER(WatchdogException);
>> +DECLARE_INTERRUPT_HANDLER(unknown_exception);
>> +DECLARE_INTERRUPT_HANDLER_ASYNC(unknown_async_exception);
>> +
>> +void replay_system_reset(void);
>> +void replay_soft_interrupts(void);
>> +
>> +#endif /* _ASM_POWERPC_INTERRUPT_H */
>> diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
>> index 8f789b597bae..8dd3cdb25338 100644
>> --- a/arch/powerpc/include/asm/time.h
>> +++ b/arch/powerpc/include/asm/time.h
>> @@ -102,6 +102,8 @@ DECLARE_PER_CPU(u64, decrementers_next_tb);
>>   /* Convert timebase ticks to nanoseconds */
>>   unsigned long long tb_to_ns(unsigned long long tb_ticks);
>>   
>> +void timer_broadcast_interrupt(void);
> 
> This seems unrelated. I think a separate patch woud be better for moving prototypes without making 
> them wrappers.

Yeah this might have just slipped in, thanks.

>> +
>>   /* SPLPAR */
>>   void accumulate_stolen_time(void);
>>   
>> diff --git a/arch/powerpc/kernel/dbell.c b/arch/powerpc/kernel/dbell.c
>> index 52680cf07c9d..c0f99f8ffa7d 100644
>> --- a/arch/powerpc/kernel/dbell.c
>> +++ b/arch/powerpc/kernel/dbell.c
>> @@ -12,14 +12,14 @@
>>   #include <linux/hardirq.h>
>>   
>>   #include <asm/dbell.h>
>> +#include <asm/interrupt.h>
>>   #include <asm/irq_regs.h>
>>   #include <asm/kvm_ppc.h>
>>   #include <asm/trace.h>
>>   
>> -#ifdef CONFIG_SMP
>> -
> 
> This seems unrelated, is that needed ? What's the problem with having to full versions of 
> doorbell_exception() ?

No real problem I think, it might have been from some earlier work in 
progress. I can adjust it.

> 
>> -void doorbell_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER_ASYNC(doorbell_exception)
>>   {
>> +#ifdef CONFIG_SMP
>>   	struct pt_regs *old_regs = set_irq_regs(regs);
>>   
>>   	irq_enter();
>> @@ -37,11 +37,7 @@ void doorbell_exception(struct pt_regs *regs)
>>   	trace_doorbell_exit(regs);
>>   	irq_exit();
>>   	set_irq_regs(old_regs);
>> -}
>>   #else /* CONFIG_SMP */
>> -void doorbell_exception(struct pt_regs *regs)
>> -{
>>   	printk(KERN_WARNING "Received doorbell on non-smp system\n");
>> -}
>>   #endif /* CONFIG_SMP */
>> -
>> +}
>> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
>> index 36dea2020ec5..8b0db807974c 100644
>> --- a/arch/powerpc/kernel/exceptions-64s.S
>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>> @@ -1923,7 +1923,7 @@ EXC_COMMON_BEGIN(doorbell_super_common)
>>   #ifdef CONFIG_PPC_DOORBELL
>>   	bl	doorbell_exception
>>   #else
>> -	bl	unknown_exception
>> +	bl	unknown_async_exception
> 
> Unrelated  to wrappers ?

Well there's now a difference between sync and async exceptions, but it 
could be introduced in a different patch.

>>   #endif
>>   	b	interrupt_return
>>   
>> @@ -2136,8 +2136,7 @@ EXC_COMMON_BEGIN(h_data_storage_common)
>>   	GEN_COMMON h_data_storage
>>   	addi    r3,r1,STACK_FRAME_OVERHEAD
>>   BEGIN_MMU_FTR_SECTION
>> -	li	r4,SIGSEGV
>> -	bl      bad_page_fault
>> +	bl      do_bad_page_fault
> 
> Is this name change related ?

The do_ variant is a "handler", the other can be called from C.

>>   MMU_FTR_SECTION_ELSE
>>   	bl      unknown_exception
>>   ALT_MMU_FTR_SECTION_END_IFSET(MMU_FTR_TYPE_RADIX)
>> @@ -2310,7 +2309,7 @@ EXC_COMMON_BEGIN(h_doorbell_common)
>>   #ifdef CONFIG_PPC_DOORBELL
>>   	bl	doorbell_exception
>>   #else
>> -	bl	unknown_exception
>> +	bl	unknown_async_exception
>>   #endif
>>   	b	interrupt_return
>>   
>> diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
>> index 94ad1372c490..9b4d5432e2db 100644
>> --- a/arch/powerpc/kernel/head_book3s_32.S
>> +++ b/arch/powerpc/kernel/head_book3s_32.S
>> @@ -238,8 +238,8 @@ __secondary_hold_acknowledge:
>>   
>>   /* System reset */
>>   /* core99 pmac starts the seconary here by changing the vector, and
>> -   putting it back to what it was (unknown_exception) when done.  */
>> -	EXCEPTION(0x100, Reset, unknown_exception, EXC_XFER_STD)
>> +   putting it back to what it was (unknown_async_exception) when done.  */
>> +	EXCEPTION(0x100, Reset, unknown_async_exception, EXC_XFER_STD)
>>   
>>   /* Machine check */
>>   /*
>> @@ -631,7 +631,7 @@ END_MMU_FTR_SECTION_IFSET(MMU_FTR_NEED_DTLB_SW_LRU)
>>   #endif
>>   
>>   #ifndef CONFIG_TAU_INT
>> -#define TAUException	unknown_exception
>> +#define TAUException	unknown_async_exception
>>   #endif
>>   
>>   	EXCEPTION(0x1300, Trap_13, instruction_breakpoint_exception, EXC_XFER_STD)
>> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
>> index 6b1eca53e36c..2055d204d08e 100644
>> --- a/arch/powerpc/kernel/irq.c
>> +++ b/arch/powerpc/kernel/irq.c
>> @@ -54,6 +54,7 @@
>>   #include <linux/pgtable.h>
>>   
>>   #include <linux/uaccess.h>
>> +#include <asm/interrupt.h>
>>   #include <asm/io.h>
>>   #include <asm/irq.h>
>>   #include <asm/cache.h>
>> @@ -665,7 +666,7 @@ void __do_irq(struct pt_regs *regs)
>>   	irq_exit();
>>   }
>>   
>> -void do_IRQ(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER_ASYNC(do_IRQ)
>>   {
>>   	struct pt_regs *old_regs = set_irq_regs(regs);
>>   	void *cursp, *irqsp, *sirqsp;
>> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
>> index 9f3e133b57b7..54269947113d 100644
>> --- a/arch/powerpc/kernel/mce.c
>> +++ b/arch/powerpc/kernel/mce.c
>> @@ -18,6 +18,7 @@
>>   #include <linux/extable.h>
>>   #include <linux/ftrace.h>
>>   
>> +#include <asm/interrupt.h>
>>   #include <asm/machdep.h>
>>   #include <asm/mce.h>
>>   #include <asm/nmi.h>
>> @@ -588,7 +589,7 @@ EXPORT_SYMBOL_GPL(machine_check_print_event_info);
>>    *
>>    * regs->nip and regs->msr contains srr0 and ssr1.
>>    */
>> -long notrace machine_check_early(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER_NMI(machine_check_early)
>>   {
>>   	long handled = 0;
>>   	u8 ftrace_enabled = this_cpu_get_ftrace_enabled();
>> @@ -722,7 +723,7 @@ long hmi_handle_debugtrig(struct pt_regs *regs)
>>   /*
>>    * Return values:
>>    */
>> -long hmi_exception_realmode(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER_NMI(hmi_exception_realmode)
>>   {	
>>   	int ret;
>>   
>> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
>> index 7c85ed04a164..dd87b2118620 100644
>> --- a/arch/powerpc/kernel/syscall_64.c
>> +++ b/arch/powerpc/kernel/syscall_64.c
>> @@ -5,6 +5,7 @@
>>   #include <asm/kup.h>
>>   #include <asm/cputime.h>
>>   #include <asm/hw_irq.h>
>> +#include <asm/interrupt.h>
>>   #include <asm/kprobes.h>
>>   #include <asm/paca.h>
>>   #include <asm/ptrace.h>
>> diff --git a/arch/powerpc/kernel/tau_6xx.c b/arch/powerpc/kernel/tau_6xx.c
>> index 0b4694b8d248..46b2e5de4ef5 100644
>> --- a/arch/powerpc/kernel/tau_6xx.c
>> +++ b/arch/powerpc/kernel/tau_6xx.c
>> @@ -100,7 +100,7 @@ static void TAUupdate(int cpu)
>>    * with interrupts disabled
>>    */
>>   
>> -void TAUException(struct pt_regs * regs)
>> +DEFINE_INTERRUPT_HANDLER_ASYNC(TAUException)
>>   {
>>   	int cpu = smp_processor_id();
>>   
>> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
>> index 67feb3524460..435a251247ed 100644
>> --- a/arch/powerpc/kernel/time.c
>> +++ b/arch/powerpc/kernel/time.c
>> @@ -56,6 +56,7 @@
>>   #include <linux/processor.h>
>>   #include <asm/trace.h>
>>   
>> +#include <asm/interrupt.h>
>>   #include <asm/io.h>
>>   #include <asm/nvram.h>
>>   #include <asm/cache.h>
>> @@ -570,7 +571,7 @@ void arch_irq_work_raise(void)
>>    * timer_interrupt - gets called when the decrementer overflows,
>>    * with interrupts disabled.
>>    */
>> -void timer_interrupt(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER_ASYNC(timer_interrupt)
>>   {
>>   	struct clock_event_device *evt = this_cpu_ptr(&decrementers);
>>   	u64 *next_tb = this_cpu_ptr(&decrementers_next_tb);
>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>> index 9b5298c016c7..f4462b481248 100644
>> --- a/arch/powerpc/kernel/traps.c
>> +++ b/arch/powerpc/kernel/traps.c
>> @@ -41,6 +41,7 @@
>>   #include <asm/emulated_ops.h>
>>   #include <linux/uaccess.h>
>>   #include <asm/debugfs.h>
>> +#include <asm/interrupt.h>
>>   #include <asm/io.h>
>>   #include <asm/machdep.h>
>>   #include <asm/rtas.h>
>> @@ -430,8 +431,7 @@ void hv_nmi_check_nonrecoverable(struct pt_regs *regs)
>>   	regs->msr &= ~MSR_RI;
>>   #endif
>>   }
>> -
>> -void system_reset_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER_NMI(system_reset_exception)
>>   {
>>   	unsigned long hsrr0, hsrr1;
>>   	bool saved_hsrrs = false;
>> @@ -516,7 +516,10 @@ void system_reset_exception(struct pt_regs *regs)
>>   	this_cpu_set_ftrace_enabled(ftrace_enabled);
>>   
>>   	/* What should we do here? We could issue a shutdown or hard reset. */
>> +
>> +	return 0;
>>   }
>> +NOKPROBE_SYMBOL(system_reset_exception);
> 
> Is this NOKPROBE_SYMBOL() related to wrappers or just a bug fix ?

Hmm, I don't remember off the top of my head now, may be bug fix but
I don't know why we haven't seen it already. I have tested MCE with
tracing I thought but maybe not carefully enough.

I will move them to another patch.

> 
>>   
>>   /*
>>    * I/O accesses can cause machine checks on powermacs.
>> @@ -788,7 +791,12 @@ int machine_check_generic(struct pt_regs *regs)
>>   }
>>   #endif /* everything else */
>>   
>> -void machine_check_exception(struct pt_regs *regs)
>> +
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +DEFINE_INTERRUPT_HANDLER_ASYNC(machine_check_exception)
>> +#else
>> +DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
>> +#endif
>>   {
>>   	int recover = 0;
>>   
>> @@ -838,13 +846,21 @@ void machine_check_exception(struct pt_regs *regs)
>>   	if (!(regs->msr & MSR_RI))
>>   		die("Unrecoverable Machine check", regs, SIGBUS);
>>   
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +bail:
>>   	return;
>> +#else
>> +	return 0;
>>   
>>   bail:
>>   	if (nmi) nmi_exit();
>> +
>> +	return 0;
>> +#endif
> 
> Looks fishy. Can't we have both returning either long or void ?

All handlers could return long, then you would have lots of pointless
`return 0` and `li r3,0`. HMI exception NMI wants to return something.

Arguably HMI should handle local_irq_disable with irq_work, like MCE
does. I have noted that in the HMI interrupt comment and will get to
it eventually, which should unify this.

> 
>>   }
>> +NOKPROBE_SYMBOL(machine_check_exception);
> 
> Is this NOKPROBE_SYMBOL() related to wrappers or just a bug fix ?
> 
>>   
>> -void SMIException(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(SMIException) /* async? */
>>   {
>>   	die("System Management Interrupt", regs, SIGABRT);
>>   }
>> @@ -1030,7 +1046,7 @@ static void p9_hmi_special_emu(struct pt_regs *regs)
>>   }
>>   #endif /* CONFIG_VSX */
>>   
>> -void handle_hmi_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER_ASYNC(handle_hmi_exception)
>>   {
>>   	struct pt_regs *old_regs;
>>   
>> @@ -1059,7 +1075,7 @@ void handle_hmi_exception(struct pt_regs *regs)
>>   	set_irq_regs(old_regs);
>>   }
>>   
>> -void unknown_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(unknown_exception)
>>   {
>>   	enum ctx_state prev_state = exception_enter();
>>   
>> @@ -1071,7 +1087,19 @@ void unknown_exception(struct pt_regs *regs)
>>   	exception_exit(prev_state);
>>   }
>>   
>> -void instruction_breakpoint_exception(struct pt_regs *regs)
> 
> shouldn't unknown_async_exception() be added in a preceeding patch ?

+1

>> +DEFINE_INTERRUPT_HANDLER_ASYNC(unknown_async_exception)
>> +{
>> +	enum ctx_state prev_state = exception_enter();
>> +
>> +	printk("Bad trap at PC: %lx, SR: %lx, vector=%lx\n",
>> +	       regs->nip, regs->msr, regs->trap);
>> +
>> +	_exception(SIGTRAP, regs, TRAP_UNK, 0);
>> +
>> +	exception_exit(prev_state);
>> +}
>> +
>> +DEFINE_INTERRUPT_HANDLER(instruction_breakpoint_exception)
>>   {
>>   	enum ctx_state prev_state = exception_enter();
>>   
>> @@ -1086,12 +1114,12 @@ void instruction_breakpoint_exception(struct pt_regs *regs)
>>   	exception_exit(prev_state);
>>   }
>>   
>> -void RunModeException(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(RunModeException)
>>   {
>>   	_exception(SIGTRAP, regs, TRAP_UNK, 0);
>>   }
>>   
>> -void single_step_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(single_step_exception)
>>   {
>>   	enum ctx_state prev_state = exception_enter();
>>   
>> @@ -1436,7 +1464,7 @@ static int emulate_math(struct pt_regs *regs)
>>   static inline int emulate_math(struct pt_regs *regs) { return -1; }
>>   #endif
>>   
>> -void program_check_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(program_check_exception)
>>   {
>>   	enum ctx_state prev_state = exception_enter();
>>   	unsigned int reason = get_reason(regs);
>> @@ -1561,14 +1589,14 @@ NOKPROBE_SYMBOL(program_check_exception);
>>    * This occurs when running in hypervisor mode on POWER6 or later
>>    * and an illegal instruction is encountered.
>>    */
>> -void emulation_assist_interrupt(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(emulation_assist_interrupt)
>>   {
>>   	regs->msr |= REASON_ILLEGAL;
>>   	program_check_exception(regs);
>>   }
>>   NOKPROBE_SYMBOL(emulation_assist_interrupt);
>>   
>> -void alignment_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(alignment_exception)
>>   {
>>   	enum ctx_state prev_state = exception_enter();
>>   	int sig, code, fixed = 0;
>> @@ -1618,7 +1646,7 @@ void alignment_exception(struct pt_regs *regs)
>>   	exception_exit(prev_state);
>>   }
>>   
>> -void StackOverflow(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(StackOverflow)
>>   {
>>   	pr_crit("Kernel stack overflow in process %s[%d], r1=%lx\n",
>>   		current->comm, task_pid_nr(current), regs->gpr[1]);
>> @@ -1627,7 +1655,7 @@ void StackOverflow(struct pt_regs *regs)
>>   	panic("kernel stack overflow");
>>   }
>>   
>> -void stack_overflow_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(stack_overflow_exception)
>>   {
>>   	enum ctx_state prev_state = exception_enter();
>>   
>> @@ -1636,7 +1664,7 @@ void stack_overflow_exception(struct pt_regs *regs)
>>   	exception_exit(prev_state);
>>   }
>>   
>> -void kernel_fp_unavailable_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(kernel_fp_unavailable_exception)
>>   {
>>   	enum ctx_state prev_state = exception_enter();
>>   
>> @@ -1647,7 +1675,7 @@ void kernel_fp_unavailable_exception(struct pt_regs *regs)
>>   	exception_exit(prev_state);
>>   }
>>   
>> -void altivec_unavailable_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(altivec_unavailable_exception)
>>   {
>>   	enum ctx_state prev_state = exception_enter();
>>   
>> @@ -1666,7 +1694,7 @@ void altivec_unavailable_exception(struct pt_regs *regs)
>>   	exception_exit(prev_state);
>>   }
>>   
>> -void vsx_unavailable_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(vsx_unavailable_exception)
>>   {
>>   	if (user_mode(regs)) {
>>   		/* A user program has executed an vsx instruction,
>> @@ -1697,7 +1725,7 @@ static void tm_unavailable(struct pt_regs *regs)
>>   	die("Unrecoverable TM Unavailable Exception", regs, SIGABRT);
>>   }
>>   
>> -void facility_unavailable_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(facility_unavailable_exception)
>>   {
>>   	static char *facility_strings[] = {
>>   		[FSCR_FP_LG] = "FPU",
>> @@ -1817,7 +1845,7 @@ void facility_unavailable_exception(struct pt_regs *regs)
>>   
>>   #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
>>   
>> -void fp_unavailable_tm(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(fp_unavailable_tm)
>>   {
>>   	/* Note:  This does not handle any kind of FP laziness. */
>>   
>> @@ -1850,7 +1878,7 @@ void fp_unavailable_tm(struct pt_regs *regs)
>>   	tm_recheckpoint(&current->thread);
>>   }
>>   
>> -void altivec_unavailable_tm(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(altivec_unavailable_tm)
>>   {
>>   	/* See the comments in fp_unavailable_tm().  This function operates
>>   	 * the same way.
>> @@ -1865,7 +1893,7 @@ void altivec_unavailable_tm(struct pt_regs *regs)
>>   	current->thread.used_vr = 1;
>>   }
>>   
>> -void vsx_unavailable_tm(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(vsx_unavailable_tm)
>>   {
>>   	/* See the comments in fp_unavailable_tm().  This works similarly,
>>   	 * though we're loading both FP and VEC registers in here.
>> @@ -1890,7 +1918,8 @@ void vsx_unavailable_tm(struct pt_regs *regs)
>>   }
>>   #endif /* CONFIG_PPC_TRANSACTIONAL_MEM */
>>   
>> -static void performance_monitor_exception_nmi(struct pt_regs *regs)
>> +#ifdef CONFIG_PPC64
>> +DEFINE_INTERRUPT_HANDLER_NMI(performance_monitor_exception_nmi)
>>   {
>>   	nmi_enter();
>>   
>> @@ -1899,9 +1928,12 @@ static void performance_monitor_exception_nmi(struct pt_regs *regs)
>>   	perf_irq(regs);
>>   
>>   	nmi_exit();
>> +
>> +	return 0;
>>   }
>> +#endif
>>   
>> -static void performance_monitor_exception_async(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER_ASYNC(performance_monitor_exception_async)
>>   {
>>   	irq_enter();
>>   
>> @@ -1912,7 +1944,7 @@ static void performance_monitor_exception_async(struct pt_regs *regs)
>>   	irq_exit();
>>   }
>>   
>> -void performance_monitor_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER_RAW(performance_monitor_exception)
>>   {
>>   	/*
>>   	 * On 64-bit, if perf interrupts hit in a local_irq_disable
>> @@ -1924,6 +1956,8 @@ void performance_monitor_exception(struct pt_regs *regs)
>>   		performance_monitor_exception_nmi(regs);
>>   	else
>>   		performance_monitor_exception_async(regs);
>> +
>> +	return 0;
>>   }
>>   
>>   #ifdef CONFIG_PPC_ADV_DEBUG_REGS
>> @@ -2057,7 +2091,7 @@ NOKPROBE_SYMBOL(DebugException);
>>   #endif /* CONFIG_PPC_ADV_DEBUG_REGS */
>>   
>>   #ifdef CONFIG_ALTIVEC
>> -void altivec_assist_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(altivec_assist_exception)
>>   {
>>   	int err;
>>   
>> @@ -2199,7 +2233,7 @@ void SPEFloatingPointRoundException(struct pt_regs *regs)
>>    * in the MSR is 0.  This indicates that SRR0/1 are live, and that
>>    * we therefore lost state by taking this exception.
>>    */
>> -void unrecoverable_exception(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(unrecoverable_exception)
>>   {
>>   	pr_emerg("Unrecoverable exception %lx at %lx (msr=%lx)\n",
>>   		 regs->trap, regs->nip, regs->msr);
>> @@ -2219,7 +2253,7 @@ void __attribute__ ((weak)) WatchdogHandler(struct pt_regs *regs)
>>   	return;
>>   }
>>   
>> -void WatchdogException(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(WatchdogException) /* XXX NMI? async? */
>>   {
>>   	printk (KERN_EMERG "PowerPC Book-E Watchdog Exception\n");
>>   	WatchdogHandler(regs);
>> @@ -2230,7 +2264,7 @@ void WatchdogException(struct pt_regs *regs)
>>    * We enter here if we discover during exception entry that we are
>>    * running in supervisor mode with a userspace value in the stack pointer.
>>    */
>> -void kernel_bad_stack(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER(kernel_bad_stack)
>>   {
>>   	printk(KERN_EMERG "Bad kernel stack pointer %lx at %lx\n",
>>   	       regs->gpr[1], regs->nip);
>> diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
>> index af3c15a1d41e..824b9376ac35 100644
>> --- a/arch/powerpc/kernel/watchdog.c
>> +++ b/arch/powerpc/kernel/watchdog.c
>> @@ -26,6 +26,7 @@
>>   #include <linux/delay.h>
>>   #include <linux/smp.h>
>>   
>> +#include <asm/interrupt.h>
>>   #include <asm/paca.h>
>>   
>>   /*
>> @@ -247,14 +248,14 @@ static void watchdog_timer_interrupt(int cpu)
>>   		watchdog_smp_panic(cpu, tb);
>>   }
>>   
>> -void soft_nmi_interrupt(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER_NMI(soft_nmi_interrupt)
>>   {
>>   	unsigned long flags;
>>   	int cpu = raw_smp_processor_id();
>>   	u64 tb;
>>   
>>   	if (!cpumask_test_cpu(cpu, &wd_cpus_enabled))
>> -		return;
>> +		return 0;
>>   
>>   	nmi_enter();
>>   
>> @@ -291,6 +292,8 @@ void soft_nmi_interrupt(struct pt_regs *regs)
>>   
>>   out:
>>   	nmi_exit();
>> +
>> +	return 0;
>>   }
>>   
>>   static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 6f612d240392..3f9a229f82a2 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -53,6 +53,7 @@
>>   #include <asm/cputable.h>
>>   #include <asm/cacheflush.h>
>>   #include <linux/uaccess.h>
>> +#include <asm/interrupt.h>
>>   #include <asm/io.h>
>>   #include <asm/kvm_ppc.h>
>>   #include <asm/kvm_book3s.h>
>> diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
>> index 8053efdf7ea7..10fc274bea65 100644
>> --- a/arch/powerpc/kvm/book3s_hv_builtin.c
>> +++ b/arch/powerpc/kvm/book3s_hv_builtin.c
>> @@ -17,6 +17,7 @@
>>   
>>   #include <asm/asm-prototypes.h>
>>   #include <asm/cputable.h>
>> +#include <asm/interrupt.h>
>>   #include <asm/kvm_ppc.h>
>>   #include <asm/kvm_book3s.h>
>>   #include <asm/archrandom.h>
>> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
>> index 288a9820ec01..bd2bb73021d8 100644
>> --- a/arch/powerpc/kvm/booke.c
>> +++ b/arch/powerpc/kvm/booke.c
>> @@ -20,6 +20,7 @@
>>   
>>   #include <asm/cputable.h>
>>   #include <linux/uaccess.h>
>> +#include <asm/interrupt.h>
>>   #include <asm/kvm_ppc.h>
>>   #include <asm/cacheflush.h>
>>   #include <asm/dbell.h>
>> diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
>> index 77073a256cff..453afb9ae9b4 100644
>> --- a/arch/powerpc/mm/book3s64/hash_utils.c
>> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
>> @@ -38,6 +38,7 @@
>>   #include <linux/pgtable.h>
>>   
>>   #include <asm/debugfs.h>
>> +#include <asm/interrupt.h>
>>   #include <asm/processor.h>
>>   #include <asm/mmu.h>
>>   #include <asm/mmu_context.h>
>> @@ -1512,7 +1513,7 @@ int hash_page(unsigned long ea, unsigned long access, unsigned long trap,
>>   }
>>   EXPORT_SYMBOL_GPL(hash_page);
>>   
>> -long do_hash_fault(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER_RET(__do_hash_fault)
>>   {
>>   	unsigned long ea = regs->dar;
>>   	unsigned long dsisr = regs->dsisr;
>> @@ -1522,27 +1523,6 @@ long do_hash_fault(struct pt_regs *regs)
>>   	unsigned int region_id;
>>   	long err;
>>   
>> -	if (unlikely(dsisr & (DSISR_BAD_FAULT_64S | DSISR_DABRMATCH | DSISR_KEYFAULT)))
>> -		goto page_fault;
>> -
>> -	/*
>> -	 * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then
>> -	 * don't call hash_page, just fail the fault. This is required to
>> -	 * prevent re-entrancy problems in the hash code, namely perf
>> -	 * interrupts hitting while something holds H_PAGE_BUSY, and taking a
>> -	 * hash fault. See the comment in hash_preload().
>> -	 *
>> -	 * We come here as a result of a DSI at a point where we don't want
>> -	 * to call hash_page, such as when we are accessing memory (possibly
>> -	 * user memory) inside a PMU interrupt that occurred while interrupts
>> -	 * were soft-disabled.  We want to invoke the exception handler for
>> -	 * the access, or panic if there isn't a handler.
>> -	 */
>> -	if (unlikely(in_nmi())) {
>> -		bad_page_fault(regs, SIGSEGV);
>> -		return 0;
>> -	}
>> -
>>   	region_id = get_region_id(ea);
>>   	if ((region_id == VMALLOC_REGION_ID) || (region_id == IO_REGION_ID))
>>   		mm = &init_mm;
>> @@ -1583,13 +1563,44 @@ long do_hash_fault(struct pt_regs *regs)
>>   		err = 0;
>>   
>>   	} else if (err) {
>> -page_fault:
>>   		err = do_page_fault(regs);
>>   	}
>>   
>>   	return err;
>>   }
>>   
>> +/*
>> + * The _RAW interrupt entry checks for the in_nmi() case before
>> + * running the full handler.
>> + */
>> +DEFINE_INTERRUPT_HANDLER_RAW(do_hash_fault)
> 
> Could we do that split into __do_hash_fault() / do_hash_fault() in a preceeding patch ?

Yeah sure.

>> +{
>> +	unsigned long dsisr = regs->dsisr;
>> +
>> +	if (unlikely(dsisr & (DSISR_BAD_FAULT_64S | DSISR_DABRMATCH | DSISR_KEYFAULT)))
>> +		return do_page_fault(regs);
>> +
>> +	/*
>> +	 * If we are in an "NMI" (e.g., an interrupt when soft-disabled), then
>> +	 * don't call hash_page, just fail the fault. This is required to
>> +	 * prevent re-entrancy problems in the hash code, namely perf
>> +	 * interrupts hitting while something holds H_PAGE_BUSY, and taking a
>> +	 * hash fault. See the comment in hash_preload().
>> +	 *
>> +	 * We come here as a result of a DSI at a point where we don't want
>> +	 * to call hash_page, such as when we are accessing memory (possibly
>> +	 * user memory) inside a PMU interrupt that occurred while interrupts
>> +	 * were soft-disabled.  We want to invoke the exception handler for
>> +	 * the access, or panic if there isn't a handler.
>> +	 */
>> +	if (unlikely(in_nmi())) {
>> +		do_bad_page_fault(regs);
>> +		return 0;
>> +	}
>> +
>> +	return __do_hash_fault(regs);
>> +}
>> +
>>   #ifdef CONFIG_PPC_MM_SLICES
>>   static bool should_hash_preload(struct mm_struct *mm, unsigned long ea)
>>   {
>> diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
>> index c581548b533f..0ae10adae203 100644
>> --- a/arch/powerpc/mm/book3s64/slb.c
>> +++ b/arch/powerpc/mm/book3s64/slb.c
>> @@ -10,6 +10,7 @@
>>    */
>>   
>>   #include <asm/asm-prototypes.h>
>> +#include <asm/interrupt.h>
>>   #include <asm/mmu.h>
>>   #include <asm/mmu_context.h>
>>   #include <asm/paca.h>
>> @@ -813,7 +814,7 @@ static long slb_allocate_user(struct mm_struct *mm, unsigned long ea)
>>   	return slb_insert_entry(ea, context, flags, ssize, false);
>>   }
>>   
>> -long do_slb_fault(struct pt_regs *regs)
>> +DEFINE_INTERRUPT_HANDLER_RAW(do_slb_fault)
>>   {
>>   	unsigned long ea = regs->dar;
>>   	unsigned long id = get_region_id(ea);
>> @@ -827,17 +828,19 @@ long do_slb_fault(struct pt_regs *regs)
>>   	/*
>>   	 * SLB kernel faults must be very careful not to touch anything
>>   	 * that is not bolted. E.g., PACA and global variables are okay,
>> -	 * mm->context stuff is not.
>> -	 *
>> -	 * SLB user faults can access all of kernel memory, but must be
>> -	 * careful not to touch things like IRQ state because it is not
>> -	 * "reconciled" here. The difficulty is that we must use
>> -	 * fast_exception_return to return from kernel SLB faults without
>> -	 * looking at possible non-bolted memory. We could test user vs
>> -	 * kernel faults in the interrupt handler asm and do a full fault,
>> -	 * reconcile, ret_from_except for user faults which would make them
>> -	 * first class kernel code. But for performance it's probably nicer
>> -	 * if they go via fast_exception_return too.
>> +	 * mm->context stuff is not. SLB user faults may access all of
>> +	 * memory (and induce one recursive SLB kernel fault), so the
>> +	 * kernel fault must not trample on the user fault state at those
>> +	 * points.
>> +	 */
>> +
>> +	/*
>> +	 * This is a _RAW interrupt handler, so it must not touch local
>> +	 * irq state, or schedule. We could test for usermode and upgrade
>> +	 * to a normal process context (synchronous) interrupt for those,
>> +	 * which would make them first-class kernel code and able to be
>> +	 * traced and instrumented, although performance would suffer a
>> +	 * bit, it would probably be a good tradeoff.
> 
> Is the comment change really related to the wrapper macros ?

You're right. Aneesh wanted comments for _RAW so I added them but 
pointed out that the restriction already exists, it's just that it 
may not have been obvious before.

Thanks for the good review.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v5 09/21] powerpc/64: context tracking remove _TIF_NOHZ
From: Nicholas Piggin @ 2021-01-14  3:48 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
In-Reply-To: <6527f4ec-2011-75b3-1db2-ec6d9b120dd9@csgroup.eu>

Excerpts from Christophe Leroy's message of January 14, 2021 12:50 am:
> 
> 
> Le 13/01/2021 à 08:32, Nicholas Piggin a écrit :
>> Add context tracking to the system call handler explicitly, and remove
>> _TIF_NOHZ.
>> 
>> This saves 35 cycles on gettid system call cost on POWER9 with a
>> CONFIG_NOHZ_FULL kernel.
> 
> 35 cycles among 100 cycles, or among 5000 cycles ? I meant what pourcentage to you win ?

I can re-check when I update and retest. On the order of about 500
IIRC, so quite significant proportion of the cost.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v5 15/21] powerpc/64s: reconcile interrupts in C
From: Nicholas Piggin @ 2021-01-14  3:51 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
In-Reply-To: <c035e8cc-642b-819a-343e-f7d0b0307315@csgroup.eu>

Excerpts from Christophe Leroy's message of January 14, 2021 12:54 am:
> 
> 
> Le 13/01/2021 à 08:32, Nicholas Piggin a écrit :
>> There is no need for this to be in asm, use the new intrrupt entry wrapper.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   arch/powerpc/include/asm/interrupt.h | 15 +++++++++++----
>>   arch/powerpc/kernel/exceptions-64s.S | 26 --------------------------
>>   2 files changed, 11 insertions(+), 30 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
>> index 34d7cca2cb2e..6eba7c489753 100644
>> --- a/arch/powerpc/include/asm/interrupt.h
>> +++ b/arch/powerpc/include/asm/interrupt.h
>> @@ -14,11 +14,14 @@ struct interrupt_state {
>>   
>>   static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrupt_state *state)
>>   {
>> -#ifdef CONFIG_PPC_BOOK3E_64
>> -	state->ctx_state = exception_enter();
>> -#endif
>> -
> 
> Can't the above stay on top of the function ?

It could but I prefer to do it this way because exception_enter
needs the irq soft-mask state to be set up. It is E vs S, but it
reads better this way (and one day I hope to get E to use C
interrupt returns).

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v5 16/21] powerpc/64: move account_stolen_time into its own function
From: Nicholas Piggin @ 2021-01-14  3:51 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
In-Reply-To: <fa2c9884-81c8-d4cc-a33d-3aacdacfafd7@csgroup.eu>

Excerpts from Christophe Leroy's message of January 14, 2021 12:59 am:
> 
> 
> Le 13/01/2021 à 08:32, Nicholas Piggin a écrit :
>> This will be used by interrupt entry as well.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   arch/powerpc/include/asm/cputime.h | 15 +++++++++++++++
>>   arch/powerpc/kernel/syscall_64.c   | 10 +---------
>>   2 files changed, 16 insertions(+), 9 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h
>> index ed75d1c318e3..3f61604e1fcf 100644
>> --- a/arch/powerpc/include/asm/cputime.h
>> +++ b/arch/powerpc/include/asm/cputime.h
>> @@ -87,6 +87,18 @@ static notrace inline void account_cpu_user_exit(void)
>>   	acct->starttime_user = tb;
>>   }
>>   
>> +static notrace inline void account_stolen_time(void)
>> +{
>> +#ifdef CONFIG_PPC_SPLPAR
>> +	if (IS_ENABLED(CONFIG_VIRT_CPU_ACCOUNTING_NATIVE) &&
> 
> Arent' you already inside a CONFIG_VIRT_CPU_ACCOUNTING_NATIVE section ?

Yes, will fix.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v5 17/21] powerpc/64: entry cpu time accounting in C
From: Nicholas Piggin @ 2021-01-14  3:58 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
In-Reply-To: <3304762a-d6d6-df70-5546-e7e4dc2d3380@csgroup.eu>

Excerpts from Christophe Leroy's message of January 14, 2021 1:05 am:
> 
> 
> Le 13/01/2021 à 08:32, Nicholas Piggin a écrit :
>> There is no need for this to be in asm, use the new intrrupt entry wrapper.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   arch/powerpc/include/asm/interrupt.h |  7 +++++++
>>   arch/powerpc/include/asm/ppc_asm.h   | 24 ------------------------
>>   arch/powerpc/kernel/exceptions-64e.S |  1 -
>>   arch/powerpc/kernel/exceptions-64s.S |  5 -----
>>   4 files changed, 7 insertions(+), 30 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
>> index 6eba7c489753..e278dffe7657 100644
>> --- a/arch/powerpc/include/asm/interrupt.h
>> +++ b/arch/powerpc/include/asm/interrupt.h
>> @@ -4,6 +4,7 @@
>>   
>>   #include <linux/context_tracking.h>
>>   #include <linux/hardirq.h>
>> +#include <asm/cputime.h>
>>   #include <asm/ftrace.h>
>>   
>>   struct interrupt_state {
>> @@ -25,6 +26,9 @@ static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrup
>>   	if (user_mode(regs)) {
>>   		CT_WARN_ON(ct_state() != CONTEXT_USER);
>>   		user_exit_irqoff();
>> +
>> +		account_cpu_user_entry();
> 
> Are interrupts still disabled here ? Otherwise you risk getting IRQ time accounted on user.

Yes. Only the handlers themselves will enable interrupts, with
interrupt_cond_local_irq_enable.

> 
>> +		account_stolen_time();
>>   	} else {
>>   		/*
>>   		 * CT_WARN_ON comes here via program_check_exception,
>> @@ -38,6 +42,9 @@ static inline void interrupt_enter_prepare(struct pt_regs *regs, struct interrup
>>   #ifdef CONFIG_PPC_BOOK3E_64
>>   	state->ctx_state = exception_enter();
>>   #endif
>> +
>> +	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) && user_mode(regs))
>> +		account_cpu_user_entry();
> 
> Isn't this interrupt_enter_prepare() function called also on PPC32 ?
> Have you removed the ACCOUNT_CPU_USER_ENTRY() from entry_32.S ?

Yes and no, I was thinking of 64 only :( I can make that for 64E. 32-bit
could be another patch if you want it.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v5 18/21] powerpc: move NMI entry/exit code into wrapper
From: Nicholas Piggin @ 2021-01-14  4:00 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev
In-Reply-To: <b8006c6e-0828-66b7-067f-cbfd0ddf99a1@csgroup.eu>

Excerpts from Christophe Leroy's message of January 14, 2021 1:13 am:
> 
> 
> Le 13/01/2021 à 08:32, Nicholas Piggin a écrit :
>> This moves the common NMI entry and exit code into the interrupt handler
>> wrappers.
>> 
>> This changes the behaviour of soft-NMI (watchdog) and HMI interrupts, and
>> also MCE interrupts on 64e, by adding missing parts of the NMI entry to
>> them.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   arch/powerpc/include/asm/interrupt.h | 24 ++++++++++++++++
>>   arch/powerpc/kernel/mce.c            | 11 --------
>>   arch/powerpc/kernel/traps.c          | 42 +++++-----------------------
>>   arch/powerpc/kernel/watchdog.c       | 10 +++----
>>   4 files changed, 35 insertions(+), 52 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
>> index e278dffe7657..01192e213f9a 100644
>> --- a/arch/powerpc/include/asm/interrupt.h
>> +++ b/arch/powerpc/include/asm/interrupt.h
>> @@ -95,14 +95,38 @@ static inline void interrupt_async_exit_prepare(struct pt_regs *regs, struct int
>>   }
>>   
>>   struct interrupt_nmi_state {
>> +#ifdef CONFIG_PPC64
>> +	u8 ftrace_enabled;
>> +#endif
>>   };
>>   
>>   static inline void interrupt_nmi_enter_prepare(struct pt_regs *regs, struct interrupt_nmi_state *state)
>>   {
>> +#ifdef CONFIG_PPC64
>> +	state->ftrace_enabled = this_cpu_get_ftrace_enabled();
>> +	this_cpu_set_ftrace_enabled(0);
>> +#endif
>> +
>> +	/*
>> +	 * Do not use nmi_enter() for pseries hash guest taking a real-mode
>> +	 * NMI because not everything it touches is within the RMA limit.
>> +	 */
>> +	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
>> +			!firmware_has_feature(FW_FEATURE_LPAR) ||
>> +			radix_enabled() || (mfmsr() & MSR_DR))
>> +		nmi_enter();
>>   }
>>   
>>   static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, struct interrupt_nmi_state *state)
>>   {
>> +	if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) ||
>> +			!firmware_has_feature(FW_FEATURE_LPAR) ||
>> +			radix_enabled() || (mfmsr() & MSR_DR))
>> +		nmi_exit();
>> +
>> +#ifdef CONFIG_PPC64
>> +	this_cpu_set_ftrace_enabled(state->ftrace_enabled);
>> +#endif
>>   }
>>   
>>   /**
>> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
>> index 54269947113d..51456217ec40 100644
>> --- a/arch/powerpc/kernel/mce.c
>> +++ b/arch/powerpc/kernel/mce.c
>> @@ -592,12 +592,6 @@ EXPORT_SYMBOL_GPL(machine_check_print_event_info);
>>   DEFINE_INTERRUPT_HANDLER_NMI(machine_check_early)
>>   {
>>   	long handled = 0;
>> -	u8 ftrace_enabled = this_cpu_get_ftrace_enabled();
>> -
>> -	this_cpu_set_ftrace_enabled(0);
>> -	/* Do not use nmi_enter/exit for pseries hpte guest */
>> -	if (radix_enabled() || !firmware_has_feature(FW_FEATURE_LPAR))
>> -		nmi_enter();
>>   
>>   	hv_nmi_check_nonrecoverable(regs);
>>   
>> @@ -607,11 +601,6 @@ DEFINE_INTERRUPT_HANDLER_NMI(machine_check_early)
>>   	if (ppc_md.machine_check_early)
>>   		handled = ppc_md.machine_check_early(regs);
>>   
>> -	if (radix_enabled() || !firmware_has_feature(FW_FEATURE_LPAR))
>> -		nmi_exit();
>> -
>> -	this_cpu_set_ftrace_enabled(ftrace_enabled);
>> -
>>   	return handled;
>>   }
>>   
>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>> index b4f23e871a68..43d23232ef5c 100644
>> --- a/arch/powerpc/kernel/traps.c
>> +++ b/arch/powerpc/kernel/traps.c
>> @@ -435,11 +435,6 @@ DEFINE_INTERRUPT_HANDLER_NMI(system_reset_exception)
>>   {
>>   	unsigned long hsrr0, hsrr1;
>>   	bool saved_hsrrs = false;
>> -	u8 ftrace_enabled = this_cpu_get_ftrace_enabled();
>> -
>> -	this_cpu_set_ftrace_enabled(0);
>> -
>> -	nmi_enter();
>>   
>>   	/*
>>   	 * System reset can interrupt code where HSRRs are live and MSR[RI]=1.
>> @@ -511,10 +506,6 @@ DEFINE_INTERRUPT_HANDLER_NMI(system_reset_exception)
>>   		mtspr(SPRN_HSRR1, hsrr1);
>>   	}
>>   
>> -	nmi_exit();
>> -
>> -	this_cpu_set_ftrace_enabled(ftrace_enabled);
>> -
>>   	/* What should we do here? We could issue a shutdown or hard reset. */
>>   
>>   	return 0;
>> @@ -792,6 +783,12 @@ int machine_check_generic(struct pt_regs *regs)
>>   #endif /* everything else */
>>   
>>   
>> +/*
>> + * BOOK3S_64 does not call this handler as a non-maskable interrupt
>> + * (it uses its own early real-mode handler to handle the MCE proper
>> + * and then raises irq_work to call this handler when interrupts are
>> + * enabled).
>> + */
>>   #ifdef CONFIG_PPC_BOOK3S_64
>>   DEFINE_INTERRUPT_HANDLER_ASYNC(machine_check_exception)
>>   #else
>> @@ -800,20 +797,6 @@ DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
>>   {
>>   	int recover = 0;
>>   
>> -	/*
>> -	 * BOOK3S_64 does not call this handler as a non-maskable interrupt
>> -	 * (it uses its own early real-mode handler to handle the MCE proper
>> -	 * and then raises irq_work to call this handler when interrupts are
>> -	 * enabled).
>> -	 *
>> -	 * This is silly. The BOOK3S_64 should just call a different function
>> -	 * rather than expecting semantics to magically change. Something
>> -	 * like 'non_nmi_machine_check_exception()', perhaps?
>> -	 */
>> -	const bool nmi = !IS_ENABLED(CONFIG_PPC_BOOK3S_64);
>> -
>> -	if (nmi) nmi_enter();
>> -
>>   	__this_cpu_inc(irq_stat.mce_exceptions);
>>   
>>   	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>> @@ -838,24 +821,17 @@ DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
>>   	if (check_io_access(regs))
>>   		goto bail;
>>   
>> -	if (nmi) nmi_exit();
>> -
> 
> IIRC, not doing the nmi_exit() before the die() is problematic.
> 
> See 
> https://github.com/linuxppc/linux/commit/daf00ae71dad8aa05965713c62558aeebf2df48e#diff-70077148c383252ca949063eaf1b0250620e4607b43f4ef3fd2d8f448a83ab0a

Yes good catch. Maybe putting it into a nmi_die() or having die 
explicitly check for the NMI case might be the go.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH] perf tools: Resolve symbols against debug file first
From: Namhyung Kim @ 2021-01-14  4:54 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Mark Rutland, Peter Zijlstra, linuxppc-dev, linux-kernel,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Ingo Molnar,
	Jiri Olsa
In-Reply-To: <388a2e21-14ee-4609-84d0-c8824154c015@suse.cz>

Hi both of Jiri,

On Wed, Jan 13, 2021 at 8:43 PM Jiri Slaby <jslaby@suse.cz> wrote:
>
> On 13. 01. 21, 11:46, Jiri Olsa wrote:
> > On Wed, Jan 13, 2021 at 09:01:28AM +0100, Jiri Slaby wrote:
> >> With LTO, there are symbols like these:
> >> /usr/lib/debug/usr/lib64/libantlr4-runtime.so.4.8-4.8-1.4.x86_64.debug
> >>   10305: 0000000000955fa4     0 NOTYPE  LOCAL  DEFAULT   29 Predicate.cpp.2bc410e7
> >>
> >> This comes from a runtime/debug split done by the standard way:
> >> objcopy --only-keep-debug $runtime $debug
> >> objcopy --add-gnu-debuglink=$debugfn -R .comment -R .GCC.command.line --strip-all $runtime
> >>
> >> perf currently cannot resolve such symbols (relicts of LTO), as section
> >> 29 exists only in the debug file (29 is .debug_info). And perf resolves
> >> symbols only against runtime file. This results in all symbols from such
> >> a library being unresolved:
> >>       0.38%  main2    libantlr4-runtime.so.4.8  [.] 0x00000000000671e0
> >>
> >> So try resolving against the debug file first. And only if it fails (the
> >> section has NOBITS set), try runtime file. We can do this, as "objcopy
> >> --only-keep-debug" per documentation preserves all sections, but clears
> >> data of some of them (the runtime ones) and marks them as NOBITS.
> >>
> >> The correct result is now:
> >>       0.38%  main2    libantlr4-runtime.so.4.8  [.] antlr4::IntStream::~IntStream
> >>
> >> Note that these LTO symbols are properly skipped anyway as they belong
> >> neither to *text* nor to *data* (is_label && !elf_sec__filter(&shdr,
> >> secstrs) is true).
> >>
> >> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Cc: Ingo Molnar <mingo@redhat.com>
> >> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> >> Cc: Mark Rutland <mark.rutland@arm.com>
> >> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> >> Cc: Jiri Olsa <jolsa@redhat.com>
> >> Cc: Namhyung Kim <namhyung@kernel.org>
> >> ---
> >>   tools/perf/util/symbol-elf.c | 10 +++++++++-
> >>   1 file changed, 9 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> >> index f3577f7d72fe..a31b716fa61c 100644
> >> --- a/tools/perf/util/symbol-elf.c
> >> +++ b/tools/perf/util/symbol-elf.c
> >> @@ -1226,12 +1226,20 @@ int dso__load_sym(struct dso *dso, struct map *map, struct symsrc *syms_ss,
> >>              if (sym.st_shndx == SHN_ABS)
> >>                      continue;
> >>
> >> -            sec = elf_getscn(runtime_ss->elf, sym.st_shndx);
> >> +            sec = elf_getscn(syms_ss->elf, sym.st_shndx);
> >>              if (!sec)
> >>                      goto out_elf_end;
> >
> > we iterate symbols from syms_ss, so the fix seems to be correct
> > to call elf_getscn on syms_ss, not on runtime_ss as we do now
> >
> > I'd think this worked only when runtime_ss == syms_ss
>
> No, because the headers are copied 1:1 from runtime_ss to syms_ss. And
> runtime_ss is then stripped, so only .debug* sections are removed there.
> (And syms_ss's are set as NOBITS.)
>
> We iterated .debug* sections in syms_ss and used runtime_ss section
> _headers_ only to adjust symbols (sometimes). That worked.

It seems PPC has an opd section only in the runtime_ss and that's why
we use it for section headers.

>
> >>              gelf_getshdr(sec, &shdr);
> >>
> >> +            if (shdr.sh_type == SHT_NOBITS) {
> >> +                    sec = elf_getscn(runtime_ss->elf, sym.st_shndx);
> >> +                    if (!sec)
> >> +                            goto out_elf_end;
> >> +
> >> +                    gelf_getshdr(sec, &shdr);
> >> +            }
> >
> > is that fallback necessary? the symbol is from syms_ss
>
> Provided the above, we don't need the section data here, only headers,
> so the NOBITS test is superfluous and the fallback shouldn't be needed.
> Let me test it.

We need to talk to PPC folks like I said.  Or maybe we can change the
default ss depending on the arch.

Thanks,
Namhyung

^ permalink raw reply

* Re: [PATCH v2 0/7] Rid W=1 warnings in Ethernet
From: Lee Jones @ 2021-01-14  8:33 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Paul Durrant, Kurt Kanzenbach, Alexei Starovoitov,
	Gustavo A. R. Silva, Peter Cammaert, Paul Mackerras,
	Sukadev Bhattiprolu, Wei Liu, Daniel Borkmann, John Fastabend,
	Santiago Leon, xen-devel, Grygorii Strashko, Thomas Falcon,
	Jesper Dangaard Brouer, Jens Osterkamp, Rusty Russell,
	Daris A Nevil, Lijun Pan, Ivan Khoronzhuk, Nicolas Pitre,
	Geoff Levand, netdev, linux-kernel, Erik Stahlman, John Allen,
	Utz Bacher, Dany Madden, bpf, linuxppc-dev, David S. Miller,
	Russell King
In-Reply-To: <20210113183551.6551a6a2@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Wed, 13 Jan 2021, Jakub Kicinski wrote:

> On Wed, 13 Jan 2021 16:41:16 +0000 Lee Jones wrote:
> > Resending the stragglers again.                                                                                  
> > 
> > This set is part of a larger effort attempting to clean-up W=1                                                   
> > kernel builds, which are currently overwhelmingly riddled with                                                   
> > niggly little warnings.                                                                                          
> >                                                                                                                  
> > v2:                                                                                                              
> >  - Squashed IBM patches                                                                                      
> >  - Fixed real issue in SMSC
> >  - Added Andrew's Reviewed-by tags on remainder
> 
> Does not apply, please rebase on net-next/master.

These are based on Tuesday's next/master.

I just rebased them now with no issue.

What conflict are you seeing?

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply

* Re: [RFC PATCH v3 2/6] swiotlb: Add restricted DMA pool
From: Claire Chang @ 2021-01-14  9:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: heikki.krogerus, peterz, grant.likely, paulus, Frank Rowand,
	mingo, Marek Szyprowski, sstabellini, Saravana Kannan,
	list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
	rafael.j.wysocki, Bartosz Golaszewski, xen-devel, Thierry Reding,
	linux-devicetree, will, Konrad Rzeszutek Wilk, dan.j.williams,
	linuxppc-dev, Rob Herring, boris.ostrovsky, Andy Shevchenko,
	jgross, Nicolas Boichat, Greg KH, rdunlap, lkml, Tomasz Figa,
	list@263.net:IOMMU DRIVERS <iommu@lists.linux-foundation.org>, Joerg Roedel <joro@8bytes.org>, ,
	xypron.glpk, Robin Murphy, bauerman
In-Reply-To: <20210113124209.GA1383@lst.de>

On Wed, Jan 13, 2021 at 8:42 PM Christoph Hellwig <hch@lst.de> wrote:
>
> > +#ifdef CONFIG_SWIOTLB
> > +     struct io_tlb_mem       *dma_io_tlb_mem;
> >  #endif
>
> Please add a new config option for this code instead of always building
> it when swiotlb is enabled.
>
> > +static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
> > +                                size_t size)
>
> Can you split the refactoring in swiotlb.c into one or more prep
> patches?
>
> > +static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
> > +                                 struct device *dev)
> > +{
> > +     struct io_tlb_mem *mem = rmem->priv;
> > +     int ret;
> > +
> > +     if (dev->dma_io_tlb_mem)
> > +             return -EBUSY;
> > +
> > +     if (!mem) {
> > +             mem = kzalloc(sizeof(*mem), GFP_KERNEL);
> > +             if (!mem)
> > +                     return -ENOMEM;
>
> What is the calling convention here that allows for a NULL and non-NULL
> private data?

Since multiple devices can share the same pool, the private data,
io_tlb_mem struct, will be initialized by the first device attached to
it.
This is similar to rmem_dma_device_init() in kernel/dma/coherent.c.
I'll add a comment for it in next version.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox