LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] net/ethernet/freescale: Fix incorrect IS_ERR_VALUE macro usages
From: Li Yang @ 2020-11-25 17:05 UTC (permalink / raw)
  To: liwei (GF)
  Cc: Netdev, lkml, Timur Tabi, Paul Gortmaker, guohanjun,
	Jakub Kicinski, linuxppc-dev, David S. Miller, Zhao Qiang
In-Reply-To: <2a03fb50-7900-d6e9-bbd6-0ad45b003657@huawei.com>

On Tue, Nov 24, 2020 at 8:00 PM liwei (GF) <liwei391@huawei.com> wrote:
>
> Hi Yang,
>
> On 2020/11/25 6:13, Li Yang wrote:
> > On Tue, Nov 24, 2020 at 3:44 PM Li Yang <leoyang.li@nxp.com> wrote:
> >>
> >> On Tue, Nov 24, 2020 at 12:24 AM Wei Li <liwei391@huawei.com> wrote:
> >>>
> >>> IS_ERR_VALUE macro should be used only with unsigned long type.
> >>> Especially it works incorrectly with unsigned shorter types on
> >>> 64bit machines.
> >>
> >> This is truly a problem for the driver to run on 64-bit architectures.
> >> But from an earlier discussion
> >> https://patchwork.kernel.org/project/linux-kbuild/patch/1464384685-347275-1-git-send-email-arnd@arndb.de/,
> >> the preferred solution would be removing the IS_ERR_VALUE() usage or
> >> make the values to be unsigned long.
> >>
> >> It looks like we are having a bigger problem with the 64-bit support
> >> for the driver that the offset variables can also be real pointers
> >> which cannot be held with 32-bit data types(when uf_info->bd_mem_part
> >> == MEM_PART_SYSTEM).  So actually we have to change these offsets to
> >> unsigned long, otherwise we are having more serious issues on 64-bit
> >> systems.  Are you willing to make such changes or you want us to deal
> >> with it?
> >
> > Well, it looks like this hardware block was never integrated on a
> > 64-bit SoC and will very likely to keep so.  So probably we can keep
> > the driver 32-bit only.  It is currently limited to PPC32 in Kconfig,
> > how did you build it for 64-bit?
> >
> >>
>
> Thank you for providing the earlier discussion archive. In fact, this
> issue is detected by our static analysis tool.

Thanks for the effort, but this probably is a false positive for the
static analysis tool as the 64-bit case is not buildable.

>
> From my view, there is no harm to fix these potential misuses. But if you
> really have decided to keep the driver 32-bit only, please just ingore this patch.

It is not an easy task to add proper 64-bit support, so probably we
just keep it 32-bit only for now.  Thanks for the patch anyway.

Regards,
Leo

>
> Thanks,
> Wei
>
> >>>
> >>> Fixes: 4c35630ccda5 ("[POWERPC] Change rheap functions to use ulongs instead of pointers")
> >>> Signed-off-by: Wei Li <liwei391@huawei.com>
> >>> ---
> >>>  drivers/net/ethernet/freescale/ucc_geth.c | 30 +++++++++++------------
> >>>  1 file changed, 15 insertions(+), 15 deletions(-)
> >>>
> >>> diff --git a/drivers/net/ethernet/freescale/ucc_geth.c b/drivers/net/ethernet/freescale/ucc_geth.c
> >>> index 714b501be7d0..8656d9be256a 100644
> >>> --- a/drivers/net/ethernet/freescale/ucc_geth.c
> >>> +++ b/drivers/net/ethernet/freescale/ucc_geth.c
> >>> @@ -286,7 +286,7 @@ static int fill_init_enet_entries(struct ucc_geth_private *ugeth,
> >>>                 else {
> >>>                         init_enet_offset =
> >>>                             qe_muram_alloc(thread_size, thread_alignment);
> >>> -                       if (IS_ERR_VALUE(init_enet_offset)) {
> >>> +                       if (IS_ERR_VALUE((unsigned long)(int)init_enet_offset)) {
> >>>                                 if (netif_msg_ifup(ugeth))
> >>>                                         pr_err("Can not allocate DPRAM memory\n");
> >>>                                 qe_put_snum((u8) snum);
> >>> @@ -2223,7 +2223,7 @@ static int ucc_geth_alloc_tx(struct ucc_geth_private *ugeth)
> >>>                         ugeth->tx_bd_ring_offset[j] =
> >>>                             qe_muram_alloc(length,
> >>>                                            UCC_GETH_TX_BD_RING_ALIGNMENT);
> >>> -                       if (!IS_ERR_VALUE(ugeth->tx_bd_ring_offset[j]))
> >>> +                       if (!IS_ERR_VALUE((unsigned long)(int)ugeth->tx_bd_ring_offset[j]))
> >>>                                 ugeth->p_tx_bd_ring[j] =
> >>>                                     (u8 __iomem *) qe_muram_addr(ugeth->
> >>>                                                          tx_bd_ring_offset[j]);
> >>> @@ -2300,7 +2300,7 @@ static int ucc_geth_alloc_rx(struct ucc_geth_private *ugeth)
> >>>                         ugeth->rx_bd_ring_offset[j] =
> >>>                             qe_muram_alloc(length,
> >>>                                            UCC_GETH_RX_BD_RING_ALIGNMENT);
> >>> -                       if (!IS_ERR_VALUE(ugeth->rx_bd_ring_offset[j]))
> >>> +                       if (!IS_ERR_VALUE((unsigned long)(int)ugeth->rx_bd_ring_offset[j]))
> >>>                                 ugeth->p_rx_bd_ring[j] =
> >>>                                     (u8 __iomem *) qe_muram_addr(ugeth->
> >>>                                                          rx_bd_ring_offset[j]);
> >>> @@ -2510,7 +2510,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
> >>>         ugeth->tx_glbl_pram_offset =
> >>>             qe_muram_alloc(sizeof(struct ucc_geth_tx_global_pram),
> >>>                            UCC_GETH_TX_GLOBAL_PRAM_ALIGNMENT);
> >>> -       if (IS_ERR_VALUE(ugeth->tx_glbl_pram_offset)) {
> >>> +       if (IS_ERR_VALUE((unsigned long)(int)ugeth->tx_glbl_pram_offset)) {
> >>>                 if (netif_msg_ifup(ugeth))
> >>>                         pr_err("Can not allocate DPRAM memory for p_tx_glbl_pram\n");
> >>>                 return -ENOMEM;
> >>> @@ -2530,7 +2530,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
> >>>                            sizeof(struct ucc_geth_thread_data_tx) +
> >>>                            32 * (numThreadsTxNumerical == 1),
> >>>                            UCC_GETH_THREAD_DATA_ALIGNMENT);
> >>> -       if (IS_ERR_VALUE(ugeth->thread_dat_tx_offset)) {
> >>> +       if (IS_ERR_VALUE((unsigned long)(int)ugeth->thread_dat_tx_offset)) {
> >>>                 if (netif_msg_ifup(ugeth))
> >>>                         pr_err("Can not allocate DPRAM memory for p_thread_data_tx\n");
> >>>                 return -ENOMEM;
> >>> @@ -2557,7 +2557,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
> >>>             qe_muram_alloc(ug_info->numQueuesTx *
> >>>                            sizeof(struct ucc_geth_send_queue_qd),
> >>>                            UCC_GETH_SEND_QUEUE_QUEUE_DESCRIPTOR_ALIGNMENT);
> >>> -       if (IS_ERR_VALUE(ugeth->send_q_mem_reg_offset)) {
> >>> +       if (IS_ERR_VALUE((unsigned long)(int)ugeth->send_q_mem_reg_offset)) {
> >>>                 if (netif_msg_ifup(ugeth))
> >>>                         pr_err("Can not allocate DPRAM memory for p_send_q_mem_reg\n");
> >>>                 return -ENOMEM;
> >>> @@ -2597,7 +2597,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
> >>>                 ugeth->scheduler_offset =
> >>>                     qe_muram_alloc(sizeof(struct ucc_geth_scheduler),
> >>>                                    UCC_GETH_SCHEDULER_ALIGNMENT);
> >>> -               if (IS_ERR_VALUE(ugeth->scheduler_offset)) {
> >>> +               if (IS_ERR_VALUE((unsigned long)(int)ugeth->scheduler_offset)) {
> >>>                         if (netif_msg_ifup(ugeth))
> >>>                                 pr_err("Can not allocate DPRAM memory for p_scheduler\n");
> >>>                         return -ENOMEM;
> >>> @@ -2644,7 +2644,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
> >>>                     qe_muram_alloc(sizeof
> >>>                                    (struct ucc_geth_tx_firmware_statistics_pram),
> >>>                                    UCC_GETH_TX_STATISTICS_ALIGNMENT);
> >>> -               if (IS_ERR_VALUE(ugeth->tx_fw_statistics_pram_offset)) {
> >>> +               if (IS_ERR_VALUE((unsigned long)(int)ugeth->tx_fw_statistics_pram_offset)) {
> >>>                         if (netif_msg_ifup(ugeth))
> >>>                                 pr_err("Can not allocate DPRAM memory for p_tx_fw_statistics_pram\n");
> >>>                         return -ENOMEM;
> >>> @@ -2681,7 +2681,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
> >>>         ugeth->rx_glbl_pram_offset =
> >>>             qe_muram_alloc(sizeof(struct ucc_geth_rx_global_pram),
> >>>                            UCC_GETH_RX_GLOBAL_PRAM_ALIGNMENT);
> >>> -       if (IS_ERR_VALUE(ugeth->rx_glbl_pram_offset)) {
> >>> +       if (IS_ERR_VALUE((unsigned long)(int)ugeth->rx_glbl_pram_offset)) {
> >>>                 if (netif_msg_ifup(ugeth))
> >>>                         pr_err("Can not allocate DPRAM memory for p_rx_glbl_pram\n");
> >>>                 return -ENOMEM;
> >>> @@ -2700,7 +2700,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
> >>>             qe_muram_alloc(numThreadsRxNumerical *
> >>>                            sizeof(struct ucc_geth_thread_data_rx),
> >>>                            UCC_GETH_THREAD_DATA_ALIGNMENT);
> >>> -       if (IS_ERR_VALUE(ugeth->thread_dat_rx_offset)) {
> >>> +       if (IS_ERR_VALUE((unsigned long)(int)ugeth->thread_dat_rx_offset)) {
> >>>                 if (netif_msg_ifup(ugeth))
> >>>                         pr_err("Can not allocate DPRAM memory for p_thread_data_rx\n");
> >>>                 return -ENOMEM;
> >>> @@ -2721,7 +2721,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
> >>>                     qe_muram_alloc(sizeof
> >>>                                    (struct ucc_geth_rx_firmware_statistics_pram),
> >>>                                    UCC_GETH_RX_STATISTICS_ALIGNMENT);
> >>> -               if (IS_ERR_VALUE(ugeth->rx_fw_statistics_pram_offset)) {
> >>> +               if (IS_ERR_VALUE((unsigned long)(int)ugeth->rx_fw_statistics_pram_offset)) {
> >>>                         if (netif_msg_ifup(ugeth))
> >>>                                 pr_err("Can not allocate DPRAM memory for p_rx_fw_statistics_pram\n");
> >>>                         return -ENOMEM;
> >>> @@ -2741,7 +2741,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
> >>>             qe_muram_alloc(ug_info->numQueuesRx *
> >>>                            sizeof(struct ucc_geth_rx_interrupt_coalescing_entry)
> >>>                            + 4, UCC_GETH_RX_INTERRUPT_COALESCING_ALIGNMENT);
> >>> -       if (IS_ERR_VALUE(ugeth->rx_irq_coalescing_tbl_offset)) {
> >>> +       if (IS_ERR_VALUE((unsigned long)(int)ugeth->rx_irq_coalescing_tbl_offset)) {
> >>>                 if (netif_msg_ifup(ugeth))
> >>>                         pr_err("Can not allocate DPRAM memory for p_rx_irq_coalescing_tbl\n");
> >>>                 return -ENOMEM;
> >>> @@ -2807,7 +2807,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
> >>>                            (sizeof(struct ucc_geth_rx_bd_queues_entry) +
> >>>                             sizeof(struct ucc_geth_rx_prefetched_bds)),
> >>>                            UCC_GETH_RX_BD_QUEUES_ALIGNMENT);
> >>> -       if (IS_ERR_VALUE(ugeth->rx_bd_qs_tbl_offset)) {
> >>> +       if (IS_ERR_VALUE((unsigned long)(int)ugeth->rx_bd_qs_tbl_offset)) {
> >>>                 if (netif_msg_ifup(ugeth))
> >>>                         pr_err("Can not allocate DPRAM memory for p_rx_bd_qs_tbl\n");
> >>>                 return -ENOMEM;
> >>> @@ -2892,7 +2892,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
> >>>                 ugeth->exf_glbl_param_offset =
> >>>                     qe_muram_alloc(sizeof(struct ucc_geth_exf_global_pram),
> >>>                 UCC_GETH_RX_EXTENDED_FILTERING_GLOBAL_PARAMETERS_ALIGNMENT);
> >>> -               if (IS_ERR_VALUE(ugeth->exf_glbl_param_offset)) {
> >>> +               if (IS_ERR_VALUE((unsigned long)(int)ugeth->exf_glbl_param_offset)) {
> >>>                         if (netif_msg_ifup(ugeth))
> >>>                                 pr_err("Can not allocate DPRAM memory for p_exf_glbl_param\n");
> >>>                         return -ENOMEM;
> >>> @@ -3026,7 +3026,7 @@ static int ucc_geth_startup(struct ucc_geth_private *ugeth)
> >>>
> >>>         /* Allocate InitEnet command parameter structure */
> >>>         init_enet_pram_offset = qe_muram_alloc(sizeof(struct ucc_geth_init_pram), 4);
> >>> -       if (IS_ERR_VALUE(init_enet_pram_offset)) {
> >>> +       if (IS_ERR_VALUE((unsigned long)(int)init_enet_pram_offset)) {
> >>>                 if (netif_msg_ifup(ugeth))
> >>>                         pr_err("Can not allocate DPRAM memory for p_init_enet_pram\n");
> >>>                 return -ENOMEM;
> >>> --
> >>> 2.17.1
> >>>

^ permalink raw reply

* Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
From: Marc Zyngier @ 2020-11-25 16:42 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Michael S . Tsirkin, linux-pci, Denis Kirjanov, Greg Kurz,
	linux-kernel, linux-block, Paul Mackerras, Thomas Gleixner,
	linuxppc-dev, Christoph Hellwig
In-Reply-To: <7184880b-0351-ae18-d2e1-fab7b79fc864@redhat.com>

On 2020-11-25 16:24, Laurent Vivier wrote:
> On 25/11/2020 17:05, Denis Kirjanov wrote:
>> On 11/25/20, Laurent Vivier <lvivier@redhat.com> wrote:
>>> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
>>> 
>>> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ 
>>> affinity")
>>> this is broken on pseries.
>> 
>> Please add "Fixes" tag.
> 
> In fact, the code in commit 0d9f0a52c8b9f is correct.
> 
> The problem is with MSI/X irq affinity and pseries. So this patch
> fixes more than virtio_scsi. I put this information because this
> commit allows to clearly show the problem. Perhaps I should remove
> this line in fact?

This patch does not fix virtio_scsi at all, which as you noticed, is
correct. It really fixes the PPC MSI setup, which is starting to show
its age. So getting rid of the reference seems like the right thing to 
do.

I'm also not keen on the BugId thing. It should really be a lore link.
I also cannot find any such tag in the kernel, nor is it a documented
practice. The last reference to a Bugzilla entry seems to have happened
with 786b5219081ff16 (five years ago).

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply

* Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
From: Laurent Vivier @ 2020-11-25 16:24 UTC (permalink / raw)
  To: Denis Kirjanov
  Cc: Marc Zyngier, Michael S . Tsirkin, linux-pci, Greg Kurz,
	linux-kernel, linux-block, Paul Mackerras, Thomas Gleixner,
	linuxppc-dev, Christoph Hellwig
In-Reply-To: <CAOJe8K1Q7sGf67bdj-2Mthkj4XNR4fOSskV1dyh62AdzefhpAQ@mail.gmail.com>

On 25/11/2020 17:05, Denis Kirjanov wrote:
> On 11/25/20, Laurent Vivier <lvivier@redhat.com> wrote:
>> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
>>
>> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
>> this is broken on pseries.
> 
> Please add "Fixes" tag.

In fact, the code in commit 0d9f0a52c8b9f is correct.

The problem is with MSI/X irq affinity and pseries. So this patch fixes more than
virtio_scsi. I put this information because this commit allows to clearly show the
problem. Perhaps I should remove this line in fact?

Thanks,
Laurent

> 
> Thanks!
> 
>>
>> The affinity is correctly computed in msi_desc but this is not applied
>> to the system IRQs.
>>
>> It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
>> lost at this point and never passed to irq_domain_alloc_descs()
>> (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
>> because irq_create_mapping() doesn't take an affinity parameter.
>>
>> As the previous patch has added the affinity parameter to
>> irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
>> to irq_domain_alloc_descs().
>>
>> With this change, the virtqueues are correctly dispatched between the CPUs
>> on pseries.
>>
>> BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>> Reviewed-by: Greg Kurz <groug@kaod.org>
>> ---
>>  arch/powerpc/platforms/pseries/msi.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/msi.c
>> b/arch/powerpc/platforms/pseries/msi.c
>> index 133f6adcb39c..b3ac2455faad 100644
>> --- a/arch/powerpc/platforms/pseries/msi.c
>> +++ b/arch/powerpc/platforms/pseries/msi.c
>> @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int
>> nvec_in, int type)
>>  			return hwirq;
>>  		}
>>
>> -		virq = irq_create_mapping(NULL, hwirq);
>> +		virq = irq_create_mapping_affinity(NULL, hwirq,
>> +						   entry->affinity);
>>
>>  		if (!virq) {
>>  			pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
>> --
>> 2.28.0
>>
>>
> 


^ permalink raw reply

* Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function
From: Greg Kurz @ 2020-11-25 12:45 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Michael S . Tsirkin, linux-pci, linux-kernel, linux-block,
	Paul Mackerras, Marc Zyngier, Thomas Gleixner, linuxppc-dev,
	Christoph Hellwig
In-Reply-To: <20201125111657.1141295-2-lvivier@redhat.com>

On Wed, 25 Nov 2020 12:16:56 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> This function adds an affinity parameter to irq_create_mapping().
> This parameter is needed to pass it to irq_domain_alloc_descs().
> 
> irq_create_mapping() is a wrapper around irq_create_mapping_affinity()
> to pass NULL for the affinity parameter.
> 
> No functional change.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---

Reviewed-by: Greg Kurz <groug@kaod.org>

>  include/linux/irqdomain.h | 12 ++++++++++--
>  kernel/irq/irqdomain.c    | 13 ++++++++-----
>  2 files changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
> index 71535e87109f..ea5a337e0f8b 100644
> --- a/include/linux/irqdomain.h
> +++ b/include/linux/irqdomain.h
> @@ -384,11 +384,19 @@ extern void irq_domain_associate_many(struct irq_domain *domain,
>  extern void irq_domain_disassociate(struct irq_domain *domain,
>  				    unsigned int irq);
>  
> -extern unsigned int irq_create_mapping(struct irq_domain *host,
> -				       irq_hw_number_t hwirq);
> +extern unsigned int irq_create_mapping_affinity(struct irq_domain *host,
> +				      irq_hw_number_t hwirq,
> +				      const struct irq_affinity_desc *affinity);
>  extern unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec);
>  extern void irq_dispose_mapping(unsigned int virq);
>  
> +static inline unsigned int irq_create_mapping(struct irq_domain *host,
> +					      irq_hw_number_t hwirq)
> +{
> +	return irq_create_mapping_affinity(host, hwirq, NULL);
> +}
> +
> +
>  /**
>   * irq_linear_revmap() - Find a linux irq from a hw irq number.
>   * @domain: domain owning this hardware interrupt
> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
> index cf8b374b892d..e4ca69608f3b 100644
> --- a/kernel/irq/irqdomain.c
> +++ b/kernel/irq/irqdomain.c
> @@ -624,17 +624,19 @@ unsigned int irq_create_direct_mapping(struct irq_domain *domain)
>  EXPORT_SYMBOL_GPL(irq_create_direct_mapping);
>  
>  /**
> - * irq_create_mapping() - Map a hardware interrupt into linux irq space
> + * irq_create_mapping_affinity() - Map a hardware interrupt into linux irq space
>   * @domain: domain owning this hardware interrupt or NULL for default domain
>   * @hwirq: hardware irq number in that domain space
> + * @affinity: irq affinity
>   *
>   * Only one mapping per hardware interrupt is permitted. Returns a linux
>   * irq number.
>   * If the sense/trigger is to be specified, set_irq_type() should be called
>   * on the number returned from that call.
>   */
> -unsigned int irq_create_mapping(struct irq_domain *domain,
> -				irq_hw_number_t hwirq)
> +unsigned int irq_create_mapping_affinity(struct irq_domain *domain,
> +				       irq_hw_number_t hwirq,
> +				       const struct irq_affinity_desc *affinity)
>  {
>  	struct device_node *of_node;
>  	int virq;
> @@ -660,7 +662,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
>  	}
>  
>  	/* Allocate a virtual interrupt number */
> -	virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), NULL);
> +	virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node),
> +				      affinity);
>  	if (virq <= 0) {
>  		pr_debug("-> virq allocation failed\n");
>  		return 0;
> @@ -676,7 +679,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
>  
>  	return virq;
>  }
> -EXPORT_SYMBOL_GPL(irq_create_mapping);
> +EXPORT_SYMBOL_GPL(irq_create_mapping_affinity);
>  
>  /**
>   * irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs


^ permalink raw reply

* Re: [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
From: Denis Kirjanov @ 2020-11-25 16:05 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Marc Zyngier, Michael S . Tsirkin, linux-pci, Greg Kurz,
	linux-kernel, linux-block, Paul Mackerras, Thomas Gleixner,
	linuxppc-dev, Christoph Hellwig
In-Reply-To: <20201125150932.1150619-3-lvivier@redhat.com>

On 11/25/20, Laurent Vivier <lvivier@redhat.com> wrote:
> With virtio multiqueue, normally each queue IRQ is mapped to a CPU.
>
> But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
> this is broken on pseries.

Please add "Fixes" tag.

Thanks!

>
> The affinity is correctly computed in msi_desc but this is not applied
> to the system IRQs.
>
> It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
> lost at this point and never passed to irq_domain_alloc_descs()
> (see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
> because irq_create_mapping() doesn't take an affinity parameter.
>
> As the previous patch has added the affinity parameter to
> irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
> to irq_domain_alloc_descs().
>
> With this change, the virtqueues are correctly dispatched between the CPUs
> on pseries.
>
> BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> Reviewed-by: Greg Kurz <groug@kaod.org>
> ---
>  arch/powerpc/platforms/pseries/msi.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/platforms/pseries/msi.c
> b/arch/powerpc/platforms/pseries/msi.c
> index 133f6adcb39c..b3ac2455faad 100644
> --- a/arch/powerpc/platforms/pseries/msi.c
> +++ b/arch/powerpc/platforms/pseries/msi.c
> @@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int
> nvec_in, int type)
>  			return hwirq;
>  		}
>
> -		virq = irq_create_mapping(NULL, hwirq);
> +		virq = irq_create_mapping_affinity(NULL, hwirq,
> +						   entry->affinity);
>
>  		if (!virq) {
>  			pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
> --
> 2.28.0
>
>

^ permalink raw reply

* [PATCH V4 4/5] ocxl: Add mmu notifier
From: Christophe Lombard @ 2020-11-25 15:50 UTC (permalink / raw)
  To: linuxppc-dev, fbarrat, ajd
In-Reply-To: <20201125155013.39955-1-clombard@linux.vnet.ibm.com>

Add invalidate_range mmu notifier, when required (ATSD access of MMIO
registers is available), to initiate TLB invalidation commands.
For the time being, the ATSD0 set of registers is used by default.

The pasid and bdf values have to be configured in the Process Element
Entry.
The PEE must be set up to match the BDF/PASID of the AFU.

Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
---
 drivers/misc/ocxl/link.c | 62 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 77381dda2c45..129d4eddc4d2 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -2,8 +2,10 @@
 // Copyright 2017 IBM Corp.
 #include <linux/sched/mm.h>
 #include <linux/mutex.h>
+#include <linux/mm.h>
 #include <linux/mm_types.h>
 #include <linux/mmu_context.h>
+#include <linux/mmu_notifier.h>
 #include <asm/copro.h>
 #include <asm/pnv-ocxl.h>
 #include <asm/xive.h>
@@ -33,6 +35,7 @@
 
 #define SPA_PE_VALID		0x80000000
 
+struct ocxl_link;
 
 struct pe_data {
 	struct mm_struct *mm;
@@ -41,6 +44,8 @@ struct pe_data {
 	/* opaque pointer to be passed to the above callback */
 	void *xsl_err_data;
 	struct rcu_head rcu;
+	struct ocxl_link *link;
+	struct mmu_notifier mmu_notifier;
 };
 
 struct spa {
@@ -83,6 +88,8 @@ struct ocxl_link {
 	int domain;
 	int bus;
 	int dev;
+	void __iomem *arva;     /* ATSD register virtual address */
+	spinlock_t atsd_lock;   /* to serialize shootdowns */
 	atomic_t irq_available;
 	struct spa *spa;
 	void *platform_data;
@@ -388,6 +395,7 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l
 	link->bus = dev->bus->number;
 	link->dev = PCI_SLOT(dev->devfn);
 	atomic_set(&link->irq_available, MAX_IRQ_PER_LINK);
+	spin_lock_init(&link->atsd_lock);
 
 	rc = alloc_spa(dev, link);
 	if (rc)
@@ -403,6 +411,13 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l
 	if (rc)
 		goto err_xsl_irq;
 
+	/* if link->arva is not defeined, MMIO registers are not used to
+	 * generate TLB invalidate. PowerBus snooping is enabled.
+	 * Otherwise, PowerBus snooping is disabled. TLB Invalidates are
+	 * initiated using MMIO registers.
+	 */
+	pnv_ocxl_map_lpar(dev, mfspr(SPRN_LPID), 0, &link->arva);
+
 	*out_link = link;
 	return 0;
 
@@ -454,6 +469,11 @@ static void release_xsl(struct kref *ref)
 {
 	struct ocxl_link *link = container_of(ref, struct ocxl_link, ref);
 
+	if (link->arva) {
+		pnv_ocxl_unmap_lpar(link->arva);
+		link->arva = NULL;
+	}
+
 	list_del(&link->list);
 	/* call platform code before releasing data */
 	pnv_ocxl_spa_release(link->platform_data);
@@ -470,6 +490,26 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle)
 }
 EXPORT_SYMBOL_GPL(ocxl_link_release);
 
+static void invalidate_range(struct mmu_notifier *mn,
+			     struct mm_struct *mm,
+			     unsigned long start, unsigned long end)
+{
+	struct pe_data *pe_data = container_of(mn, struct pe_data, mmu_notifier);
+	struct ocxl_link *link = pe_data->link;
+	unsigned long addr, pid, page_size = PAGE_SIZE;
+
+	pid = mm->context.id;
+
+	spin_lock(&link->atsd_lock);
+	for (addr = start; addr < end; addr += page_size)
+		pnv_ocxl_tlb_invalidate(link->arva, pid, addr, page_size);
+	spin_unlock(&link->atsd_lock);
+}
+
+static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = {
+	.invalidate_range = invalidate_range,
+};
+
 static u64 calculate_cfg_state(bool kernel)
 {
 	u64 state;
@@ -526,6 +566,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
 	pe_data->mm = mm;
 	pe_data->xsl_err_cb = xsl_err_cb;
 	pe_data->xsl_err_data = xsl_err_data;
+	pe_data->link = link;
+	pe_data->mmu_notifier.ops = &ocxl_mmu_notifier_ops;
 
 	memset(pe, 0, sizeof(struct ocxl_process_element));
 	pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
@@ -542,8 +584,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
 	 * by the nest MMU. If we have a kernel context, TLBIs are
 	 * already global.
 	 */
-	if (mm)
+	if (mm) {
 		mm_context_add_copro(mm);
+		if (link->arva) {
+			/* Use MMIO registers for the TLB Invalidate
+			 * operations.
+			 */
+			mmu_notifier_register(&pe_data->mmu_notifier, mm);
+		}
+	}
+
 	/*
 	 * Barrier is to make sure PE is visible in the SPA before it
 	 * is used by the device. It also helps with the global TLBI
@@ -674,6 +724,16 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
 		WARN(1, "Couldn't find pe data when removing PE\n");
 	} else {
 		if (pe_data->mm) {
+			if (link->arva) {
+				mmu_notifier_unregister(&pe_data->mmu_notifier,
+							pe_data->mm);
+				spin_lock(&link->atsd_lock);
+				pnv_ocxl_tlb_invalidate(link->arva,
+							pe_data->mm->context.id,
+							0ull,
+							PAGE_SIZE);
+				spin_unlock(&link->atsd_lock);
+			}
 			mm_context_remove_copro(pe_data->mm);
 			mmdrop(pe_data->mm);
 		}
-- 
2.28.0


^ permalink raw reply related

* [PATCH V4 2/5] ocxl: Initiate a TLB invalidate command
From: Christophe Lombard @ 2020-11-25 15:50 UTC (permalink / raw)
  To: linuxppc-dev, fbarrat, ajd
In-Reply-To: <20201125155013.39955-1-clombard@linux.vnet.ibm.com>

When a TLB Invalidate is required for the Logical Partition, the following
sequence has to be performed:

1. Load MMIO ATSD AVA register with the necessary value, if required.
2. Write the MMIO ATSD launch register to initiate the TLB Invalidate
command.
3. Poll the MMIO ATSD status register to determine when the TLB Invalidate
   has been completed.

Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pnv-ocxl.h   | 51 ++++++++++++++++++++
 arch/powerpc/platforms/powernv/ocxl.c | 69 +++++++++++++++++++++++++++
 2 files changed, 120 insertions(+)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
index 60c3c74427d9..9acd1fbf1197 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -3,12 +3,59 @@
 #ifndef _ASM_PNV_OCXL_H
 #define _ASM_PNV_OCXL_H
 
+#include <linux/bitfield.h>
 #include <linux/pci.h>
 
 #define PNV_OCXL_TL_MAX_TEMPLATE        63
 #define PNV_OCXL_TL_BITS_PER_RATE       4
 #define PNV_OCXL_TL_RATE_BUF_SIZE       ((PNV_OCXL_TL_MAX_TEMPLATE+1) * PNV_OCXL_TL_BITS_PER_RATE / 8)
 
+#define PNV_OCXL_ATSD_TIMEOUT		1
+
+/* TLB Management Instructions */
+#define PNV_OCXL_ATSD_LNCH		0x00
+/* Radix Invalidate */
+#define   PNV_OCXL_ATSD_LNCH_R		PPC_BIT(0)
+/* Radix Invalidation Control
+ * 0b00 Just invalidate TLB.
+ * 0b01 Invalidate just Page Walk Cache.
+ * 0b10 Invalidate TLB, Page Walk Cache, and any
+ * caching of Partition and Process Table Entries.
+ */
+#define   PNV_OCXL_ATSD_LNCH_RIC	PPC_BITMASK(1, 2)
+/* Number and Page Size of translations to be invalidated */
+#define   PNV_OCXL_ATSD_LNCH_LP		PPC_BITMASK(3, 10)
+/* Invalidation Criteria
+ * 0b00 Invalidate just the target VA.
+ * 0b01 Invalidate matching PID.
+ */
+#define   PNV_OCXL_ATSD_LNCH_IS		PPC_BITMASK(11, 12)
+/* 0b1: Process Scope, 0b0: Partition Scope */
+#define   PNV_OCXL_ATSD_LNCH_PRS	PPC_BIT(13)
+/* Invalidation Flag */
+#define   PNV_OCXL_ATSD_LNCH_B		PPC_BIT(14)
+/* Actual Page Size to be invalidated
+ * 000 4KB
+ * 101 64KB
+ * 001 2MB
+ * 010 1GB
+ */
+#define   PNV_OCXL_ATSD_LNCH_AP		PPC_BITMASK(15, 17)
+/* Defines the large page select
+ * L=0b0 for 4KB pages
+ * L=0b1 for large pages)
+ */
+#define   PNV_OCXL_ATSD_LNCH_L		PPC_BIT(18)
+/* Process ID */
+#define   PNV_OCXL_ATSD_LNCH_PID	PPC_BITMASK(19, 38)
+/* NoFlush – Assumed to be 0b0 */
+#define   PNV_OCXL_ATSD_LNCH_F		PPC_BIT(39)
+#define   PNV_OCXL_ATSD_LNCH_OCAPI_SLBI	PPC_BIT(40)
+#define   PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON	PPC_BIT(41)
+#define PNV_OCXL_ATSD_AVA		0x08
+#define   PNV_OCXL_ATSD_AVA_AVA		PPC_BITMASK(0, 51)
+#define PNV_OCXL_ATSD_STAT		0x10
+
 int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled, u16 *supported);
 int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count);
 
@@ -31,4 +78,8 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);
 int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
 		      uint64_t lpcr, void __iomem **arva);
 void pnv_ocxl_unmap_lpar(void __iomem *arva);
+void pnv_ocxl_tlb_invalidate(void __iomem *arva,
+			     unsigned long pid,
+			     unsigned long addr,
+			     unsigned long page_size);
 #endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
index 57fc1062677b..9105efcf242a 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -528,3 +528,72 @@ void pnv_ocxl_unmap_lpar(void __iomem *arva)
 	iounmap(arva);
 }
 EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar);
+
+void pnv_ocxl_tlb_invalidate(void __iomem *arva,
+			     unsigned long pid,
+			     unsigned long addr,
+			     unsigned long page_size)
+{
+	unsigned long timeout = jiffies + (HZ * PNV_OCXL_ATSD_TIMEOUT);
+	u64 val = 0ull;
+	int pend;
+	u8 size;
+
+	if (!(arva))
+		return;
+
+	if (addr) {
+		/* load Abbreviated Virtual Address register with
+		 * the necessary value
+		 */
+		val |= FIELD_PREP(PNV_OCXL_ATSD_AVA_AVA, addr >> (63-51));
+		out_be64(arva + PNV_OCXL_ATSD_AVA, val);
+	}
+
+	/* Write access initiates a shoot down to initiate the
+	 * TLB Invalidate command
+	 */
+	val = PNV_OCXL_ATSD_LNCH_R;
+	val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b10);
+	if (addr)
+		val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b00);
+	else {
+		val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b01);
+		val |= PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON;
+	}
+	val |= PNV_OCXL_ATSD_LNCH_PRS;
+	/* Actual Page Size to be invalidated
+	 * 000 4KB
+	 * 101 64KB
+	 * 001 2MB
+	 * 010 1GB
+	 */
+	size = 0b101;
+	if (page_size == 0x1000)
+		size = 0b000;
+	if (page_size == 0x200000)
+		size = 0b001;
+	if (page_size == 0x40000000)
+		size = 0b010;
+	val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_AP, size);
+	val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_PID, pid);
+	out_be64(arva + PNV_OCXL_ATSD_LNCH, val);
+
+	/* Poll the ATSD status register to determine when the
+	 * TLB Invalidate has been completed.
+	 */
+	val = in_be64(arva + PNV_OCXL_ATSD_STAT);
+	pend = val >> 63;
+
+	while (pend) {
+		if (time_after_eq(jiffies, timeout)) {
+			pr_err("%s - Timeout while reading XTS MMIO ATSD status register (val=%#llx, pidr=0x%lx)\n",
+			       __func__, val, pid);
+			return;
+		}
+		cpu_relax();
+		val = in_be64(arva + PNV_OCXL_ATSD_STAT);
+		pend = val >> 63;
+	}
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_tlb_invalidate);
-- 
2.28.0


^ permalink raw reply related

* [PATCH V4 5/5] ocxl: Add new kernel traces
From: Christophe Lombard @ 2020-11-25 15:50 UTC (permalink / raw)
  To: linuxppc-dev, fbarrat, ajd
In-Reply-To: <20201125155013.39955-1-clombard@linux.vnet.ibm.com>

Add specific kernel traces which provide information on mmu notifier and on
pages range.

Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
---
 drivers/misc/ocxl/link.c  |  4 +++
 drivers/misc/ocxl/trace.h | 64 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 129d4eddc4d2..ab039c115381 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -499,6 +499,7 @@ static void invalidate_range(struct mmu_notifier *mn,
 	unsigned long addr, pid, page_size = PAGE_SIZE;
 
 	pid = mm->context.id;
+	trace_ocxl_mmu_notifier_range(start, end, pid);
 
 	spin_lock(&link->atsd_lock);
 	for (addr = start; addr < end; addr += page_size)
@@ -590,6 +591,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
 			/* Use MMIO registers for the TLB Invalidate
 			 * operations.
 			 */
+			trace_ocxl_init_mmu_notifier(pasid, mm->context.id);
 			mmu_notifier_register(&pe_data->mmu_notifier, mm);
 		}
 	}
@@ -725,6 +727,8 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
 	} else {
 		if (pe_data->mm) {
 			if (link->arva) {
+				trace_ocxl_release_mmu_notifier(pasid,
+								pe_data->mm->context.id);
 				mmu_notifier_unregister(&pe_data->mmu_notifier,
 							pe_data->mm);
 				spin_lock(&link->atsd_lock);
diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
index 17e21cb2addd..a33a5094ff6c 100644
--- a/drivers/misc/ocxl/trace.h
+++ b/drivers/misc/ocxl/trace.h
@@ -8,6 +8,70 @@
 
 #include <linux/tracepoint.h>
 
+
+TRACE_EVENT(ocxl_mmu_notifier_range,
+	TP_PROTO(unsigned long start, unsigned long end, unsigned long pidr),
+	TP_ARGS(start, end, pidr),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, start)
+		__field(unsigned long, end)
+		__field(unsigned long, pidr)
+	),
+
+	TP_fast_assign(
+		__entry->start = start;
+		__entry->end = end;
+		__entry->pidr = pidr;
+	),
+
+	TP_printk("start=0x%lx end=0x%lx pidr=0x%lx",
+		__entry->start,
+		__entry->end,
+		__entry->pidr
+	)
+);
+
+TRACE_EVENT(ocxl_init_mmu_notifier,
+	TP_PROTO(int pasid, unsigned long pidr),
+	TP_ARGS(pasid, pidr),
+
+	TP_STRUCT__entry(
+		__field(int, pasid)
+		__field(unsigned long, pidr)
+	),
+
+	TP_fast_assign(
+		__entry->pasid = pasid;
+		__entry->pidr = pidr;
+	),
+
+	TP_printk("pasid=%d, pidr=0x%lx",
+		__entry->pasid,
+		__entry->pidr
+	)
+);
+
+TRACE_EVENT(ocxl_release_mmu_notifier,
+	TP_PROTO(int pasid, unsigned long pidr),
+	TP_ARGS(pasid, pidr),
+
+	TP_STRUCT__entry(
+		__field(int, pasid)
+		__field(unsigned long, pidr)
+	),
+
+	TP_fast_assign(
+		__entry->pasid = pasid;
+		__entry->pidr = pidr;
+	),
+
+	TP_printk("pasid=%d, pidr=0x%lx",
+		__entry->pasid,
+		__entry->pidr
+	)
+);
+
 DECLARE_EVENT_CLASS(ocxl_context,
 	TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
 	TP_ARGS(pid, spa, pasid, pidr, tidr),
-- 
2.28.0


^ permalink raw reply related

* [PATCH V4 1/5] ocxl: Assign a register set to a Logical Partition
From: Christophe Lombard @ 2020-11-25 15:50 UTC (permalink / raw)
  To: linuxppc-dev, fbarrat, ajd
In-Reply-To: <20201125155013.39955-1-clombard@linux.vnet.ibm.com>

Platform specific function to assign a register set to a Logical Partition.
The "ibm,mmio-atsd" property, provided by the firmware, contains the 16
base ATSD physical addresses (ATSD0 through ATSD15) of the set of MMIO
registers (XTS MMIO ATSDx LPARID/AVA/launch/status register).

For the time being, the ATSD0 set of registers is used by default.

Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pnv-ocxl.h   |  3 ++
 arch/powerpc/platforms/powernv/ocxl.c | 45 +++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
index d37ededca3ee..60c3c74427d9 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -28,4 +28,7 @@ int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask, void **p
 void pnv_ocxl_spa_release(void *platform_data);
 int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);
 
+int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
+		      uint64_t lpcr, void __iomem **arva);
+void pnv_ocxl_unmap_lpar(void __iomem *arva);
 #endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
index ecdad219d704..57fc1062677b 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -483,3 +483,48 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
 	return rc;
 }
 EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe_from_cache);
+
+int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
+		      uint64_t lpcr, void __iomem **arva)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	u64 mmio_atsd;
+	int rc;
+
+	/* ATSD physical address.
+	 * ATSD LAUNCH register: write access initiates a shoot down to
+	 * initiate the TLB Invalidate command.
+	 */
+	rc = of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
+					0, &mmio_atsd);
+	if (rc) {
+		dev_info(&dev->dev, "No available ATSD found\n");
+		return rc;
+	}
+
+	/* Assign a register set to a Logical Partition and MMIO ATSD
+	 * LPARID register to the required value.
+	 */
+	rc = opal_npu_map_lpar(phb->opal_id, pci_dev_id(dev),
+			       lparid, lpcr);
+	if (rc) {
+		dev_err(&dev->dev, "Error mapping device to LPAR: %d\n", rc);
+		return rc;
+	}
+
+	*arva = ioremap(mmio_atsd, 24);
+	if (!(*arva)) {
+		dev_warn(&dev->dev, "ioremap failed - mmio_atsd: %#llx\n", mmio_atsd);
+		rc = -ENOMEM;
+	}
+
+	return rc;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_map_lpar);
+
+void pnv_ocxl_unmap_lpar(void __iomem *arva)
+{
+	iounmap(arva);
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar);
-- 
2.28.0


^ permalink raw reply related

* [PATCH V4 3/5] ocxl: Update the Process Element Entry
From: Christophe Lombard @ 2020-11-25 15:50 UTC (permalink / raw)
  To: linuxppc-dev, fbarrat, ajd
In-Reply-To: <20201125155013.39955-1-clombard@linux.vnet.ibm.com>

To complete the MMIO based mechanism, the fields: PASID, bus, device and
function of the Process Element Entry have to be filled. (See
OpenCAPI Power Platform Architecture document)

                   Hypervisor Process Element Entry
Word
    0 1 .... 7  8  ...... 12  13 ..15  16.... 19  20 ........... 31
0                  OSL Configuration State (0:31)
1                  OSL Configuration State (32:63)
2               PASID                      |    Reserved
3       Bus   |   Device    |Function |        Reserved
4                             Reserved
5                             Reserved
6                               ....

Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
---
 drivers/misc/ocxl/context.c       | 4 +++-
 drivers/misc/ocxl/link.c          | 4 +++-
 drivers/misc/ocxl/ocxl_internal.h | 9 ++++++---
 drivers/scsi/cxlflash/ocxl_hw.c   | 6 ++++--
 include/misc/ocxl.h               | 2 +-
 5 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index c21f65a5c762..9eb0d93b01c6 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -70,6 +70,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, struct mm_struct *mm)
 {
 	int rc;
 	unsigned long pidr = 0;
+	struct pci_dev *dev;
 
 	// Locks both status & tidr
 	mutex_lock(&ctx->status_mutex);
@@ -81,8 +82,9 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, struct mm_struct *mm)
 	if (mm)
 		pidr = mm->context.id;
 
+	dev = to_pci_dev(ctx->afu->fn->dev.parent);
 	rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid, pidr, ctx->tidr,
-			      amr, mm, xsl_fault_error, ctx);
+			      amr, pci_dev_id(dev), mm, xsl_fault_error, ctx);
 	if (rc)
 		goto out;
 
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index fd73d3bc0eb6..77381dda2c45 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -494,7 +494,7 @@ static u64 calculate_cfg_state(bool kernel)
 }
 
 int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
-		u64 amr, struct mm_struct *mm,
+		u64 amr, u16 bdf, struct mm_struct *mm,
 		void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
 		void *xsl_err_data)
 {
@@ -529,6 +529,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
 
 	memset(pe, 0, sizeof(struct ocxl_process_element));
 	pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
+	pe->pasid = cpu_to_be32(pasid << (31 - 19));
+	pe->bdf = cpu_to_be16(bdf);
 	pe->lpid = cpu_to_be32(mfspr(SPRN_LPID));
 	pe->pid = cpu_to_be32(pidr);
 	pe->tid = cpu_to_be32(tidr);
diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
index 0bad0a123af6..10125a22d5a5 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -84,13 +84,16 @@ struct ocxl_context {
 
 struct ocxl_process_element {
 	__be64 config_state;
-	__be32 reserved1[11];
+	__be32 pasid;
+	__be16 bdf;
+	__be16 reserved1;
+	__be32 reserved2[9];
 	__be32 lpid;
 	__be32 tid;
 	__be32 pid;
-	__be32 reserved2[10];
+	__be32 reserved3[10];
 	__be64 amr;
-	__be32 reserved3[3];
+	__be32 reserved4[3];
 	__be32 software_state;
 };
 
diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
index e4e0d767b98e..244fc27215dc 100644
--- a/drivers/scsi/cxlflash/ocxl_hw.c
+++ b/drivers/scsi/cxlflash/ocxl_hw.c
@@ -329,6 +329,7 @@ static int start_context(struct ocxlflash_context *ctx)
 	struct ocxl_hw_afu *afu = ctx->hw_afu;
 	struct ocxl_afu_config *acfg = &afu->acfg;
 	void *link_token = afu->link_token;
+	struct pci_dev *pdev = afu->pdev;
 	struct device *dev = afu->dev;
 	bool master = ctx->master;
 	struct mm_struct *mm;
@@ -360,8 +361,9 @@ static int start_context(struct ocxlflash_context *ctx)
 		mm = current->mm;
 	}
 
-	rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0, mm,
-			      ocxlflash_xsl_fault, ctx);
+	rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0,
+			      pci_dev_id(pdev), mm, ocxlflash_xsl_fault,
+			      ctx);
 	if (unlikely(rc)) {
 		dev_err(dev, "%s: ocxl_link_add_pe failed rc=%d\n",
 			__func__, rc);
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index e013736e275d..3ed736da02c8 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -447,7 +447,7 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle);
  * defined
  */
 int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
-		u64 amr, struct mm_struct *mm,
+		u64 amr, u16 bdf, struct mm_struct *mm,
 		void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
 		void *xsl_err_data);
 
-- 
2.28.0


^ permalink raw reply related

* [PATCH V4 0/5] ocxl: Mmio invalidation support
From: Christophe Lombard @ 2020-11-25 15:50 UTC (permalink / raw)
  To: linuxppc-dev, fbarrat, ajd

OpenCAPI 4.0/5.0 with TLBI/SLBI Snooping, is not used due to performance
problems caused by the PAU having to process all incoming TLBI/SLBI
commands which will cause them to back up on the PowerBus.

When the Address Translation Mode requires TLB operations to be initiated
using MMIO registers, a set of registers like the following is used:
• XTS MMIO ATSD0 LPARID register
• XTS MMIO ATSD0 AVA register
• XTS MMIO ATSD0 launch register, write access initiates a shoot down
• XTS MMIO ATSD0 status register

The MMIO based mechanism also blocks the NPU/PAU from snooping TLBIE
commands from the PowerBus.

The Shootdown commands (ATSD) will be generated using MMIO registers
in the NPU/PAU and sent to the device.

Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>

---
Changelog[v4]
 - Rebase to latest upstream.
 - Correct a typo in page size

Changelog[v3]
 - Rebase to latest upstream.
 - Add page_size argument in pnv_ocxl_tlb_invalidate()
 - Remove double pointer
 
Changelog[v2]
 - Rebase to latest upstream.
 - Create a set of smaller patches
 - Move the device tree parsing and ioremap() for the shootdown page in a
   platform-specific file (powernv)
 - Release the shootdown page in release_xsl()
 - Initialize atsd_lock
 - Move the code to initiate the TLB Invalidate command in a
   platform-specific file (powernv)
 - Use the notifier invalidate_range
---

Christophe Lombard (5):
  ocxl: Assign a register set to a Logical Partition
  ocxl: Initiate a TLB invalidate command
  ocxl: Update the Process Element Entry
  ocxl: Add mmu notifier
  ocxl: Add new kernel traces

 arch/powerpc/include/asm/pnv-ocxl.h   |  54 ++++++++++++
 arch/powerpc/platforms/powernv/ocxl.c | 114 ++++++++++++++++++++++++++
 drivers/misc/ocxl/context.c           |   4 +-
 drivers/misc/ocxl/link.c              |  70 +++++++++++++++-
 drivers/misc/ocxl/ocxl_internal.h     |   9 +-
 drivers/misc/ocxl/trace.h             |  64 +++++++++++++++
 drivers/scsi/cxlflash/ocxl_hw.c       |   6 +-
 include/misc/ocxl.h                   |   2 +-
 8 files changed, 314 insertions(+), 9 deletions(-)

-- 
2.28.0


^ permalink raw reply

* Re: [PATCH net 1/2] ibmvnic: Ensure that SCRQ entry reads are correctly ordered
From: Thomas Falcon @ 2020-11-25 15:26 UTC (permalink / raw)
  To: Michael Ellerman, netdev
  Cc: cforno12, ljp, ricklind, dnbanerg, drt, brking, sukadev,
	linuxppc-dev
In-Reply-To: <87o8jmyosh.fsf@mpe.ellerman.id.au>

On 11/24/20 11:43 PM, Michael Ellerman wrote:
> Thomas Falcon <tlfalcon@linux.ibm.com> writes:
>> Ensure that received Subordinate Command-Response Queue (SCRQ)
>> entries are properly read in order by the driver. These queues
>> are used in the ibmvnic device to process RX buffer and TX completion
>> descriptors. dma_rmb barriers have been added after checking for a
>> pending descriptor to ensure the correct descriptor entry is checked
>> and after reading the SCRQ descriptor to ensure the entire
>> descriptor is read before processing.
>>
>> Fixes: 032c5e828 ("Driver for IBM System i/p VNIC protocol")
>> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
>> ---
>>   drivers/net/ethernet/ibm/ibmvnic.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
>> index 2aa40b2..489ed5e 100644
>> --- a/drivers/net/ethernet/ibm/ibmvnic.c
>> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
>> @@ -2403,6 +2403,8 @@ static int ibmvnic_poll(struct napi_struct *napi, int budget)
>>   
>>   		if (!pending_scrq(adapter, adapter->rx_scrq[scrq_num]))
>>   			break;
>> +		/* ensure that we do not prematurely exit the polling loop */
>> +		dma_rmb();
> I'd be happier if these comments were more specific about which read(s)
> they are ordering vs which other read(s).
>
> I'm sure it's obvious to you, but it may not be to a future author,
> and/or after the code has been refactored over time.

Thank you for reviewing! I will submit a v2 soon with clearer comments 
on the reads being ordered here.

Thanks,

Tom


>
>>   		next = ibmvnic_next_scrq(adapter, adapter->rx_scrq[scrq_num]);
>>   		rx_buff =
>>   		    (struct ibmvnic_rx_buff *)be64_to_cpu(next->
>> @@ -3098,6 +3100,9 @@ static int ibmvnic_complete_tx(struct ibmvnic_adapter *adapter,
>>   		unsigned int pool = scrq->pool_index;
>>   		int num_entries = 0;
>>   
>> +		/* ensure that the correct descriptor entry is read */
>> +		dma_rmb();
>> +
>>   		next = ibmvnic_next_scrq(adapter, scrq);
>>   		for (i = 0; i < next->tx_comp.num_comps; i++) {
>>   			if (next->tx_comp.rcs[i]) {
>> @@ -3498,6 +3503,9 @@ static union sub_crq *ibmvnic_next_scrq(struct ibmvnic_adapter *adapter,
>>   	}
>>   	spin_unlock_irqrestore(&scrq->lock, flags);
>>   
>> +	/* ensure that the entire SCRQ descriptor is read */
>> +	dma_rmb();
>> +
>>   	return entry;
>>   }
> cheers

^ permalink raw reply

* [PATCH v3 1/2] genirq/irqdomain: Add an irq_create_mapping_affinity() function
From: Laurent Vivier @ 2020-11-25 15:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Laurent Vivier, Marc Zyngier, Michael S . Tsirkin, linux-pci,
	Greg Kurz, linux-block, Paul Mackerras, Thomas Gleixner,
	linuxppc-dev, Christoph Hellwig
In-Reply-To: <20201125150932.1150619-1-lvivier@redhat.com>

There is currently no way to convey the affinity of an interrupt
via irq_create_mapping(), which creates issues for devices that
expect that affinity to be managed by the kernel.

In order to sort this out, rename irq_create_mapping() to
irq_create_mapping_affinity() with an additional affinity parameter
that can conveniently passed down to irq_domain_alloc_descs().

irq_create_mapping() is then re-implemented as a wrapper around
irq_create_mapping_affinity().

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
---
 include/linux/irqdomain.h | 12 ++++++++++--
 kernel/irq/irqdomain.c    | 13 ++++++++-----
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 71535e87109f..ea5a337e0f8b 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -384,11 +384,19 @@ extern void irq_domain_associate_many(struct irq_domain *domain,
 extern void irq_domain_disassociate(struct irq_domain *domain,
 				    unsigned int irq);
 
-extern unsigned int irq_create_mapping(struct irq_domain *host,
-				       irq_hw_number_t hwirq);
+extern unsigned int irq_create_mapping_affinity(struct irq_domain *host,
+				      irq_hw_number_t hwirq,
+				      const struct irq_affinity_desc *affinity);
 extern unsigned int irq_create_fwspec_mapping(struct irq_fwspec *fwspec);
 extern void irq_dispose_mapping(unsigned int virq);
 
+static inline unsigned int irq_create_mapping(struct irq_domain *host,
+					      irq_hw_number_t hwirq)
+{
+	return irq_create_mapping_affinity(host, hwirq, NULL);
+}
+
+
 /**
  * irq_linear_revmap() - Find a linux irq from a hw irq number.
  * @domain: domain owning this hardware interrupt
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf8b374b892d..e4ca69608f3b 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -624,17 +624,19 @@ unsigned int irq_create_direct_mapping(struct irq_domain *domain)
 EXPORT_SYMBOL_GPL(irq_create_direct_mapping);
 
 /**
- * irq_create_mapping() - Map a hardware interrupt into linux irq space
+ * irq_create_mapping_affinity() - Map a hardware interrupt into linux irq space
  * @domain: domain owning this hardware interrupt or NULL for default domain
  * @hwirq: hardware irq number in that domain space
+ * @affinity: irq affinity
  *
  * Only one mapping per hardware interrupt is permitted. Returns a linux
  * irq number.
  * If the sense/trigger is to be specified, set_irq_type() should be called
  * on the number returned from that call.
  */
-unsigned int irq_create_mapping(struct irq_domain *domain,
-				irq_hw_number_t hwirq)
+unsigned int irq_create_mapping_affinity(struct irq_domain *domain,
+				       irq_hw_number_t hwirq,
+				       const struct irq_affinity_desc *affinity)
 {
 	struct device_node *of_node;
 	int virq;
@@ -660,7 +662,8 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
 	}
 
 	/* Allocate a virtual interrupt number */
-	virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), NULL);
+	virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node),
+				      affinity);
 	if (virq <= 0) {
 		pr_debug("-> virq allocation failed\n");
 		return 0;
@@ -676,7 +679,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain,
 
 	return virq;
 }
-EXPORT_SYMBOL_GPL(irq_create_mapping);
+EXPORT_SYMBOL_GPL(irq_create_mapping_affinity);
 
 /**
  * irq_create_strict_mappings() - Map a range of hw irqs to fixed linux irqs
-- 
2.28.0


^ permalink raw reply related

* [PATCH v3 2/2] powerpc/pseries: pass MSI affinity to irq_create_mapping()
From: Laurent Vivier @ 2020-11-25 15:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Laurent Vivier, Marc Zyngier, Michael S . Tsirkin, linux-pci,
	Greg Kurz, linux-block, Paul Mackerras, Thomas Gleixner,
	linuxppc-dev, Christoph Hellwig
In-Reply-To: <20201125150932.1150619-1-lvivier@redhat.com>

With virtio multiqueue, normally each queue IRQ is mapped to a CPU.

But since commit 0d9f0a52c8b9f ("virtio_scsi: use virtio IRQ affinity")
this is broken on pseries.

The affinity is correctly computed in msi_desc but this is not applied
to the system IRQs.

It appears the affinity is correctly passed to rtas_setup_msi_irqs() but
lost at this point and never passed to irq_domain_alloc_descs()
(see commit 06ee6d571f0e ("genirq: Add affinity hint to irq allocation"))
because irq_create_mapping() doesn't take an affinity parameter.

As the previous patch has added the affinity parameter to
irq_create_mapping() we can forward the affinity from rtas_setup_msi_irqs()
to irq_domain_alloc_descs().

With this change, the virtqueues are correctly dispatched between the CPUs
on pseries.

BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1702939
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
---
 arch/powerpc/platforms/pseries/msi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c
index 133f6adcb39c..b3ac2455faad 100644
--- a/arch/powerpc/platforms/pseries/msi.c
+++ b/arch/powerpc/platforms/pseries/msi.c
@@ -458,7 +458,8 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type)
 			return hwirq;
 		}

-		virq = irq_create_mapping(NULL, hwirq);
+		virq = irq_create_mapping_affinity(NULL, hwirq,
+						   entry->affinity);

 		if (!virq) {
 			pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);
-- 
2.28.0

^ permalink raw reply related

* [PATCH v3 0/2] powerpc/pseries: fix MSI/X IRQ affinity on pseries
From: Laurent Vivier @ 2020-11-25 15:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: Laurent Vivier, Marc Zyngier, Michael S . Tsirkin, linux-pci,
	Greg Kurz, linux-block, Paul Mackerras, Thomas Gleixner,
	linuxppc-dev, Christoph Hellwig

With virtio, in multiqueue case, each queue IRQ is normally
bound to a different CPU using the affinity mask.

This works fine on x86_64 but totally ignored on pseries.

This is not obvious at first look because irqbalance is doing
some balancing to improve that.

It appears that the "managed" flag set in the MSI entry
is never copied to the system IRQ entry.

This series passes the affinity mask from rtas_setup_msi_irqs()
to irq_domain_alloc_descs() by adding an affinity parameter to
irq_create_mapping().

The first patch adds the parameter (no functional change), the
second patch passes the actual affinity mask to irq_create_mapping()
in rtas_setup_msi_irqs().

For instance, with 32 CPUs VM and 32 queues virtio-scsi interface:

... -smp 32 -device virtio-scsi-pci,id=virtio_scsi_pci0,num_queues=32

for IRQ in $(grep virtio2-request /proc/interrupts |cut -d: -f1); do
    for file in /proc/irq/$IRQ/ ; do
        echo -n "IRQ: $(basename $file) CPU: " ; cat $file/smp_affinity_list
    done
done

Without the patch (and without irqbalanced)

IRQ: 268 CPU: 0-31
IRQ: 269 CPU: 0-31
IRQ: 270 CPU: 0-31
IRQ: 271 CPU: 0-31
IRQ: 272 CPU: 0-31
IRQ: 273 CPU: 0-31
IRQ: 274 CPU: 0-31
IRQ: 275 CPU: 0-31
IRQ: 276 CPU: 0-31
IRQ: 277 CPU: 0-31
IRQ: 278 CPU: 0-31
IRQ: 279 CPU: 0-31
IRQ: 280 CPU: 0-31
IRQ: 281 CPU: 0-31
IRQ: 282 CPU: 0-31
IRQ: 283 CPU: 0-31
IRQ: 284 CPU: 0-31
IRQ: 285 CPU: 0-31
IRQ: 286 CPU: 0-31
IRQ: 287 CPU: 0-31
IRQ: 288 CPU: 0-31
IRQ: 289 CPU: 0-31
IRQ: 290 CPU: 0-31
IRQ: 291 CPU: 0-31
IRQ: 292 CPU: 0-31
IRQ: 293 CPU: 0-31
IRQ: 294 CPU: 0-31
IRQ: 295 CPU: 0-31
IRQ: 296 CPU: 0-31
IRQ: 297 CPU: 0-31
IRQ: 298 CPU: 0-31
IRQ: 299 CPU: 0-31

With the patch:

IRQ: 265 CPU: 0
IRQ: 266 CPU: 1
IRQ: 267 CPU: 2
IRQ: 268 CPU: 3
IRQ: 269 CPU: 4
IRQ: 270 CPU: 5
IRQ: 271 CPU: 6
IRQ: 272 CPU: 7
IRQ: 273 CPU: 8
IRQ: 274 CPU: 9
IRQ: 275 CPU: 10
IRQ: 276 CPU: 11
IRQ: 277 CPU: 12
IRQ: 278 CPU: 13
IRQ: 279 CPU: 14
IRQ: 280 CPU: 15
IRQ: 281 CPU: 16
IRQ: 282 CPU: 17
IRQ: 283 CPU: 18
IRQ: 284 CPU: 19
IRQ: 285 CPU: 20
IRQ: 286 CPU: 21
IRQ: 287 CPU: 22
IRQ: 288 CPU: 23
IRQ: 289 CPU: 24
IRQ: 290 CPU: 25
IRQ: 291 CPU: 26
IRQ: 292 CPU: 27
IRQ: 293 CPU: 28
IRQ: 294 CPU: 29
IRQ: 295 CPU: 30
IRQ: 299 CPU: 31

This matches what we have on an x86_64 system.

v3: update changelog of PATCH 1 with comments from Thomas Gleixner and
    Marc Zyngier.
v2: add a wrapper around original irq_create_mapping() with the
    affinity parameter. Update comments

Laurent Vivier (2):
  genirq/irqdomain: Add an irq_create_mapping_affinity() function
  powerpc/pseries: pass MSI affinity to irq_create_mapping()

 arch/powerpc/platforms/pseries/msi.c |  3 ++-
 include/linux/irqdomain.h            | 12 ++++++++++--
 kernel/irq/irqdomain.c               | 13 ++++++++-----
 3 files changed, 20 insertions(+), 8 deletions(-)

-- 
2.28.0



^ permalink raw reply

* Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function
From: Laurent Vivier @ 2020-11-25 15:00 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Michael S . Tsirkin, linux-pci, linux-kernel, Greg Kurz,
	linux-block, Paul Mackerras, Thomas Gleixner, linuxppc-dev,
	Christoph Hellwig
In-Reply-To: <5100171ff6d4c3efffe008e1e0bf3707@kernel.org>

On 25/11/2020 15:54, Marc Zyngier wrote:
> On 2020-11-25 14:09, Laurent Vivier wrote:
>> On 25/11/2020 14:20, Thomas Gleixner wrote:
>>> Laurent,
>>>
>>> On Wed, Nov 25 2020 at 12:16, Laurent Vivier wrote:
>>>
>>> The proper subsystem prefix is: 'genirq/irqdomain:' and the first letter
>>> after the colon wants to be uppercase.
>>
>> Ok.
>>
>>>> This function adds an affinity parameter to irq_create_mapping().
>>>> This parameter is needed to pass it to irq_domain_alloc_descs().
>>>
>>> A changelog has to explain the WHY. 'The parameter is needed' is not
>>> really useful information.
>>>
>>
>> The reason of this change is explained in PATCH 2.
>>
>> I have two patches, one to change the interface with no functional
>> change (PATCH 1) and
>> one to fix the problem (PATCH 2). Moreover they don't cover the same subsystems.
>>
>> I can either:
>> - merge the two patches
>> - or make a reference in the changelog of PATCH 1 to PATCH 2
>>   (something like "(see folowing patch "powerpc/pseries: pass MSI affinity to
>>    irq_create_mapping()")")
>> - or copy some information from PATCH 2
>>   (something like "this parameter is needed by rtas_setup_msi_irqs()
>> to pass the affinity
>>    to irq_domain_alloc_descs() to fix multiqueue affinity")
>>
>> What do you prefer?
> 
> How about something like this for the first patch:
> 
> "There is currently no way to convey the affinity of an interrupt
>  via irq_create_mapping(), which creates issues for devices that
>  expect that affinity to be managed by the kernel.
> 
>  In order to sort this out, rename irq_create_mapping() to
>  irq_create_mapping_affinity() with an additional affinity parameter
>  that can conveniently passed down to irq_domain_alloc_descs().
> 
>  irq_create_mapping() is then re-implemented as a wrapper around
>  irq_create_mapping_affinity()."

It looks perfect. I update the changelog with that.

Thanks,
Laurent


^ permalink raw reply

* Re: [PATCH v6 03/22] powerpc/book3s64/kuap/kuep: Make KUAP and KUEP a subfeature of PPC_MEM_KEYS
From: Aneesh Kumar K.V @ 2020-11-25 14:57 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev, mpe
In-Reply-To: <a98c3e0e-ddc9-3278-b707-669627c45e88@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:

> Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :
....

> diff --git a/arch/powerpc/mm/book3s64/pkeys.c b/arch/powerpc/mm/book3s64/pkeys.c
>> index b1d091a97611..7dc71f85683d 100644
>> --- a/arch/powerpc/mm/book3s64/pkeys.c
>> +++ b/arch/powerpc/mm/book3s64/pkeys.c
>> @@ -89,12 +89,14 @@ static int scan_pkey_feature(void)
>>   		}
>>   	}
>>   
>> +#ifdef CONFIG_PPC_MEM_KEYS
>>   	/*
>>   	 * Adjust the upper limit, based on the number of bits supported by
>>   	 * arch-neutral code.
>>   	 */
>>   	pkeys_total = min_t(int, pkeys_total,
>>   			    ((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) + 1));
>
> I don't think we need an #ifdef here. I thing an 'if (IS_ENABLED(CONFIG_PPC_MEM_KEYS))' should make it.

ppc64/arch/powerpc/mm/book3s64/pkeys.c: In function ‘scan_pkey_feature’:                                                                                     
ppc64/arch/powerpc/mm/book3s64/pkeys.c:98:33: error: ‘VM_PKEY_SHIFT’ undeclared (first use in this function)                                                 
   98 |         ((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT) + 1));                                
      |                                 ^~~~~~~~~~~~~                                          
pkey headers only include arch headers if PPC_MEM_KEYS is enabled. ie, 

#ifdef CONFIG_ARCH_HAS_PKEYS
#include <asm/pkeys.h>
#else /* ! CONFIG_ARCH_HAS_PKEYS */
#define arch_max_pkey() (1)
#define execute_only_pkey(mm) (0)
#define arch_override_mprotect_pkey(vma, prot, pkey) (0)
#define PKEY_DEDICATED_EXECUTE_ONLY 0
#define ARCH_VM_PKEY_FLAGS 0
..

Sorting that out should be another patch series. 



>
>> +#endif
>>   	return pkeys_total;
>>   }

^ permalink raw reply

* Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function
From: Marc Zyngier @ 2020-11-25 14:54 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Michael S . Tsirkin, linux-pci, linux-kernel, Greg Kurz,
	linux-block, Paul Mackerras, Thomas Gleixner, linuxppc-dev,
	Christoph Hellwig
In-Reply-To: <e32641f7-0993-8923-7d74-5ac57a60f10d@redhat.com>

On 2020-11-25 14:09, Laurent Vivier wrote:
> On 25/11/2020 14:20, Thomas Gleixner wrote:
>> Laurent,
>> 
>> On Wed, Nov 25 2020 at 12:16, Laurent Vivier wrote:
>> 
>> The proper subsystem prefix is: 'genirq/irqdomain:' and the first 
>> letter
>> after the colon wants to be uppercase.
> 
> Ok.
> 
>>> This function adds an affinity parameter to irq_create_mapping().
>>> This parameter is needed to pass it to irq_domain_alloc_descs().
>> 
>> A changelog has to explain the WHY. 'The parameter is needed' is not
>> really useful information.
>> 
> 
> The reason of this change is explained in PATCH 2.
> 
> I have two patches, one to change the interface with no functional
> change (PATCH 1) and
> one to fix the problem (PATCH 2). Moreover they don't cover the same 
> subsystems.
> 
> I can either:
> - merge the two patches
> - or make a reference in the changelog of PATCH 1 to PATCH 2
>   (something like "(see folowing patch "powerpc/pseries: pass MSI 
> affinity to
>    irq_create_mapping()")")
> - or copy some information from PATCH 2
>   (something like "this parameter is needed by rtas_setup_msi_irqs()
> to pass the affinity
>    to irq_domain_alloc_descs() to fix multiqueue affinity")
> 
> What do you prefer?

How about something like this for the first patch:

"There is currently no way to convey the affinity of an interrupt
  via irq_create_mapping(), which creates issues for devices that
  expect that affinity to be managed by the kernel.

  In order to sort this out, rename irq_create_mapping() to
  irq_create_mapping_affinity() with an additional affinity parameter
  that can conveniently passed down to irq_domain_alloc_descs().

  irq_create_mapping() is then re-implemented as a wrapper around
  irq_create_mapping_affinity()."

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply

* Re: [PATCH V3 5/5] ocxl: Add new kernel traces
From: Frederic Barrat @ 2020-11-25 14:29 UTC (permalink / raw)
  To: Christophe Lombard, linuxppc-dev, fbarrat, ajd
In-Reply-To: <20201124095838.18665-6-clombard@linux.vnet.ibm.com>



On 24/11/2020 10:58, Christophe Lombard wrote:
> Add specific kernel traces which provide information on mmu notifier and on
> pages range.
> 
> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
> ---


Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>


>   drivers/misc/ocxl/link.c  |  4 +++
>   drivers/misc/ocxl/trace.h | 64 +++++++++++++++++++++++++++++++++++++++
>   2 files changed, 68 insertions(+)
> 
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index 129d4eddc4d2..ab039c115381 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -499,6 +499,7 @@ static void invalidate_range(struct mmu_notifier *mn,
>   	unsigned long addr, pid, page_size = PAGE_SIZE;
> 
>   	pid = mm->context.id;
> +	trace_ocxl_mmu_notifier_range(start, end, pid);
> 
>   	spin_lock(&link->atsd_lock);
>   	for (addr = start; addr < end; addr += page_size)
> @@ -590,6 +591,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
>   			/* Use MMIO registers for the TLB Invalidate
>   			 * operations.
>   			 */
> +			trace_ocxl_init_mmu_notifier(pasid, mm->context.id);
>   			mmu_notifier_register(&pe_data->mmu_notifier, mm);
>   		}
>   	}
> @@ -725,6 +727,8 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
>   	} else {
>   		if (pe_data->mm) {
>   			if (link->arva) {
> +				trace_ocxl_release_mmu_notifier(pasid,
> +								pe_data->mm->context.id);
>   				mmu_notifier_unregister(&pe_data->mmu_notifier,
>   							pe_data->mm);
>   				spin_lock(&link->atsd_lock);
> diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
> index 17e21cb2addd..a33a5094ff6c 100644
> --- a/drivers/misc/ocxl/trace.h
> +++ b/drivers/misc/ocxl/trace.h
> @@ -8,6 +8,70 @@
> 
>   #include <linux/tracepoint.h>
> 
> +
> +TRACE_EVENT(ocxl_mmu_notifier_range,
> +	TP_PROTO(unsigned long start, unsigned long end, unsigned long pidr),
> +	TP_ARGS(start, end, pidr),
> +
> +	TP_STRUCT__entry(
> +		__field(unsigned long, start)
> +		__field(unsigned long, end)
> +		__field(unsigned long, pidr)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->start = start;
> +		__entry->end = end;
> +		__entry->pidr = pidr;
> +	),
> +
> +	TP_printk("start=0x%lx end=0x%lx pidr=0x%lx",
> +		__entry->start,
> +		__entry->end,
> +		__entry->pidr
> +	)
> +);
> +
> +TRACE_EVENT(ocxl_init_mmu_notifier,
> +	TP_PROTO(int pasid, unsigned long pidr),
> +	TP_ARGS(pasid, pidr),
> +
> +	TP_STRUCT__entry(
> +		__field(int, pasid)
> +		__field(unsigned long, pidr)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->pasid = pasid;
> +		__entry->pidr = pidr;
> +	),
> +
> +	TP_printk("pasid=%d, pidr=0x%lx",
> +		__entry->pasid,
> +		__entry->pidr
> +	)
> +);
> +
> +TRACE_EVENT(ocxl_release_mmu_notifier,
> +	TP_PROTO(int pasid, unsigned long pidr),
> +	TP_ARGS(pasid, pidr),
> +
> +	TP_STRUCT__entry(
> +		__field(int, pasid)
> +		__field(unsigned long, pidr)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->pasid = pasid;
> +		__entry->pidr = pidr;
> +	),
> +
> +	TP_printk("pasid=%d, pidr=0x%lx",
> +		__entry->pasid,
> +		__entry->pidr
> +	)
> +);
> +
>   DECLARE_EVENT_CLASS(ocxl_context,
>   	TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
>   	TP_ARGS(pid, spa, pasid, pidr, tidr),
> 

^ permalink raw reply

* Re: [PATCH V3 4/5] ocxl: Add mmu notifier
From: Frederic Barrat @ 2020-11-25 14:27 UTC (permalink / raw)
  To: Christophe Lombard, linuxppc-dev, fbarrat, ajd
In-Reply-To: <20201124095838.18665-5-clombard@linux.vnet.ibm.com>



On 24/11/2020 10:58, Christophe Lombard wrote:
> Add invalidate_range mmu notifier, when required (ATSD access of MMIO
> registers is available), to initiate TLB invalidation commands.
> For the time being, the ATSD0 set of registers is used by default.
> 
> The pasid and bdf values have to be configured in the Process Element
> Entry.
> The PEE must be set up to match the BDF/PASID of the AFU.
> 
> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
> ---


That looks ok too.
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>


>   drivers/misc/ocxl/link.c | 62 +++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 61 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index 77381dda2c45..129d4eddc4d2 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -2,8 +2,10 @@
>   // Copyright 2017 IBM Corp.
>   #include <linux/sched/mm.h>
>   #include <linux/mutex.h>
> +#include <linux/mm.h>
>   #include <linux/mm_types.h>
>   #include <linux/mmu_context.h>
> +#include <linux/mmu_notifier.h>
>   #include <asm/copro.h>
>   #include <asm/pnv-ocxl.h>
>   #include <asm/xive.h>
> @@ -33,6 +35,7 @@
> 
>   #define SPA_PE_VALID		0x80000000
> 
> +struct ocxl_link;
> 
>   struct pe_data {
>   	struct mm_struct *mm;
> @@ -41,6 +44,8 @@ struct pe_data {
>   	/* opaque pointer to be passed to the above callback */
>   	void *xsl_err_data;
>   	struct rcu_head rcu;
> +	struct ocxl_link *link;
> +	struct mmu_notifier mmu_notifier;
>   };
> 
>   struct spa {
> @@ -83,6 +88,8 @@ struct ocxl_link {
>   	int domain;
>   	int bus;
>   	int dev;
> +	void __iomem *arva;     /* ATSD register virtual address */
> +	spinlock_t atsd_lock;   /* to serialize shootdowns */
>   	atomic_t irq_available;
>   	struct spa *spa;
>   	void *platform_data;
> @@ -388,6 +395,7 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l
>   	link->bus = dev->bus->number;
>   	link->dev = PCI_SLOT(dev->devfn);
>   	atomic_set(&link->irq_available, MAX_IRQ_PER_LINK);
> +	spin_lock_init(&link->atsd_lock);
> 
>   	rc = alloc_spa(dev, link);
>   	if (rc)
> @@ -403,6 +411,13 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l
>   	if (rc)
>   		goto err_xsl_irq;
> 
> +	/* if link->arva is not defeined, MMIO registers are not used to
> +	 * generate TLB invalidate. PowerBus snooping is enabled.
> +	 * Otherwise, PowerBus snooping is disabled. TLB Invalidates are
> +	 * initiated using MMIO registers.
> +	 */
> +	pnv_ocxl_map_lpar(dev, mfspr(SPRN_LPID), 0, &link->arva);
> +
>   	*out_link = link;
>   	return 0;
> 
> @@ -454,6 +469,11 @@ static void release_xsl(struct kref *ref)
>   {
>   	struct ocxl_link *link = container_of(ref, struct ocxl_link, ref);
> 
> +	if (link->arva) {
> +		pnv_ocxl_unmap_lpar(link->arva);
> +		link->arva = NULL;
> +	}
> +
>   	list_del(&link->list);
>   	/* call platform code before releasing data */
>   	pnv_ocxl_spa_release(link->platform_data);
> @@ -470,6 +490,26 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle)
>   }
>   EXPORT_SYMBOL_GPL(ocxl_link_release);
> 
> +static void invalidate_range(struct mmu_notifier *mn,
> +			     struct mm_struct *mm,
> +			     unsigned long start, unsigned long end)
> +{
> +	struct pe_data *pe_data = container_of(mn, struct pe_data, mmu_notifier);
> +	struct ocxl_link *link = pe_data->link;
> +	unsigned long addr, pid, page_size = PAGE_SIZE;
> +
> +	pid = mm->context.id;
> +
> +	spin_lock(&link->atsd_lock);
> +	for (addr = start; addr < end; addr += page_size)
> +		pnv_ocxl_tlb_invalidate(link->arva, pid, addr, page_size);
> +	spin_unlock(&link->atsd_lock);
> +}
> +
> +static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = {
> +	.invalidate_range = invalidate_range,
> +};
> +
>   static u64 calculate_cfg_state(bool kernel)
>   {
>   	u64 state;
> @@ -526,6 +566,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
>   	pe_data->mm = mm;
>   	pe_data->xsl_err_cb = xsl_err_cb;
>   	pe_data->xsl_err_data = xsl_err_data;
> +	pe_data->link = link;
> +	pe_data->mmu_notifier.ops = &ocxl_mmu_notifier_ops;
> 
>   	memset(pe, 0, sizeof(struct ocxl_process_element));
>   	pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
> @@ -542,8 +584,16 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
>   	 * by the nest MMU. If we have a kernel context, TLBIs are
>   	 * already global.
>   	 */
> -	if (mm)
> +	if (mm) {
>   		mm_context_add_copro(mm);
> +		if (link->arva) {
> +			/* Use MMIO registers for the TLB Invalidate
> +			 * operations.
> +			 */
> +			mmu_notifier_register(&pe_data->mmu_notifier, mm);
> +		}
> +	}
> +
>   	/*
>   	 * Barrier is to make sure PE is visible in the SPA before it
>   	 * is used by the device. It also helps with the global TLBI
> @@ -674,6 +724,16 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
>   		WARN(1, "Couldn't find pe data when removing PE\n");
>   	} else {
>   		if (pe_data->mm) {
> +			if (link->arva) {
> +				mmu_notifier_unregister(&pe_data->mmu_notifier,
> +							pe_data->mm);
> +				spin_lock(&link->atsd_lock);
> +				pnv_ocxl_tlb_invalidate(link->arva,
> +							pe_data->mm->context.id,
> +							0ull,
> +							PAGE_SIZE);
> +				spin_unlock(&link->atsd_lock);
> +			}
>   			mm_context_remove_copro(pe_data->mm);
>   			mmdrop(pe_data->mm);
>   		}
> 

^ permalink raw reply

* Re: [PATCH v6 10/22] powerpc/book3s64/pkeys: Store/restore userspace AMR/IAMR correctly on entry and exit from kernel
From: Christophe Leroy @ 2020-11-25 14:16 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, mpe; +Cc: Sandipan Das
In-Reply-To: <5a471282-ff4f-cfea-8d30-a0479f0385a1@linux.ibm.com>



Le 25/11/2020 à 14:55, Aneesh Kumar K.V a écrit :
> On 11/25/20 7:22 PM, Christophe Leroy wrote:
>>
>>
>> Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :
>>> This prepare kernel to operate with a different value than userspace AMR/IAMR.
>>> For this, AMR/IAMR need to be saved and restored on entry and return from the
>>> kernel.
>>>
>>> With KUAP we modify kernel AMR when accessing user address from the kernel
>>> via copy_to/from_user interfaces. We don't need to modify IAMR value in
>>> similar fashion.
>>>
>>> If MMU_FTR_PKEY is enabled we need to save AMR/IAMR in pt_regs on entering
>>> kernel from userspace. If not we can assume that AMR/IAMR is not modified
>>> from userspace.
>>>
>>> We need to save AMR if we have MMU_FTR_KUAP feature enabled and we are
>>> interrupted within kernel. This is required so that if we get interrupted
>>> within copy_to/from_user we continue with the right AMR value.
>>>
>>> If we hae MMU_FTR_KUEP enabled we need to restore IAMR on return to userspace
>>> beause kernel will be running with a different IAMR value.
>>>
>>> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com>
>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
>>> ---
>>>   arch/powerpc/include/asm/book3s/64/kup.h | 222 +++++++++++++++++++----
>>>   arch/powerpc/include/asm/ptrace.h        |   5 +-
>>>   arch/powerpc/kernel/asm-offsets.c        |   2 +
>>>   arch/powerpc/kernel/entry_64.S           |   6 +-
>>>   arch/powerpc/kernel/exceptions-64s.S     |   4 +-
>>>   arch/powerpc/kernel/syscall_64.c         |  32 +++-
>>>   6 files changed, 225 insertions(+), 46 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
>>> index 1d38eab83d48..4dbb2d53fd8f 100644
>>> --- a/arch/powerpc/include/asm/book3s/64/kup.h
>>> +++ b/arch/powerpc/include/asm/book3s/64/kup.h
>>> @@ -13,17 +13,46 @@
>>>   #ifdef __ASSEMBLY__
>>> -.macro kuap_restore_amr    gpr1, gpr2
>>> -#ifdef CONFIG_PPC_KUAP
>>> +.macro kuap_restore_user_amr gpr1
>>> +#if defined(CONFIG_PPC_PKEY)
>>>       BEGIN_MMU_FTR_SECTION_NESTED(67)
>>> -    mfspr    \gpr1, SPRN_AMR
>>> +    /*
>>> +     * AMR and IAMR are going to be different when
>>> +     * returning to userspace.
>>> +     */
>>> +    ld    \gpr1, STACK_REGS_AMR(r1)
>>> +    isync
>>> +    mtspr    SPRN_AMR, \gpr1
>>> +    /*
>>> +     * Restore IAMR only when returning to userspace
>>> +     */
>>> +    ld    \gpr1, STACK_REGS_IAMR(r1)
>>> +    mtspr    SPRN_IAMR, \gpr1
>>> +
>>> +    /* No isync required, see kuap_restore_user_amr() */
>>> +    END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_PKEY, 67)
>>> +#endif
>>> +.endm
>>> +
>>> +.macro kuap_restore_kernel_amr    gpr1, gpr2
>>> +#if defined(CONFIG_PPC_PKEY)
>>> +
>>> +    BEGIN_MMU_FTR_SECTION_NESTED(67)
>>> +    /*
>>> +     * AMR is going to be mostly the same since we are
>>> +     * returning to the kernel. Compare and do a mtspr.
>>> +     */
>>>       ld    \gpr2, STACK_REGS_AMR(r1)
>>> +    mfspr    \gpr1, SPRN_AMR
>>>       cmpd    \gpr1, \gpr2
>>> -    beq    998f
>>> +    beq    100f
>>>       isync
>>>       mtspr    SPRN_AMR, \gpr2
>>> -    /* No isync required, see kuap_restore_amr() */
>>> -998:
>>> +    /*
>>> +     * No isync required, see kuap_restore_amr()
>>> +     * No need to restore IAMR when returning to kernel space.
>>> +     */
>>> +100:
>>>       END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
>>>   #endif
>>>   .endm
>>> @@ -42,23 +71,98 @@
>>>   .endm
>>>   #endif
>>> +/*
>>> + *    if (pkey) {
>>> + *
>>> + *        save AMR -> stack;
>>> + *        if (kuap) {
>>> + *            if (AMR != BLOCKED)
>>> + *                KUAP_BLOCKED -> AMR;
>>> + *        }
>>> + *        if (from_user) {
>>> + *            save IAMR -> stack;
>>> + *            if (kuep) {
>>> + *                KUEP_BLOCKED ->IAMR
>>> + *            }
>>> + *        }
>>> + *        return;
>>> + *    }
>>> + *
>>> + *    if (kuap) {
>>> + *        if (from_kernel) {
>>> + *            save AMR -> stack;
>>> + *            if (AMR != BLOCKED)
>>> + *                KUAP_BLOCKED -> AMR;
>>> + *        }
>>> + *
>>> + *    }
>>> + */
>>>   .macro kuap_save_amr_and_lock gpr1, gpr2, use_cr, msr_pr_cr
>>> -#ifdef CONFIG_PPC_KUAP
>>> +#if defined(CONFIG_PPC_PKEY)
>>> +
>>> +    /*
>>> +     * if both pkey and kuap is disabled, nothing to do
>>> +     */
>>> +    BEGIN_MMU_FTR_SECTION_NESTED(68)
>>> +    b    100f  // skip_save_amr
>>> +    END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY | MMU_FTR_KUAP, 68)
>>> +
>>> +    /*
>>> +     * if pkey is disabled and we are entering from userspace
>>> +     * don't do anything.
>>> +     */
>>>       BEGIN_MMU_FTR_SECTION_NESTED(67)
>>>       .ifnb \msr_pr_cr
>>> -    bne    \msr_pr_cr, 99f
>>> +    /*
>>> +     * Without pkey we are not changing AMR outside the kernel
>>> +     * hence skip this completely.
>>> +     */
>>> +    bne    \msr_pr_cr, 100f  // from userspace
>>>       .endif
>>> +        END_MMU_FTR_SECTION_NESTED_IFCLR(MMU_FTR_PKEY, 67)
>>> +
>>> +    /*
>>> +     * pkey is enabled or pkey is disabled but entering from kernel
>>> +     */
>>>       mfspr    \gpr1, SPRN_AMR
>>>       std    \gpr1, STACK_REGS_AMR(r1)
>>> -    li    \gpr2, (AMR_KUAP_BLOCKED >> AMR_KUAP_SHIFT)
>>> -    sldi    \gpr2, \gpr2, AMR_KUAP_SHIFT
>>> +
>>> +    /*
>>> +     * update kernel AMR with AMR_KUAP_BLOCKED only
>>> +     * if KUAP feature is enabled
>>> +     */
>>> +    BEGIN_MMU_FTR_SECTION_NESTED(69)
>>> +    LOAD_REG_IMMEDIATE(\gpr2, AMR_KUAP_BLOCKED)
>>>       cmpd    \use_cr, \gpr1, \gpr2
>>> -    beq    \use_cr, 99f
>>> -    // We don't isync here because we very recently entered via rfid
>>> +    beq    \use_cr, 102f
>>> +    /*
>>> +     * We don't isync here because we very recently entered via an interrupt
>>> +     */
>>>       mtspr    SPRN_AMR, \gpr2
>>>       isync
>>> -99:
>>> -    END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 67)
>>> +102:
>>> +    END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUAP, 69)
>>> +
>>> +    /*
>>> +     * if entering from kernel we don't need save IAMR
>>> +     */
>>> +    .ifnb \msr_pr_cr
>>> +    beq    \msr_pr_cr, 100f // from kernel space
>>> +    mfspr    \gpr1, SPRN_IAMR
>>> +    std    \gpr1, STACK_REGS_IAMR(r1)
>>> +
>>> +    /*
>>> +     * update kernel IAMR with AMR_KUEP_BLOCKED only
>>> +     * if KUEP feature is enabled
>>> +     */
>>> +    BEGIN_MMU_FTR_SECTION_NESTED(70)
>>> +    LOAD_REG_IMMEDIATE(\gpr2, AMR_KUEP_BLOCKED)
>>> +    mtspr    SPRN_IAMR, \gpr2
>>> +    isync
>>> +    END_MMU_FTR_SECTION_NESTED_IFSET(MMU_FTR_KUEP, 70)
>>> +    .endif
>>> +
>>> +100: // skip_save_amr
>>>   #endif
>>>   .endm
>>> @@ -66,22 +170,42 @@
>>>   DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
>>> -#ifdef CONFIG_PPC_KUAP
>>> +#ifdef CONFIG_PPC_PKEY
>>>   #include <asm/mmu.h>
>>>   #include <asm/ptrace.h>
>>> -static inline void kuap_restore_amr(struct pt_regs *regs, unsigned long amr)
>>> +static inline void kuap_restore_user_amr(struct pt_regs *regs)
>>
>> While we are at changing the function's names, could we remove the _amr from the names in order to 
>> make it more generic and allow to re-use that name when we migrate PPC32 to C interrupt/syscall 
>> entries/exits ? (see 
>> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/302a0e88e15ce4569d9619631b36248041d5ed27.1586196948.git.christophe.leroy@c-s.fr/) 
>>
> 
> How do you suggest we rename it? kuap_restore_user is a bit ambiguous right?
> 

I have never been really good at finding names.

What about kuap_user_restore() and kuap_kernel_restore() ?

^ permalink raw reply

* Re: [PATCH V3 3/5] ocxl: Update the Process Element Entry
From: Frederic Barrat @ 2020-11-25 14:08 UTC (permalink / raw)
  To: Christophe Lombard, linuxppc-dev, fbarrat, ajd
In-Reply-To: <20201124095838.18665-4-clombard@linux.vnet.ibm.com>



On 24/11/2020 10:58, Christophe Lombard wrote:
> To complete the MMIO based mechanism, the fields: PASID, bus, device and
> function of the Process Element Entry have to be filled. (See
> OpenCAPI Power Platform Architecture document)
> 
>                     Hypervisor Process Element Entry
> Word
>      0 1 .... 7  8  ...... 12  13 ..15  16.... 19  20 ........... 31
> 0                  OSL Configuration State (0:31)
> 1                  OSL Configuration State (32:63)
> 2               PASID                      |    Reserved
> 3       Bus   |   Device    |Function |        Reserved
> 4                             Reserved
> 5                             Reserved
> 6                               ....
> 
> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
> ---


LGTM
Acked-by: Frederic Barrat <fbarrat@linux.ibm.com>


>   drivers/misc/ocxl/context.c       | 4 +++-
>   drivers/misc/ocxl/link.c          | 4 +++-
>   drivers/misc/ocxl/ocxl_internal.h | 9 ++++++---
>   drivers/scsi/cxlflash/ocxl_hw.c   | 6 ++++--
>   include/misc/ocxl.h               | 2 +-
>   5 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
> index c21f65a5c762..9eb0d93b01c6 100644
> --- a/drivers/misc/ocxl/context.c
> +++ b/drivers/misc/ocxl/context.c
> @@ -70,6 +70,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, struct mm_struct *mm)
>   {
>   	int rc;
>   	unsigned long pidr = 0;
> +	struct pci_dev *dev;
> 
>   	// Locks both status & tidr
>   	mutex_lock(&ctx->status_mutex);
> @@ -81,8 +82,9 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr, struct mm_struct *mm)
>   	if (mm)
>   		pidr = mm->context.id;
> 
> +	dev = to_pci_dev(ctx->afu->fn->dev.parent);
>   	rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid, pidr, ctx->tidr,
> -			      amr, mm, xsl_fault_error, ctx);
> +			      amr, pci_dev_id(dev), mm, xsl_fault_error, ctx);
>   	if (rc)
>   		goto out;
> 
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index fd73d3bc0eb6..77381dda2c45 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -494,7 +494,7 @@ static u64 calculate_cfg_state(bool kernel)
>   }
> 
>   int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
> -		u64 amr, struct mm_struct *mm,
> +		u64 amr, u16 bdf, struct mm_struct *mm,
>   		void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
>   		void *xsl_err_data)
>   {
> @@ -529,6 +529,8 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
> 
>   	memset(pe, 0, sizeof(struct ocxl_process_element));
>   	pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
> +	pe->pasid = cpu_to_be32(pasid << (31 - 19));
> +	pe->bdf = cpu_to_be16(bdf);
>   	pe->lpid = cpu_to_be32(mfspr(SPRN_LPID));
>   	pe->pid = cpu_to_be32(pidr);
>   	pe->tid = cpu_to_be32(tidr);
> diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
> index 0bad0a123af6..10125a22d5a5 100644
> --- a/drivers/misc/ocxl/ocxl_internal.h
> +++ b/drivers/misc/ocxl/ocxl_internal.h
> @@ -84,13 +84,16 @@ struct ocxl_context {
> 
>   struct ocxl_process_element {
>   	__be64 config_state;
> -	__be32 reserved1[11];
> +	__be32 pasid;
> +	__be16 bdf;
> +	__be16 reserved1;
> +	__be32 reserved2[9];
>   	__be32 lpid;
>   	__be32 tid;
>   	__be32 pid;
> -	__be32 reserved2[10];
> +	__be32 reserved3[10];
>   	__be64 amr;
> -	__be32 reserved3[3];
> +	__be32 reserved4[3];
>   	__be32 software_state;
>   };
> 
> diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
> index e4e0d767b98e..244fc27215dc 100644
> --- a/drivers/scsi/cxlflash/ocxl_hw.c
> +++ b/drivers/scsi/cxlflash/ocxl_hw.c
> @@ -329,6 +329,7 @@ static int start_context(struct ocxlflash_context *ctx)
>   	struct ocxl_hw_afu *afu = ctx->hw_afu;
>   	struct ocxl_afu_config *acfg = &afu->acfg;
>   	void *link_token = afu->link_token;
> +	struct pci_dev *pdev = afu->pdev;
>   	struct device *dev = afu->dev;
>   	bool master = ctx->master;
>   	struct mm_struct *mm;
> @@ -360,8 +361,9 @@ static int start_context(struct ocxlflash_context *ctx)
>   		mm = current->mm;
>   	}
> 
> -	rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0, mm,
> -			      ocxlflash_xsl_fault, ctx);
> +	rc = ocxl_link_add_pe(link_token, ctx->pe, pid, 0, 0,
> +			      pci_dev_id(pdev), mm, ocxlflash_xsl_fault,
> +			      ctx);
>   	if (unlikely(rc)) {
>   		dev_err(dev, "%s: ocxl_link_add_pe failed rc=%d\n",
>   			__func__, rc);
> diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
> index e013736e275d..3ed736da02c8 100644
> --- a/include/misc/ocxl.h
> +++ b/include/misc/ocxl.h
> @@ -447,7 +447,7 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle);
>    * defined
>    */
>   int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
> -		u64 amr, struct mm_struct *mm,
> +		u64 amr, u16 bdf, struct mm_struct *mm,
>   		void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
>   		void *xsl_err_data);
> 

^ permalink raw reply

* Re: [PATCH v2 1/2] genirq: add an irq_create_mapping_affinity() function
From: Laurent Vivier @ 2020-11-25 14:09 UTC (permalink / raw)
  To: Thomas Gleixner, linux-kernel
  Cc: Marc Zyngier, Michael S . Tsirkin, linux-pci, Greg Kurz,
	linux-block, Paul Mackerras, linuxppc-dev, Christoph Hellwig
In-Reply-To: <87sg8xk1yi.fsf@nanos.tec.linutronix.de>

On 25/11/2020 14:20, Thomas Gleixner wrote:
> Laurent,
> 
> On Wed, Nov 25 2020 at 12:16, Laurent Vivier wrote:
> 
> The proper subsystem prefix is: 'genirq/irqdomain:' and the first letter
> after the colon wants to be uppercase.

Ok.

>> This function adds an affinity parameter to irq_create_mapping().
>> This parameter is needed to pass it to irq_domain_alloc_descs().
> 
> A changelog has to explain the WHY. 'The parameter is needed' is not
> really useful information.
> 

The reason of this change is explained in PATCH 2.

I have two patches, one to change the interface with no functional change (PATCH 1) and
one to fix the problem (PATCH 2). Moreover they don't cover the same subsystems.

I can either:
- merge the two patches
- or make a reference in the changelog of PATCH 1 to PATCH 2
  (something like "(see folowing patch "powerpc/pseries: pass MSI affinity to
   irq_create_mapping()")")
- or copy some information from PATCH 2
  (something like "this parameter is needed by rtas_setup_msi_irqs() to pass the affinity
   to irq_domain_alloc_descs() to fix multiqueue affinity")

What do you prefer?

Thanks,
Laurent


^ permalink raw reply

* Re: [PATCH V3 2/5] ocxl: Initiate a TLB invalidate command
From: Frederic Barrat @ 2020-11-25 14:06 UTC (permalink / raw)
  To: Christophe Lombard, linuxppc-dev, fbarrat, ajd
In-Reply-To: <20201124095838.18665-3-clombard@linux.vnet.ibm.com>



On 24/11/2020 10:58, Christophe Lombard wrote:
> When a TLB Invalidate is required for the Logical Partition, the following
> sequence has to be performed:
> 
> 1. Load MMIO ATSD AVA register with the necessary value, if required.
> 2. Write the MMIO ATSD launch register to initiate the TLB Invalidate
> command.
> 3. Poll the MMIO ATSD status register to determine when the TLB Invalidate
>     has been completed.
> 
> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
> ---
>   arch/powerpc/include/asm/pnv-ocxl.h   | 51 +++++++++++++++++++
>   arch/powerpc/platforms/powernv/ocxl.c | 70 +++++++++++++++++++++++++++
>   2 files changed, 121 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
> index 60c3c74427d9..9acd1fbf1197 100644
> --- a/arch/powerpc/include/asm/pnv-ocxl.h
> +++ b/arch/powerpc/include/asm/pnv-ocxl.h
> @@ -3,12 +3,59 @@
>   #ifndef _ASM_PNV_OCXL_H
>   #define _ASM_PNV_OCXL_H
> 
> +#include <linux/bitfield.h>
>   #include <linux/pci.h>
> 
>   #define PNV_OCXL_TL_MAX_TEMPLATE        63
>   #define PNV_OCXL_TL_BITS_PER_RATE       4
>   #define PNV_OCXL_TL_RATE_BUF_SIZE       ((PNV_OCXL_TL_MAX_TEMPLATE+1) * PNV_OCXL_TL_BITS_PER_RATE / 8)
> 
> +#define PNV_OCXL_ATSD_TIMEOUT		1
> +
> +/* TLB Management Instructions */
> +#define PNV_OCXL_ATSD_LNCH		0x00
> +/* Radix Invalidate */
> +#define   PNV_OCXL_ATSD_LNCH_R		PPC_BIT(0)
> +/* Radix Invalidation Control
> + * 0b00 Just invalidate TLB.
> + * 0b01 Invalidate just Page Walk Cache.
> + * 0b10 Invalidate TLB, Page Walk Cache, and any
> + * caching of Partition and Process Table Entries.
> + */
> +#define   PNV_OCXL_ATSD_LNCH_RIC	PPC_BITMASK(1, 2)
> +/* Number and Page Size of translations to be invalidated */
> +#define   PNV_OCXL_ATSD_LNCH_LP		PPC_BITMASK(3, 10)
> +/* Invalidation Criteria
> + * 0b00 Invalidate just the target VA.
> + * 0b01 Invalidate matching PID.
> + */
> +#define   PNV_OCXL_ATSD_LNCH_IS		PPC_BITMASK(11, 12)
> +/* 0b1: Process Scope, 0b0: Partition Scope */
> +#define   PNV_OCXL_ATSD_LNCH_PRS	PPC_BIT(13)
> +/* Invalidation Flag */
> +#define   PNV_OCXL_ATSD_LNCH_B		PPC_BIT(14)
> +/* Actual Page Size to be invalidated
> + * 000 4KB
> + * 101 64KB
> + * 001 2MB
> + * 010 1GB
> + */
> +#define   PNV_OCXL_ATSD_LNCH_AP		PPC_BITMASK(15, 17)
> +/* Defines the large page select
> + * L=0b0 for 4KB pages
> + * L=0b1 for large pages)
> + */
> +#define   PNV_OCXL_ATSD_LNCH_L		PPC_BIT(18)
> +/* Process ID */
> +#define   PNV_OCXL_ATSD_LNCH_PID	PPC_BITMASK(19, 38)
> +/* NoFlush – Assumed to be 0b0 */
> +#define   PNV_OCXL_ATSD_LNCH_F		PPC_BIT(39)
> +#define   PNV_OCXL_ATSD_LNCH_OCAPI_SLBI	PPC_BIT(40)
> +#define   PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON	PPC_BIT(41)
> +#define PNV_OCXL_ATSD_AVA		0x08
> +#define   PNV_OCXL_ATSD_AVA_AVA		PPC_BITMASK(0, 51)
> +#define PNV_OCXL_ATSD_STAT		0x10
> +
>   int pnv_ocxl_get_actag(struct pci_dev *dev, u16 *base, u16 *enabled, u16 *supported);
>   int pnv_ocxl_get_pasid_count(struct pci_dev *dev, int *count);
> 
> @@ -31,4 +78,8 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);
>   int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
>   		      uint64_t lpcr, void __iomem **arva);
>   void pnv_ocxl_unmap_lpar(void __iomem *arva);
> +void pnv_ocxl_tlb_invalidate(void __iomem *arva,
> +			     unsigned long pid,
> +			     unsigned long addr,
> +			     unsigned long page_size);
>   #endif /* _ASM_PNV_OCXL_H */
> diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
> index 57fc1062677b..f665846d2b28 100644
> --- a/arch/powerpc/platforms/powernv/ocxl.c
> +++ b/arch/powerpc/platforms/powernv/ocxl.c
> @@ -528,3 +528,73 @@ void pnv_ocxl_unmap_lpar(void __iomem *arva)
>   	iounmap(arva);
>   }
>   EXPORT_SYMBOL_GPL(pnv_ocxl_unmap_lpar);
> +
> +void pnv_ocxl_tlb_invalidate(void __iomem *arva,
> +			     unsigned long pid,
> +			     unsigned long addr,
> +			     unsigned long page_size)
> +{
> +	unsigned long timeout = jiffies + (HZ * PNV_OCXL_ATSD_TIMEOUT);
> +	u64 val = 0ull;
> +	int pend;
> +	u8 size;
> +
> +	if (!(arva))
> +		return;
> +
> +	if (addr) {
> +		/* load Abbreviated Virtual Address register with
> +		 * the necessary value
> +		 */
> +		val |= FIELD_PREP(PNV_OCXL_ATSD_AVA_AVA, addr >> (63-51));
> +		out_be64(arva + PNV_OCXL_ATSD_AVA, val);
> +	}
> +
> +	/* Write access initiates a shoot down to initiate the
> +	 * TLB Invalidate command
> +	 */
> +	val = PNV_OCXL_ATSD_LNCH_R;
> +	if (addr) {
> +		val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b00);
> +		val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b00);
> +	} else {
> +		val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_RIC, 0b10);
> +		val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_IS, 0b01);
> +		val |= PNV_OCXL_ATSD_LNCH_OCAPI_SINGLETON;
> +	}
> +	val |= PNV_OCXL_ATSD_LNCH_PRS;
> +	/* Actual Page Size to be invalidated
> +	 * 000 4KB
> +	 * 101 64KB
> +	 * 001 2MB
> +	 * 010 1GB
> +	 */
> +	size = 0b101;
> +	if (page_size == 0x10000)
> +		size = 0b000;


Ooops, typo, we want page_size == 0x1000

   Fred


> +	if (page_size == 0x200000)
> +		size = 0b001;
> +	if (page_size == 0x40000000)
> +		size = 0b010;
> +	val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_AP, size);
> +	val |= FIELD_PREP(PNV_OCXL_ATSD_LNCH_PID, pid);
> +	out_be64(arva + PNV_OCXL_ATSD_LNCH, val);
> +
> +	/* Poll the ATSD status register to determine when the
> +	 * TLB Invalidate has been completed.
> +	 */
> +	val = in_be64(arva + PNV_OCXL_ATSD_STAT);
> +	pend = val >> 63;
> +
> +	while (pend) {
> +		if (time_after_eq(jiffies, timeout)) {
> +			pr_err("%s - Timeout while reading XTS MMIO ATSD status register (val=%#llx, pidr=0x%lx)\n",
> +			       __func__, val, pid);
> +			return;
> +		}
> +		cpu_relax();
> +		val = in_be64(arva + PNV_OCXL_ATSD_STAT);
> +		pend = val >> 63;
> +	}
> +}
> +EXPORT_SYMBOL_GPL(pnv_ocxl_tlb_invalidate);
> 

^ permalink raw reply

* Re: [PATCH v6 16/22] powerpc/book3s64/kuap: Improve error reporting with KUAP
From: Christophe Leroy @ 2020-11-25 14:04 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, mpe
In-Reply-To: <20201125051634.509286-17-aneesh.kumar@linux.ibm.com>



Le 25/11/2020 à 06:16, Aneesh Kumar K.V a écrit :
> With hash translation use DSISR_KEYFAULT to identify a wrong access.
> With Radix we look at the AMR value and type of fault.
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   arch/powerpc/include/asm/book3s/32/kup.h     |  4 +--
>   arch/powerpc/include/asm/book3s/64/kup.h     | 27 ++++++++++++++++----
>   arch/powerpc/include/asm/kup.h               |  4 +--
>   arch/powerpc/include/asm/nohash/32/kup-8xx.h |  4 +--
>   arch/powerpc/mm/fault.c                      |  2 +-
>   5 files changed, 29 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/32/kup.h b/arch/powerpc/include/asm/book3s/32/kup.h
> index 32fd4452e960..b18cd931e325 100644
> --- a/arch/powerpc/include/asm/book3s/32/kup.h
> +++ b/arch/powerpc/include/asm/book3s/32/kup.h
> @@ -177,8 +177,8 @@ static inline void restore_user_access(unsigned long flags)
>   		allow_user_access(to, to, end - addr, KUAP_READ_WRITE);
>   }
>   
> -static inline bool
> -bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
> +static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
> +				  bool is_write, unsigned long error_code)
>   {
>   	unsigned long begin = regs->kuap & 0xf0000000;
>   	unsigned long end = regs->kuap << 28;
> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
> index 4a3d0d601745..2922c442a218 100644
> --- a/arch/powerpc/include/asm/book3s/64/kup.h
> +++ b/arch/powerpc/include/asm/book3s/64/kup.h
> @@ -301,12 +301,29 @@ static inline void set_kuap(unsigned long value)
>   	isync();
>   }
>   
> -static inline bool
> -bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
> +#define RADIX_KUAP_BLOCK_READ	UL(0x4000000000000000)
> +#define RADIX_KUAP_BLOCK_WRITE	UL(0x8000000000000000)
> +
> +static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
> +				  bool is_write, unsigned long error_code)
>   {
> -	return WARN(mmu_has_feature(MMU_FTR_KUAP) &&
> -		    (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)),
> -		    "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read");
> +	if (!mmu_has_feature(MMU_FTR_KUAP))
> +		return false;
> +
> +	if (radix_enabled()) {
> +		/*
> +		 * Will be a storage protection fault.
> +		 * Only check the details of AMR[0]
> +		 */
> +		return WARN((regs->kuap & (is_write ? RADIX_KUAP_BLOCK_WRITE : RADIX_KUAP_BLOCK_READ)),
> +			    "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read");

I think it is pointless to keep the WARN() here.

I have a series aiming at removing them. See 
https://patchwork.ozlabs.org/project/linuxppc-dev/patch/cc9129bdda1dbc2f0a09cf45fece7d0b0e690784.1605541983.git.christophe.leroy@csgroup.eu/

> +	}
> +	/*
> +	 * We don't want to WARN here because userspace can setup
> +	 * keys such that a kernel access to user address can cause
> +	 * fault
> +	 */
> +	return !!(error_code & DSISR_KEYFAULT);
>   }
>   
>   static __always_inline void allow_user_access(void __user *to, const void __user *from,
> diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
> index a06e50b68d40..952be0414f43 100644
> --- a/arch/powerpc/include/asm/kup.h
> +++ b/arch/powerpc/include/asm/kup.h
> @@ -59,8 +59,8 @@ void setup_kuap(bool disabled);
>   #else
>   static inline void setup_kuap(bool disabled) { }
>   
> -static inline bool
> -bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
> +static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
> +				  bool is_write, unsigned long error_code)
>   {
>   	return false;
>   }
> diff --git a/arch/powerpc/include/asm/nohash/32/kup-8xx.h b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
> index 567cdc557402..7bdd9e5b63ed 100644
> --- a/arch/powerpc/include/asm/nohash/32/kup-8xx.h
> +++ b/arch/powerpc/include/asm/nohash/32/kup-8xx.h
> @@ -60,8 +60,8 @@ static inline void restore_user_access(unsigned long flags)
>   	mtspr(SPRN_MD_AP, flags);
>   }
>   
> -static inline bool
> -bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
> +static inline bool bad_kuap_fault(struct pt_regs *regs, unsigned long address,
> +				  bool is_write, unsigned long error_code)
>   {
>   	return WARN(!((regs->kuap ^ MD_APG_KUAP) & 0xff000000),
>   		    "Bug: fault blocked by AP register !");
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 0add963a849b..c91621df0c61 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -227,7 +227,7 @@ static bool bad_kernel_fault(struct pt_regs *regs, unsigned long error_code,
>   
>   	// Read/write fault in a valid region (the exception table search passed
>   	// above), but blocked by KUAP is bad, it can never succeed.
> -	if (bad_kuap_fault(regs, address, is_write))
> +	if (bad_kuap_fault(regs, address, is_write, error_code))
>   		return true;
>   
>   	// What's left? Kernel fault on user in well defined regions (extable
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox