LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 5.4 22/40] ibmvnic: continue to init in CRQ reset returns H_CLOSED
From: Sasha Levin @ 2020-07-02  1:23 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dany Madden, Sasha Levin, linuxppc-dev, David S . Miller, netdev
In-Reply-To: <20200702012402.2701121-1-sashal@kernel.org>

From: Dany Madden <drt@linux.ibm.com>

[ Upstream commit 8b40eb73509f5704a0e8cd25de0163876299f1a7 ]

Continue the reset path when partner adapter is not ready or H_CLOSED is
returned from reset crq. This patch allows the CRQ init to proceed to
establish a valid CRQ for traffic to flow after reset.

Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 5a42ddeecfe50..143a9722ad11a 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1865,13 +1865,18 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 			release_sub_crqs(adapter, 1);
 		} else {
 			rc = ibmvnic_reset_crq(adapter);
-			if (!rc)
+			if (rc == H_CLOSED || rc == H_SUCCESS) {
 				rc = vio_enable_interrupts(adapter->vdev);
+				if (rc)
+					netdev_err(adapter->netdev,
+						   "Reset failed to enable interrupts. rc=%d\n",
+						   rc);
+			}
 		}
 
 		if (rc) {
 			netdev_err(adapter->netdev,
-				   "Couldn't initialize crq. rc=%d\n", rc);
+				   "Reset couldn't initialize crq. rc=%d\n", rc);
 			goto out;
 		}
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.7 29/53] powerpc/kvm/book3s64: Fix kernel crash with nested kvm & DEBUG_VIRTUAL
From: Sasha Levin @ 2020-07-02  1:21 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Sasha Levin, Aneesh Kumar K.V, linuxppc-dev, kvm-ppc
In-Reply-To: <20200702012202.2700645-1-sashal@kernel.org>

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>

[ Upstream commit c1ed1754f271f6b7acb1bfdc8cfb62220fbed423 ]

With CONFIG_DEBUG_VIRTUAL=y, __pa() checks for addr value and if it's
less than PAGE_OFFSET it leads to a BUG().

  #define __pa(x)
  ({
  	VIRTUAL_BUG_ON((unsigned long)(x) < PAGE_OFFSET);
  	(unsigned long)(x) & 0x0fffffffffffffffUL;
  })

  kernel BUG at arch/powerpc/kvm/book3s_64_mmu_radix.c:43!
  cpu 0x70: Vector: 700 (Program Check) at [c0000018a2187360]
      pc: c000000000161b30: __kvmhv_copy_tofrom_guest_radix+0x130/0x1f0
      lr: c000000000161d5c: kvmhv_copy_from_guest_radix+0x3c/0x80
  ...
  kvmhv_copy_from_guest_radix+0x3c/0x80
  kvmhv_load_from_eaddr+0x48/0xc0
  kvmppc_ld+0x98/0x1e0
  kvmppc_load_last_inst+0x50/0x90
  kvmppc_hv_emulate_mmio+0x288/0x2b0
  kvmppc_book3s_radix_page_fault+0xd8/0x2b0
  kvmppc_book3s_hv_page_fault+0x37c/0x1050
  kvmppc_vcpu_run_hv+0xbb8/0x1080
  kvmppc_vcpu_run+0x34/0x50
  kvm_arch_vcpu_ioctl_run+0x2fc/0x410
  kvm_vcpu_ioctl+0x2b4/0x8f0
  ksys_ioctl+0xf4/0x150
  sys_ioctl+0x28/0x80
  system_call_exception+0x104/0x1d0
  system_call_common+0xe8/0x214

kvmhv_copy_tofrom_guest_radix() uses a NULL value for to/from to
indicate direction of copy.

Avoid calling __pa() if the value is NULL to avoid the BUG().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Massage change log a bit to mention CONFIG_DEBUG_VIRTUAL]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200611120159.680284-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index bc6c1aa3d0e92..ef40addd52c65 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -40,7 +40,8 @@ unsigned long __kvmhv_copy_tofrom_guest_radix(int lpid, int pid,
 	/* Can't access quadrants 1 or 2 in non-HV mode, call the HV to do it */
 	if (kvmhv_on_pseries())
 		return plpar_hcall_norets(H_COPY_TOFROM_GUEST, lpid, pid, eaddr,
-					  __pa(to), __pa(from), n);
+					  (to != NULL) ? __pa(to): 0,
+					  (from != NULL) ? __pa(from): 0, n);
 
 	quadrant = 1;
 	if (!pid)
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.7 28/53] ibmvnic: continue to init in CRQ reset returns H_CLOSED
From: Sasha Levin @ 2020-07-02  1:21 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dany Madden, Sasha Levin, linuxppc-dev, David S . Miller, netdev
In-Reply-To: <20200702012202.2700645-1-sashal@kernel.org>

From: Dany Madden <drt@linux.ibm.com>

[ Upstream commit 8b40eb73509f5704a0e8cd25de0163876299f1a7 ]

Continue the reset path when partner adapter is not ready or H_CLOSED is
returned from reset crq. This patch allows the CRQ init to proceed to
establish a valid CRQ for traffic to flow after reset.

Signed-off-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 1b4d04e4474bb..2dbcbdbb0e4ad 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1958,13 +1958,18 @@ static int do_reset(struct ibmvnic_adapter *adapter,
 			release_sub_crqs(adapter, 1);
 		} else {
 			rc = ibmvnic_reset_crq(adapter);
-			if (!rc)
+			if (rc == H_CLOSED || rc == H_SUCCESS) {
 				rc = vio_enable_interrupts(adapter->vdev);
+				if (rc)
+					netdev_err(adapter->netdev,
+						   "Reset failed to enable interrupts. rc=%d\n",
+						   rc);
+			}
 		}
 
 		if (rc) {
 			netdev_err(adapter->netdev,
-				   "Couldn't initialize crq. rc=%d\n", rc);
+				   "Reset couldn't initialize crq. rc=%d\n", rc);
 			goto out;
 		}
 
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH v2 5/6] powerpc/pseries/iommu: Make use of DDW even if it does not map the partition
From: Leonardo Bras @ 2020-07-02  0:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <2c5dc8d2-f379-5a5f-844a-f4eea233f265@ozlabs.ru>

On Thu, 2020-07-02 at 10:31 +1000, Alexey Kardashevskiy wrote:
> > In fact, there is a lot of places in this file where it's called direct
> > window. Should I replace everything?
> > Should it be in a separated patch?
> 
> If it looks simple and you write a nice commit log explaining all that
> and why you are not reusing the existing ibm,dma-window property 
> for that - sure, do it :)

Nice, I will do that :)

> (to provide a clue what "reset" will reset to? is there any other
> reason?)

That's the main reason here. 

The way I perceive this, ibm,dma-window should only point to the
default DMA window, which is guaranteed to always be the same, even if
it's destroyed and re-created. So there I see no point destroying /
overwriting it.

On the other hand, I also thought about using a new node name for this
window, but it would be very troublesome and I could see no real gain.

Thanks !


^ permalink raw reply

* Re: Memory:  880608K/983040K  .... 36896K reserved ?
From: Michael Ellerman @ 2020-07-02  0:52 UTC (permalink / raw)
  To: Joakim Tjernlund, linuxppc-dev@ozlabs.org
In-Reply-To: <a598dcaea9e62379c74d1d78083d92898373c4de.camel@infinera.com>

Joakim Tjernlund <Joakim.Tjernlund@infinera.com> writes:
> I cannot figure out how the xxxK reserved item works in:
>  Memory: 880608K/983040K available (9532K kernel code, 1104K rwdata, 3348K rodata, 1088K init, 1201K bss, 36896K reserved ...

It's calculated as:

	(physpages - totalram_pages() - totalcma_pages)

The naming is a bit historical I guess.

But roughly physpages is the total number of pages of RAM we think exist
in the system.

totalram_pages() is the total number of pages that have been freed to
the buddy allocator.

totalcma_pages is pages used by CMA which is probably 0 for you.

So the amount "reserved" is the memory that hasn't been freed to the
buddy allocator by memblock.

You should be able to see it in debugfs:

# cat /sys/kernel/debug/memblock/reserved 
   0: 0x0000000000000000..0x0000000002b40e57
   1: 0x0000000002b41000..0x0000000002b413ff
   2: 0x0000000002b50000..0x0000000002baffff
   3: 0x000000000a910000..0x000000000e93ffff
   4: 0x000000000fe80000..0x000000000fe9ffff
   5: 0x000000000feac000..0x000000000ffebfff
   6: 0x000000000ffed400..0x000000000ffed7ff
   7: 0x000000000ffeda80..0x000000000ffeebff
   8: 0x000000000ffeee80..0x000000000ffeffff
   9: 0x000000000fff0280..0x000000000fff13ff
   ...

> Is there a way to tune(lower it) this memory?

Some or most of those reserved regions will be things your firmware told
you to reserve, so you need to work out what each region is. They might
be firmware things, or even holes in RAM, you need to dig deeper to find
out what is what.

cheers

^ permalink raw reply

* Re: [PATCH v2 1/6] powerpc/pseries/iommu: Create defines for operations in ibm,ddw-applicable
From: Leonardo Bras @ 2020-07-02  0:46 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <0d3ee444-f528-673e-48f9-633138398543@ozlabs.ru>

On Thu, 2020-07-02 at 10:43 +1000, Alexey Kardashevskiy wrote:
> > Or should one stick to #define in this case?
> 
> imho a matter of taste but after some grepping it feels like #define is
> mostly used which does not mean it is a good idea. Keep it enum and see
> if it passed mpe's filter :)

Good idea :)

Thanks !


^ permalink raw reply

* Re: [PATCH v2 1/6] powerpc/pseries/iommu: Create defines for operations in ibm,ddw-applicable
From: Alexey Kardashevskiy @ 2020-07-02  0:43 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <f1f0563dae4c81620b53bcc258f2960a7948a583.camel@gmail.com>



On 02/07/2020 10:36, Leonardo Bras wrote:
> On Thu, 2020-07-02 at 10:21 +1000, Alexey Kardashevskiy wrote:
>>> enum {
>>>        DDW_QUERY_PE_DMA_WIN,
>>>        DDW_CREATE_PE_DMA_WIN,
>>>        DDW_REMOVE_PE_DMA_WIN,
>>>
>>>        DDW_APPLICABLE_SIZE
>>> }
>>> IMO, it looks better than all the defines before.
>>>
>>> What do you think?
>>
>> No, not really, these come from a binary interface so the reader of this
>> cares about absolute numbers and rather wants to see them explicitly.
> 
> Makes sense to me.
> I am still getting experience on where to use enum vs define. Thanks
> for the tip!
> 
> Using something like 
> enum {
> 	DDW_QUERY_PE_DMA_WIN = 0,
> 	DDW_CREATE_PE_DMA_WIN = 1,
> 	DDW_REMOVE_PE_DMA_WIN = 2,
> 
> 	DDW_APPLICABLE_SIZE
> };
> 
> would be fine too?


This is fine too.


> Or should one stick to #define in this case?

imho a matter of taste but after some grepping it feels like #define is
mostly used which does not mean it is a good idea. Keep it enum and see
if it passed mpe's filter :)



-- 
Alexey

^ permalink raw reply

* Re: [PATCH v2 1/6] powerpc/pseries/iommu: Create defines for operations in ibm,ddw-applicable
From: Leonardo Bras @ 2020-07-02  0:36 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <a884da45-7778-95cf-d65b-a6c82d2024a7@ozlabs.ru>

On Thu, 2020-07-02 at 10:21 +1000, Alexey Kardashevskiy wrote:
> > enum {
> >        DDW_QUERY_PE_DMA_WIN,
> >        DDW_CREATE_PE_DMA_WIN,
> >        DDW_REMOVE_PE_DMA_WIN,
> > 
> >        DDW_APPLICABLE_SIZE
> > }
> > IMO, it looks better than all the defines before.
> > 
> > What do you think?
> 
> No, not really, these come from a binary interface so the reader of this
> cares about absolute numbers and rather wants to see them explicitly.

Makes sense to me.
I am still getting experience on where to use enum vs define. Thanks
for the tip!

Using something like 
enum {
	DDW_QUERY_PE_DMA_WIN = 0,
	DDW_CREATE_PE_DMA_WIN = 1,
	DDW_REMOVE_PE_DMA_WIN = 2,

	DDW_APPLICABLE_SIZE
};

would be fine too?
Or should one stick to #define in this case?

Thank you,


^ permalink raw reply

* Re: [PATCH v2 5/6] powerpc/pseries/iommu: Make use of DDW even if it does not map the partition
From: Alexey Kardashevskiy @ 2020-07-02  0:31 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <0c3de45dfb612745aa2ee4126b3935303d8e8704.camel@gmail.com>



On 02/07/2020 09:48, Leonardo Bras wrote:
> On Wed, 2020-07-01 at 16:57 -0300, Leonardo Bras wrote:
>>> It is not necessarily "direct" anymore as the name suggests, you may
>>> want to change that. DMA64_PROPNAME, may be. Thanks,
>>>
>>
>> Yeah, you are right.
>> I will change this for next version, also changing the string name to
>> reflect this.
>>
>> -#define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
>> +#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
>>
>> Is that ok?
>>
>> Thank you for helping!
> 
> In fact, there is a lot of places in this file where it's called direct
> window. Should I replace everything?
> Should it be in a separated patch?

If it looks simple and you write a nice commit log explaining all that
and why you are not reusing the existing ibm,dma-window property (to
provide a clue what "reset" will reset to? is there any other reason?)
for that - sure, do it :)



-- 
Alexey

^ permalink raw reply

* Re: [PATCH v2 2/6] powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows
From: Leonardo Bras @ 2020-07-02  0:28 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <47ad4907-5f63-56e1-2985-a7a7f4d0ba35@ozlabs.ru>

On Thu, 2020-07-02 at 10:18 +1000, Alexey Kardashevskiy wrote:
> 
> On 02/07/2020 00:04, Leonardo Bras wrote:
> > On Wed, 2020-07-01 at 18:17 +1000, Alexey Kardashevskiy wrote:
> > > > +#define DDW_EXT_SIZE		0
> > > > +#define DDW_EXT_RESET_DMA_WIN	1
> > > > +#define DDW_EXT_QUERY_OUT_SIZE	2
> > > 
> > > #define DDW_EXT_LAST (DDW_EXT_QUERY_OUT_SIZE + 1)
> > > ...
> > > 
> > > 
> > > > +
> > > >  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
> > > >  {
> > > >  	struct iommu_table_group *table_group;
> > > > @@ -339,7 +343,7 @@ struct direct_window {
> > > >  /* Dynamic DMA Window support */
> > > >  struct ddw_query_response {
> > > >  	u32 windows_available;
> > > > -	u32 largest_available_block;
> > > > +	u64 largest_available_block;
> > > >  	u32 page_size;
> > > >  	u32 migration_capable;
> > > >  };
> > > > @@ -875,13 +879,29 @@ static int find_existing_ddw_windows(void)
> > > >  machine_arch_initcall(pseries, find_existing_ddw_windows);
> > > >  
> > > >  static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> > > > -			struct ddw_query_response *query)
> > > > +		     struct ddw_query_response *query,
> > > > +		     struct device_node *parent)
> > > >  {
> > > >  	struct device_node *dn;
> > > >  	struct pci_dn *pdn;
> > > > -	u32 cfg_addr;
> > > > +	u32 cfg_addr, query_out[5], ddw_ext[DDW_EXT_QUERY_OUT_SIZE + 1];
> > > 
> > > ... and use DDW_EXT_LAST here.
> > 
> > Because of the growing nature of ddw-extensions, I intentionally let
> > this be (DDW_EXT_QUERY_OUT_SIZE + 1). If we create a DDW_EXT_LAST, it
> > will be incremented in the future if more extensions come to exist.
> > 
> > I mean, I previously saw no reason for allocating space for extensions
> > after the desired one, as they won't be used here.
> 
> Ah, my bad, you're right.
> 
> 
> > > 
> > > >  	u64 buid;
> > > > -	int ret;
> > > > +	int ret, out_sz;
> > > > +
> > > > +	/*
> > > > +	 * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule how many
> > > > +	 * output parameters ibm,query-pe-dma-windows will have, ranging from
> > > > +	 * 5 to 6.
> > > > +	 */
> > > > +
> > > > +	ret = of_property_read_u32_array(parent, "ibm,ddw-extensions",
> > > > +					 &ddw_ext[0],
> > > > +					 DDW_EXT_QUERY_OUT_SIZE + 1);
> > 
> > In this case, I made sure not to cross (DDW_EXT_QUERY_OUT_SIZE + 1)
> > while reading the extensions from the property.
> > 
> > What do you think about it? 
> 
> I think you want something like:
> 
> static inline int ddw_read_ext(const struct device_node *np, int extnum,
> u32 *ret)
> {
> retun of_property_read_u32_index(np, "ibm,ddw-extensions", extnum + 1, ret);
> }
> 
> These "+1"'s all over the place are confusing.

That's a great idea!

I was not aware it was possible to read a single value[index] directly
from the property, but it makes total sense to use it.

Thank you!


^ permalink raw reply

* Re: [PATCH v2 4/6] powerpc/pseries/iommu: Remove default DMA window before creating DDW
From: Alexey Kardashevskiy @ 2020-07-02  0:25 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <42e7174bf60227caee4d1c353235e42b90305632.camel@gmail.com>



On 02/07/2020 05:48, Leonardo Bras wrote:
> On Wed, 2020-07-01 at 18:17 +1000, Alexey Kardashevskiy wrote:
>>
>> On 24/06/2020 16:24, Leonardo Bras wrote:
>>> On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
>>> default DMA window for the device, before attempting to configure a DDW,
>>> in order to make the maximum resources available for the next DDW to be
>>> created.
>>>
>>> This is a requirement for some devices to use DDW, given they only
>>> allow one DMA window.
>>
>> Devices never know about these windows, it is purely PHB's side of
>> things. A device can access any address on the bus, the bus can generate
>> an exception if there is no window behind the address OR some other
>> device's MMIO. We could actually create a second window in addition to
>> the first one and allocate bus addresses from both, we just simplifying
>> this by merging two separate non-adjacent windows into one.
> 
> That's interesting, I was not aware of this. 
> I will try to improve this commit message with this info.
> Thanks for sharing!
> 
>>>>> If setting up a new DDW fails anywhere after the removal of this
>>> default DMA window, it's needed to restore the default DMA window.
>>> For this, an implementation of ibm,reset-pe-dma-windows rtas call is
>>> needed:
>>>
>>> Platforms supporting the DDW option starting with LoPAR level 2.7 implement
>>> ibm,ddw-extensions. The first extension available (index 2) carries the
>>> token for ibm,reset-pe-dma-windows rtas call, which is used to restore
>>> the default DMA window for a device, if it has been deleted.
>>>
>>> It does so by resetting the TCE table allocation for the PE to it's
>>> boot time value, available in "ibm,dma-window" device tree node.
>>>
>>> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
>>> ---
>>>  arch/powerpc/platforms/pseries/iommu.c | 70 ++++++++++++++++++++++----
>>>  1 file changed, 61 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>>> index a8840d9e1c35..4fcf00016fb1 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -1029,6 +1029,39 @@ static phys_addr_t ddw_memory_hotplug_max(void)
>>>  	return max_addr;
>>>  }
>>>  
>>> +/*
>>> + * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
>>> + * ibm,ddw-extensions, which carries the rtas token for
>>> + * ibm,reset-pe-dma-windows.
>>> + * That rtas-call can be used to restore the default DMA window for the device.
>>> + */
>>> +static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
>>> +{
>>> +	int ret;
>>> +	u32 cfg_addr, ddw_ext[DDW_EXT_RESET_DMA_WIN + 1];
>>> +	u64 buid;
>>> +	struct device_node *dn;
>>> +	struct pci_dn *pdn;
>>> +
>>> +	ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
>>> +					 &ddw_ext[0], DDW_EXT_RESET_DMA_WIN + 1);
>>> +	if (ret)
>>> +		return;
>>> +
>>> +	dn = pci_device_to_OF_node(dev);
>>> +	pdn = PCI_DN(dn);
>>> +	buid = pdn->phb->buid;
>>> +	cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
>>> +
>>> +	ret = rtas_call(ddw_ext[DDW_EXT_RESET_DMA_WIN], 3, 1, NULL, cfg_addr,
>>> +			BUID_HI(buid), BUID_LO(buid));
>>> +	if (ret)
>>> +		dev_info(&dev->dev,
>>> +			 "ibm,reset-pe-dma-windows(%x) %x %x %x returned %d ",
>>> +			 ddw_ext[1], cfg_addr, BUID_HI(buid), BUID_LO(buid),
>>
>> s/ddw_ext[1]/ddw_ext[DDW_EXT_RESET_DMA_WIN]/
> 
> Good catch! I missed this one.
> 
>>
>>
>>> +			 ret);
>>> +}
>>> +
>>>  /*
>>>   * If the PE supports dynamic dma windows, and there is space for a table
>>>   * that can map all pages in a linear offset, then setup such a table,
>>> @@ -1049,8 +1082,9 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>>>  	u64 dma_addr, max_addr;
>>>  	struct device_node *dn;
>>>  	u32 ddw_avail[DDW_APPLICABLE_SIZE];
>>> +
>>
>> Unrelated new empty line.
> 
> Fixed!
> 
>>
>>
>>>  	struct direct_window *window;
>>> -	struct property *win64;
>>> +	struct property *win64, *default_win = NULL, *ddw_ext = NULL;
>>>  	struct dynamic_dma_window_prop *ddwprop;
>>>  	struct failed_ddw_pdn *fpdn;
>>>  
>>> @@ -1085,7 +1119,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>>>  	if (ret)
>>>  		goto out_failed;
>>>  
>>> -       /*
>>> +	/*
>>>  	 * Query if there is a second window of size to map the
>>>  	 * whole partition.  Query returns number of windows, largest
>>>  	 * block assigned to PE (partition endpoint), and two bitmasks
>>> @@ -1096,15 +1130,31 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>>>  	if (ret != 0)
>>>  		goto out_failed;
>>>  
>>> +	/*
>>> +	 * If there is no window available, remove the default DMA window,
>>> +	 * if it's present. This will make all the resources available to the
>>> +	 * new DDW window.
>>> +	 * If anything fails after this, we need to restore it, so also check
>>> +	 * for extensions presence.
>>> +	 */
>>>  	if (query.windows_available == 0) {
>>
>> Does phyp really always advertise 0 windows for these VFs? What is in
>> the largest_available_block when windows_available==0?
> 
> For this VF, it always advertise 0 windows before removing the default
> DMA window. The largest available block size is the same as after the
> removal (256GB). The only value that changes after removal is the
> number of available windows. Here some debug prints:

> 
> [    3.473149] mlx5_core 4005:01:00.0: ibm,query-pe-dma-windows(53)
> 10000 8000000 29004005 returned 0
> [    3.473162] mlx5_core 4005:01:00.0: windows_available = 0,
> largest_block = 400000, page_size = 3, migration_capable = 3
> [    3.473332] mlx5_core 4005:01:00.0: ibm,query-pe-dma-windows(53)
> 10000 8000000 29004005 returned 0
> [    3.473345] mlx5_core 4005:01:00.0: windows_available = 1,
> largest_block = 400000, page_size = 3, migration_capable = 3



Ah, I see, thanks for the info. Ok, they really do not want us to have 2
windows. Oh well.



> 
>>
>>
>>> -		/*
>>> -		 * no additional windows are available for this device.
>>> -		 * We might be able to reallocate the existing window,
>>> -		 * trading in for a larger page size.
>>> -		 */
>>> -		dev_dbg(&dev->dev, "no free dynamic windows");
>>> -		goto out_failed;
>>> +		default_win = of_find_property(pdn, "ibm,dma-window", NULL);
>>> +		ddw_ext = of_find_property(pdn, "ibm,ddw-extensions", NULL);
>>> +		if (default_win && ddw_ext)
>>> +			remove_dma_window(pdn, ddw_avail, default_win);
>>> +
>>> +		/* Query again, to check if the window is available */
>>> +		ret = query_ddw(dev, ddw_avail, &query, pdn);
>>> +		if (ret != 0)
>>> +			goto out_failed;
>>> +
>>> +		if (query.windows_available == 0) {
>>> +			/* no windows are available for this device. */
>>> +			dev_dbg(&dev->dev, "no free dynamic windows");
>>> +			goto out_failed;
>>> +		}
>>>  	}
>>> +
>>
>> Unrelated new empty line. Thanks,
> Fixed!
> Thank you!
> 
>>
>>>  	if (query.page_size & 4) {
>>>  		page_shift = 24; /* 16MB */
>>>  	} else if (query.page_size & 2) {
>>> @@ -1194,6 +1244,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>>>  	kfree(win64);
>>>  
>>>  out_failed:
>>> +	if (default_win && ddw_ext)
>>> +		reset_dma_window(dev, pdn);
>>>  
>>>  	fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
>>>  	if (!fpdn)
>>>
> 

-- 
Alexey

^ permalink raw reply

* Re: [PATCH v2 1/6] powerpc/pseries/iommu: Create defines for operations in ibm,ddw-applicable
From: Alexey Kardashevskiy @ 2020-07-02  0:21 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <01443a2f1d58a595ddff03fd14fd56f4c26171bf.camel@gmail.com>



On 01/07/2020 23:28, Leonardo Bras wrote:
> On Wed, 2020-07-01 at 18:16 +1000, Alexey Kardashevskiy wrote:
>>
>> On 24/06/2020 16:24, Leonardo Bras wrote:
>>> Create defines to help handling ibm,ddw-applicable values, avoiding
>>> confusion about the index of given operations.
>>>
>>> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
>>> ---
>>>  arch/powerpc/platforms/pseries/iommu.c | 40 +++++++++++++++-----------
>>>  1 file changed, 23 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>>> index 6d47b4a3ce39..68d2aa9c71a8 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -39,6 +39,11 @@
>>>  
>>>  #include "pseries.h"
>>>  
>>> +#define DDW_QUERY_PE_DMA_WIN	0
>>> +#define DDW_CREATE_PE_DMA_WIN	1
>>> +#define DDW_REMOVE_PE_DMA_WIN	2
>>> +#define DDW_APPLICABLE_SIZE	3
>>
>> #define DDW_APPLICABLE_SIZE  (DDW_REMOVE_PE_DMA_WIN + 1)
>>
>> thanks,
> 
> Thanks for the feedback!
> About this (and patch #2), would it be better to use enum ?
> enum {
> 	DDW_QUERY_PE_DMA_WIN,
> 	DDW_CREATE_PE_DMA_WIN,
> 	DDW_REMOVE_PE_DMA_WIN,
> 
> 	DDW_APPLICABLE_SIZE
> }
> IMO, it looks better than all the defines before.
> 
> What do you think?

No, not really, these come from a binary interface so the reader of this
cares about absolute numbers and rather wants to see them explicitly.


-- 
Alexey

^ permalink raw reply

* Re: [PATCH v2 2/6] powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows
From: Alexey Kardashevskiy @ 2020-07-02  0:18 UTC (permalink / raw)
  To: Leonardo Bras, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <fbcbc7eb298930195b7146221dc1eded6bf556e4.camel@gmail.com>



On 02/07/2020 00:04, Leonardo Bras wrote:
> On Wed, 2020-07-01 at 18:17 +1000, Alexey Kardashevskiy wrote:
>>
>>> +#define DDW_EXT_SIZE		0
>>> +#define DDW_EXT_RESET_DMA_WIN	1
>>> +#define DDW_EXT_QUERY_OUT_SIZE	2
>>
>> #define DDW_EXT_LAST (DDW_EXT_QUERY_OUT_SIZE + 1)
>> ...
>>
>>
>>> +
>>>  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
>>>  {
>>>  	struct iommu_table_group *table_group;
>>> @@ -339,7 +343,7 @@ struct direct_window {
>>>  /* Dynamic DMA Window support */
>>>  struct ddw_query_response {
>>>  	u32 windows_available;
>>> -	u32 largest_available_block;
>>> +	u64 largest_available_block;
>>>  	u32 page_size;
>>>  	u32 migration_capable;
>>>  };
>>> @@ -875,13 +879,29 @@ static int find_existing_ddw_windows(void)
>>>  machine_arch_initcall(pseries, find_existing_ddw_windows);
>>>  
>>>  static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
>>> -			struct ddw_query_response *query)
>>> +		     struct ddw_query_response *query,
>>> +		     struct device_node *parent)
>>>  {
>>>  	struct device_node *dn;
>>>  	struct pci_dn *pdn;
>>> -	u32 cfg_addr;
>>> +	u32 cfg_addr, query_out[5], ddw_ext[DDW_EXT_QUERY_OUT_SIZE + 1];
>>
>> ... and use DDW_EXT_LAST here.
> 
> Because of the growing nature of ddw-extensions, I intentionally let
> this be (DDW_EXT_QUERY_OUT_SIZE + 1). If we create a DDW_EXT_LAST, it
> will be incremented in the future if more extensions come to exist.
> 
> I mean, I previously saw no reason for allocating space for extensions
> after the desired one, as they won't be used here.

Ah, my bad, you're right.


> 
>>
>>
>>>  	u64 buid;
>>> -	int ret;
>>> +	int ret, out_sz;
>>> +
>>> +	/*
>>> +	 * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule how many
>>> +	 * output parameters ibm,query-pe-dma-windows will have, ranging from
>>> +	 * 5 to 6.
>>> +	 */
>>> +
>>> +	ret = of_property_read_u32_array(parent, "ibm,ddw-extensions",
>>> +					 &ddw_ext[0],
>>> +					 DDW_EXT_QUERY_OUT_SIZE + 1);
> 
> In this case, I made sure not to cross (DDW_EXT_QUERY_OUT_SIZE + 1)
> while reading the extensions from the property.
> 
> What do you think about it? 

I think you want something like:

static inline int ddw_read_ext(const struct device_node *np, int extnum,
u32 *ret)
{
retun of_property_read_u32_index(np, "ibm,ddw-extensions", extnum + 1, ret);
}

These "+1"'s all over the place are confusing.


-- 
Alexey

^ permalink raw reply

* Re: [PATCH v2 5/6] powerpc/pseries/iommu: Make use of DDW even if it does not map the partition
From: Leonardo Bras @ 2020-07-01 23:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <0e8bcc38614ec80c7816c07dd4dc70854c2b901d.camel@gmail.com>

On Wed, 2020-07-01 at 16:57 -0300, Leonardo Bras wrote:
> > It is not necessarily "direct" anymore as the name suggests, you may
> > want to change that. DMA64_PROPNAME, may be. Thanks,
> > 
> 
> Yeah, you are right.
> I will change this for next version, also changing the string name to
> reflect this.
> 
> -#define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
> +#define DMA64_PROPNAME "linux,dma64-ddr-window-info"
> 
> Is that ok?
> 
> Thank you for helping!

In fact, there is a lot of places in this file where it's called direct
window. Should I replace everything?
Should it be in a separated patch?

Best regards,
Leonardo


^ permalink raw reply

* Re: [PATCH] ASoC: fsl_sai: Refine regcache usage with pm runtime
From: Mark Brown @ 2020-07-01 22:22 UTC (permalink / raw)
  To: perex, alsa-devel, tiwai, Xiubo.Lee, timur, festevam,
	nicoleotsuka, Shengjiu Wang
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1593412953-10897-1-git-send-email-shengjiu.wang@nxp.com>

On Mon, 29 Jun 2020 14:42:33 +0800, Shengjiu Wang wrote:
> When there is dedicated power domain bound with device, after probing
> the power will be disabled, then registers are not accessible in
> fsl_sai_dai_probe(), so regcache only need to be enabled in end of
> probe() and regcache_mark_dirty should be moved to pm runtime resume
> callback function.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: fsl_sai: Refine regcache usage with pm runtime
      commit: d8d702e19e997cf3f172487e0659d0e68aa5ede5

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

^ permalink raw reply

* Re: [PATCH v3] ASoC: fsl_asrc: Add an option to select internal ratio mode
From: Mark Brown @ 2020-07-01 22:22 UTC (permalink / raw)
  To: perex, alsa-devel, tiwai, Xiubo.Lee, timur, lgirdwood, festevam,
	nicoleotsuka, Shengjiu Wang
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1593525367-23221-1-git-send-email-shengjiu.wang@nxp.com>

On Tue, 30 Jun 2020 21:56:07 +0800, Shengjiu Wang wrote:
> The ASRC not only supports ideal ratio mode, but also supports
> internal ratio mode.
> 
> For internal rato mode, the rate of clock source should be divided
> with no remainder by sample rate, otherwise there is sound
> distortion.
> 
> [...]

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/1] ASoC: fsl_asrc: Add an option to select internal ratio mode
      commit: d0250cf4f2abfbea64ed247230f08f5ae23979f0

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

^ permalink raw reply

* Re: [PATCH v2 6/6] powerpc/pseries/iommu: Avoid errors when DDW starts at 0x00
From: Leonardo Bras @ 2020-07-01 19:59 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <76047748-9ddd-5ba3-fe4d-85c7c08bd521@ozlabs.ru>

On Wed, 2020-07-01 at 18:04 +1000, Alexey Kardashevskiy wrote:
> 
> On 27/06/2020 03:46, Leonardo Bras wrote:
> > On Wed, 2020-06-24 at 03:24 -0300, Leonardo Bras wrote:
> > > As of today, enable_ddw() will return a non-null DMA address if the
> > > created DDW maps the whole partition. If the address is valid,
> > > iommu_bypass_supported_pSeriesLP() will consider iommu bypass enabled.
> > > 
> > > This can cause some trouble if the DDW happens to start at 0x00.
> > > 
> > > Instead if checking if the address is non-null, check directly if
> > > the DDW maps the whole partition, so it can bypass iommu.
> > > 
> > > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > 
> > This patch has a bug in it. I will rework it soon.
> 
> I'd rather suggest this:
> 
> https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20180725095032.2196-2-aik@ozlabs.ru/
> 
> Although it does not look like you are actually going to have windows
> starting at 0. Thanks,

Yeah, agree. 
I am thinking of dropping this one, as I don't see much good to be done
here.

Thank you!


^ permalink raw reply

* Re: [PATCH v2 5/6] powerpc/pseries/iommu: Make use of DDW even if it does not map the partition
From: Leonardo Bras @ 2020-07-01 19:57 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <18df09c0-ef83-a0d8-1143-1cb4d50bf6b7@ozlabs.ru>

On Wed, 2020-07-01 at 18:16 +1000, Alexey Kardashevskiy wrote:
> 
> On 24/06/2020 16:24, Leonardo Bras wrote:
> > As of today, if a DDW is created and can't map the whole partition, it's
> > removed and the default DMA window "ibm,dma-window" is used instead.
> > 
> > Usually this DDW is bigger than the default DMA window, so it would be
> > better to make use of it instead.
> > 
> > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > ---
> >  arch/powerpc/platforms/pseries/iommu.c | 28 +++++++++++++++++---------
> >  1 file changed, 19 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > index 4fcf00016fb1..2d217cda4075 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -685,7 +685,7 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
> >  	struct iommu_table *tbl;
> >  	struct device_node *dn, *pdn;
> >  	struct pci_dn *ppci;
> > -	const __be32 *dma_window = NULL;
> > +	const __be32 *dma_window = NULL, *alt_dma_window = NULL;
> >  
> >  	dn = pci_bus_to_OF_node(bus);
> >  
> > @@ -699,8 +699,13 @@ static void pci_dma_bus_setup_pSeriesLP(struct pci_bus *bus)
> >  			break;
> >  	}
> >  
> > +	/* If there is a DDW available, use it instead */
> > +	alt_dma_window = of_get_property(pdn, DIRECT64_PROPNAME, NULL);
> 
> It is not necessarily "direct" anymore as the name suggests, you may
> want to change that. DMA64_PROPNAME, may be. Thanks,
> 

Yeah, you are right.
I will change this for next version, also changing the string name to
reflect this.

-#define DIRECT64_PROPNAME "linux,direct64-ddr-window-info"
+#define DMA64_PROPNAME "linux,dma64-ddr-window-info"

Is that ok?

Thank you for helping!



^ permalink raw reply

* Re: [PATCH v2 4/6] powerpc/pseries/iommu: Remove default DMA window before creating DDW
From: Leonardo Bras @ 2020-07-01 19:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <e00340a3-1070-a787-5acc-0bfc37f73dff@ozlabs.ru>

On Wed, 2020-07-01 at 18:17 +1000, Alexey Kardashevskiy wrote:
> 
> On 24/06/2020 16:24, Leonardo Bras wrote:
> > On LoPAR "DMA Window Manipulation Calls", it's recommended to remove the
> > default DMA window for the device, before attempting to configure a DDW,
> > in order to make the maximum resources available for the next DDW to be
> > created.
> > 
> > This is a requirement for some devices to use DDW, given they only
> > allow one DMA window.
> 
> Devices never know about these windows, it is purely PHB's side of
> things. A device can access any address on the bus, the bus can generate
> an exception if there is no window behind the address OR some other
> device's MMIO. We could actually create a second window in addition to
> the first one and allocate bus addresses from both, we just simplifying
> this by merging two separate non-adjacent windows into one.

That's interesting, I was not aware of this. 
I will try to improve this commit message with this info.
Thanks for sharing!

> > > > If setting up a new DDW fails anywhere after the removal of this
> > default DMA window, it's needed to restore the default DMA window.
> > For this, an implementation of ibm,reset-pe-dma-windows rtas call is
> > needed:
> > 
> > Platforms supporting the DDW option starting with LoPAR level 2.7 implement
> > ibm,ddw-extensions. The first extension available (index 2) carries the
> > token for ibm,reset-pe-dma-windows rtas call, which is used to restore
> > the default DMA window for a device, if it has been deleted.
> > 
> > It does so by resetting the TCE table allocation for the PE to it's
> > boot time value, available in "ibm,dma-window" device tree node.
> > 
> > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > ---
> >  arch/powerpc/platforms/pseries/iommu.c | 70 ++++++++++++++++++++++----
> >  1 file changed, 61 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > index a8840d9e1c35..4fcf00016fb1 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -1029,6 +1029,39 @@ static phys_addr_t ddw_memory_hotplug_max(void)
> >  	return max_addr;
> >  }
> >  
> > +/*
> > + * Platforms supporting the DDW option starting with LoPAR level 2.7 implement
> > + * ibm,ddw-extensions, which carries the rtas token for
> > + * ibm,reset-pe-dma-windows.
> > + * That rtas-call can be used to restore the default DMA window for the device.
> > + */
> > +static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn)
> > +{
> > +	int ret;
> > +	u32 cfg_addr, ddw_ext[DDW_EXT_RESET_DMA_WIN + 1];
> > +	u64 buid;
> > +	struct device_node *dn;
> > +	struct pci_dn *pdn;
> > +
> > +	ret = of_property_read_u32_array(par_dn, "ibm,ddw-extensions",
> > +					 &ddw_ext[0], DDW_EXT_RESET_DMA_WIN + 1);
> > +	if (ret)
> > +		return;
> > +
> > +	dn = pci_device_to_OF_node(dev);
> > +	pdn = PCI_DN(dn);
> > +	buid = pdn->phb->buid;
> > +	cfg_addr = ((pdn->busno << 16) | (pdn->devfn << 8));
> > +
> > +	ret = rtas_call(ddw_ext[DDW_EXT_RESET_DMA_WIN], 3, 1, NULL, cfg_addr,
> > +			BUID_HI(buid), BUID_LO(buid));
> > +	if (ret)
> > +		dev_info(&dev->dev,
> > +			 "ibm,reset-pe-dma-windows(%x) %x %x %x returned %d ",
> > +			 ddw_ext[1], cfg_addr, BUID_HI(buid), BUID_LO(buid),
> 
> s/ddw_ext[1]/ddw_ext[DDW_EXT_RESET_DMA_WIN]/

Good catch! I missed this one.

> 
> 
> > +			 ret);
> > +}
> > +
> >  /*
> >   * If the PE supports dynamic dma windows, and there is space for a table
> >   * that can map all pages in a linear offset, then setup such a table,
> > @@ -1049,8 +1082,9 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> >  	u64 dma_addr, max_addr;
> >  	struct device_node *dn;
> >  	u32 ddw_avail[DDW_APPLICABLE_SIZE];
> > +
> 
> Unrelated new empty line.

Fixed!

> 
> 
> >  	struct direct_window *window;
> > -	struct property *win64;
> > +	struct property *win64, *default_win = NULL, *ddw_ext = NULL;
> >  	struct dynamic_dma_window_prop *ddwprop;
> >  	struct failed_ddw_pdn *fpdn;
> >  
> > @@ -1085,7 +1119,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> >  	if (ret)
> >  		goto out_failed;
> >  
> > -       /*
> > +	/*
> >  	 * Query if there is a second window of size to map the
> >  	 * whole partition.  Query returns number of windows, largest
> >  	 * block assigned to PE (partition endpoint), and two bitmasks
> > @@ -1096,15 +1130,31 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> >  	if (ret != 0)
> >  		goto out_failed;
> >  
> > +	/*
> > +	 * If there is no window available, remove the default DMA window,
> > +	 * if it's present. This will make all the resources available to the
> > +	 * new DDW window.
> > +	 * If anything fails after this, we need to restore it, so also check
> > +	 * for extensions presence.
> > +	 */
> >  	if (query.windows_available == 0) {
> 
> Does phyp really always advertise 0 windows for these VFs? What is in
> the largest_available_block when windows_available==0?

For this VF, it always advertise 0 windows before removing the default
DMA window. The largest available block size is the same as after the
removal (256GB). The only value that changes after removal is the
number of available windows. Here some debug prints:

[    3.473149] mlx5_core 4005:01:00.0: ibm,query-pe-dma-windows(53)
10000 8000000 29004005 returned 0
[    3.473162] mlx5_core 4005:01:00.0: windows_available = 0,
largest_block = 400000, page_size = 3, migration_capable = 3
[    3.473332] mlx5_core 4005:01:00.0: ibm,query-pe-dma-windows(53)
10000 8000000 29004005 returned 0
[    3.473345] mlx5_core 4005:01:00.0: windows_available = 1,
largest_block = 400000, page_size = 3, migration_capable = 3

> 
> 
> > -		/*
> > -		 * no additional windows are available for this device.
> > -		 * We might be able to reallocate the existing window,
> > -		 * trading in for a larger page size.
> > -		 */
> > -		dev_dbg(&dev->dev, "no free dynamic windows");
> > -		goto out_failed;
> > +		default_win = of_find_property(pdn, "ibm,dma-window", NULL);
> > +		ddw_ext = of_find_property(pdn, "ibm,ddw-extensions", NULL);
> > +		if (default_win && ddw_ext)
> > +			remove_dma_window(pdn, ddw_avail, default_win);
> > +
> > +		/* Query again, to check if the window is available */
> > +		ret = query_ddw(dev, ddw_avail, &query, pdn);
> > +		if (ret != 0)
> > +			goto out_failed;
> > +
> > +		if (query.windows_available == 0) {
> > +			/* no windows are available for this device. */
> > +			dev_dbg(&dev->dev, "no free dynamic windows");
> > +			goto out_failed;
> > +		}
> >  	}
> > +
> 
> Unrelated new empty line. Thanks,
Fixed!
Thank you!

> 
> >  	if (query.page_size & 4) {
> >  		page_shift = 24; /* 16MB */
> >  	} else if (query.page_size & 2) {
> > @@ -1194,6 +1244,8 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
> >  	kfree(win64);
> >  
> >  out_failed:
> > +	if (default_win && ddw_ext)
> > +		reset_dma_window(dev, pdn);
> >  
> >  	fpdn = kzalloc(sizeof(*fpdn), GFP_KERNEL);
> >  	if (!fpdn)
> > 


^ permalink raw reply

* Memory:  880608K/983040K  .... 36896K reserved ?
From: Joakim Tjernlund @ 2020-07-01 19:00 UTC (permalink / raw)
  To: linuxppc-dev@ozlabs.org

I cannot figure out how the xxxK reserved item works in:
 Memory: 880608K/983040K available (9532K kernel code, 1104K rwdata, 3348K rodata, 1088K init, 1201K bss, 36896K reserved ...

Is there a way to tune(lower it) this memory?

 Jocke

^ permalink raw reply

* Re: [PATCH 01/11] kexec_file: allow archs to handle special regions while locating memory hole
From: Hari Bathini @ 2020-07-01 18:31 UTC (permalink / raw)
  To: Dave Young
  Cc: Thiago Jung Bauermann, Pingfan Liu, Petr Tesarik, Kexec-ml,
	Mahesh J Salgaonkar, Mimi Zohar, lkml, linuxppc-dev, Sourabh Jain,
	Andrew Morton, Vivek Goyal, Eric Biederman
In-Reply-To: <20200701074659.GA3878@dhcp-128-65.nay.redhat.com>



On 01/07/20 1:16 pm, Dave Young wrote:
> On 06/29/20 at 05:26pm, Hari Bathini wrote:
>> Hi Petr,
>>
>> On 29/06/20 5:09 pm, Petr Tesarik wrote:
>>> Hi Hari,
>>>
>>> is there any good reason to add two more functions with a very similar
>>> name to an existing function? AFAICS all you need is a way to call a
>>> PPC64-specific function from within kexec_add_buffer (PATCH 4/11), so
>>> you could add something like this:
>>>
>>> int __weak arch_kexec_locate_mem_hole(struct kexec_buf *kbuf)
>>> {
>>> 	return 0;
>>> }
>>>
>>> Call this function from kexec_add_buffer where appropriate and then
>>> override it for PPC64 (it roughly corresponds to your
>>> kexec_locate_mem_hole_ppc64() from PATCH 4/11).
>>>
>>> FWIW it would make it easier for me to follow the resulting code.
>>
>> Right, Petr.
>>
>> I was trying out a few things before I ended up with what I sent here.
>> Bu yeah.. I did realize arch_kexec_locate_mem_hole() would have been better
>> after sending out v1. Will take care of that in v2.
> 
> Another way is use arch private function to locate mem hole, then set
> kbuf->mem, and then call kexec_add_buf, it will skip the common locate
> hole function.

Dave, I did think about it. But there are a couple of places this can get
tricky. One is ima_add_kexec_buffer() and the other is kexec_elf_load().
These call sites could be updated to set kbuf->mem before kexec_add_buffer().
But the current approach seemed like the better option for it creates a
single point of control in setting up segment buffers and also, makes adding
any new segments simpler, arch-specific segments or otherwise.

Thanks
Hari

^ permalink raw reply

* Re: [PATCH 10/20] dm: stop using ->queuedata
From: Mike Snitzer @ 2020-07-01 17:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-xtensa, linux-s390, linux-m68k, linux-nvdimm,
	dm-devel, linux-nvme, linux-kernel, linux-raid, linux-bcache,
	linuxppc-dev, drbd-dev
In-Reply-To: <20200701085947.3354405-11-hch@lst.de>

On Wed, Jul 01 2020 at  4:59am -0400,
Christoph Hellwig <hch@lst.de> wrote:

> Instead of setting up the queuedata as well just use one private data
> field.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Acked-by: Mike Snitzer <snitzer@redhat.com>


^ permalink raw reply

* Re: [PATCH 04/11] ppc64/kexec_file: avoid stomping memory used by special regions
From: Hari Bathini @ 2020-07-01 18:18 UTC (permalink / raw)
  To: Dave Young
  Cc: Pingfan Liu, Kexec-ml, Petr Tesarik, lkml, Sourabh Jain,
	Mahesh J Salgaonkar, linuxppc-dev, Vivek Goyal, Andrew Morton,
	Mimi Zohar, Thiago Jung Bauermann, Eric Biederman
In-Reply-To: <20200701074012.GA4496@dhcp-128-65.nay.redhat.com>



On 01/07/20 1:10 pm, Dave Young wrote:
> Hi Hari,
> On 06/27/20 at 12:35am, Hari Bathini wrote:
>> crashkernel region could have an overlap with special memory regions
>> like  opal, rtas, tce-table & such. These regions are referred to as
>> exclude memory ranges. Setup this ranges during image probe in order
>> to avoid them while finding the buffer for different kdump segments.
>> Implement kexec_locate_mem_hole_ppc64() that locates a memory hole
>> accounting for these ranges. Also, override arch_kexec_add_buffer()
>> to locate a memory hole & later call __kexec_add_buffer() function
>> with kbuf->mem set to skip the generic locate memory hole lookup.
>>
>> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
>> ---
>>  arch/powerpc/include/asm/crashdump-ppc64.h |   10 +
>>  arch/powerpc/include/asm/kexec.h           |    7 -
>>  arch/powerpc/kexec/elf_64.c                |    7 +
>>  arch/powerpc/kexec/file_load_64.c          |  292 ++++++++++++++++++++++++++++
>>  4 files changed, 312 insertions(+), 4 deletions(-)
>>  create mode 100644 arch/powerpc/include/asm/crashdump-ppc64.h
>>
> [snip]
>>  /**
>> + * get_exclude_memory_ranges - Get exclude memory ranges. This list includes
>> + *                             regions like opal/rtas, tce-table, initrd,
>> + *                             kernel, htab which should be avoided while
>> + *                             setting up kexec load segments.
>> + * @mem_ranges:                Range list to add the memory ranges to.
>> + *
>> + * Returns 0 on success, negative errno on error.
>> + */
>> +static int get_exclude_memory_ranges(struct crash_mem **mem_ranges)
>> +{
>> +	int ret;
>> +
>> +	ret = add_tce_mem_ranges(mem_ranges);
>> +	if (ret)
>> +		goto out;
>> +
>> +	ret = add_initrd_mem_range(mem_ranges);
>> +	if (ret)
>> +		goto out;
>> +
>> +	ret = add_htab_mem_range(mem_ranges);
>> +	if (ret)
>> +		goto out;
>> +
>> +	ret = add_kernel_mem_range(mem_ranges);
>> +	if (ret)
>> +		goto out;
>> +
>> +	ret = add_rtas_mem_range(mem_ranges, false);
>> +	if (ret)
>> +		goto out;
>> +
>> +	ret = add_opal_mem_range(mem_ranges, false);
>> +	if (ret)
>> +		goto out;
>> +
>> +	ret = add_reserved_ranges(mem_ranges);
>> +	if (ret)
>> +		goto out;
>> +
>> +	/* exclude memory ranges should be sorted for easy lookup */
>> +	sort_memory_ranges(*mem_ranges);
>> +out:
>> +	if (ret)
>> +		pr_err("Failed to setup exclude memory ranges\n");
>> +	return ret;
>> +}
> 
> I'm confused about the "overlap with crashkernel memory", does that mean
> those normal kernel used memory could be put in crashkernel reserved

There are regions that could overlap with crashkernel region but they are
not normal kernel used memory though. These are regions that kernel and/or
f/w chose to place at a particular address for real mode accessibility
and/or memory layout between kernel & f/w kind of thing.

> memory range?  If so why can't just skip those areas while crashkernel
> doing the reservation?

crashkernel region has a dependency to be in the first memory block for it
to be accessible in real mode. Accommodating this requirement while addressing
other requirements would mean something like what we have now. A list of
possible special memory regions in crashkernel region to take care of.

I have plans to split crashkernel region into low & high to have exclusive
regions for crashkernel, even if that means to have two of them. But that
is for another day with its own set of complexities to deal with...

Thanks
Hari

^ permalink raw reply

* Re: [PATCH 16/20] block: move ->make_request_fn to struct block_device_operations
From: Dan Williams @ 2020-07-01 16:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-xtensa, linux-nvdimm, linux-s390, linux-m68k,
	linux-nvme, Linux Kernel Mailing List, linux-raid,
	device-mapper development, linux-bcache, linuxppc-dev, drbd-dev
In-Reply-To: <20200701085947.3354405-17-hch@lst.de>

On Wed, Jul 1, 2020 at 2:01 AM Christoph Hellwig <hch@lst.de> wrote:
>
> The make_request_fn is a little weird in that it sits directly in
> struct request_queue instead of an operation vector.  Replace it with
> a block_device_operations method called submit_bio (which describes much
> better what it does).  Also remove the request_queue argument to it, as
> the queue can be derived pretty trivially from the bio.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
[..]
>  drivers/nvdimm/blk.c                          |  5 +-
>  drivers/nvdimm/btt.c                          |  5 +-
>  drivers/nvdimm/pmem.c                         |  5 +-

For drivers/nvdimm

Acked-by: Dan Williams <dan.j.williams@intel.com>

^ permalink raw reply

* Re: [PATCH v2 2/6] powerpc/pseries/iommu: Update call to ibm,query-pe-dma-windows
From: Leonardo Bras @ 2020-07-01 14:04 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael Ellerman, Benjamin Herrenschmidt,
	Paul Mackerras, Thiago Jung Bauermann, Ram Pai
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <5411e8a1-02a3-1287-40bf-ccc9db7a4f88@ozlabs.ru>

On Wed, 2020-07-01 at 18:17 +1000, Alexey Kardashevskiy wrote:
> 
> > +#define DDW_EXT_SIZE		0
> > +#define DDW_EXT_RESET_DMA_WIN	1
> > +#define DDW_EXT_QUERY_OUT_SIZE	2
> 
> #define DDW_EXT_LAST (DDW_EXT_QUERY_OUT_SIZE + 1)
> ...
> 
> 
> > +
> >  static struct iommu_table_group *iommu_pseries_alloc_group(int node)
> >  {
> >  	struct iommu_table_group *table_group;
> > @@ -339,7 +343,7 @@ struct direct_window {
> >  /* Dynamic DMA Window support */
> >  struct ddw_query_response {
> >  	u32 windows_available;
> > -	u32 largest_available_block;
> > +	u64 largest_available_block;
> >  	u32 page_size;
> >  	u32 migration_capable;
> >  };
> > @@ -875,13 +879,29 @@ static int find_existing_ddw_windows(void)
> >  machine_arch_initcall(pseries, find_existing_ddw_windows);
> >  
> >  static int query_ddw(struct pci_dev *dev, const u32 *ddw_avail,
> > -			struct ddw_query_response *query)
> > +		     struct ddw_query_response *query,
> > +		     struct device_node *parent)
> >  {
> >  	struct device_node *dn;
> >  	struct pci_dn *pdn;
> > -	u32 cfg_addr;
> > +	u32 cfg_addr, query_out[5], ddw_ext[DDW_EXT_QUERY_OUT_SIZE + 1];
> 
> ... and use DDW_EXT_LAST here.

Because of the growing nature of ddw-extensions, I intentionally let
this be (DDW_EXT_QUERY_OUT_SIZE + 1). If we create a DDW_EXT_LAST, it
will be incremented in the future if more extensions come to exist.

I mean, I previously saw no reason for allocating space for extensions
after the desired one, as they won't be used here.

> 
> 
> >  	u64 buid;
> > -	int ret;
> > +	int ret, out_sz;
> > +
> > +	/*
> > +	 * From LoPAR level 2.8, "ibm,ddw-extensions" index 3 can rule how many
> > +	 * output parameters ibm,query-pe-dma-windows will have, ranging from
> > +	 * 5 to 6.
> > +	 */
> > +
> > +	ret = of_property_read_u32_array(parent, "ibm,ddw-extensions",
> > +					 &ddw_ext[0],
> > +					 DDW_EXT_QUERY_OUT_SIZE + 1);

In this case, I made sure not to cross (DDW_EXT_QUERY_OUT_SIZE + 1)
while reading the extensions from the property.

What do you think about it? 

Best regards,
Leonardo 


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox