linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memory
@ 2023-06-13 17:16 Gaurav Batra
  2023-06-13 17:17 ` Gaurav Batra
  2023-07-03  5:26 ` Michael Ellerman
  0 siblings, 2 replies; 6+ messages in thread
From: Gaurav Batra @ 2023-06-13 17:16 UTC (permalink / raw)
  To: mpe; +Cc: Brian King, linuxppc-dev, Gaurav Batra

When memory is dynamically added/removed, iommu_mem_notifier() is invoked. This
routine traverses through all the DMA windows (DDW only, not default windows)
to add/remove "direct" TCE mappings. The routines for this purpose are
tce_clearrange_multi_pSeriesLP() and tce_clearrange_multi_pSeriesLP().

Both these routines are designed for Direct mapped DMA windows only.

The issue is that there could be some DMA windows in the list which are not
"direct" mapped. Calling these routines will either,

1) remove some dynamically mapped TCEs, Or
2) try to add TCEs which are out of bounds and HCALL returns H_PARAMETER

Here are the side affects when these routines are incorrectly invoked for
"dynamically" mapped DMA windows.

tce_setrange_multi_pSeriesLP()

This adds direct mapped TCEs. Now, this could invoke HCALL to add TCEs with
out-of-bound range. In this scenario, HCALL will return H_PARAMETER and DLAR
ADD of memory will fail.

tce_clearrange_multi_pSeriesLP()

This will remove range of TCEs. The TCE range that is calculated, depending on
the memory range being added, could infact be mapping some other memory
address (for dynamic DMA window scenario). This will wipe out those TCEs.

The solution is for iommu_mem_notifier() to only invoke these routines for
"direct" mapped DMA windows.

Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com>
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/iommu.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index 918f511837db..24dd61636400 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -363,6 +363,7 @@ struct dynamic_dma_window_prop {
 struct dma_win {
 	struct device_node *device;
 	const struct dynamic_dma_window_prop *prop;
+	bool    direct;
 	struct list_head list;
 };
 
@@ -1409,6 +1410,8 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		goto out_del_prop;
 
 	if (direct_mapping) {
+		window->direct = true;
+
 		/* DDW maps the whole partition, so enable direct DMA mapping */
 		ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
 					    win64->value, tce_setrange_multi_pSeriesLP_walk);
@@ -1425,6 +1428,8 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
 		int i;
 		unsigned long start = 0, end = 0;
 
+		window->direct = false;
+
 		for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources); i++) {
 			const unsigned long mask = IORESOURCE_MEM_64 | IORESOURCE_MEM;
 
@@ -1587,8 +1592,10 @@ static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
 	case MEM_GOING_ONLINE:
 		spin_lock(&dma_win_list_lock);
 		list_for_each_entry(window, &dma_win_list, list) {
-			ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
-					arg->nr_pages, window->prop);
+			if (window->direct) {
+				ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
+						arg->nr_pages, window->prop);
+			}
 			/* XXX log error */
 		}
 		spin_unlock(&dma_win_list_lock);
@@ -1597,8 +1604,10 @@ static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
 	case MEM_OFFLINE:
 		spin_lock(&dma_win_list_lock);
 		list_for_each_entry(window, &dma_win_list, list) {
-			ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
-					arg->nr_pages, window->prop);
+			if (window->direct) {
+				ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
+						arg->nr_pages, window->prop);
+			}
 			/* XXX log error */
 		}
 		spin_unlock(&dma_win_list_lock);
-- 
2.39.2 (Apple Git-143)


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memory
  2023-06-13 17:16 [PATCH] powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memory Gaurav Batra
@ 2023-06-13 17:17 ` Gaurav Batra
  2023-06-23 14:55   ` Gaurav Batra
  2023-07-03  5:26 ` Michael Ellerman
  1 sibling, 1 reply; 6+ messages in thread
From: Gaurav Batra @ 2023-06-13 17:17 UTC (permalink / raw)
  To: mpe; +Cc: Brian King, linuxppc-dev

Hello Michael,

I found this bug while going though the code. This bug is exposed when 
DDW is smaller than the max memory of the LPAR. This will result in 
creating DDW which will have Dynamically mapped TCEs (no direct mapping).

I would like to stress that this  bug is exposed only in Upstream 
kernel. Current kernel level in Distros are not exposed to this since 
they don't have the  concept of "dynamically mapped" DDW.

I didn't have access to any of the P10 boxes with large amount of memory 
to  re-create the scenario. On P10 we have 2MB TCEs, which results in 
DDW large enough to be able to cover  max memory I could have for the 
LPAR. As a result,  IO Bus Addresses generated were always within DDW 
limits and no H_PARAMETER was returned by HCALL.

So, I hacked the kernel to force the use of 64K TCEs. This resulted in 
DDW smaller than max memory.

When I tried to DLPAR ADD memory, it failed with error code of -4 
(H_PARAMETER) from HCALL (H_PUT_TCE/H_PUT_TCE_INDIRECT), when 
iommu_mem_notifier() invoked tce_setrange_multi_pSeriesLP().

I didn't test the DLPAR REMOVE path, to verify if incorrect TCEs are 
removed by tce_clearrange_multi_pSeriesLP(), since I would need to hack 
kernel to force dynamically added TCEs to the high IO Bus Addresses. 
But, the concept is  same.

Thanks,

Gaurav

On 6/13/23 12:16 PM, Gaurav Batra wrote:
> When memory is dynamically added/removed, iommu_mem_notifier() is invoked. This
> routine traverses through all the DMA windows (DDW only, not default windows)
> to add/remove "direct" TCE mappings. The routines for this purpose are
> tce_clearrange_multi_pSeriesLP() and tce_clearrange_multi_pSeriesLP().
>
> Both these routines are designed for Direct mapped DMA windows only.
>
> The issue is that there could be some DMA windows in the list which are not
> "direct" mapped. Calling these routines will either,
>
> 1) remove some dynamically mapped TCEs, Or
> 2) try to add TCEs which are out of bounds and HCALL returns H_PARAMETER
>
> Here are the side affects when these routines are incorrectly invoked for
> "dynamically" mapped DMA windows.
>
> tce_setrange_multi_pSeriesLP()
>
> This adds direct mapped TCEs. Now, this could invoke HCALL to add TCEs with
> out-of-bound range. In this scenario, HCALL will return H_PARAMETER and DLAR
> ADD of memory will fail.
>
> tce_clearrange_multi_pSeriesLP()
>
> This will remove range of TCEs. The TCE range that is calculated, depending on
> the memory range being added, could infact be mapping some other memory
> address (for dynamic DMA window scenario). This will wipe out those TCEs.
>
> The solution is for iommu_mem_notifier() to only invoke these routines for
> "direct" mapped DMA windows.
>
> Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com>
> Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
> ---
>   arch/powerpc/platforms/pseries/iommu.c | 17 +++++++++++++----
>   1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> index 918f511837db..24dd61636400 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -363,6 +363,7 @@ struct dynamic_dma_window_prop {
>   struct dma_win {
>   	struct device_node *device;
>   	const struct dynamic_dma_window_prop *prop;
> +	bool    direct;
>   	struct list_head list;
>   };
>
> @@ -1409,6 +1410,8 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		goto out_del_prop;
>
>   	if (direct_mapping) {
> +		window->direct = true;
> +
>   		/* DDW maps the whole partition, so enable direct DMA mapping */
>   		ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> PAGE_SHIFT,
>   					    win64->value, tce_setrange_multi_pSeriesLP_walk);
> @@ -1425,6 +1428,8 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>   		int i;
>   		unsigned long start = 0, end = 0;
>
> +		window->direct = false;
> +
>   		for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources); i++) {
>   			const unsigned long mask = IORESOURCE_MEM_64 | IORESOURCE_MEM;
>
> @@ -1587,8 +1592,10 @@ static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
>   	case MEM_GOING_ONLINE:
>   		spin_lock(&dma_win_list_lock);
>   		list_for_each_entry(window, &dma_win_list, list) {
> -			ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
> -					arg->nr_pages, window->prop);
> +			if (window->direct) {
> +				ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
> +						arg->nr_pages, window->prop);
> +			}
>   			/* XXX log error */
>   		}
>   		spin_unlock(&dma_win_list_lock);
> @@ -1597,8 +1604,10 @@ static int iommu_mem_notifier(struct notifier_block *nb, unsigned long action,
>   	case MEM_OFFLINE:
>   		spin_lock(&dma_win_list_lock);
>   		list_for_each_entry(window, &dma_win_list, list) {
> -			ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
> -					arg->nr_pages, window->prop);
> +			if (window->direct) {
> +				ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
> +						arg->nr_pages, window->prop);
> +			}
>   			/* XXX log error */
>   		}
>   		spin_unlock(&dma_win_list_lock);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memory
  2023-06-13 17:17 ` Gaurav Batra
@ 2023-06-23 14:55   ` Gaurav Batra
  2023-06-26  4:54     ` Michael Ellerman
  0 siblings, 1 reply; 6+ messages in thread
From: Gaurav Batra @ 2023-06-23 14:55 UTC (permalink / raw)
  To: mpe; +Cc: Brian King, linuxppc-dev

Hello Michael,

Did you get a chance to look into this patch? I don't mean to rush you. 
Just wondering if there is anything I can do to help make the patch to 
Upstream.

Thanks,

Gaurav

On 6/13/23 12:17 PM, Gaurav Batra wrote:
> Hello Michael,
>
> I found this bug while going though the code. This bug is exposed when 
> DDW is smaller than the max memory of the LPAR. This will result in 
> creating DDW which will have Dynamically mapped TCEs (no direct mapping).
>
> I would like to stress that this  bug is exposed only in Upstream 
> kernel. Current kernel level in Distros are not exposed to this since 
> they don't have the  concept of "dynamically mapped" DDW.
>
> I didn't have access to any of the P10 boxes with large amount of 
> memory to  re-create the scenario. On P10 we have 2MB TCEs, which 
> results in DDW large enough to be able to cover  max memory I could 
> have for the LPAR. As a result,  IO Bus Addresses generated were 
> always within DDW limits and no H_PARAMETER was returned by HCALL.
>
> So, I hacked the kernel to force the use of 64K TCEs. This resulted in 
> DDW smaller than max memory.
>
> When I tried to DLPAR ADD memory, it failed with error code of -4 
> (H_PARAMETER) from HCALL (H_PUT_TCE/H_PUT_TCE_INDIRECT), when 
> iommu_mem_notifier() invoked tce_setrange_multi_pSeriesLP().
>
> I didn't test the DLPAR REMOVE path, to verify if incorrect TCEs are 
> removed by tce_clearrange_multi_pSeriesLP(), since I would need to 
> hack kernel to force dynamically added TCEs to the high IO Bus 
> Addresses. But, the concept is  same.
>
> Thanks,
>
> Gaurav
>
> On 6/13/23 12:16 PM, Gaurav Batra wrote:
>> When memory is dynamically added/removed, iommu_mem_notifier() is 
>> invoked. This
>> routine traverses through all the DMA windows (DDW only, not default 
>> windows)
>> to add/remove "direct" TCE mappings. The routines for this purpose are
>> tce_clearrange_multi_pSeriesLP() and tce_clearrange_multi_pSeriesLP().
>>
>> Both these routines are designed for Direct mapped DMA windows only.
>>
>> The issue is that there could be some DMA windows in the list which 
>> are not
>> "direct" mapped. Calling these routines will either,
>>
>> 1) remove some dynamically mapped TCEs, Or
>> 2) try to add TCEs which are out of bounds and HCALL returns H_PARAMETER
>>
>> Here are the side affects when these routines are incorrectly invoked 
>> for
>> "dynamically" mapped DMA windows.
>>
>> tce_setrange_multi_pSeriesLP()
>>
>> This adds direct mapped TCEs. Now, this could invoke HCALL to add 
>> TCEs with
>> out-of-bound range. In this scenario, HCALL will return H_PARAMETER 
>> and DLAR
>> ADD of memory will fail.
>>
>> tce_clearrange_multi_pSeriesLP()
>>
>> This will remove range of TCEs. The TCE range that is calculated, 
>> depending on
>> the memory range being added, could infact be mapping some other memory
>> address (for dynamic DMA window scenario). This will wipe out those 
>> TCEs.
>>
>> The solution is for iommu_mem_notifier() to only invoke these 
>> routines for
>> "direct" mapped DMA windows.
>>
>> Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com>
>> Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/platforms/pseries/iommu.c | 17 +++++++++++++----
>>   1 file changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>> b/arch/powerpc/platforms/pseries/iommu.c
>> index 918f511837db..24dd61636400 100644
>> --- a/arch/powerpc/platforms/pseries/iommu.c
>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>> @@ -363,6 +363,7 @@ struct dynamic_dma_window_prop {
>>   struct dma_win {
>>       struct device_node *device;
>>       const struct dynamic_dma_window_prop *prop;
>> +    bool    direct;
>>       struct list_head list;
>>   };
>>
>> @@ -1409,6 +1410,8 @@ static bool enable_ddw(struct pci_dev *dev, 
>> struct device_node *pdn)
>>           goto out_del_prop;
>>
>>       if (direct_mapping) {
>> +        window->direct = true;
>> +
>>           /* DDW maps the whole partition, so enable direct DMA 
>> mapping */
>>           ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> 
>> PAGE_SHIFT,
>>                           win64->value, 
>> tce_setrange_multi_pSeriesLP_walk);
>> @@ -1425,6 +1428,8 @@ static bool enable_ddw(struct pci_dev *dev, 
>> struct device_node *pdn)
>>           int i;
>>           unsigned long start = 0, end = 0;
>>
>> +        window->direct = false;
>> +
>>           for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources); i++) {
>>               const unsigned long mask = IORESOURCE_MEM_64 | 
>> IORESOURCE_MEM;
>>
>> @@ -1587,8 +1592,10 @@ static int iommu_mem_notifier(struct 
>> notifier_block *nb, unsigned long action,
>>       case MEM_GOING_ONLINE:
>>           spin_lock(&dma_win_list_lock);
>>           list_for_each_entry(window, &dma_win_list, list) {
>> -            ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
>> -                    arg->nr_pages, window->prop);
>> +            if (window->direct) {
>> +                ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
>> +                        arg->nr_pages, window->prop);
>> +            }
>>               /* XXX log error */
>>           }
>>           spin_unlock(&dma_win_list_lock);
>> @@ -1597,8 +1604,10 @@ static int iommu_mem_notifier(struct 
>> notifier_block *nb, unsigned long action,
>>       case MEM_OFFLINE:
>>           spin_lock(&dma_win_list_lock);
>>           list_for_each_entry(window, &dma_win_list, list) {
>> -            ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
>> -                    arg->nr_pages, window->prop);
>> +            if (window->direct) {
>> +                ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
>> +                        arg->nr_pages, window->prop);
>> +            }
>>               /* XXX log error */
>>           }
>>           spin_unlock(&dma_win_list_lock);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memory
  2023-06-23 14:55   ` Gaurav Batra
@ 2023-06-26  4:54     ` Michael Ellerman
  2023-06-26 13:12       ` Gaurav Batra
  0 siblings, 1 reply; 6+ messages in thread
From: Michael Ellerman @ 2023-06-26  4:54 UTC (permalink / raw)
  To: Gaurav Batra; +Cc: Brian King, linuxppc-dev

Gaurav Batra <gbatra@linux.vnet.ibm.com> writes:
> Hello Michael,
>
> Did you get a chance to look into this patch? I don't mean to rush you. 
> Just wondering if there is anything I can do to help make the patch to 
> Upstream.

I skimmed it and decided it wasn't a critical bug fix, and hoped someone
else would review it - silly me :D

But the patch looks simple enough, and the explanation is very good so I
think it looks good to merge.

I'll apply it for v6.5.

cheers

> On 6/13/23 12:17 PM, Gaurav Batra wrote:
>> Hello Michael,
>>
>> I found this bug while going though the code. This bug is exposed when 
>> DDW is smaller than the max memory of the LPAR. This will result in 
>> creating DDW which will have Dynamically mapped TCEs (no direct mapping).
>>
>> I would like to stress that this  bug is exposed only in Upstream 
>> kernel. Current kernel level in Distros are not exposed to this since 
>> they don't have the  concept of "dynamically mapped" DDW.
>>
>> I didn't have access to any of the P10 boxes with large amount of 
>> memory to  re-create the scenario. On P10 we have 2MB TCEs, which 
>> results in DDW large enough to be able to cover  max memory I could 
>> have for the LPAR. As a result,  IO Bus Addresses generated were 
>> always within DDW limits and no H_PARAMETER was returned by HCALL.
>>
>> So, I hacked the kernel to force the use of 64K TCEs. This resulted in 
>> DDW smaller than max memory.
>>
>> When I tried to DLPAR ADD memory, it failed with error code of -4 
>> (H_PARAMETER) from HCALL (H_PUT_TCE/H_PUT_TCE_INDIRECT), when 
>> iommu_mem_notifier() invoked tce_setrange_multi_pSeriesLP().
>>
>> I didn't test the DLPAR REMOVE path, to verify if incorrect TCEs are 
>> removed by tce_clearrange_multi_pSeriesLP(), since I would need to 
>> hack kernel to force dynamically added TCEs to the high IO Bus 
>> Addresses. But, the concept is  same.
>>
>> Thanks,
>>
>> Gaurav
>>
>> On 6/13/23 12:16 PM, Gaurav Batra wrote:
>>> When memory is dynamically added/removed, iommu_mem_notifier() is 
>>> invoked. This
>>> routine traverses through all the DMA windows (DDW only, not default 
>>> windows)
>>> to add/remove "direct" TCE mappings. The routines for this purpose are
>>> tce_clearrange_multi_pSeriesLP() and tce_clearrange_multi_pSeriesLP().
>>>
>>> Both these routines are designed for Direct mapped DMA windows only.
>>>
>>> The issue is that there could be some DMA windows in the list which 
>>> are not
>>> "direct" mapped. Calling these routines will either,
>>>
>>> 1) remove some dynamically mapped TCEs, Or
>>> 2) try to add TCEs which are out of bounds and HCALL returns H_PARAMETER
>>>
>>> Here are the side affects when these routines are incorrectly invoked 
>>> for
>>> "dynamically" mapped DMA windows.
>>>
>>> tce_setrange_multi_pSeriesLP()
>>>
>>> This adds direct mapped TCEs. Now, this could invoke HCALL to add 
>>> TCEs with
>>> out-of-bound range. In this scenario, HCALL will return H_PARAMETER 
>>> and DLAR
>>> ADD of memory will fail.
>>>
>>> tce_clearrange_multi_pSeriesLP()
>>>
>>> This will remove range of TCEs. The TCE range that is calculated, 
>>> depending on
>>> the memory range being added, could infact be mapping some other memory
>>> address (for dynamic DMA window scenario). This will wipe out those 
>>> TCEs.
>>>
>>> The solution is for iommu_mem_notifier() to only invoke these 
>>> routines for
>>> "direct" mapped DMA windows.
>>>
>>> Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com>
>>> Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
>>> ---
>>>   arch/powerpc/platforms/pseries/iommu.c | 17 +++++++++++++----
>>>   1 file changed, 13 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>>> b/arch/powerpc/platforms/pseries/iommu.c
>>> index 918f511837db..24dd61636400 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -363,6 +363,7 @@ struct dynamic_dma_window_prop {
>>>   struct dma_win {
>>>       struct device_node *device;
>>>       const struct dynamic_dma_window_prop *prop;
>>> +    bool    direct;
>>>       struct list_head list;
>>>   };
>>>
>>> @@ -1409,6 +1410,8 @@ static bool enable_ddw(struct pci_dev *dev, 
>>> struct device_node *pdn)
>>>           goto out_del_prop;
>>>
>>>       if (direct_mapping) {
>>> +        window->direct = true;
>>> +
>>>           /* DDW maps the whole partition, so enable direct DMA 
>>> mapping */
>>>           ret = walk_system_ram_range(0, memblock_end_of_DRAM() >> 
>>> PAGE_SHIFT,
>>>                           win64->value, 
>>> tce_setrange_multi_pSeriesLP_walk);
>>> @@ -1425,6 +1428,8 @@ static bool enable_ddw(struct pci_dev *dev, 
>>> struct device_node *pdn)
>>>           int i;
>>>           unsigned long start = 0, end = 0;
>>>
>>> +        window->direct = false;
>>> +
>>>           for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources); i++) {
>>>               const unsigned long mask = IORESOURCE_MEM_64 | 
>>> IORESOURCE_MEM;
>>>
>>> @@ -1587,8 +1592,10 @@ static int iommu_mem_notifier(struct 
>>> notifier_block *nb, unsigned long action,
>>>       case MEM_GOING_ONLINE:
>>>           spin_lock(&dma_win_list_lock);
>>>           list_for_each_entry(window, &dma_win_list, list) {
>>> -            ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
>>> -                    arg->nr_pages, window->prop);
>>> +            if (window->direct) {
>>> +                ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
>>> +                        arg->nr_pages, window->prop);
>>> +            }
>>>               /* XXX log error */
>>>           }
>>>           spin_unlock(&dma_win_list_lock);
>>> @@ -1597,8 +1604,10 @@ static int iommu_mem_notifier(struct 
>>> notifier_block *nb, unsigned long action,
>>>       case MEM_OFFLINE:
>>>           spin_lock(&dma_win_list_lock);
>>>           list_for_each_entry(window, &dma_win_list, list) {
>>> -            ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
>>> -                    arg->nr_pages, window->prop);
>>> +            if (window->direct) {
>>> +                ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
>>> +                        arg->nr_pages, window->prop);
>>> +            }
>>>               /* XXX log error */
>>>           }
>>>           spin_unlock(&dma_win_list_lock);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memory
  2023-06-26  4:54     ` Michael Ellerman
@ 2023-06-26 13:12       ` Gaurav Batra
  0 siblings, 0 replies; 6+ messages in thread
From: Gaurav Batra @ 2023-06-26 13:12 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Brian King, linuxppc-dev

Thanks a lot


On 6/25/23 11:54 PM, Michael Ellerman wrote:
> Gaurav Batra <gbatra@linux.vnet.ibm.com> writes:
>> Hello Michael,
>>
>> Did you get a chance to look into this patch? I don't mean to rush you.
>> Just wondering if there is anything I can do to help make the patch to
>> Upstream.
> I skimmed it and decided it wasn't a critical bug fix, and hoped someone
> else would review it - silly me :D
>
> But the patch looks simple enough, and the explanation is very good so I
> think it looks good to merge.
>
> I'll apply it for v6.5.
>
> cheers
>
>> On 6/13/23 12:17 PM, Gaurav Batra wrote:
>>> Hello Michael,
>>>
>>> I found this bug while going though the code. This bug is exposed when
>>> DDW is smaller than the max memory of the LPAR. This will result in
>>> creating DDW which will have Dynamically mapped TCEs (no direct mapping).
>>>
>>> I would like to stress that this  bug is exposed only in Upstream
>>> kernel. Current kernel level in Distros are not exposed to this since
>>> they don't have the  concept of "dynamically mapped" DDW.
>>>
>>> I didn't have access to any of the P10 boxes with large amount of
>>> memory to  re-create the scenario. On P10 we have 2MB TCEs, which
>>> results in DDW large enough to be able to cover  max memory I could
>>> have for the LPAR. As a result,  IO Bus Addresses generated were
>>> always within DDW limits and no H_PARAMETER was returned by HCALL.
>>>
>>> So, I hacked the kernel to force the use of 64K TCEs. This resulted in
>>> DDW smaller than max memory.
>>>
>>> When I tried to DLPAR ADD memory, it failed with error code of -4
>>> (H_PARAMETER) from HCALL (H_PUT_TCE/H_PUT_TCE_INDIRECT), when
>>> iommu_mem_notifier() invoked tce_setrange_multi_pSeriesLP().
>>>
>>> I didn't test the DLPAR REMOVE path, to verify if incorrect TCEs are
>>> removed by tce_clearrange_multi_pSeriesLP(), since I would need to
>>> hack kernel to force dynamically added TCEs to the high IO Bus
>>> Addresses. But, the concept is  same.
>>>
>>> Thanks,
>>>
>>> Gaurav
>>>
>>> On 6/13/23 12:16 PM, Gaurav Batra wrote:
>>>> When memory is dynamically added/removed, iommu_mem_notifier() is
>>>> invoked. This
>>>> routine traverses through all the DMA windows (DDW only, not default
>>>> windows)
>>>> to add/remove "direct" TCE mappings. The routines for this purpose are
>>>> tce_clearrange_multi_pSeriesLP() and tce_clearrange_multi_pSeriesLP().
>>>>
>>>> Both these routines are designed for Direct mapped DMA windows only.
>>>>
>>>> The issue is that there could be some DMA windows in the list which
>>>> are not
>>>> "direct" mapped. Calling these routines will either,
>>>>
>>>> 1) remove some dynamically mapped TCEs, Or
>>>> 2) try to add TCEs which are out of bounds and HCALL returns H_PARAMETER
>>>>
>>>> Here are the side affects when these routines are incorrectly invoked
>>>> for
>>>> "dynamically" mapped DMA windows.
>>>>
>>>> tce_setrange_multi_pSeriesLP()
>>>>
>>>> This adds direct mapped TCEs. Now, this could invoke HCALL to add
>>>> TCEs with
>>>> out-of-bound range. In this scenario, HCALL will return H_PARAMETER
>>>> and DLAR
>>>> ADD of memory will fail.
>>>>
>>>> tce_clearrange_multi_pSeriesLP()
>>>>
>>>> This will remove range of TCEs. The TCE range that is calculated,
>>>> depending on
>>>> the memory range being added, could infact be mapping some other memory
>>>> address (for dynamic DMA window scenario). This will wipe out those
>>>> TCEs.
>>>>
>>>> The solution is for iommu_mem_notifier() to only invoke these
>>>> routines for
>>>> "direct" mapped DMA windows.
>>>>
>>>> Signed-off-by: Gaurav Batra <gbatra@linux.vnet.ibm.com>
>>>> Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
>>>> ---
>>>>    arch/powerpc/platforms/pseries/iommu.c | 17 +++++++++++++----
>>>>    1 file changed, 13 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c
>>>> b/arch/powerpc/platforms/pseries/iommu.c
>>>> index 918f511837db..24dd61636400 100644
>>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>>> @@ -363,6 +363,7 @@ struct dynamic_dma_window_prop {
>>>>    struct dma_win {
>>>>        struct device_node *device;
>>>>        const struct dynamic_dma_window_prop *prop;
>>>> +    bool    direct;
>>>>        struct list_head list;
>>>>    };
>>>>
>>>> @@ -1409,6 +1410,8 @@ static bool enable_ddw(struct pci_dev *dev,
>>>> struct device_node *pdn)
>>>>            goto out_del_prop;
>>>>
>>>>        if (direct_mapping) {
>>>> +        window->direct = true;
>>>> +
>>>>            /* DDW maps the whole partition, so enable direct DMA
>>>> mapping */
>>>>            ret = walk_system_ram_range(0, memblock_end_of_DRAM() >>
>>>> PAGE_SHIFT,
>>>>                            win64->value,
>>>> tce_setrange_multi_pSeriesLP_walk);
>>>> @@ -1425,6 +1428,8 @@ static bool enable_ddw(struct pci_dev *dev,
>>>> struct device_node *pdn)
>>>>            int i;
>>>>            unsigned long start = 0, end = 0;
>>>>
>>>> +        window->direct = false;
>>>> +
>>>>            for (i = 0; i < ARRAY_SIZE(pci->phb->mem_resources); i++) {
>>>>                const unsigned long mask = IORESOURCE_MEM_64 |
>>>> IORESOURCE_MEM;
>>>>
>>>> @@ -1587,8 +1592,10 @@ static int iommu_mem_notifier(struct
>>>> notifier_block *nb, unsigned long action,
>>>>        case MEM_GOING_ONLINE:
>>>>            spin_lock(&dma_win_list_lock);
>>>>            list_for_each_entry(window, &dma_win_list, list) {
>>>> -            ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
>>>> -                    arg->nr_pages, window->prop);
>>>> +            if (window->direct) {
>>>> +                ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
>>>> +                        arg->nr_pages, window->prop);
>>>> +            }
>>>>                /* XXX log error */
>>>>            }
>>>>            spin_unlock(&dma_win_list_lock);
>>>> @@ -1597,8 +1604,10 @@ static int iommu_mem_notifier(struct
>>>> notifier_block *nb, unsigned long action,
>>>>        case MEM_OFFLINE:
>>>>            spin_lock(&dma_win_list_lock);
>>>>            list_for_each_entry(window, &dma_win_list, list) {
>>>> -            ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
>>>> -                    arg->nr_pages, window->prop);
>>>> +            if (window->direct) {
>>>> +                ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
>>>> +                        arg->nr_pages, window->prop);
>>>> +            }
>>>>                /* XXX log error */
>>>>            }
>>>>            spin_unlock(&dma_win_list_lock);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memory
  2023-06-13 17:16 [PATCH] powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memory Gaurav Batra
  2023-06-13 17:17 ` Gaurav Batra
@ 2023-07-03  5:26 ` Michael Ellerman
  1 sibling, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2023-07-03  5:26 UTC (permalink / raw)
  To: Gaurav Batra; +Cc: Brian King, linuxppc-dev

On Tue, 13 Jun 2023 12:16:41 -0500, Gaurav Batra wrote:
> When memory is dynamically added/removed, iommu_mem_notifier() is invoked. This
> routine traverses through all the DMA windows (DDW only, not default windows)
> to add/remove "direct" TCE mappings. The routines for this purpose are
> tce_clearrange_multi_pSeriesLP() and tce_clearrange_multi_pSeriesLP().
> 
> Both these routines are designed for Direct mapped DMA windows only.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memory
      https://git.kernel.org/powerpc/c/d61cd13e732c0eaa7d66b45edb2d0de8eab65a1e

cheers

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-07-03  5:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-13 17:16 [PATCH] powerpc/iommu: TCEs are incorrectly manipulated with DLPAR add/remove of memory Gaurav Batra
2023-06-13 17:17 ` Gaurav Batra
2023-06-23 14:55   ` Gaurav Batra
2023-06-26  4:54     ` Michael Ellerman
2023-06-26 13:12       ` Gaurav Batra
2023-07-03  5:26 ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).