LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Gaurav Batra <gbatra@linux.ibm.com>
To: Harsh Prateek Bora <harshpb@linux.ibm.com>, maddy@linux.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org, ritesh.list@gmail.com,
	sbhat@linux.ibm.com, vaibhav@linux.ibm.com,
	donettom@linux.ibm.com
Subject: Re: [PATCH v2] powerpc/pseries/iommu: Add TCEs for 16GB pages when RAM is pre-mapped
Date: Fri, 15 May 2026 09:23:50 -0500	[thread overview]
Message-ID: <ed054a1c-47cc-4930-9d97-c3cf7a3e4bc1@linux.ibm.com> (raw)
In-Reply-To: <f6e0be64-fc54-4c36-b871-991771549b29@linux.ibm.com>


On 5/15/26 4:06 AM, Harsh Prateek Bora wrote:
>
>
> On 15/05/26 12:24 am, Gaurav Batra wrote:
>> In powerPC, if Dynamic DMA Window is big enough, RAM is pre-mapped. To
>> determine the size of RAM, a PAPR+ property "ibm,lrdr-capacity" is used.
>> This OF property dictates what is the max size of RAM an LPAR can have,
>> including DR added memory.
>>
>> In PowerPC, 16GB pages can be allocated at machine level and then
>> assigned to LPARs. These 16GB pages are added to LPAR memory at the time
>> of boot. The address range for these 16GB pages is above MAX RAM an LPAR
>> can have (ibm,lrdr-capacity). In the current implementation, these 16GB
>> pages are being excluded from pre-mapped TCEs. A driver can have DMA
>> buffers allocated from 16GB pages. This results in platform to raise an
>> EEH when DMA is attempted on buffers in 16GB memory range.
>>
>> commit 6aa989ab2bd0 ("powerpc/pseries/iommu: memory notifier incorrectly
>> adds TCEs for pmemory")
>>
>> Prior to the above patch, memblock_end_of_DRAM() was being used to
>> determine the MAX memory of an LPAR. This included 16GB pages as well.
>> The issue with using memblock_end_of_DRAM() is that when pmemory is
>> converted to RAM via daxctl command, the DDW engine will incorrectly try
>> to add TCEs for pmemory as well.
>>
>> Below is the address distribution of RAM, 16GB pages and pmemory for an
>> LPAR with max memory of 256GB, memory allocated 64GB, 2 16GB pages and
>> assigned pmemory of 8GB.
>>
>> RANGE                                 SIZE  STATE REMOVABLE BLOCK
>> 0x0000000000000000-0x0000000fffffffff  64G online       yes 0-255
>> 0x0000004000000000-0x00000047ffffffff  32G online       yes 1024-1151
>>
>> cat /sys/bus/nd/devices/region0/resource
>> 0x40100000000
>> cat /sys/bus/nd/devices/region0/size
>> 8589934592
>>
>> The approach to fix this problem is to revert back the code changes
>> introduced by the above patch and to stash away the MAX memory of an
>> LPAR, including 16GB pages, at the LPAR boot time. This value is then
>> used whenever TCEs are needed to be pre-mapped - enable_DDW() or,
>> iommu_mem_notifier()
>>
>> Fixes: 6aa989ab2bd0 ("powerpc/pseries/iommu: memory notifier 
>> incorrectly adds TCEs for pmemory")
>> Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
>> ---
>>
>> Change log:
>>
>> V1 -> V2
>>
>> 1. Harsh: Not only start_pfn, but end_pfn also needs to be within 
>> allowed
>>     range, which may require clamping arg->nr_pages if crossing the 
>> limits.
>>
>>     Response: Incorporated changes.
>>
>> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>
> I think I mentioned it before also. Please avoid using tags unless 
> explicitly provided by the reviewer.
my apologies, I thought you meant to move it to "review comments 
section". I will remove them in my next version of the patch
>
>>
>>   arch/powerpc/platforms/pseries/iommu.c | 56 ++++++++++++++++++--------
>>   1 file changed, 40 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
>> b/arch/powerpc/platforms/pseries/iommu.c
>> index 3e1f915fe4f6..fdb160b72938 100644
>> --- a/arch/powerpc/platforms/pseries/iommu.c
>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>> @@ -69,6 +69,8 @@ static struct iommu_table 
>> *iommu_pseries_alloc_table(int node)
>>       return tbl;
>>   }
>>   +static phys_addr_t pseries_ddw_max_ram;
>> +
>>   #ifdef CONFIG_IOMMU_API
>>   static struct iommu_table_group_ops spapr_tce_table_group_ops;
>>   #endif
>> @@ -1285,15 +1287,19 @@ static LIST_HEAD(failed_ddw_pdn_list);
>>     static phys_addr_t ddw_memory_hotplug_max(void)
>>   {
>> -    resource_size_t max_addr;
>> +    resource_size_t max_addr = memory_hotplug_max();
>> +    struct device_node *memory;
>>   -#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
>> -    max_addr = hot_add_drconf_memory_max();
>> -#else
>> -    max_addr = memblock_end_of_DRAM();
>> -#endif
>> +    for_each_node_by_type(memory, "memory") {
>> +        struct resource res;
>> +
>> +        if (of_address_to_resource(memory, 0, &res))
>> +            continue;
>> +
>> +        max_addr = max_t(resource_size_t, max_addr, res.end + 1);
>> +        }
>
> Indentation needs to be corrected above and below.
>
>>   -    return max_addr;
>> +        return max_addr;
>>   }
>>     /*
>> @@ -1446,7 +1452,7 @@ static struct property 
>> *ddw_property_create(const char *propname, u32 liobn, u64
>>   static bool enable_ddw(struct pci_dev *dev, struct device_node 
>> *pdn, u64 dma_mask)
>>   {
>>       int len = 0, ret;
>> -    int max_ram_len = order_base_2(ddw_memory_hotplug_max());
>> +    int max_ram_len = order_base_2(pseries_ddw_max_ram);
>>       struct ddw_query_response query;
>>       struct ddw_create_response create;
>>       int page_shift;
>> @@ -1668,7 +1674,7 @@ static bool enable_ddw(struct pci_dev *dev, 
>> struct device_node *pdn, u64 dma_mas
>>         if (direct_mapping) {
>>           /* DDW maps the whole partition, so enable direct DMA 
>> mapping */
>> -        ret = walk_system_ram_range(0, ddw_memory_hotplug_max() >> 
>> PAGE_SHIFT,
>> +        ret = walk_system_ram_range(0, pseries_ddw_max_ram >> 
>> PAGE_SHIFT,
>>                           win64->value, 
>> tce_setrange_multi_pSeriesLP_walk);
>>           if (ret) {
>>               dev_info(&dev->dev, "failed to map DMA window for %pOF: 
>> %d\n",
>> @@ -2419,21 +2425,32 @@ static int iommu_mem_notifier(struct 
>> notifier_block *nb, unsigned long action,
>>   {
>>       struct dma_win *window;
>>       struct memory_notify *arg = data;
>> +    unsigned long limit = arg->nr_pages;
>> +    unsigned long max_ram_pages = pseries_ddw_max_ram >> PAGE_SHIFT;
>>       int ret = 0;
>>         /* This notifier can get called when onlining persistent 
>> memory as well.
>>        * TCEs are not pre-mapped for persistent memory. Persistent 
>> memory will
>> -     * always be above ddw_memory_hotplug_max()
>> +     * always be above pseries_ddw_max_ram
>>        */
>> +    if (arg->start_pfn >= max_ram_pages)
>> +        return NOTIFY_OK;
>> +
>> +    /* RAM is being DLPAR'ed. The range should never exceed max ram.
>> +     * Just in case, clamp the range and throw a warning.
>> +     */
>> +    if (arg->start_pfn + limit > max_ram_pages) {
>> +        limit = max_ram_pages - arg->start_pfn;
>> +        WARN_ON(1);
>
> WARN_ONCE with an appropriate warning message may be a better choice.
>
>> +    }
>>         switch (action) {
>>       case MEM_GOING_ONLINE:
>>           spin_lock(&dma_win_list_lock);
>>           list_for_each_entry(window, &dma_win_list, list) {
>> -            if (window->direct && (arg->start_pfn << PAGE_SHIFT) <
>> -                ddw_memory_hotplug_max()) {
>> +            if (window->direct) {
>>                   ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
>> -                        arg->nr_pages, window->prop);
>> +                        limit, window->prop);
>>               }
>>               /* XXX log error */
>
> Replace comment with a log if limit < arg->nr_pages ?
> Similarly below as well.
>
>>           }
>> @@ -2443,10 +2460,9 @@ static int iommu_mem_notifier(struct 
>> notifier_block *nb, unsigned long action,
>>       case MEM_OFFLINE:
>>           spin_lock(&dma_win_list_lock);
>>           list_for_each_entry(window, &dma_win_list, list) {
>> -            if (window->direct && (arg->start_pfn << PAGE_SHIFT) <
>> -                ddw_memory_hotplug_max()) {
>> +            if (window->direct) {
>>                   ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
>> -                        arg->nr_pages, window->prop);
>> +                        limit, window->prop);
>>               }
>>               /* XXX log error */
>
> ^^^ Ditto.
>
> Thanks
> Harsh
>
>>           }
>> @@ -2532,6 +2548,14 @@ void __init iommu_init_early_pSeries(void)
>>       register_memory_notifier(&iommu_mem_nb);
>>         set_pci_dma_ops(&dma_iommu_ops);
>> +
>> +    /* During init determine the max memory an LPAR can have and set 
>> it. This
>> +     * will be used for pre-mapping RAM in DDW. 
>> memblock_end_of_DRAM() can
>> +     * change during the running of LPAR - daxctl can add pmemory as
>> +     * "system-ram". This memory range should not be pre-mapped in 
>> DDW since
>> +     * the address of pmemory can be much higher than the DDW size.
>> +     */
>> +    pseries_ddw_max_ram = ddw_memory_hotplug_max();
>>   }
>>     static int __init disable_multitce(char *str)
>>
>> base-commit: 6d35786de28116ecf78797a62b84e6bf3c45aa5a
>


      reply	other threads:[~2026-05-15 14:23 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-14 18:54 [PATCH v2] powerpc/pseries/iommu: Add TCEs for 16GB pages when RAM is pre-mapped Gaurav Batra
2026-05-15  9:06 ` Harsh Prateek Bora
2026-05-15 14:23   ` Gaurav Batra [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ed054a1c-47cc-4930-9d97-c3cf7a3e4bc1@linux.ibm.com \
    --to=gbatra@linux.ibm.com \
    --cc=donettom@linux.ibm.com \
    --cc=harshpb@linux.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=maddy@linux.ibm.com \
    --cc=ritesh.list@gmail.com \
    --cc=sbhat@linux.ibm.com \
    --cc=vaibhav@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox