From: Gaurav Batra <gbatra@linux.ibm.com>
To: Harsh Prateek Bora <harshpb@linux.ibm.com>, maddy@linux.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org, ritesh.list@gmail.com,
sbhat@linux.ibm.com, vaibhav@linux.ibm.com,
donettom@linux.ibm.com
Subject: Re: [PATCH v2] powerpc/pseries/iommu: Add TCEs for 16GB pages when RAM is pre-mapped
Date: Fri, 15 May 2026 09:23:50 -0500 [thread overview]
Message-ID: <ed054a1c-47cc-4930-9d97-c3cf7a3e4bc1@linux.ibm.com> (raw)
In-Reply-To: <f6e0be64-fc54-4c36-b871-991771549b29@linux.ibm.com>
On 5/15/26 4:06 AM, Harsh Prateek Bora wrote:
>
>
> On 15/05/26 12:24 am, Gaurav Batra wrote:
>> In powerPC, if Dynamic DMA Window is big enough, RAM is pre-mapped. To
>> determine the size of RAM, a PAPR+ property "ibm,lrdr-capacity" is used.
>> This OF property dictates what is the max size of RAM an LPAR can have,
>> including DR added memory.
>>
>> In PowerPC, 16GB pages can be allocated at machine level and then
>> assigned to LPARs. These 16GB pages are added to LPAR memory at the time
>> of boot. The address range for these 16GB pages is above MAX RAM an LPAR
>> can have (ibm,lrdr-capacity). In the current implementation, these 16GB
>> pages are being excluded from pre-mapped TCEs. A driver can have DMA
>> buffers allocated from 16GB pages. This results in platform to raise an
>> EEH when DMA is attempted on buffers in 16GB memory range.
>>
>> commit 6aa989ab2bd0 ("powerpc/pseries/iommu: memory notifier incorrectly
>> adds TCEs for pmemory")
>>
>> Prior to the above patch, memblock_end_of_DRAM() was being used to
>> determine the MAX memory of an LPAR. This included 16GB pages as well.
>> The issue with using memblock_end_of_DRAM() is that when pmemory is
>> converted to RAM via daxctl command, the DDW engine will incorrectly try
>> to add TCEs for pmemory as well.
>>
>> Below is the address distribution of RAM, 16GB pages and pmemory for an
>> LPAR with max memory of 256GB, memory allocated 64GB, 2 16GB pages and
>> assigned pmemory of 8GB.
>>
>> RANGE SIZE STATE REMOVABLE BLOCK
>> 0x0000000000000000-0x0000000fffffffff 64G online yes 0-255
>> 0x0000004000000000-0x00000047ffffffff 32G online yes 1024-1151
>>
>> cat /sys/bus/nd/devices/region0/resource
>> 0x40100000000
>> cat /sys/bus/nd/devices/region0/size
>> 8589934592
>>
>> The approach to fix this problem is to revert back the code changes
>> introduced by the above patch and to stash away the MAX memory of an
>> LPAR, including 16GB pages, at the LPAR boot time. This value is then
>> used whenever TCEs are needed to be pre-mapped - enable_DDW() or,
>> iommu_mem_notifier()
>>
>> Fixes: 6aa989ab2bd0 ("powerpc/pseries/iommu: memory notifier
>> incorrectly adds TCEs for pmemory")
>> Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
>> ---
>>
>> Change log:
>>
>> V1 -> V2
>>
>> 1. Harsh: Not only start_pfn, but end_pfn also needs to be within
>> allowed
>> range, which may require clamping arg->nr_pages if crossing the
>> limits.
>>
>> Response: Incorporated changes.
>>
>> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>
> I think I mentioned it before also. Please avoid using tags unless
> explicitly provided by the reviewer.
my apologies, I thought you meant to move it to "review comments
section". I will remove them in my next version of the patch
>
>>
>> arch/powerpc/platforms/pseries/iommu.c | 56 ++++++++++++++++++--------
>> 1 file changed, 40 insertions(+), 16 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/pseries/iommu.c
>> b/arch/powerpc/platforms/pseries/iommu.c
>> index 3e1f915fe4f6..fdb160b72938 100644
>> --- a/arch/powerpc/platforms/pseries/iommu.c
>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>> @@ -69,6 +69,8 @@ static struct iommu_table
>> *iommu_pseries_alloc_table(int node)
>> return tbl;
>> }
>> +static phys_addr_t pseries_ddw_max_ram;
>> +
>> #ifdef CONFIG_IOMMU_API
>> static struct iommu_table_group_ops spapr_tce_table_group_ops;
>> #endif
>> @@ -1285,15 +1287,19 @@ static LIST_HEAD(failed_ddw_pdn_list);
>> static phys_addr_t ddw_memory_hotplug_max(void)
>> {
>> - resource_size_t max_addr;
>> + resource_size_t max_addr = memory_hotplug_max();
>> + struct device_node *memory;
>> -#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
>> - max_addr = hot_add_drconf_memory_max();
>> -#else
>> - max_addr = memblock_end_of_DRAM();
>> -#endif
>> + for_each_node_by_type(memory, "memory") {
>> + struct resource res;
>> +
>> + if (of_address_to_resource(memory, 0, &res))
>> + continue;
>> +
>> + max_addr = max_t(resource_size_t, max_addr, res.end + 1);
>> + }
>
> Indentation needs to be corrected above and below.
>
>> - return max_addr;
>> + return max_addr;
>> }
>> /*
>> @@ -1446,7 +1452,7 @@ static struct property
>> *ddw_property_create(const char *propname, u32 liobn, u64
>> static bool enable_ddw(struct pci_dev *dev, struct device_node
>> *pdn, u64 dma_mask)
>> {
>> int len = 0, ret;
>> - int max_ram_len = order_base_2(ddw_memory_hotplug_max());
>> + int max_ram_len = order_base_2(pseries_ddw_max_ram);
>> struct ddw_query_response query;
>> struct ddw_create_response create;
>> int page_shift;
>> @@ -1668,7 +1674,7 @@ static bool enable_ddw(struct pci_dev *dev,
>> struct device_node *pdn, u64 dma_mas
>> if (direct_mapping) {
>> /* DDW maps the whole partition, so enable direct DMA
>> mapping */
>> - ret = walk_system_ram_range(0, ddw_memory_hotplug_max() >>
>> PAGE_SHIFT,
>> + ret = walk_system_ram_range(0, pseries_ddw_max_ram >>
>> PAGE_SHIFT,
>> win64->value,
>> tce_setrange_multi_pSeriesLP_walk);
>> if (ret) {
>> dev_info(&dev->dev, "failed to map DMA window for %pOF:
>> %d\n",
>> @@ -2419,21 +2425,32 @@ static int iommu_mem_notifier(struct
>> notifier_block *nb, unsigned long action,
>> {
>> struct dma_win *window;
>> struct memory_notify *arg = data;
>> + unsigned long limit = arg->nr_pages;
>> + unsigned long max_ram_pages = pseries_ddw_max_ram >> PAGE_SHIFT;
>> int ret = 0;
>> /* This notifier can get called when onlining persistent
>> memory as well.
>> * TCEs are not pre-mapped for persistent memory. Persistent
>> memory will
>> - * always be above ddw_memory_hotplug_max()
>> + * always be above pseries_ddw_max_ram
>> */
>> + if (arg->start_pfn >= max_ram_pages)
>> + return NOTIFY_OK;
>> +
>> + /* RAM is being DLPAR'ed. The range should never exceed max ram.
>> + * Just in case, clamp the range and throw a warning.
>> + */
>> + if (arg->start_pfn + limit > max_ram_pages) {
>> + limit = max_ram_pages - arg->start_pfn;
>> + WARN_ON(1);
>
> WARN_ONCE with an appropriate warning message may be a better choice.
>
>> + }
>> switch (action) {
>> case MEM_GOING_ONLINE:
>> spin_lock(&dma_win_list_lock);
>> list_for_each_entry(window, &dma_win_list, list) {
>> - if (window->direct && (arg->start_pfn << PAGE_SHIFT) <
>> - ddw_memory_hotplug_max()) {
>> + if (window->direct) {
>> ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
>> - arg->nr_pages, window->prop);
>> + limit, window->prop);
>> }
>> /* XXX log error */
>
> Replace comment with a log if limit < arg->nr_pages ?
> Similarly below as well.
>
>> }
>> @@ -2443,10 +2460,9 @@ static int iommu_mem_notifier(struct
>> notifier_block *nb, unsigned long action,
>> case MEM_OFFLINE:
>> spin_lock(&dma_win_list_lock);
>> list_for_each_entry(window, &dma_win_list, list) {
>> - if (window->direct && (arg->start_pfn << PAGE_SHIFT) <
>> - ddw_memory_hotplug_max()) {
>> + if (window->direct) {
>> ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
>> - arg->nr_pages, window->prop);
>> + limit, window->prop);
>> }
>> /* XXX log error */
>
> ^^^ Ditto.
>
> Thanks
> Harsh
>
>> }
>> @@ -2532,6 +2548,14 @@ void __init iommu_init_early_pSeries(void)
>> register_memory_notifier(&iommu_mem_nb);
>> set_pci_dma_ops(&dma_iommu_ops);
>> +
>> + /* During init determine the max memory an LPAR can have and set
>> it. This
>> + * will be used for pre-mapping RAM in DDW.
>> memblock_end_of_DRAM() can
>> + * change during the running of LPAR - daxctl can add pmemory as
>> + * "system-ram". This memory range should not be pre-mapped in
>> DDW since
>> + * the address of pmemory can be much higher than the DDW size.
>> + */
>> + pseries_ddw_max_ram = ddw_memory_hotplug_max();
>> }
>> static int __init disable_multitce(char *str)
>>
>> base-commit: 6d35786de28116ecf78797a62b84e6bf3c45aa5a
>
prev parent reply other threads:[~2026-05-15 14:23 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-14 18:54 [PATCH v2] powerpc/pseries/iommu: Add TCEs for 16GB pages when RAM is pre-mapped Gaurav Batra
2026-05-15 9:06 ` Harsh Prateek Bora
2026-05-15 14:23 ` Gaurav Batra [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ed054a1c-47cc-4930-9d97-c3cf7a3e4bc1@linux.ibm.com \
--to=gbatra@linux.ibm.com \
--cc=donettom@linux.ibm.com \
--cc=harshpb@linux.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=sbhat@linux.ibm.com \
--cc=vaibhav@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox