From: Harsh Prateek Bora <harshpb@linux.ibm.com>
To: Gaurav Batra <gbatra@linux.ibm.com>,
maddy@linux.ibm.com,
Venkat Rao Bagalkote <venkat88@linux.ibm.com>,
sbhat@linux.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org, ritesh.list@gmail.com,
vaibhav@linux.ibm.com, donettom@linux.ibm.com
Subject: Re: [PATCH v3] powerpc/pseries/iommu: Add TCEs for 16GB pages when RAM is pre-mapped
Date: Tue, 2 Jun 2026 09:27:56 +0530 [thread overview]
Message-ID: <b2232da2-25e0-4436-86ad-4cb6d57dcab1@linux.ibm.com> (raw)
In-Reply-To: <2a1694c2-d3ed-475b-a4a6-2383d6229c3b@linux.ibm.com>
On 01/06/26 11:33 pm, Gaurav Batra wrote:
> Hello Harsh,
>
>
> response is inline
>
>
> On 5/31/26 12:48 PM, Harsh Prateek Bora wrote:
>> + Venkat
>>
>> Hi Gaurav,
>> Would just like to confirm if it is tested with multiple iterations of
>> hotplug of RAM (DLPAR) as well?
> I tested the patch with both DLPAR of RAM and adapter, for 100
> iterations each.
Thanks for confirming! Feel free to add:
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
>>
>> Hi Venkat,
>> Could you please help validate the patch for above-mentioned scenario
>> as well?
>>
>> Hi Shivaprasad,
>> Please share your review feedback or any additional testing scenarios
>> needed?
>>
>> Thanks
>> Harsh
>>
>> On 15/05/26 9:21 pm, Gaurav Batra wrote:
>>> In powerPC, if Dynamic DMA Window is big enough, RAM is pre-mapped. To
>>> determine the size of RAM, a PAPR+ property "ibm,lrdr-capacity" is used.
>>> This OF property dictates what is the max size of RAM an LPAR can have,
>>> including DR added memory.
>>>
>>> In PowerPC, 16GB pages can be allocated at machine level and then
>>> assigned to LPARs. These 16GB pages are added to LPAR memory at the time
>>> of boot. The address range for these 16GB pages is above MAX RAM an LPAR
>>> can have (ibm,lrdr-capacity). In the current implementation, these 16GB
>>> pages are being excluded from pre-mapped TCEs. A driver can have DMA
>>> buffers allocated from 16GB pages. This results in platform to raise an
>>> EEH when DMA is attempted on buffers in 16GB memory range.
>>>
>>> commit 6aa989ab2bd0 ("powerpc/pseries/iommu: memory notifier incorrectly
>>> adds TCEs for pmemory")
>>>
>>> Prior to the above patch, memblock_end_of_DRAM() was being used to
>>> determine the MAX memory of an LPAR. This included 16GB pages as well.
>>> The issue with using memblock_end_of_DRAM() is that when pmemory is
>>> converted to RAM via daxctl command, the DDW engine will incorrectly try
>>> to add TCEs for pmemory as well.
>>>
>>> Below is the address distribution of RAM, 16GB pages and pmemory for an
>>> LPAR with max memory of 256GB, memory allocated 64GB, 2 16GB pages and
>>> assigned pmemory of 8GB.
>>>
>>> RANGE SIZE STATE REMOVABLE BLOCK
>>> 0x0000000000000000-0x0000000fffffffff 64G online yes 0-255
>>> 0x0000004000000000-0x00000047ffffffff 32G online yes 1024-1151
>>>
>>> cat /sys/bus/nd/devices/region0/resource
>>> 0x40100000000
>>> cat /sys/bus/nd/devices/region0/size
>>> 8589934592
>>>
>>> The approach to fix this problem is to revert back the code changes
>>> introduced by the above patch and to stash away the MAX memory of an
>>> LPAR, including 16GB pages, at the LPAR boot time. This value is then
>>> used whenever TCEs are needed to be pre-mapped - enable_DDW() or,
>>> iommu_mem_notifier()
>>>
>>> Fixes: 6aa989ab2bd0 ("powerpc/pseries/iommu: memory notifier
>>> incorrectly adds TCEs for pmemory")
>>> Signed-off-by: Gaurav Batra <gbatra@linux.ibm.com>
>>> ---
>>>
>>> Change log:
>>>
>>> V2 -> V3
>>>
>>> 1. Harsh: Remove R-b tags from the change log
>>>
>>> Response: Incorporated changes
>>>
>>> 2. Harsh: Change WARN_ON() to WARN_ONCE()
>>>
>>> Response: Incorporated changes
>>>
>>> 3. Harsh: Fix indendation
>>>
>>> Response: Incorporated changes
>>>
>>> 4. Harsh: Replace comment with a log if limit < arg->nr_pages ?
>>>
>>> Response: Doesn't seems to be needed since the WARN_ONCE() will
>>> log this
>>> scenario. I removed the comment instead.
>>>
>>> V1 -> V2
>>>
>>> 1. Harsh: Not only start_pfn, but end_pfn also needs to be within
>>> allowed
>>> range, which may require clamping arg->nr_pages if crossing the
>>> limits.
>>>
>>> Response: Incorporated changes.
>>>
>>> arch/powerpc/platforms/pseries/iommu.c | 58 ++++++++++++++++++--------
>>> 1 file changed, 41 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/
>>> platforms/pseries/iommu.c
>>> index 3e1f915fe4f6..7bbe070006fa 100644
>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>> @@ -69,6 +69,8 @@ static struct iommu_table
>>> *iommu_pseries_alloc_table(int node)
>>> return tbl;
>>> }
>>> +static phys_addr_t pseries_ddw_max_ram;
>>> +
>>> #ifdef CONFIG_IOMMU_API
>>> static struct iommu_table_group_ops spapr_tce_table_group_ops;
>>> #endif
>>> @@ -1285,13 +1287,17 @@ static LIST_HEAD(failed_ddw_pdn_list);
>>> static phys_addr_t ddw_memory_hotplug_max(void)
>>> {
>>> - resource_size_t max_addr;
>>> + resource_size_t max_addr = memory_hotplug_max();
>>> + struct device_node *memory;
>>> -#if defined(CONFIG_NUMA) && defined(CONFIG_MEMORY_HOTPLUG)
>>> - max_addr = hot_add_drconf_memory_max();
>>> -#else
>>> - max_addr = memblock_end_of_DRAM();
>>> -#endif
>>> + for_each_node_by_type(memory, "memory") {
>>> + struct resource res;
>>> +
>>> + if (of_address_to_resource(memory, 0, &res))
>>> + continue;
>>> +
>>> + max_addr = max_t(resource_size_t, max_addr, res.end + 1);
>>> + }
>>> return max_addr;
>>> }
>>> @@ -1446,7 +1452,7 @@ static struct property
>>> *ddw_property_create(const char *propname, u32 liobn, u64
>>> static bool enable_ddw(struct pci_dev *dev, struct device_node
>>> *pdn, u64 dma_mask)
>>> {
>>> int len = 0, ret;
>>> - int max_ram_len = order_base_2(ddw_memory_hotplug_max());
>>> + int max_ram_len = order_base_2(pseries_ddw_max_ram);
>>> struct ddw_query_response query;
>>> struct ddw_create_response create;
>>> int page_shift;
>>> @@ -1668,7 +1674,7 @@ static bool enable_ddw(struct pci_dev *dev,
>>> struct device_node *pdn, u64 dma_mas
>>> if (direct_mapping) {
>>> /* DDW maps the whole partition, so enable direct DMA
>>> mapping */
>>> - ret = walk_system_ram_range(0, ddw_memory_hotplug_max() >>
>>> PAGE_SHIFT,
>>> + ret = walk_system_ram_range(0, pseries_ddw_max_ram >>
>>> PAGE_SHIFT,
>>> win64->value,
>>> tce_setrange_multi_pSeriesLP_walk);
>>> if (ret) {
>>> dev_info(&dev->dev, "failed to map DMA window for %pOF:
>>> %d\n",
>>> @@ -2419,23 +2425,35 @@ static int iommu_mem_notifier(struct
>>> notifier_block *nb, unsigned long action,
>>> {
>>> struct dma_win *window;
>>> struct memory_notify *arg = data;
>>> + unsigned long limit = arg->nr_pages;
>>> + unsigned long max_ram_pages = pseries_ddw_max_ram >> PAGE_SHIFT;
>>> int ret = 0;
>>> /* This notifier can get called when onlining persistent
>>> memory as well.
>>> * TCEs are not pre-mapped for persistent memory. Persistent
>>> memory will
>>> - * always be above ddw_memory_hotplug_max()
>>> + * always be above pseries_ddw_max_ram
>>> */
>>> + if (arg->start_pfn >= max_ram_pages)
>>> + return NOTIFY_OK;
>>> +
>>> + /* RAM is being DLPAR'ed. The range should never exceed max ram.
>>> + * Just in case, clamp the range and throw a warning.
>>> + */
>>> + if (arg->start_pfn + limit > max_ram_pages) {
>>> + limit = max_ram_pages - arg->start_pfn;
>>> + WARN_ONCE(1, "Limiting Page Range %lx - %lx to Max Mem
>>> Pages: %lx\n",
>>> + arg->start_pfn, arg->start_pfn + arg->nr_pages,
>>> + max_ram_pages);
>>> + }
>>> switch (action) {
>>> case MEM_GOING_ONLINE:
>>> spin_lock(&dma_win_list_lock);
>>> list_for_each_entry(window, &dma_win_list, list) {
>>> - if (window->direct && (arg->start_pfn << PAGE_SHIFT) <
>>> - ddw_memory_hotplug_max()) {
>>> + if (window->direct) {
>>> ret |= tce_setrange_multi_pSeriesLP(arg->start_pfn,
>>> - arg->nr_pages, window->prop);
>>> + limit, window->prop);
>>> }
>>> - /* XXX log error */
>>> }
>>> spin_unlock(&dma_win_list_lock);
>>> break;
>>> @@ -2443,12 +2461,10 @@ static int iommu_mem_notifier(struct
>>> notifier_block *nb, unsigned long action,
>>> case MEM_OFFLINE:
>>> spin_lock(&dma_win_list_lock);
>>> list_for_each_entry(window, &dma_win_list, list) {
>>> - if (window->direct && (arg->start_pfn << PAGE_SHIFT) <
>>> - ddw_memory_hotplug_max()) {
>>> + if (window->direct) {
>>> ret |= tce_clearrange_multi_pSeriesLP(arg->start_pfn,
>>> - arg->nr_pages, window->prop);
>>> + limit, window->prop);
>>> }
>>> - /* XXX log error */
>>> }
>>> spin_unlock(&dma_win_list_lock);
>>> break;
>>> @@ -2532,6 +2548,14 @@ void __init iommu_init_early_pSeries(void)
>>> register_memory_notifier(&iommu_mem_nb);
>>> set_pci_dma_ops(&dma_iommu_ops);
>>> +
>>> + /* During init determine the max memory an LPAR can have and set
>>> it. This
>>> + * will be used for pre-mapping RAM in DDW.
>>> memblock_end_of_DRAM() can
>>> + * change during the running of LPAR - daxctl can add pmemory as
>>> + * "system-ram". This memory range should not be pre-mapped in
>>> DDW since
>>> + * the address of pmemory can be much higher than the DDW size.
>>> + */
>>> + pseries_ddw_max_ram = ddw_memory_hotplug_max();
>>> }
>>> static int __init disable_multitce(char *str)
>>>
>>> base-commit: 6d35786de28116ecf78797a62b84e6bf3c45aa5a
>>
next prev parent reply other threads:[~2026-06-02 3:58 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-15 15:51 [PATCH v3] powerpc/pseries/iommu: Add TCEs for 16GB pages when RAM is pre-mapped Gaurav Batra
2026-05-31 17:48 ` Harsh Prateek Bora
2026-06-01 18:03 ` Gaurav Batra
2026-06-02 3:57 ` Harsh Prateek Bora [this message]
2026-06-09 6:38 ` Venkat Rao Bagalkote
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b2232da2-25e0-4436-86ad-4cb6d57dcab1@linux.ibm.com \
--to=harshpb@linux.ibm.com \
--cc=donettom@linux.ibm.com \
--cc=gbatra@linux.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=ritesh.list@gmail.com \
--cc=sbhat@linux.ibm.com \
--cc=vaibhav@linux.ibm.com \
--cc=venkat88@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox