From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: "Dan Horák" <dan@danny.cz>,
linuxppc-dev@lists.ozlabs.org,
"Gaurav Batra" <gbatra@linux.ibm.com>
Cc: amd-gfx@lists.freedesktop.org, Donet Tom <donettom@linux.ibm.com>
Subject: Re: amdgpu driver fails to initialize on ppc64le in 7.0-rc1 and newer
Date: Sun, 15 Mar 2026 09:55:11 +0530 [thread overview]
Message-ID: <1phlu3bs.ritesh.list@gmail.com> (raw)
In-Reply-To: <20260313142351.609bc4c3efe1184f64ca5f44@danny.cz>
Dan Horák <dan@danny.cz> writes:
+cc Gaurav,
> Hi,
>
> starting with 7.0-rc1 (meaning 6.19 is OK) the amdgpu driver fails to
> initialize on my Linux/ppc64le Power9 based system (with Radeon Pro WX4100)
> with the following in the log
>
> ...
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
^^^^
So looks like this is a PowerNV (Power9) machine.
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] Detected VRAM RAM=4096M, BAR=4096M
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] RAM width 128bits GDDR5
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: iommu: 64-bit OK but direct DMA is limited by 0
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: 4096M of VRAM memory ready
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: 32570M of GTT memory ready.
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: (-12) failed to allocate kernel bo
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] Debug VRAM access will use slowpath MM access
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] GART: num cpu pages 4096, num gpu pages 65536
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] PCIE GART of 256M enabled (table at 0x000000F4FFF80000).
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: (-12) failed to allocate kernel bo
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: (-12) create WB bo failed
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu_device_wb_init failed -12
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu_device_ip_init failed
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: Fatal error during GPU init
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: finishing device.
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: probe with driver amdgpu failed with error -12
> bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: ttm finalized
> ...
>
> After some hints from Alex and bisecting and other investigation I have
> found that https://github.com/torvalds/linux/commit/1471c517cf7dae1a6342fb821d8ed501af956dd0
> is the culprit and reverting it makes amdgpu load (and work) again.
Thanks for confirming this. Yes, this was recently added [1]
[1]: https://lore.kernel.org/linuxppc-dev/20251107161105.85999-1-gbatra@linux.ibm.com/
@Gaurav,
I am not too familiar with the area, however looking at the logs shared
by Dan, it looks like we might be always going for dma direct allocation
path and maybe the device doesn't support this address limit.
bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: iommu: 64-bit OK but direct DMA is limited by 0
bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff
Looking at the code..
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index fe7472f13b10..d5743b3c3ab3 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -654,7 +654,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
/* let the implementation decide on the zone to allocate from: */
flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
- if (dma_alloc_direct(dev, ops)) {
+ if (dma_alloc_direct(dev, ops) || arch_dma_alloc_direct(dev)) {
cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
} else if (use_dma_iommu(dev)) {
cpu_addr = iommu_dma_alloc(dev, size, dma_handle, flag, attrs);
Now, do we need arch_dma_alloc_direct() here? It always returns true if
dev->dma_ops_bypass is set to true, w/o checking for checks that
dma_go_direct() has.
whereas...
/*
* Check if the devices uses a direct mapping for streaming DMA operations.
* This allows IOMMU drivers to set a bypass mode if the DMA mask is large
* enough.
*/
static inline bool
dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
..dma_go_direct(dev, dev->coherent_dma_mask, ops);
.... ...
#ifdef CONFIG_DMA_OPS_BYPASS
if (dev->dma_ops_bypass)
return min_not_zero(mask, dev->bus_dma_limit) >=
dma_direct_get_required_mask(dev);
#endif
dma_alloc_direct() already checks for dma_ops_bypass and also if
dev->coherent_dma_mask >= dma_direct_get_required_mask(). So...
.... Do we really need the machinary of arch_dma_{alloc|free}_direct()?
Isn't dma_alloc_direct() checks sufficient?
Thoughts?
-ritesh
>
> for the record, I have originally opened https://gitlab.freedesktop.org/drm/amd/-/issues/5039
>
>
> With regards,
>
> Dan
next prev parent reply other threads:[~2026-03-15 5:03 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-13 13:23 amdgpu driver fails to initialize on ppc64le in 7.0-rc1 and newer Dan Horák
2026-03-15 4:25 ` Ritesh Harjani [this message]
2026-03-15 9:50 ` Dan Horák
2026-03-16 21:02 ` Gaurav Batra
2026-03-25 12:12 ` Ritesh Harjani
2026-03-25 14:56 ` Gaurav Batra
2026-03-25 16:28 ` Gaurav Batra
2026-03-25 17:42 ` Ritesh Harjani
2026-03-25 20:00 ` Dan Horák
2026-03-26 10:29 ` Dan Horák
2026-03-26 10:38 ` Ritesh Harjani
2026-03-26 13:37 ` Gaurav Batra
2026-03-17 11:43 ` Ritesh Harjani
2026-03-17 14:31 ` Dan Horák
2026-03-17 22:34 ` Karl Schimanek
2026-03-16 13:55 ` Alex Deucher
2026-03-23 0:30 ` Timothy Pearson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1phlu3bs.ritesh.list@gmail.com \
--to=ritesh.list@gmail.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=dan@danny.cz \
--cc=donettom@linux.ibm.com \
--cc=gbatra@linux.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox