public inbox for linuxppc-dev@ozlabs.org
 help / color / mirror / Atom feed
From: "Dan Horák" <dan@danny.cz>
To: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org,
	Gaurav Batra <gbatra@linux.ibm.com>,
	amd-gfx@lists.freedesktop.org, Donet Tom <donettom@linux.ibm.com>
Subject: Re: amdgpu driver fails to initialize on ppc64le in 7.0-rc1 and newer
Date: Sun, 15 Mar 2026 10:50:21 +0100	[thread overview]
Message-ID: <20260315105021.667e52d4a99b154ef1e6aa34@danny.cz> (raw)
In-Reply-To: <1phlu3bs.ritesh.list@gmail.com>

Hi Ritesh,

On Sun, 15 Mar 2026 09:55:11 +0530
Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote:

> Dan Horák <dan@danny.cz> writes:
> 
> +cc Gaurav,
> 
> > Hi,
> >
> > starting with 7.0-rc1 (meaning 6.19 is OK) the amdgpu driver fails to
> > initialize on my Linux/ppc64le Power9 based system (with Radeon Pro WX4100)
> > with the following in the log
> >
> > ...
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
> 
>                   ^^^^
> So looks like this is a PowerNV (Power9) machine.

correct :-)
 
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] Detected VRAM RAM=4096M, BAR=4096M
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] RAM width 128bits GDDR5
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: iommu: 64-bit OK but direct DMA is limited by 0
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0:  4096M of VRAM memory ready
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0:  32570M of GTT memory ready.
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: (-12) failed to allocate kernel bo
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] Debug VRAM access will use slowpath MM access
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] GART: num cpu pages 4096, num gpu pages 65536
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: [drm] PCIE GART of 256M enabled (table at 0x000000F4FFF80000).
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: (-12) failed to allocate kernel bo
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: (-12) create WB bo failed
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu_device_wb_init failed -12
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: amdgpu_device_ip_init failed
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: Fatal error during GPU init
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: finishing device.
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: probe with driver amdgpu failed with error -12
> > bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0:  ttm finalized
> > ...
> >
> > After some hints from Alex and bisecting and other investigation I have
> > found that https://github.com/torvalds/linux/commit/1471c517cf7dae1a6342fb821d8ed501af956dd0
> > is the culprit and reverting it makes amdgpu load (and work) again.
> 
> Thanks for confirming this. Yes, this was recently added [1]
> 
> [1]: https://lore.kernel.org/linuxppc-dev/20251107161105.85999-1-gbatra@linux.ibm.com/ 
> 
> 
> @Gaurav,
> 
> I am not too familiar with the area, however looking at the logs shared
> by Dan, it looks like we might be always going for dma direct allocation
> path and maybe the device doesn't support this address limit. 
> 
>  bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: iommu: 64-bit OK but direct DMA is limited by 0
>  bře 05 08:35:40 talos.danny.cz kernel: amdgpu 0000:01:00.0: dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff

a complete kernel log is at
https://gitlab.freedesktop.org/-/project/4522/uploads/c4935bca6f37bbd06bb4045c07d00b5b/kernel.log

Please let me know if you need more info.


		Dan

 
> Looking at the code..
> 
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index fe7472f13b10..d5743b3c3ab3 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -654,7 +654,7 @@ void *dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *dma_handle,
>  	/* let the implementation decide on the zone to allocate from: */
>  	flag &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
>  
> -	if (dma_alloc_direct(dev, ops)) {
> +	if (dma_alloc_direct(dev, ops) || arch_dma_alloc_direct(dev)) {
>  		cpu_addr = dma_direct_alloc(dev, size, dma_handle, flag, attrs);
>  	} else if (use_dma_iommu(dev)) {
>  		cpu_addr = iommu_dma_alloc(dev, size, dma_handle, flag, attrs);
> 
> Now, do we need arch_dma_alloc_direct() here? It always returns true if
> dev->dma_ops_bypass is set to true, w/o checking for checks that
> dma_go_direct() has.
> 
> whereas...
> 
> /*
>  * Check if the devices uses a direct mapping for streaming DMA operations.
>  * This allows IOMMU drivers to set a bypass mode if the DMA mask is large
>  * enough.
>  */
> static inline bool
> dma_alloc_direct(struct device *dev, const struct dma_map_ops *ops)
> ..dma_go_direct(dev, dev->coherent_dma_mask, ops);
> ....  ...
>       #ifdef CONFIG_DMA_OPS_BYPASS
>           if (dev->dma_ops_bypass)
>               return min_not_zero(mask, dev->bus_dma_limit) >=
>                       dma_direct_get_required_mask(dev);
>       #endif
> 
> dma_alloc_direct() already checks for dma_ops_bypass and also if
> dev->coherent_dma_mask >= dma_direct_get_required_mask(). So...
> 
> .... Do we really need the machinary of arch_dma_{alloc|free}_direct()?
> Isn't dma_alloc_direct() checks sufficient?
> 
> Thoughts?
> 
> -ritesh
> 
> 
> >
> > for the record, I have originally opened https://gitlab.freedesktop.org/drm/amd/-/issues/5039
> >
> >
> > 	With regards,
> >
> > 		Dan


  reply	other threads:[~2026-03-15  9:50 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-13 13:23 amdgpu driver fails to initialize on ppc64le in 7.0-rc1 and newer Dan Horák
2026-03-15  4:25 ` Ritesh Harjani
2026-03-15  9:50   ` Dan Horák [this message]
2026-03-16 21:02     ` Gaurav Batra
2026-03-25 12:12       ` Ritesh Harjani
2026-03-25 14:56         ` Gaurav Batra
2026-03-25 16:28         ` Gaurav Batra
2026-03-25 17:42           ` Ritesh Harjani
2026-03-25 20:00             ` Dan Horák
2026-03-26 10:29             ` Dan Horák
2026-03-26 10:38               ` Ritesh Harjani
2026-03-26 13:37                 ` Gaurav Batra
2026-03-17 11:43     ` Ritesh Harjani
2026-03-17 14:31       ` Dan Horák
2026-03-17 22:34       ` Karl Schimanek
2026-03-16 13:55   ` Alex Deucher
2026-03-23  0:30   ` Timothy Pearson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260315105021.667e52d4a99b154ef1e6aa34@danny.cz \
    --to=dan@danny.cz \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=donettom@linux.ibm.com \
    --cc=gbatra@linux.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=ritesh.list@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox