Re: [PATCH] drm/radeon: Fix screen corruption (v2)

AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Luben Tuikov <luben.tuikov@amd.com>
To: "Robin Murphy" <robin.murphy@arm.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Alex Deucher" <alexdeucher@gmail.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>,
	Mikhail Krylov <sqarert@gmail.com>,
	AMD Graphics <amd-gfx@lists.freedesktop.org>,
	Direct Rendering Infrastructure - Development
	<dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH] drm/radeon: Fix screen corruption (v2)
Date: Thu, 15 Dec 2022 07:07:33 -0500	[thread overview]
Message-ID: <658a9226-98df-fd09-957b-14fa7fbb9f87@amd.com> (raw)
In-Reply-To: <e15133af-d3d9-de47-b01a-bca9053b0d8f@arm.com>

On 2022-12-15 06:53, Robin Murphy wrote:
> On 2022-12-15 11:40, Luben Tuikov wrote:
>> On 2022-12-15 06:27, Christian König wrote:
>>> Am 15.12.22 um 11:19 schrieb Luben Tuikov:
>>>> On 2022-12-15 04:46, Christian König wrote:
>>>>> Am 15.12.22 um 10:08 schrieb Luben Tuikov:
>>>>>> On 2022-12-15 03:07, Christian König wrote:
>>>>>>> Am 15.12.22 um 00:08 schrieb Robin Murphy:
>>>>>>>> On 2022-12-14 22:02, Alex Deucher wrote:
>>>>>>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@arm.com>
>>>>>>>>> wrote:
>>>>>>>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>>>>>>>>> Fix screen corruption on older 32-bit systems using AGP chips.
>>>>>>>>>>>
>>>>>>>>>>> On older systems with little memory, for instance 1.5 GiB, using an
>>>>>>>>>>> AGP chip,
>>>>>>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is
>>>>>>>>>>> 0x7FFFFFF, and
>>>>>>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
>>>>>>>>>>> false. As such the result of this static inline isn't suitable for
>>>>>>>>>>> the last
>>>>>>>>>>> argument to ttm_device_init()--it simply needs to now whether to
>>>>>>>>>>> use GFP_DMA32
>>>>>>>>>>> when allocating DMA buffers.
>>>>>>>>>> This sounds wrong to me. If the issues happen on systems without PAE it
>>>>>>>>>> clearly can't have anything to with the actual DMA address size. Not to
>>>>>>>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
>>>>>>>>>> GFP_DMA32 would be functionally meaningless anyway. Although the
>>>>>>>>>> reported symptoms initially sounded like they could be caused by DMA
>>>>>>>>>> going to the wrong place, that is also equally consistent with a
>>>>>>>>>> loss of
>>>>>>>>>> cache coherency.
>>>>>>>>>>
>>>>>>>>>> My (limited) understanding of AGP is that the GART can effectively
>>>>>>>>>> alias
>>>>>>>>>> memory to a second physical address, so I could well believe that
>>>>>>>>>> something somewhere in the driver stack needs to perform some cache
>>>>>>>>>> maintenance to avoid coherency issues, and that in these particular
>>>>>>>>>> setups whatever that is might be assuming the memory is direct-mapped
>>>>>>>>>> and thus going wrong for highmem pages.
>>>>>>>>>>
>>>>>>>>>> So as I said before, I really think this is not about using
>>>>>>>>>> GFP_DMA32 at
>>>>>>>>>> all, but about *not* using GFP_HIGHUSER.
>>>>>>>>> One of the wonderful features of AGP is that it has to be used with
>>>>>>>>> uncached memory.  The aperture basically just provides a remapping of
>>>>>>>>> physical pages into a linear aperture that you point the GPU at.  TTM
>>>>>>>>> has to jump through quite a few hoops to get uncached memory in the
>>>>>>>>> first place, so it's likely that that somehow isn't compatible with
>>>>>>>>> HIGHMEM.  Can you get uncached HIGHMEM?
>>>>>>>> I guess in principle yes, if you're careful not to use regular
>>>>>>>> kmap()/kmap_atomic(), and always use pgprot_noncached() for
>>>>>>>> userspace/vmalloc mappings, but clearly that leaves lots of scope for
>>>>>>>> slipping up.
>>>>>>> I theory we should do exactly that in TTM, but we have very few users
>>>>>>> who actually still exercise that functionality.
>>>>>>>
>>>>>>>> Working backwards from primitives like set_memory_uc(), I see various
>>>>>>>> paths in TTM where manipulating the caching state is skipped for
>>>>>>>> highmem pages, but I wouldn't even know where to start looking for
>>>>>>>> whether the right state is propagated to all the places where they
>>>>>>>> might eventually be mapped somewhere.
>>>>>>> The tt object has the caching state for the pages and
>>>>>>> ttm_prot_from_caching() then uses pgprot_noncached() and co for the
>>>>>>> userspace/vmalloc mappings.
>>>>>>>
>>>>>> The point of this patch is that dma_addressing_limited() is unsuitable as
>>>>>> the last parameter to ttm_pool_init(), since if it is "false"--as it is in this
>>>>>> particular case--then TTM ends up using HIGHUSER, and we get the screen corruption.
>>>>>> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())
>>>>> Well I would rather say that dma_addressing_limited() works, but the
>>>>> default value from dma_get_required_mask() is broken.
>>>>>
>>>> dma_get_required_mask() for his setup of 1.5 GiB of memory returns 0x7FFFFFF.
>>>
>>> This 0x7FFFFFF mask looks fishy to me. That would only be 128MiB
>>> addressable memory (27 bits set)? Or is there another F missing?
>>
>> Yeah, I'm missing an F--it is correctly described at the top of the thread above,
>> i.e. in the commit of v2 patch.
>>
>> 0x7FFF_FFFF, which seems correct, no?
>>
>>>> While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in radeon_device_init().
>>>>
>>>>> 32 bits only work with bounce buffers and we can't use those on graphics
>>>>> hardware.
>>>>>
>>>>>> Is there an objection to this patch, if it fixes the screen corruption?
>>>>> Not from my side, but fixing the underlying issues would be better I think.
>>>>>
>>>> Have they been identified?
>>>
>>> I'm not 100% sure. I think by using GFP_DMA32 we just work around the
>>> issue somehow.
>>
>> Right. Using GFP_DMA32, we don't touch high-mem. I was looking at the DRM
>> code trying to understand what we do when GFP_DMA32 is not set, and the immediate
>> thing I see is that we set GFP_HIGHUSER when use_dma32 is unset in the device struct.
>> (Then I got down to the caching attributes...)
>>
>> It's be nice if we can find the actual issue--what else would it show us that needs fixing...?
>>
>> So what do we do with this patch?
>>
>> Shouldn't leave it in a limbo--some OSes ship their kernel
>> with 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations") wholly
>> reverted.
> 
> Removing dma_addressing_limited() is still wrong, for the reasons given 
> in that commit. What we need is an *additional* condition that 
> encapsulates "also pass use_dma32 for AGP devices because it avoids some 
> weird coherency issue with 32-bit highmem that isn't worth trying to 
> debug further".

Yes, you had a patch earlier which did exactly that--why not push that patch?

Q: If host memory is 1.5 GiB, i.e. mask of 0x7FFF_FFFF, but the device's mask is 0xFFFF_FFFF,
shouldn't we use GFP_DMA32, instead of GFP_HIGHUSER?

Regards,
Luben

next prev parent reply	other threads:[~2022-12-15 12:07 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-23 16:31 Screen corruption using radeon kernel driver Krylov Michael
2022-04-25 17:22 ` Alex Deucher
2022-05-16 13:01   ` Mikhail Krylov
2022-11-28 14:31   ` Mikhail Krylov
2022-11-28 14:50     ` Alex Deucher
2022-11-28 20:48       ` Mikhail Krylov
2022-11-29 14:44         ` Alex Deucher
2022-11-29 15:59           ` Mikhail Krylov
2022-11-29 16:05             ` Alex Deucher
2022-11-29 17:11               ` Mikhail Krylov
2022-11-30 12:54                 ` Robin Murphy
2022-11-30 14:28                   ` Alex Deucher
2022-11-30 15:42                     ` Robin Murphy
2022-11-30 16:07                       ` Alex Deucher
2022-11-30 19:59                         ` Mikhail Krylov
2022-12-01 14:00                           ` Robin Murphy
2022-12-01 14:06                             ` Alex Deucher
2022-12-01 15:28                             ` Mikhail Krylov
2022-12-10 15:32                         ` Mikhail Krylov
2022-12-11  5:52                           ` Luben Tuikov
2022-12-11 11:42                             ` [PATCH] drm/radeon: Fix screen corruption Luben Tuikov
2022-12-12  2:08                               ` [PATCH] drm/radeon: Fix screen corruption (v2) Luben Tuikov
2022-12-14 21:53                                 ` Robin Murphy
2022-12-14 22:02                                   ` Alex Deucher
2022-12-14 23:08                                     ` Robin Murphy
2022-12-15  8:07                                       ` Christian König
2022-12-15  9:08                                         ` Luben Tuikov
2022-12-15  9:46                                           ` Christian König
2022-12-15 10:19                                             ` Luben Tuikov
2022-12-15 11:27                                               ` Christian König
2022-12-15 11:40                                                 ` Luben Tuikov
2022-12-15 11:53                                                   ` Robin Murphy
2022-12-15 12:07                                                     ` Luben Tuikov [this message]
2023-01-19 16:56                                                       ` Krylov Michael
2023-01-20  4:31                                                         ` Luben Tuikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=658a9226-98df-fd09-957b-14fa7fbb9f87@amd.com \
    --to=luben.tuikov@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=robin.murphy@arm.com \
    --cc=sqarert@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox