From: okaya-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org
To: "Christian König" <christian.koenig-5C7GfCeVMHo@public.gmane.org>
Cc: alexander.deucher-5C7GfCeVMHo@public.gmane.org,
Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Lucas Stach <l.stach-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
Subject: Re: [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately
Date: Thu, 12 Apr 2018 07:48:16 -0400 [thread overview]
Message-ID: <b223f968900bc439c757565ffe2d4a59@codeaurora.org> (raw)
In-Reply-To: <9e55076e-56dd-d553-63d6-2ce018199f84-5C7GfCeVMHo@public.gmane.org>
On 2018-04-12 06:33, Christian König wrote:
> Am 12.04.2018 um 11:49 schrieb Lucas Stach:
>> Am Donnerstag, den 12.04.2018, 11:35 +0200 schrieb Christian König:
>>> Am 12.04.2018 um 11:11 schrieb Lucas Stach:
>>>> Am Mittwoch, den 11.04.2018, 20:26 +0200 schrieb Christian König:
>>>>> Am 11.04.2018 um 19:11 schrieb Robin Murphy:
>>>>>> For dma_map_sg(), DMA API implementations are free to merge
>>>>>> consecutive
>>>>>> segments into a single DMA mapping if conditions are suitable,
>>>>>> thus the
>>>>>> resulting DMA addresses may be packed into fewer entries than
>>>>>> ttm->sg->nents implies.
>>>>>>
>>>>>> drm_prime_sg_to_page_addr_arrays() does not account for this,
>>>>>> meaning
>>>>>> its callers either have to reject the 0 < count < nents case or
>>>>>> risk
>>>>>> getting bogus addresses back later. Fortunately this is relatively
>>>>>> easy
>>>>>> to deal with having to rejig structures to also store the mapped
>>>>>> count,
>>>>>> since the total DMA length should still be equal to the total
>>>>>> buffer
>>>>>> length. All we need is a separate scatterlist cursor to iterate
>>>>>> the DMA
>>>>>> addresses separately from the CPU addresses.
>>>>> Mhm, I think I like Sinas approach better.
>>>>>
>>>>> See the hardware actually needs the dma_address on a page by page
>>>>> basis.
>>>>>
>>>>> Joining multiple consecutive pages into one entry is just
>>>>> additional
>>>>> overhead which we don't need.
>>>> But setting MAX_SEGMENT_SIZE will probably prevent an IOMMU that
>>>> might
>>>> be in front of your GPU to map large pages as such, causing
>>>> additional
>>>> overhead by means of additional TLB misses and page walks on the
>>>> IOMMU
>>>> side.
>>>>
>>>> And wouldn't you like to use huge pages at the GPU side, if the
>>>> IOMMU
>>>> already provides you the service of producing one large consecutive
>>>> address range, rather than mapping them via a number of small pages?
>>> No, I wouldn't like to use that. We're already using that :)
>>>
>>> But we use huge pages by allocating consecutive chunks of memory so
>>> that
>>> both the CPU as well as the GPU can increase their TLB hit rate.
>> I'm not convinced that this is a universal win. Many GPU buffers
>> aren't
>> accessed by the CPU and allocating huge pages puts much more strain on
>> the kernel MM subsystem.
>
> Yeah, we indeed see the extra overhead during allocation.
>
>>> What currently happens is that the DMA subsystem tries to coalesce
>>> multiple pages into on SG entry and we de-coalesce that inside the
>>> driver again to create our random access array.
>> I guess the right thing would be to have a flag that tells the the DMA
>> implementation to not coalesce the entries. But (ab-)using max segment
>> size tells the DMA API to split the segments if they are larger than
>> the given size, which is probably not what you want either as you now
>> need to coalesce the segments again when you are mapping real huge
>> pages.
>
> No, even with huge pages I need an array with every single dma address
> for a 4K pages (or whatever the normal page size is).
>
> The problem is that I need a random access array for the DMA
> addresses, even when they are continuously.
>
> I agree that max segment size is a bit ugly here, but at least for
> radeon, amdgpu and pretty much TTM in general it is exactly what I
> need.
>
> I could fix TTM to not need that, but for radeon and amdgpu it is the
> hardware which needs this.
I can implement i915 approach as Robin suggested as an alternative. Is
that better?
>
> Christian.
>
>>
>>> That is a huge waste of memory and CPU cycles and I actually wanted
>>> to
>>> get rid of it for quite some time now. Sinas approach seems to be a
>>> good
>>> step into the right direction to me to actually clean that up.
>>>
>>>> Detecting such a condition is much easier if the DMA map
>>>> implementation
>>>> gives you the coalesced scatter entries.
>>> A way which preserves both path would be indeed nice to have, but
>>> that
>>> only allows for the TLB optimization for the GPU and not the CPU any
>>> more. So I actually see that as really minor use case.
>> Some of the Tegras have much larger TLBs and better page-walk
>> performance on the system IOMMU compared to the GPU MMU, so they would
>> probably benefit a good deal even if the hugepage optimization only
>> targets the GPU side.
>>
>> Regards,
>> Lucas
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
next prev parent reply other threads:[~2018-04-12 11:48 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-04-11 17:11 [PATCH 1/2] drm/prime: Iterate SG DMA addresses separately Robin Murphy
2018-04-11 17:11 ` [PATCH 2/2] drm/amdgpu: Allow dma_map_sg() coalescing Robin Murphy
2018-04-11 18:28 ` Christian König
2018-04-12 17:53 ` Robin Murphy
[not found] ` <24948fc1-ff01-1276-e3fa-c2b0e5713b5b-5wv7dgnIgG8@public.gmane.org>
2018-04-12 19:08 ` Christian König
2018-04-11 17:45 ` [1/2] drm/prime: Iterate SG DMA addresses separately Robin Murphy
[not found] ` <0901c8c3b9adbcb851ba58dfca6b16d12ccbcb0f.1523465719.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-04-11 18:26 ` [PATCH 1/2] " Christian König
[not found] ` <67b1875d-9f77-5fb8-bfc6-53d34c15ab16-5C7GfCeVMHo@public.gmane.org>
2018-04-12 9:11 ` Lucas Stach
[not found] ` <1523524317.4981.24.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
2018-04-12 9:35 ` Christian König
[not found] ` <58ef1aab-6c3a-8e69-0a7e-98218ba9fe96-5C7GfCeVMHo@public.gmane.org>
2018-04-12 9:49 ` Lucas Stach
[not found] ` <1523526540.4981.26.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
2018-04-12 10:33 ` Christian König
[not found] ` <9e55076e-56dd-d553-63d6-2ce018199f84-5C7GfCeVMHo@public.gmane.org>
2018-04-12 11:48 ` okaya-sgV2jX0FEOL9JmXXK+q4OQ [this message]
2018-04-12 13:18 ` Robin Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b223f968900bc439c757565ffe2d4a59@codeaurora.org \
--to=okaya-sgv2jx0feol9jmxxk+q4oq@public.gmane.org \
--cc=alexander.deucher-5C7GfCeVMHo@public.gmane.org \
--cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
--cc=christian.koenig-5C7GfCeVMHo@public.gmane.org \
--cc=dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
--cc=l.stach-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org \
--cc=robin.murphy-5wv7dgnIgG8@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.