* Re: [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM
2014-12-08 7:14 ` Alexandre Courbot
@ 2014-12-08 7:36 ` Alexandre Courbot
2014-12-08 8:11 ` Alexandre Courbot
2014-12-15 8:04 ` Alexandre Courbot
2 siblings, 0 replies; 9+ messages in thread
From: Alexandre Courbot @ 2014-12-08 7:36 UTC (permalink / raw)
To: Thierry Reding, dri-devel
Cc: Alexandre Courbot, Thomas Hellstrom, Allen Martin, Arnd Bergmann,
Konrad Rzeszutek Wilk
On 12/08/2014 04:14 PM, Alexandre Courbot wrote:
> On 11/12/2014 09:39 PM, Thierry Reding wrote:
>> From: Thierry Reding <treding@nvidia.com>
>>
>> dma_alloc_coherent() returns a kernel virtual address that is part of
>> the linear range. Passing such an address to virt_to_page() is illegal
>> on non-coherent architectures. This causes the kernel to oops on 64-bit
>> ARM because the struct page * obtained from virt_to_page() points to
>> unmapped memory.
>>
>> This commit fixes this by using phys_to_page() since we get a physical
>> address from dma_alloc_coherent(). Note that this is not a proper fix
>> because if an IOMMU is set up to translate addresses for the GPU this
>> address will be an I/O virtual address rather than a physical one. The
>> proper fix probably involves not getting a pointer to the struct page
>> in the first place, but that would be a much more intrusive change, if
>> at all possible.
>>
>> Until that time, this temporary fix will allow TTM to work on 32-bit
>> and 64-bit ARM as well, provided that no IOMMU translations are enabled
>> for the GPU.
>>
>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>> ---
>> Arnd, I realize that this isn't a proper fix according to what we
>> discussed on
>> IRC yesterday, but I can't see a way to remove access to the pages
>> array that
>> would be as simple as this. I've marked this as RFC in the hope that
>> it will
>> trigger some discussion that will lead to a proper solution.
>>
>> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> index c96db433f8af..d7993985752c 100644
>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> @@ -343,7 +343,11 @@ static struct dma_page
>> *__ttm_dma_alloc_page(struct dma_pool *pool)
>> &d_page->dma,
>> pool->gfp_flags);
>> if (d_page->vaddr)
>> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>> + d_page->p = phys_to_page(d_page->dma);
>> +#else
>> d_page->p = virt_to_page(d_page->vaddr);
>> +#endif
>
> Since I am messing with the IOMMU I just happened to have hit the issue
> you are mentioning. Wouldn't the following work:
>
> - if (d_page->vaddr)
> -#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> - d_page->p = phys_to_page(d_page->dma);
> -#else
> - d_page->p = virt_to_page(d_page->vaddr);
> -#endif
> - else {
> + if (d_page->vaddr) {
> + if (is_vmalloc_addr(d_page->vaddr)) {
> + d_page->p = vmalloc_to_page(d_page->vaddr);
> + } else {
> + d_page->p = virt_to_page(d_page->vaddr);
> + }
> + } else {
>
> A remapped page will end up in the vmalloc range of the address space,
> and in this case we can use vmalloc_to_page() to get the right page.
> Pages outside of this range are part of the linear mapping and can be
> resolved using virt_to_page().
>
> Jetson seems to be mostly happy with this, although I sometimes get the
> following trace:
>
> [ 13.174763] kernel BUG at ../mm/slab.c:2593!
> [ 13.174767] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> [ 13.174790] Modules linked in: nouveau_platform(O+) nouveau(O)
> cfbfillrect cfbimgblt cfbcopyarea ttm
> ...
> [ 13.175234] [<c00de238>] (cache_alloc_refill) from [<c00de528>]
> (__kmalloc+0x100/0x13c)
> [ 13.175247] [<c00de528>] (__kmalloc) from [<c001d564>]
> (arm_iommu_alloc_attrs+0x94/0x3a8)
> [ 13.175269] [<c001d564>] (arm_iommu_alloc_attrs) from [<bf008f4c>]
> (ttm_dma_populate+0x498/0x76c [ttm])
> [ 13.175294] [<bf008f4c>] (ttm_dma_populate [ttm]) from [<bf000bb8>]
> (ttm_tt_bind+0x38/0x68 [ttm])
> [ 13.175315] [<bf000bb8>] (ttm_tt_bind [ttm]) from [<bf00298c>]
> (ttm_bo_handle_move_mem+0x408/0x47c [ttm])
> [ 13.175337] [<bf00298c>] (ttm_bo_handle_move_mem [ttm]) from
> [<bf003758>] (ttm_bo_validate+0x220/0x22c [ttm])
> [ 13.175359] [<bf003758>] (ttm_bo_validate [ttm]) from [<bf003984>]
> (ttm_bo_init+0x220/0x338 [ttm])
> [ 13.175480] [<bf003984>] (ttm_bo_init [ttm]) from [<bf0c70a0>]
> (nouveau_bo_new+0x1c0/0x294 [nouveau])
> [ 13.175688] [<bf0c70a0>] (nouveau_bo_new [nouveau]) from [<bf0ce88c>]
> (nv84_fence_create+0x1cc/0x240 [nouveau])
> [ 13.175891] [<bf0ce88c>] (nv84_fence_create [nouveau]) from
> [<bf0cec90>] (nvc0_fence_create+0xc/0x24 [nouveau])
> [ 13.176094] [<bf0cec90>] (nvc0_fence_create [nouveau]) from
> [<bf0c1480>] (nouveau_accel_init+0xec/0x450 [nouveau])
>
> I suspect this is related to this change, but it might also be the
> side-effect of another bug in my code.
FWIW and after some more testing, I noticed that without IOMMU
vmalloc_to_page() and phys_to_page() both return the same valid page.
With the IOMMU enabled vmalloc_to_page() still returns what seems to be
valid pages (phys_to_page() of course doesn't make sense anymore).
So AFAICT the change I proposed seems valid. I am not sure what causes
the BUG() in the slab allocator.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM
2014-12-08 7:14 ` Alexandre Courbot
2014-12-08 7:36 ` Alexandre Courbot
@ 2014-12-08 8:11 ` Alexandre Courbot
2014-12-08 8:19 ` Alexandre Courbot
2014-12-15 8:04 ` Alexandre Courbot
2 siblings, 1 reply; 9+ messages in thread
From: Alexandre Courbot @ 2014-12-08 8:11 UTC (permalink / raw)
To: Thierry Reding, dri-devel
Cc: Alexandre Courbot, Thomas Hellstrom, Allen Martin, Arnd Bergmann,
Konrad Rzeszutek Wilk
On 12/08/2014 04:14 PM, Alexandre Courbot wrote:
> On 11/12/2014 09:39 PM, Thierry Reding wrote:
>> From: Thierry Reding <treding@nvidia.com>
>>
>> dma_alloc_coherent() returns a kernel virtual address that is part of
>> the linear range. Passing such an address to virt_to_page() is illegal
>> on non-coherent architectures. This causes the kernel to oops on 64-bit
>> ARM because the struct page * obtained from virt_to_page() points to
>> unmapped memory.
>>
>> This commit fixes this by using phys_to_page() since we get a physical
>> address from dma_alloc_coherent(). Note that this is not a proper fix
>> because if an IOMMU is set up to translate addresses for the GPU this
>> address will be an I/O virtual address rather than a physical one. The
>> proper fix probably involves not getting a pointer to the struct page
>> in the first place, but that would be a much more intrusive change, if
>> at all possible.
>>
>> Until that time, this temporary fix will allow TTM to work on 32-bit
>> and 64-bit ARM as well, provided that no IOMMU translations are enabled
>> for the GPU.
>>
>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>> ---
>> Arnd, I realize that this isn't a proper fix according to what we
>> discussed on
>> IRC yesterday, but I can't see a way to remove access to the pages
>> array that
>> would be as simple as this. I've marked this as RFC in the hope that
>> it will
>> trigger some discussion that will lead to a proper solution.
>>
>> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> index c96db433f8af..d7993985752c 100644
>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> @@ -343,7 +343,11 @@ static struct dma_page
>> *__ttm_dma_alloc_page(struct dma_pool *pool)
>> &d_page->dma,
>> pool->gfp_flags);
>> if (d_page->vaddr)
>> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>> + d_page->p = phys_to_page(d_page->dma);
>> +#else
>> d_page->p = virt_to_page(d_page->vaddr);
>> +#endif
>
> Since I am messing with the IOMMU I just happened to have hit the issue
> you are mentioning. Wouldn't the following work:
>
> - if (d_page->vaddr)
> -#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> - d_page->p = phys_to_page(d_page->dma);
> -#else
> - d_page->p = virt_to_page(d_page->vaddr);
> -#endif
> - else {
> + if (d_page->vaddr) {
> + if (is_vmalloc_addr(d_page->vaddr)) {
> + d_page->p = vmalloc_to_page(d_page->vaddr);
> + } else {
> + d_page->p = virt_to_page(d_page->vaddr);
> + }
> + } else {
>
> A remapped page will end up in the vmalloc range of the address space,
> and in this case we can use vmalloc_to_page() to get the right page.
> Pages outside of this range are part of the linear mapping and can be
> resolved using virt_to_page().
>
> Jetson seems to be mostly happy with this, although I sometimes get the
> following trace:
>
> [ 13.174763] kernel BUG at ../mm/slab.c:2593!
> [ 13.174767] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> [ 13.174790] Modules linked in: nouveau_platform(O+) nouveau(O)
> cfbfillrect cfbimgblt cfbcopyarea ttm
> ...
> [ 13.175234] [<c00de238>] (cache_alloc_refill) from [<c00de528>]
> (__kmalloc+0x100/0x13c)
> [ 13.175247] [<c00de528>] (__kmalloc) from [<c001d564>]
> (arm_iommu_alloc_attrs+0x94/0x3a8)
> [ 13.175269] [<c001d564>] (arm_iommu_alloc_attrs) from [<bf008f4c>]
> (ttm_dma_populate+0x498/0x76c [ttm])
> [ 13.175294] [<bf008f4c>] (ttm_dma_populate [ttm]) from [<bf000bb8>]
> (ttm_tt_bind+0x38/0x68 [ttm])
> [ 13.175315] [<bf000bb8>] (ttm_tt_bind [ttm]) from [<bf00298c>]
> (ttm_bo_handle_move_mem+0x408/0x47c [ttm])
> [ 13.175337] [<bf00298c>] (ttm_bo_handle_move_mem [ttm]) from
> [<bf003758>] (ttm_bo_validate+0x220/0x22c [ttm])
> [ 13.175359] [<bf003758>] (ttm_bo_validate [ttm]) from [<bf003984>]
> (ttm_bo_init+0x220/0x338 [ttm])
> [ 13.175480] [<bf003984>] (ttm_bo_init [ttm]) from [<bf0c70a0>]
> (nouveau_bo_new+0x1c0/0x294 [nouveau])
> [ 13.175688] [<bf0c70a0>] (nouveau_bo_new [nouveau]) from [<bf0ce88c>]
> (nv84_fence_create+0x1cc/0x240 [nouveau])
> [ 13.175891] [<bf0ce88c>] (nv84_fence_create [nouveau]) from
> [<bf0cec90>] (nvc0_fence_create+0xc/0x24 [nouveau])
> [ 13.176094] [<bf0cec90>] (nvc0_fence_create [nouveau]) from
> [<bf0c1480>] (nouveau_accel_init+0xec/0x450 [nouveau])
>
> I suspect this is related to this change, but it might also be the
> side-effect of another bug in my code.
Ok, I finally understand where this trace comes from.
slab.c:2593:
BUG_ON(flags & GFP_SLAB_BUG_MASK);
gfp.h:
#define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
and ttm_dma_populate() calls ttm_dma_pool_init() with gfp_flags ==
GFP_HIGHUSER, which includes __GFP_HIGHMEM. Somehow these flags are
propagated through arm_iommu_alloc_attrs() up to the slab allocator,
which meets this BUG_ON() when it needs to grow its cache.
Note that ttm_dma_pool_init() can also be called with GFP_DMA32, which
will trigger the same error.
So although I am still not sure how this should be fixed, it seems like
this has nothing to do with the change I am proposing.
Temporary workaround:
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -882,9 +882,9 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma,
struct device *dev)
type = ttm_to_type(ttm->page_flags, ttm->caching_state);
if (ttm->page_flags & TTM_PAGE_FLAG_DMA32)
- gfp_flags = GFP_USER | GFP_DMA32;
+ gfp_flags = GFP_USER;
else
- gfp_flags = GFP_HIGHUSER;
+ gfp_flags = GFP_USER;
if (ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC)
gfp_flags |= __GFP_ZERO;
... which obviously doesn't look right.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM
2014-12-08 8:11 ` Alexandre Courbot
@ 2014-12-08 8:19 ` Alexandre Courbot
0 siblings, 0 replies; 9+ messages in thread
From: Alexandre Courbot @ 2014-12-08 8:19 UTC (permalink / raw)
To: Thierry Reding, dri-devel
Cc: Alexandre Courbot, Thomas Hellstrom, Allen Martin, Arnd Bergmann,
Konrad Rzeszutek Wilk
On 12/08/2014 05:11 PM, Alexandre Courbot wrote:
> On 12/08/2014 04:14 PM, Alexandre Courbot wrote:
>> On 11/12/2014 09:39 PM, Thierry Reding wrote:
>>> From: Thierry Reding <treding@nvidia.com>
>>>
>>> dma_alloc_coherent() returns a kernel virtual address that is part of
>>> the linear range. Passing such an address to virt_to_page() is illegal
>>> on non-coherent architectures. This causes the kernel to oops on 64-bit
>>> ARM because the struct page * obtained from virt_to_page() points to
>>> unmapped memory.
>>>
>>> This commit fixes this by using phys_to_page() since we get a physical
>>> address from dma_alloc_coherent(). Note that this is not a proper fix
>>> because if an IOMMU is set up to translate addresses for the GPU this
>>> address will be an I/O virtual address rather than a physical one. The
>>> proper fix probably involves not getting a pointer to the struct page
>>> in the first place, but that would be a much more intrusive change, if
>>> at all possible.
>>>
>>> Until that time, this temporary fix will allow TTM to work on 32-bit
>>> and 64-bit ARM as well, provided that no IOMMU translations are enabled
>>> for the GPU.
>>>
>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>> ---
>>> Arnd, I realize that this isn't a proper fix according to what we
>>> discussed on
>>> IRC yesterday, but I can't see a way to remove access to the pages
>>> array that
>>> would be as simple as this. I've marked this as RFC in the hope that
>>> it will
>>> trigger some discussion that will lead to a proper solution.
>>>
>>> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>>> b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>>> index c96db433f8af..d7993985752c 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>>> @@ -343,7 +343,11 @@ static struct dma_page
>>> *__ttm_dma_alloc_page(struct dma_pool *pool)
>>> &d_page->dma,
>>> pool->gfp_flags);
>>> if (d_page->vaddr)
>>> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>>> + d_page->p = phys_to_page(d_page->dma);
>>> +#else
>>> d_page->p = virt_to_page(d_page->vaddr);
>>> +#endif
>>
>> Since I am messing with the IOMMU I just happened to have hit the issue
>> you are mentioning. Wouldn't the following work:
>>
>> - if (d_page->vaddr)
>> -#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>> - d_page->p = phys_to_page(d_page->dma);
>> -#else
>> - d_page->p = virt_to_page(d_page->vaddr);
>> -#endif
>> - else {
>> + if (d_page->vaddr) {
>> + if (is_vmalloc_addr(d_page->vaddr)) {
>> + d_page->p = vmalloc_to_page(d_page->vaddr);
>> + } else {
>> + d_page->p = virt_to_page(d_page->vaddr);
>> + }
>> + } else {
>>
>> A remapped page will end up in the vmalloc range of the address space,
>> and in this case we can use vmalloc_to_page() to get the right page.
>> Pages outside of this range are part of the linear mapping and can be
>> resolved using virt_to_page().
>>
>> Jetson seems to be mostly happy with this, although I sometimes get the
>> following trace:
>>
>> [ 13.174763] kernel BUG at ../mm/slab.c:2593!
>> [ 13.174767] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
>> [ 13.174790] Modules linked in: nouveau_platform(O+) nouveau(O)
>> cfbfillrect cfbimgblt cfbcopyarea ttm
>> ...
>> [ 13.175234] [<c00de238>] (cache_alloc_refill) from [<c00de528>]
>> (__kmalloc+0x100/0x13c)
>> [ 13.175247] [<c00de528>] (__kmalloc) from [<c001d564>]
>> (arm_iommu_alloc_attrs+0x94/0x3a8)
>> [ 13.175269] [<c001d564>] (arm_iommu_alloc_attrs) from [<bf008f4c>]
>> (ttm_dma_populate+0x498/0x76c [ttm])
>> [ 13.175294] [<bf008f4c>] (ttm_dma_populate [ttm]) from [<bf000bb8>]
>> (ttm_tt_bind+0x38/0x68 [ttm])
>> [ 13.175315] [<bf000bb8>] (ttm_tt_bind [ttm]) from [<bf00298c>]
>> (ttm_bo_handle_move_mem+0x408/0x47c [ttm])
>> [ 13.175337] [<bf00298c>] (ttm_bo_handle_move_mem [ttm]) from
>> [<bf003758>] (ttm_bo_validate+0x220/0x22c [ttm])
>> [ 13.175359] [<bf003758>] (ttm_bo_validate [ttm]) from [<bf003984>]
>> (ttm_bo_init+0x220/0x338 [ttm])
>> [ 13.175480] [<bf003984>] (ttm_bo_init [ttm]) from [<bf0c70a0>]
>> (nouveau_bo_new+0x1c0/0x294 [nouveau])
>> [ 13.175688] [<bf0c70a0>] (nouveau_bo_new [nouveau]) from [<bf0ce88c>]
>> (nv84_fence_create+0x1cc/0x240 [nouveau])
>> [ 13.175891] [<bf0ce88c>] (nv84_fence_create [nouveau]) from
>> [<bf0cec90>] (nvc0_fence_create+0xc/0x24 [nouveau])
>> [ 13.176094] [<bf0cec90>] (nvc0_fence_create [nouveau]) from
>> [<bf0c1480>] (nouveau_accel_init+0xec/0x450 [nouveau])
>>
>> I suspect this is related to this change, but it might also be the
>> side-effect of another bug in my code.
>
> Ok, I finally understand where this trace comes from.
>
> slab.c:2593:
> BUG_ON(flags & GFP_SLAB_BUG_MASK);
>
> gfp.h:
> #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
>
> and ttm_dma_populate() calls ttm_dma_pool_init() with gfp_flags ==
> GFP_HIGHUSER, which includes __GFP_HIGHMEM. Somehow these flags are
> propagated through arm_iommu_alloc_attrs() up to the slab allocator,
> which meets this BUG_ON() when it needs to grow its cache.
>
> Note that ttm_dma_pool_init() can also be called with GFP_DMA32, which
> will trigger the same error.
>
> So although I am still not sure how this should be fixed, it seems like
> this has nothing to do with the change I am proposing.
>
> Temporary workaround:
>
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> @@ -882,9 +882,9 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma,
> struct device *dev)
>
> type = ttm_to_type(ttm->page_flags, ttm->caching_state);
> if (ttm->page_flags & TTM_PAGE_FLAG_DMA32)
> - gfp_flags = GFP_USER | GFP_DMA32;
> + gfp_flags = GFP_USER;
> else
> - gfp_flags = GFP_HIGHUSER;
> + gfp_flags = GFP_USER;
> if (ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC)
> gfp_flags |= __GFP_ZERO;
>
> ... which obviously doesn't look right.
This, however, looks better:
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e8907117861e..bc495354c802 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1106,7 +1106,7 @@ static struct page **__iommu_alloc_buffer(struct
device *dev, size_t size,
int i = 0;
if (array_size <= PAGE_SIZE)
- pages = kzalloc(array_size, gfp);
+ pages = kzalloc(array_size, GFP_KERNEL);
else
pages = vzalloc(array_size);
if (!pages)
I don't see any reason why the array of pages should be allocated with
the same properties as the buffer itself. Looks like this is a DMA
allocator bug. Sorry about the noise.
Alex.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM
2014-12-08 7:14 ` Alexandre Courbot
2014-12-08 7:36 ` Alexandre Courbot
2014-12-08 8:11 ` Alexandre Courbot
@ 2014-12-15 8:04 ` Alexandre Courbot
2 siblings, 0 replies; 9+ messages in thread
From: Alexandre Courbot @ 2014-12-15 8:04 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Thomas Hellstrom, Arnd Bergmann, Konrad Rzeszutek Wilk,
dri-devel@lists.freedesktop.org, Allen Martin
On Mon, Dec 8, 2014 at 4:14 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> On 11/12/2014 09:39 PM, Thierry Reding wrote:
>>
>> From: Thierry Reding <treding@nvidia.com>
>>
>> dma_alloc_coherent() returns a kernel virtual address that is part of
>> the linear range. Passing such an address to virt_to_page() is illegal
>> on non-coherent architectures. This causes the kernel to oops on 64-bit
>> ARM because the struct page * obtained from virt_to_page() points to
>> unmapped memory.
>>
>> This commit fixes this by using phys_to_page() since we get a physical
>> address from dma_alloc_coherent(). Note that this is not a proper fix
>> because if an IOMMU is set up to translate addresses for the GPU this
>> address will be an I/O virtual address rather than a physical one. The
>> proper fix probably involves not getting a pointer to the struct page
>> in the first place, but that would be a much more intrusive change, if
>> at all possible.
>>
>> Until that time, this temporary fix will allow TTM to work on 32-bit
>> and 64-bit ARM as well, provided that no IOMMU translations are enabled
>> for the GPU.
>>
>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>> ---
>> Arnd, I realize that this isn't a proper fix according to what we
>> discussed on
>> IRC yesterday, but I can't see a way to remove access to the pages array
>> that
>> would be as simple as this. I've marked this as RFC in the hope that it
>> will
>> trigger some discussion that will lead to a proper solution.
>>
>> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> index c96db433f8af..d7993985752c 100644
>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> @@ -343,7 +343,11 @@ static struct dma_page *__ttm_dma_alloc_page(struct
>> dma_pool *pool)
>> &d_page->dma,
>> pool->gfp_flags);
>> if (d_page->vaddr)
>> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>> + d_page->p = phys_to_page(d_page->dma);
>> +#else
>> d_page->p = virt_to_page(d_page->vaddr);
>> +#endif
>
>
> Since I am messing with the IOMMU I just happened to have hit the issue you
> are mentioning. Wouldn't the following work:
>
> - if (d_page->vaddr)
> -#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> - d_page->p = phys_to_page(d_page->dma);
> -#else
> - d_page->p = virt_to_page(d_page->vaddr);
> -#endif
> - else {
> + if (d_page->vaddr) {
> + if (is_vmalloc_addr(d_page->vaddr)) {
> + d_page->p = vmalloc_to_page(d_page->vaddr);
> + } else {
> + d_page->p = virt_to_page(d_page->vaddr);
> + }
> + } else {
>
> A remapped page will end up in the vmalloc range of the address space, and
> in this case we can use vmalloc_to_page() to get the right page. Pages
> outside of this range are part of the linear mapping and can be resolved
> using virt_to_page().
Thierry, have you had a chance to try this? If not, do you want to to
try to push this patch? It seems to solve the issue AFAICT, but needs
more testing.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 9+ messages in thread