* [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM
@ 2014-11-12 12:39 Thierry Reding
2014-11-12 14:18 ` Konrad Rzeszutek Wilk
2014-12-08 7:14 ` Alexandre Courbot
0 siblings, 2 replies; 9+ messages in thread
From: Thierry Reding @ 2014-11-12 12:39 UTC (permalink / raw)
To: dri-devel
Cc: Alexandre Courbot, Thomas Hellstrom, Allen Martin, Arnd Bergmann,
Konrad Rzeszutek Wilk
From: Thierry Reding <treding@nvidia.com>
dma_alloc_coherent() returns a kernel virtual address that is part of
the linear range. Passing such an address to virt_to_page() is illegal
on non-coherent architectures. This causes the kernel to oops on 64-bit
ARM because the struct page * obtained from virt_to_page() points to
unmapped memory.
This commit fixes this by using phys_to_page() since we get a physical
address from dma_alloc_coherent(). Note that this is not a proper fix
because if an IOMMU is set up to translate addresses for the GPU this
address will be an I/O virtual address rather than a physical one. The
proper fix probably involves not getting a pointer to the struct page
in the first place, but that would be a much more intrusive change, if
at all possible.
Until that time, this temporary fix will allow TTM to work on 32-bit
and 64-bit ARM as well, provided that no IOMMU translations are enabled
for the GPU.
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
Arnd, I realize that this isn't a proper fix according to what we discussed on
IRC yesterday, but I can't see a way to remove access to the pages array that
would be as simple as this. I've marked this as RFC in the hope that it will
trigger some discussion that will lead to a proper solution.
drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
index c96db433f8af..d7993985752c 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -343,7 +343,11 @@ static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool)
&d_page->dma,
pool->gfp_flags);
if (d_page->vaddr)
+#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+ d_page->p = phys_to_page(d_page->dma);
+#else
d_page->p = virt_to_page(d_page->vaddr);
+#endif
else {
kfree(d_page);
d_page = NULL;
--
2.1.3
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM
2014-11-12 12:39 [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM Thierry Reding
@ 2014-11-12 14:18 ` Konrad Rzeszutek Wilk
2014-11-12 17:03 ` Arnd Bergmann
2014-12-08 7:14 ` Alexandre Courbot
1 sibling, 1 reply; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-12 14:18 UTC (permalink / raw)
To: Thierry Reding
Cc: Allen Martin, Alexandre Courbot, Thomas Hellstrom, Arnd Bergmann,
dri-devel
On Wed, Nov 12, 2014 at 01:39:05PM +0100, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
>
> dma_alloc_coherent() returns a kernel virtual address that is part of
> the linear range. Passing such an address to virt_to_page() is illegal
> on non-coherent architectures. This causes the kernel to oops on 64-bit
> ARM because the struct page * obtained from virt_to_page() points to
> unmapped memory.
Oh! That is not good!
>
> This commit fixes this by using phys_to_page() since we get a physical
> address from dma_alloc_coherent(). Note that this is not a proper fix
> because if an IOMMU is set up to translate addresses for the GPU this
> address will be an I/O virtual address rather than a physical one. The
> proper fix probably involves not getting a pointer to the struct page
> in the first place, but that would be a much more intrusive change, if
> at all possible.
What type of caching types are there on ARM? We use the 'struct page'
on the set_pages_to_[wc|uc|wb] but all of those are X86 specfic.
But I think you could by passing the 'struct dma_page' instead
of 'struct page' (and the array uses) around. That should solve
the touching of 'struct page' and we can treat it as an opaque
type.
>
> Until that time, this temporary fix will allow TTM to work on 32-bit
> and 64-bit ARM as well, provided that no IOMMU translations are enabled
> for the GPU.
Is there a way to query the 'struct device' to see if the IOMMU translation
is enabled/disabled for said device?
Now your patch looks to get the 'struct page' by doing some form of
translation. Could you explain to me which type of memory have a 'struct page'
and which ones do not ?
It is OK if you explain this in nauseating details :-)
>
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
> Arnd, I realize that this isn't a proper fix according to what we discussed on
> IRC yesterday, but I can't see a way to remove access to the pages array that
> would be as simple as this. I've marked this as RFC in the hope that it will
> trigger some discussion that will lead to a proper solution.
>
> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> index c96db433f8af..d7993985752c 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> @@ -343,7 +343,11 @@ static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool)
> &d_page->dma,
> pool->gfp_flags);
> if (d_page->vaddr)
> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> + d_page->p = phys_to_page(d_page->dma);
> +#else
> d_page->p = virt_to_page(d_page->vaddr);
> +#endif
> else {
> kfree(d_page);
> d_page = NULL;
> --
> 2.1.3
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM
2014-11-12 14:18 ` Konrad Rzeszutek Wilk
@ 2014-11-12 17:03 ` Arnd Bergmann
2014-12-01 16:43 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 9+ messages in thread
From: Arnd Bergmann @ 2014-11-12 17:03 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Alexandre Courbot, Thomas Hellstrom, Allen Martin, dri-devel
On Wednesday 12 November 2014 09:18:59 Konrad Rzeszutek Wilk wrote:
> On Wed, Nov 12, 2014 at 01:39:05PM +0100, Thierry Reding wrote:
> > From: Thierry Reding <treding@nvidia.com>
> >
> > dma_alloc_coherent() returns a kernel virtual address that is part of
> > the linear range. Passing such an address to virt_to_page() is illegal
> > on non-coherent architectures. This causes the kernel to oops on 64-bit
> > ARM because the struct page * obtained from virt_to_page() points to
> > unmapped memory.
>
> Oh! That is not good!
>
I think what Thierry meant is that the returned pointer is /not/ in the
linear range.
> > Until that time, this temporary fix will allow TTM to work on 32-bit
> > and 64-bit ARM as well, provided that no IOMMU translations are enabled
> > for the GPU.
>
> Is there a way to query the 'struct device' to see if the IOMMU translation
> is enabled/disabled for said device?
>
> Now your patch looks to get the 'struct page' by doing some form of
> translation. Could you explain to me which type of memory have a 'struct page'
> and which ones do not ?
>
> It is OK if you explain this in nauseating details
Basically there are two types of memory that have a struct page:
- directly mapped cacheable memory, i.e. anything that can be accessed
through a kernel pointer without having to go though ioremap/vmalloc/...
- highmem pages on 32-bit system.
On noncoherent ARM systems, dma_alloc_coherent will return memory that
is was unmapped from the linear range to avoid having both cacheable and
noncachable mappings for the same page.
Arnd
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM
2014-11-12 17:03 ` Arnd Bergmann
@ 2014-12-01 16:43 ` Konrad Rzeszutek Wilk
0 siblings, 0 replies; 9+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-12-01 16:43 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Alexandre Courbot, Thomas Hellstrom, Allen Martin, dri-devel
On Wed, Nov 12, 2014 at 06:03:49PM +0100, Arnd Bergmann wrote:
> On Wednesday 12 November 2014 09:18:59 Konrad Rzeszutek Wilk wrote:
> > On Wed, Nov 12, 2014 at 01:39:05PM +0100, Thierry Reding wrote:
> > > From: Thierry Reding <treding@nvidia.com>
> > >
> > > dma_alloc_coherent() returns a kernel virtual address that is part of
> > > the linear range. Passing such an address to virt_to_page() is illegal
> > > on non-coherent architectures. This causes the kernel to oops on 64-bit
> > > ARM because the struct page * obtained from virt_to_page() points to
> > > unmapped memory.
> >
> > Oh! That is not good!
> >
> I think what Thierry meant is that the returned pointer is /not/ in the
> linear range.
>
> > > Until that time, this temporary fix will allow TTM to work on 32-bit
> > > and 64-bit ARM as well, provided that no IOMMU translations are enabled
> > > for the GPU.
> >
> > Is there a way to query the 'struct device' to see if the IOMMU translation
> > is enabled/disabled for said device?
?
> >
> > Now your patch looks to get the 'struct page' by doing some form of
> > translation. Could you explain to me which type of memory have a 'struct page'
> > and which ones do not ?
> >
> > It is OK if you explain this in nauseating details
>
> Basically there are two types of memory that have a struct page:
>
> - directly mapped cacheable memory, i.e. anything that can be accessed
> through a kernel pointer without having to go though ioremap/vmalloc/...
>
> - highmem pages on 32-bit system.
>
> On noncoherent ARM systems, dma_alloc_coherent will return memory that
> is was unmapped from the linear range to avoid having both cacheable and
> noncachable mappings for the same page.
>
> Arnd
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM
2014-11-12 12:39 [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM Thierry Reding
2014-11-12 14:18 ` Konrad Rzeszutek Wilk
@ 2014-12-08 7:14 ` Alexandre Courbot
2014-12-08 7:36 ` Alexandre Courbot
` (2 more replies)
1 sibling, 3 replies; 9+ messages in thread
From: Alexandre Courbot @ 2014-12-08 7:14 UTC (permalink / raw)
To: Thierry Reding, dri-devel
Cc: Alexandre Courbot, Thomas Hellstrom, Allen Martin, Arnd Bergmann,
Konrad Rzeszutek Wilk
On 11/12/2014 09:39 PM, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
>
> dma_alloc_coherent() returns a kernel virtual address that is part of
> the linear range. Passing such an address to virt_to_page() is illegal
> on non-coherent architectures. This causes the kernel to oops on 64-bit
> ARM because the struct page * obtained from virt_to_page() points to
> unmapped memory.
>
> This commit fixes this by using phys_to_page() since we get a physical
> address from dma_alloc_coherent(). Note that this is not a proper fix
> because if an IOMMU is set up to translate addresses for the GPU this
> address will be an I/O virtual address rather than a physical one. The
> proper fix probably involves not getting a pointer to the struct page
> in the first place, but that would be a much more intrusive change, if
> at all possible.
>
> Until that time, this temporary fix will allow TTM to work on 32-bit
> and 64-bit ARM as well, provided that no IOMMU translations are enabled
> for the GPU.
>
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
> Arnd, I realize that this isn't a proper fix according to what we discussed on
> IRC yesterday, but I can't see a way to remove access to the pages array that
> would be as simple as this. I've marked this as RFC in the hope that it will
> trigger some discussion that will lead to a proper solution.
>
> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> index c96db433f8af..d7993985752c 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> @@ -343,7 +343,11 @@ static struct dma_page *__ttm_dma_alloc_page(struct dma_pool *pool)
> &d_page->dma,
> pool->gfp_flags);
> if (d_page->vaddr)
> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> + d_page->p = phys_to_page(d_page->dma);
> +#else
> d_page->p = virt_to_page(d_page->vaddr);
> +#endif
Since I am messing with the IOMMU I just happened to have hit the issue
you are mentioning. Wouldn't the following work:
- if (d_page->vaddr)
-#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
- d_page->p = phys_to_page(d_page->dma);
-#else
- d_page->p = virt_to_page(d_page->vaddr);
-#endif
- else {
+ if (d_page->vaddr) {
+ if (is_vmalloc_addr(d_page->vaddr)) {
+ d_page->p = vmalloc_to_page(d_page->vaddr);
+ } else {
+ d_page->p = virt_to_page(d_page->vaddr);
+ }
+ } else {
A remapped page will end up in the vmalloc range of the address space,
and in this case we can use vmalloc_to_page() to get the right page.
Pages outside of this range are part of the linear mapping and can be
resolved using virt_to_page().
Jetson seems to be mostly happy with this, although I sometimes get the
following trace:
[ 13.174763] kernel BUG at ../mm/slab.c:2593!
[ 13.174767] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[ 13.174790] Modules linked in: nouveau_platform(O+) nouveau(O)
cfbfillrect cfbimgblt cfbcopyarea ttm
...
[ 13.175234] [<c00de238>] (cache_alloc_refill) from [<c00de528>]
(__kmalloc+0x100/0x13c)
[ 13.175247] [<c00de528>] (__kmalloc) from [<c001d564>]
(arm_iommu_alloc_attrs+0x94/0x3a8)
[ 13.175269] [<c001d564>] (arm_iommu_alloc_attrs) from [<bf008f4c>]
(ttm_dma_populate+0x498/0x76c [ttm])
[ 13.175294] [<bf008f4c>] (ttm_dma_populate [ttm]) from [<bf000bb8>]
(ttm_tt_bind+0x38/0x68 [ttm])
[ 13.175315] [<bf000bb8>] (ttm_tt_bind [ttm]) from [<bf00298c>]
(ttm_bo_handle_move_mem+0x408/0x47c [ttm])
[ 13.175337] [<bf00298c>] (ttm_bo_handle_move_mem [ttm]) from
[<bf003758>] (ttm_bo_validate+0x220/0x22c [ttm])
[ 13.175359] [<bf003758>] (ttm_bo_validate [ttm]) from [<bf003984>]
(ttm_bo_init+0x220/0x338 [ttm])
[ 13.175480] [<bf003984>] (ttm_bo_init [ttm]) from [<bf0c70a0>]
(nouveau_bo_new+0x1c0/0x294 [nouveau])
[ 13.175688] [<bf0c70a0>] (nouveau_bo_new [nouveau]) from [<bf0ce88c>]
(nv84_fence_create+0x1cc/0x240 [nouveau])
[ 13.175891] [<bf0ce88c>] (nv84_fence_create [nouveau]) from
[<bf0cec90>] (nvc0_fence_create+0xc/0x24 [nouveau])
[ 13.176094] [<bf0cec90>] (nvc0_fence_create [nouveau]) from
[<bf0c1480>] (nouveau_accel_init+0xec/0x450 [nouveau])
I suspect this is related to this change, but it might also be the
side-effect of another bug in my code.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM
2014-12-08 7:14 ` Alexandre Courbot
@ 2014-12-08 7:36 ` Alexandre Courbot
2014-12-08 8:11 ` Alexandre Courbot
2014-12-15 8:04 ` Alexandre Courbot
2 siblings, 0 replies; 9+ messages in thread
From: Alexandre Courbot @ 2014-12-08 7:36 UTC (permalink / raw)
To: Thierry Reding, dri-devel
Cc: Alexandre Courbot, Thomas Hellstrom, Allen Martin, Arnd Bergmann,
Konrad Rzeszutek Wilk
On 12/08/2014 04:14 PM, Alexandre Courbot wrote:
> On 11/12/2014 09:39 PM, Thierry Reding wrote:
>> From: Thierry Reding <treding@nvidia.com>
>>
>> dma_alloc_coherent() returns a kernel virtual address that is part of
>> the linear range. Passing such an address to virt_to_page() is illegal
>> on non-coherent architectures. This causes the kernel to oops on 64-bit
>> ARM because the struct page * obtained from virt_to_page() points to
>> unmapped memory.
>>
>> This commit fixes this by using phys_to_page() since we get a physical
>> address from dma_alloc_coherent(). Note that this is not a proper fix
>> because if an IOMMU is set up to translate addresses for the GPU this
>> address will be an I/O virtual address rather than a physical one. The
>> proper fix probably involves not getting a pointer to the struct page
>> in the first place, but that would be a much more intrusive change, if
>> at all possible.
>>
>> Until that time, this temporary fix will allow TTM to work on 32-bit
>> and 64-bit ARM as well, provided that no IOMMU translations are enabled
>> for the GPU.
>>
>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>> ---
>> Arnd, I realize that this isn't a proper fix according to what we
>> discussed on
>> IRC yesterday, but I can't see a way to remove access to the pages
>> array that
>> would be as simple as this. I've marked this as RFC in the hope that
>> it will
>> trigger some discussion that will lead to a proper solution.
>>
>> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> index c96db433f8af..d7993985752c 100644
>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> @@ -343,7 +343,11 @@ static struct dma_page
>> *__ttm_dma_alloc_page(struct dma_pool *pool)
>> &d_page->dma,
>> pool->gfp_flags);
>> if (d_page->vaddr)
>> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>> + d_page->p = phys_to_page(d_page->dma);
>> +#else
>> d_page->p = virt_to_page(d_page->vaddr);
>> +#endif
>
> Since I am messing with the IOMMU I just happened to have hit the issue
> you are mentioning. Wouldn't the following work:
>
> - if (d_page->vaddr)
> -#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> - d_page->p = phys_to_page(d_page->dma);
> -#else
> - d_page->p = virt_to_page(d_page->vaddr);
> -#endif
> - else {
> + if (d_page->vaddr) {
> + if (is_vmalloc_addr(d_page->vaddr)) {
> + d_page->p = vmalloc_to_page(d_page->vaddr);
> + } else {
> + d_page->p = virt_to_page(d_page->vaddr);
> + }
> + } else {
>
> A remapped page will end up in the vmalloc range of the address space,
> and in this case we can use vmalloc_to_page() to get the right page.
> Pages outside of this range are part of the linear mapping and can be
> resolved using virt_to_page().
>
> Jetson seems to be mostly happy with this, although I sometimes get the
> following trace:
>
> [ 13.174763] kernel BUG at ../mm/slab.c:2593!
> [ 13.174767] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> [ 13.174790] Modules linked in: nouveau_platform(O+) nouveau(O)
> cfbfillrect cfbimgblt cfbcopyarea ttm
> ...
> [ 13.175234] [<c00de238>] (cache_alloc_refill) from [<c00de528>]
> (__kmalloc+0x100/0x13c)
> [ 13.175247] [<c00de528>] (__kmalloc) from [<c001d564>]
> (arm_iommu_alloc_attrs+0x94/0x3a8)
> [ 13.175269] [<c001d564>] (arm_iommu_alloc_attrs) from [<bf008f4c>]
> (ttm_dma_populate+0x498/0x76c [ttm])
> [ 13.175294] [<bf008f4c>] (ttm_dma_populate [ttm]) from [<bf000bb8>]
> (ttm_tt_bind+0x38/0x68 [ttm])
> [ 13.175315] [<bf000bb8>] (ttm_tt_bind [ttm]) from [<bf00298c>]
> (ttm_bo_handle_move_mem+0x408/0x47c [ttm])
> [ 13.175337] [<bf00298c>] (ttm_bo_handle_move_mem [ttm]) from
> [<bf003758>] (ttm_bo_validate+0x220/0x22c [ttm])
> [ 13.175359] [<bf003758>] (ttm_bo_validate [ttm]) from [<bf003984>]
> (ttm_bo_init+0x220/0x338 [ttm])
> [ 13.175480] [<bf003984>] (ttm_bo_init [ttm]) from [<bf0c70a0>]
> (nouveau_bo_new+0x1c0/0x294 [nouveau])
> [ 13.175688] [<bf0c70a0>] (nouveau_bo_new [nouveau]) from [<bf0ce88c>]
> (nv84_fence_create+0x1cc/0x240 [nouveau])
> [ 13.175891] [<bf0ce88c>] (nv84_fence_create [nouveau]) from
> [<bf0cec90>] (nvc0_fence_create+0xc/0x24 [nouveau])
> [ 13.176094] [<bf0cec90>] (nvc0_fence_create [nouveau]) from
> [<bf0c1480>] (nouveau_accel_init+0xec/0x450 [nouveau])
>
> I suspect this is related to this change, but it might also be the
> side-effect of another bug in my code.
FWIW and after some more testing, I noticed that without IOMMU
vmalloc_to_page() and phys_to_page() both return the same valid page.
With the IOMMU enabled vmalloc_to_page() still returns what seems to be
valid pages (phys_to_page() of course doesn't make sense anymore).
So AFAICT the change I proposed seems valid. I am not sure what causes
the BUG() in the slab allocator.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM
2014-12-08 7:14 ` Alexandre Courbot
2014-12-08 7:36 ` Alexandre Courbot
@ 2014-12-08 8:11 ` Alexandre Courbot
2014-12-08 8:19 ` Alexandre Courbot
2014-12-15 8:04 ` Alexandre Courbot
2 siblings, 1 reply; 9+ messages in thread
From: Alexandre Courbot @ 2014-12-08 8:11 UTC (permalink / raw)
To: Thierry Reding, dri-devel
Cc: Alexandre Courbot, Thomas Hellstrom, Allen Martin, Arnd Bergmann,
Konrad Rzeszutek Wilk
On 12/08/2014 04:14 PM, Alexandre Courbot wrote:
> On 11/12/2014 09:39 PM, Thierry Reding wrote:
>> From: Thierry Reding <treding@nvidia.com>
>>
>> dma_alloc_coherent() returns a kernel virtual address that is part of
>> the linear range. Passing such an address to virt_to_page() is illegal
>> on non-coherent architectures. This causes the kernel to oops on 64-bit
>> ARM because the struct page * obtained from virt_to_page() points to
>> unmapped memory.
>>
>> This commit fixes this by using phys_to_page() since we get a physical
>> address from dma_alloc_coherent(). Note that this is not a proper fix
>> because if an IOMMU is set up to translate addresses for the GPU this
>> address will be an I/O virtual address rather than a physical one. The
>> proper fix probably involves not getting a pointer to the struct page
>> in the first place, but that would be a much more intrusive change, if
>> at all possible.
>>
>> Until that time, this temporary fix will allow TTM to work on 32-bit
>> and 64-bit ARM as well, provided that no IOMMU translations are enabled
>> for the GPU.
>>
>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>> ---
>> Arnd, I realize that this isn't a proper fix according to what we
>> discussed on
>> IRC yesterday, but I can't see a way to remove access to the pages
>> array that
>> would be as simple as this. I've marked this as RFC in the hope that
>> it will
>> trigger some discussion that will lead to a proper solution.
>>
>> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> index c96db433f8af..d7993985752c 100644
>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> @@ -343,7 +343,11 @@ static struct dma_page
>> *__ttm_dma_alloc_page(struct dma_pool *pool)
>> &d_page->dma,
>> pool->gfp_flags);
>> if (d_page->vaddr)
>> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>> + d_page->p = phys_to_page(d_page->dma);
>> +#else
>> d_page->p = virt_to_page(d_page->vaddr);
>> +#endif
>
> Since I am messing with the IOMMU I just happened to have hit the issue
> you are mentioning. Wouldn't the following work:
>
> - if (d_page->vaddr)
> -#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> - d_page->p = phys_to_page(d_page->dma);
> -#else
> - d_page->p = virt_to_page(d_page->vaddr);
> -#endif
> - else {
> + if (d_page->vaddr) {
> + if (is_vmalloc_addr(d_page->vaddr)) {
> + d_page->p = vmalloc_to_page(d_page->vaddr);
> + } else {
> + d_page->p = virt_to_page(d_page->vaddr);
> + }
> + } else {
>
> A remapped page will end up in the vmalloc range of the address space,
> and in this case we can use vmalloc_to_page() to get the right page.
> Pages outside of this range are part of the linear mapping and can be
> resolved using virt_to_page().
>
> Jetson seems to be mostly happy with this, although I sometimes get the
> following trace:
>
> [ 13.174763] kernel BUG at ../mm/slab.c:2593!
> [ 13.174767] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
> [ 13.174790] Modules linked in: nouveau_platform(O+) nouveau(O)
> cfbfillrect cfbimgblt cfbcopyarea ttm
> ...
> [ 13.175234] [<c00de238>] (cache_alloc_refill) from [<c00de528>]
> (__kmalloc+0x100/0x13c)
> [ 13.175247] [<c00de528>] (__kmalloc) from [<c001d564>]
> (arm_iommu_alloc_attrs+0x94/0x3a8)
> [ 13.175269] [<c001d564>] (arm_iommu_alloc_attrs) from [<bf008f4c>]
> (ttm_dma_populate+0x498/0x76c [ttm])
> [ 13.175294] [<bf008f4c>] (ttm_dma_populate [ttm]) from [<bf000bb8>]
> (ttm_tt_bind+0x38/0x68 [ttm])
> [ 13.175315] [<bf000bb8>] (ttm_tt_bind [ttm]) from [<bf00298c>]
> (ttm_bo_handle_move_mem+0x408/0x47c [ttm])
> [ 13.175337] [<bf00298c>] (ttm_bo_handle_move_mem [ttm]) from
> [<bf003758>] (ttm_bo_validate+0x220/0x22c [ttm])
> [ 13.175359] [<bf003758>] (ttm_bo_validate [ttm]) from [<bf003984>]
> (ttm_bo_init+0x220/0x338 [ttm])
> [ 13.175480] [<bf003984>] (ttm_bo_init [ttm]) from [<bf0c70a0>]
> (nouveau_bo_new+0x1c0/0x294 [nouveau])
> [ 13.175688] [<bf0c70a0>] (nouveau_bo_new [nouveau]) from [<bf0ce88c>]
> (nv84_fence_create+0x1cc/0x240 [nouveau])
> [ 13.175891] [<bf0ce88c>] (nv84_fence_create [nouveau]) from
> [<bf0cec90>] (nvc0_fence_create+0xc/0x24 [nouveau])
> [ 13.176094] [<bf0cec90>] (nvc0_fence_create [nouveau]) from
> [<bf0c1480>] (nouveau_accel_init+0xec/0x450 [nouveau])
>
> I suspect this is related to this change, but it might also be the
> side-effect of another bug in my code.
Ok, I finally understand where this trace comes from.
slab.c:2593:
BUG_ON(flags & GFP_SLAB_BUG_MASK);
gfp.h:
#define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
and ttm_dma_populate() calls ttm_dma_pool_init() with gfp_flags ==
GFP_HIGHUSER, which includes __GFP_HIGHMEM. Somehow these flags are
propagated through arm_iommu_alloc_attrs() up to the slab allocator,
which meets this BUG_ON() when it needs to grow its cache.
Note that ttm_dma_pool_init() can also be called with GFP_DMA32, which
will trigger the same error.
So although I am still not sure how this should be fixed, it seems like
this has nothing to do with the change I am proposing.
Temporary workaround:
--- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
@@ -882,9 +882,9 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma,
struct device *dev)
type = ttm_to_type(ttm->page_flags, ttm->caching_state);
if (ttm->page_flags & TTM_PAGE_FLAG_DMA32)
- gfp_flags = GFP_USER | GFP_DMA32;
+ gfp_flags = GFP_USER;
else
- gfp_flags = GFP_HIGHUSER;
+ gfp_flags = GFP_USER;
if (ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC)
gfp_flags |= __GFP_ZERO;
... which obviously doesn't look right.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM
2014-12-08 8:11 ` Alexandre Courbot
@ 2014-12-08 8:19 ` Alexandre Courbot
0 siblings, 0 replies; 9+ messages in thread
From: Alexandre Courbot @ 2014-12-08 8:19 UTC (permalink / raw)
To: Thierry Reding, dri-devel
Cc: Alexandre Courbot, Thomas Hellstrom, Allen Martin, Arnd Bergmann,
Konrad Rzeszutek Wilk
On 12/08/2014 05:11 PM, Alexandre Courbot wrote:
> On 12/08/2014 04:14 PM, Alexandre Courbot wrote:
>> On 11/12/2014 09:39 PM, Thierry Reding wrote:
>>> From: Thierry Reding <treding@nvidia.com>
>>>
>>> dma_alloc_coherent() returns a kernel virtual address that is part of
>>> the linear range. Passing such an address to virt_to_page() is illegal
>>> on non-coherent architectures. This causes the kernel to oops on 64-bit
>>> ARM because the struct page * obtained from virt_to_page() points to
>>> unmapped memory.
>>>
>>> This commit fixes this by using phys_to_page() since we get a physical
>>> address from dma_alloc_coherent(). Note that this is not a proper fix
>>> because if an IOMMU is set up to translate addresses for the GPU this
>>> address will be an I/O virtual address rather than a physical one. The
>>> proper fix probably involves not getting a pointer to the struct page
>>> in the first place, but that would be a much more intrusive change, if
>>> at all possible.
>>>
>>> Until that time, this temporary fix will allow TTM to work on 32-bit
>>> and 64-bit ARM as well, provided that no IOMMU translations are enabled
>>> for the GPU.
>>>
>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>> ---
>>> Arnd, I realize that this isn't a proper fix according to what we
>>> discussed on
>>> IRC yesterday, but I can't see a way to remove access to the pages
>>> array that
>>> would be as simple as this. I've marked this as RFC in the hope that
>>> it will
>>> trigger some discussion that will lead to a proper solution.
>>>
>>> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>>> b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>>> index c96db433f8af..d7993985752c 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>>> @@ -343,7 +343,11 @@ static struct dma_page
>>> *__ttm_dma_alloc_page(struct dma_pool *pool)
>>> &d_page->dma,
>>> pool->gfp_flags);
>>> if (d_page->vaddr)
>>> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>>> + d_page->p = phys_to_page(d_page->dma);
>>> +#else
>>> d_page->p = virt_to_page(d_page->vaddr);
>>> +#endif
>>
>> Since I am messing with the IOMMU I just happened to have hit the issue
>> you are mentioning. Wouldn't the following work:
>>
>> - if (d_page->vaddr)
>> -#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>> - d_page->p = phys_to_page(d_page->dma);
>> -#else
>> - d_page->p = virt_to_page(d_page->vaddr);
>> -#endif
>> - else {
>> + if (d_page->vaddr) {
>> + if (is_vmalloc_addr(d_page->vaddr)) {
>> + d_page->p = vmalloc_to_page(d_page->vaddr);
>> + } else {
>> + d_page->p = virt_to_page(d_page->vaddr);
>> + }
>> + } else {
>>
>> A remapped page will end up in the vmalloc range of the address space,
>> and in this case we can use vmalloc_to_page() to get the right page.
>> Pages outside of this range are part of the linear mapping and can be
>> resolved using virt_to_page().
>>
>> Jetson seems to be mostly happy with this, although I sometimes get the
>> following trace:
>>
>> [ 13.174763] kernel BUG at ../mm/slab.c:2593!
>> [ 13.174767] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
>> [ 13.174790] Modules linked in: nouveau_platform(O+) nouveau(O)
>> cfbfillrect cfbimgblt cfbcopyarea ttm
>> ...
>> [ 13.175234] [<c00de238>] (cache_alloc_refill) from [<c00de528>]
>> (__kmalloc+0x100/0x13c)
>> [ 13.175247] [<c00de528>] (__kmalloc) from [<c001d564>]
>> (arm_iommu_alloc_attrs+0x94/0x3a8)
>> [ 13.175269] [<c001d564>] (arm_iommu_alloc_attrs) from [<bf008f4c>]
>> (ttm_dma_populate+0x498/0x76c [ttm])
>> [ 13.175294] [<bf008f4c>] (ttm_dma_populate [ttm]) from [<bf000bb8>]
>> (ttm_tt_bind+0x38/0x68 [ttm])
>> [ 13.175315] [<bf000bb8>] (ttm_tt_bind [ttm]) from [<bf00298c>]
>> (ttm_bo_handle_move_mem+0x408/0x47c [ttm])
>> [ 13.175337] [<bf00298c>] (ttm_bo_handle_move_mem [ttm]) from
>> [<bf003758>] (ttm_bo_validate+0x220/0x22c [ttm])
>> [ 13.175359] [<bf003758>] (ttm_bo_validate [ttm]) from [<bf003984>]
>> (ttm_bo_init+0x220/0x338 [ttm])
>> [ 13.175480] [<bf003984>] (ttm_bo_init [ttm]) from [<bf0c70a0>]
>> (nouveau_bo_new+0x1c0/0x294 [nouveau])
>> [ 13.175688] [<bf0c70a0>] (nouveau_bo_new [nouveau]) from [<bf0ce88c>]
>> (nv84_fence_create+0x1cc/0x240 [nouveau])
>> [ 13.175891] [<bf0ce88c>] (nv84_fence_create [nouveau]) from
>> [<bf0cec90>] (nvc0_fence_create+0xc/0x24 [nouveau])
>> [ 13.176094] [<bf0cec90>] (nvc0_fence_create [nouveau]) from
>> [<bf0c1480>] (nouveau_accel_init+0xec/0x450 [nouveau])
>>
>> I suspect this is related to this change, but it might also be the
>> side-effect of another bug in my code.
>
> Ok, I finally understand where this trace comes from.
>
> slab.c:2593:
> BUG_ON(flags & GFP_SLAB_BUG_MASK);
>
> gfp.h:
> #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
>
> and ttm_dma_populate() calls ttm_dma_pool_init() with gfp_flags ==
> GFP_HIGHUSER, which includes __GFP_HIGHMEM. Somehow these flags are
> propagated through arm_iommu_alloc_attrs() up to the slab allocator,
> which meets this BUG_ON() when it needs to grow its cache.
>
> Note that ttm_dma_pool_init() can also be called with GFP_DMA32, which
> will trigger the same error.
>
> So although I am still not sure how this should be fixed, it seems like
> this has nothing to do with the change I am proposing.
>
> Temporary workaround:
>
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
> @@ -882,9 +882,9 @@ int ttm_dma_populate(struct ttm_dma_tt *ttm_dma,
> struct device *dev)
>
> type = ttm_to_type(ttm->page_flags, ttm->caching_state);
> if (ttm->page_flags & TTM_PAGE_FLAG_DMA32)
> - gfp_flags = GFP_USER | GFP_DMA32;
> + gfp_flags = GFP_USER;
> else
> - gfp_flags = GFP_HIGHUSER;
> + gfp_flags = GFP_USER;
> if (ttm->page_flags & TTM_PAGE_FLAG_ZERO_ALLOC)
> gfp_flags |= __GFP_ZERO;
>
> ... which obviously doesn't look right.
This, however, looks better:
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e8907117861e..bc495354c802 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1106,7 +1106,7 @@ static struct page **__iommu_alloc_buffer(struct
device *dev, size_t size,
int i = 0;
if (array_size <= PAGE_SIZE)
- pages = kzalloc(array_size, gfp);
+ pages = kzalloc(array_size, GFP_KERNEL);
else
pages = vzalloc(array_size);
if (!pages)
I don't see any reason why the array of pages should be allocated with
the same properties as the buffer itself. Looks like this is a DMA
allocator bug. Sorry about the noise.
Alex.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM
2014-12-08 7:14 ` Alexandre Courbot
2014-12-08 7:36 ` Alexandre Courbot
2014-12-08 8:11 ` Alexandre Courbot
@ 2014-12-15 8:04 ` Alexandre Courbot
2 siblings, 0 replies; 9+ messages in thread
From: Alexandre Courbot @ 2014-12-15 8:04 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Thomas Hellstrom, Arnd Bergmann, Konrad Rzeszutek Wilk,
dri-devel@lists.freedesktop.org, Allen Martin
On Mon, Dec 8, 2014 at 4:14 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> On 11/12/2014 09:39 PM, Thierry Reding wrote:
>>
>> From: Thierry Reding <treding@nvidia.com>
>>
>> dma_alloc_coherent() returns a kernel virtual address that is part of
>> the linear range. Passing such an address to virt_to_page() is illegal
>> on non-coherent architectures. This causes the kernel to oops on 64-bit
>> ARM because the struct page * obtained from virt_to_page() points to
>> unmapped memory.
>>
>> This commit fixes this by using phys_to_page() since we get a physical
>> address from dma_alloc_coherent(). Note that this is not a proper fix
>> because if an IOMMU is set up to translate addresses for the GPU this
>> address will be an I/O virtual address rather than a physical one. The
>> proper fix probably involves not getting a pointer to the struct page
>> in the first place, but that would be a much more intrusive change, if
>> at all possible.
>>
>> Until that time, this temporary fix will allow TTM to work on 32-bit
>> and 64-bit ARM as well, provided that no IOMMU translations are enabled
>> for the GPU.
>>
>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>> ---
>> Arnd, I realize that this isn't a proper fix according to what we
>> discussed on
>> IRC yesterday, but I can't see a way to remove access to the pages array
>> that
>> would be as simple as this. I've marked this as RFC in the hope that it
>> will
>> trigger some discussion that will lead to a proper solution.
>>
>> drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> index c96db433f8af..d7993985752c 100644
>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c
>> @@ -343,7 +343,11 @@ static struct dma_page *__ttm_dma_alloc_page(struct
>> dma_pool *pool)
>> &d_page->dma,
>> pool->gfp_flags);
>> if (d_page->vaddr)
>> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
>> + d_page->p = phys_to_page(d_page->dma);
>> +#else
>> d_page->p = virt_to_page(d_page->vaddr);
>> +#endif
>
>
> Since I am messing with the IOMMU I just happened to have hit the issue you
> are mentioning. Wouldn't the following work:
>
> - if (d_page->vaddr)
> -#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> - d_page->p = phys_to_page(d_page->dma);
> -#else
> - d_page->p = virt_to_page(d_page->vaddr);
> -#endif
> - else {
> + if (d_page->vaddr) {
> + if (is_vmalloc_addr(d_page->vaddr)) {
> + d_page->p = vmalloc_to_page(d_page->vaddr);
> + } else {
> + d_page->p = virt_to_page(d_page->vaddr);
> + }
> + } else {
>
> A remapped page will end up in the vmalloc range of the address space, and
> in this case we can use vmalloc_to_page() to get the right page. Pages
> outside of this range are part of the linear mapping and can be resolved
> using virt_to_page().
Thierry, have you had a chance to try this? If not, do you want to to
try to push this patch? It seems to solve the issue AFAICT, but needs
more testing.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-12-15 8:04 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-12 12:39 [RFC] drm/ttm: dma: Fixes for 32-bit and 64-bit ARM Thierry Reding
2014-11-12 14:18 ` Konrad Rzeszutek Wilk
2014-11-12 17:03 ` Arnd Bergmann
2014-12-01 16:43 ` Konrad Rzeszutek Wilk
2014-12-08 7:14 ` Alexandre Courbot
2014-12-08 7:36 ` Alexandre Courbot
2014-12-08 8:11 ` Alexandre Courbot
2014-12-08 8:19 ` Alexandre Courbot
2014-12-15 8:04 ` Alexandre Courbot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.