Converting OMAP's custom vram allocator

linux-omap.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Converting OMAP's custom vram allocator
@ 2012-09-05 10:08 Tomi Valkeinen
  2012-09-06 20:55 ` Tony Lindgren
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Tomi Valkeinen @ 2012-09-05 10:08 UTC (permalink / raw)
  To: linux-arm-kernel, linux-omap

[-- Attachment #1: Type: text/plain, Size: 1638 bytes --]

Hi,

OMAP has a custom video ram allocator, which I'd like to remove and use
the standard dma allocation functions.

There are two problems for which I'd like to hear suggestions or
comments:

First one is that the dma_alloc_* functions map the allocated memory for
cpu use. In many cases with OMAP DSS (display subsystem) this is not
needed: the memory may be written only by the SGX or the DSP, and it's
only read by the DSS, so it's never touched by the CPU.

This is even more true when using VRFB on omap3 (and probably TILER on
omap4) for rotation, as VRFB hides the actual memory and offers rotated
views. In this case the backend memory is never accessed by anyone else
than VRFB.

Is there a way to allocate the memory without creating a mapping? While
it won't break anything as such, the allocated areas can be quite large
thus causing large areas of the kernel's memory space to be needlessly
reserved.

The second case is passing a framebuffer address from the bootloader to
the kernel. Often with mobile devices the bootloader will initialize the
display hardware, showing a company logo or such. To keep the image on
the screen when kernel starts we need to reserve the same physical
memory area early at boot, and use that for the framebuffer.

I'm not sure if there's any actual problem with this one, presuming
there is a solution for the first case. Somehow the memory is reserved
at early boot time, and this is passed to the fb driver. But can the
memory be managed the same way as in normal case (for example freeing
it), or does it need to be handled as a special case?

 Tomi

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Converting OMAP's custom vram allocator
  2012-09-05 10:08 Converting OMAP's custom vram allocator Tomi Valkeinen
@ 2012-09-06 20:55 ` Tony Lindgren
  2012-09-06 21:35 ` Rob Clark
  2012-09-07  5:55 ` Marek Szyprowski
  2 siblings, 0 replies; 6+ messages in thread
From: Tony Lindgren @ 2012-09-06 20:55 UTC (permalink / raw)
  To: Tomi Valkeinen; +Cc: linux-arm-kernel, linux-omap

* Tomi Valkeinen <tomi.valkeinen@ti.com> [120905 03:09]:
> Hi,
> 
> OMAP has a custom video ram allocator, which I'd like to remove and use
> the standard dma allocation functions.
> 
> There are two problems for which I'd like to hear suggestions or
> comments:
> 
> First one is that the dma_alloc_* functions map the allocated memory for
> cpu use. In many cases with OMAP DSS (display subsystem) this is not
> needed: the memory may be written only by the SGX or the DSP, and it's
> only read by the DSS, so it's never touched by the CPU.
> 
> This is even more true when using VRFB on omap3 (and probably TILER on
> omap4) for rotation, as VRFB hides the actual memory and offers rotated
> views. In this case the backend memory is never accessed by anyone else
> than VRFB.
> 
> Is there a way to allocate the memory without creating a mapping? While
> it won't break anything as such, the allocated areas can be quite large
> thus causing large areas of the kernel's memory space to be needlessly
> reserved.
> 
> The second case is passing a framebuffer address from the bootloader to
> the kernel. Often with mobile devices the bootloader will initialize the
> display hardware, showing a company logo or such. To keep the image on
> the screen when kernel starts we need to reserve the same physical
> memory area early at boot, and use that for the framebuffer.
> 
> I'm not sure if there's any actual problem with this one, presuming
> there is a solution for the first case. Somehow the memory is reserved
> at early boot time, and this is passed to the fb driver. But can the
> memory be managed the same way as in normal case (for example freeing
> it), or does it need to be handled as a special case?

Sounds like you might be able to do this all with CMA with some
additional patches?

Tony

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Converting OMAP's custom vram allocator
  2012-09-05 10:08 Converting OMAP's custom vram allocator Tomi Valkeinen
  2012-09-06 20:55 ` Tony Lindgren
@ 2012-09-06 21:35 ` Rob Clark
  2012-09-07  5:55 ` Marek Szyprowski
  2 siblings, 0 replies; 6+ messages in thread
From: Rob Clark @ 2012-09-06 21:35 UTC (permalink / raw)
  To: Tomi Valkeinen; +Cc: linux-arm-kernel, linux-omap, linaro-mm-sig

On Wed, Sep 5, 2012 at 5:08 AM, Tomi Valkeinen <tomi.valkeinen@ti.com> wrote:
> Hi,
>
> OMAP has a custom video ram allocator, which I'd like to remove and use
> the standard dma allocation functions.
>
> There are two problems for which I'd like to hear suggestions or
> comments:
>
> First one is that the dma_alloc_* functions map the allocated memory for
> cpu use. In many cases with OMAP DSS (display subsystem) this is not
> needed: the memory may be written only by the SGX or the DSP, and it's
> only read by the DSS, so it's never touched by the CPU.

see dma_alloc_attrs() and DMA_ATTR_NO_KERNEL_MAPPING

> This is even more true when using VRFB on omap3 (and probably TILER on
> omap4) for rotation, as VRFB hides the actual memory and offers rotated
> views. In this case the backend memory is never accessed by anyone else
> than VRFB.

just fwiw, we don't actually need contiguous memory on o4/tiler :-)

(well, at least if you ignore things like secure playback)

> Is there a way to allocate the memory without creating a mapping? While
> it won't break anything as such, the allocated areas can be quite large
> thus causing large areas of the kernel's memory space to be needlessly
> reserved.
>
> The second case is passing a framebuffer address from the bootloader to
> the kernel. Often with mobile devices the bootloader will initialize the
> display hardware, showing a company logo or such. To keep the image on
> the screen when kernel starts we need to reserve the same physical
> memory area early at boot, and use that for the framebuffer.

with a bit of handwaving, this is possible.  You can pass a base
address to dma_declare_contiguous() when you setup your device's CMA
pool.  Although that doesn't really guarantee you're allocation from
that pool is at offset zero, I suppose.

> I'm not sure if there's any actual problem with this one, presuming
> there is a solution for the first case. Somehow the memory is reserved
> at early boot time, and this is passed to the fb driver. But can the
> memory be managed the same way as in normal case (for example freeing
> it), or does it need to be handled as a special case?

special-casing it might be better.. although possibly a dma attr could
be added for this to tell dma_alloc_from_contiguous() that we need a
particular address within the CMA pool.  It seems a bit like a hack,
but OTOH I guess pretty much every consumer device would need a hack
like this.

BR,
-R

>  Tomi
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Converting OMAP's custom vram allocator
  2012-09-05 10:08 Converting OMAP's custom vram allocator Tomi Valkeinen
  2012-09-06 20:55 ` Tony Lindgren
  2012-09-06 21:35 ` Rob Clark
@ 2012-09-07  5:55 ` Marek Szyprowski
  2012-09-07 10:54   ` Tomi Valkeinen
  2 siblings, 1 reply; 6+ messages in thread
From: Marek Szyprowski @ 2012-09-07  5:55 UTC (permalink / raw)
  To: 'Tomi Valkeinen', linux-arm-kernel, 'linux-omap'
  Cc: Marek Szyprowski

Hello,

On Wednesday, September 05, 2012 12:09 PM Tomi Valkeinen wrote:

> OMAP has a custom video ram allocator, which I'd like to remove and use
> the standard dma allocation functions.
> 
> There are two problems for which I'd like to hear suggestions or
> comments:
> 
> First one is that the dma_alloc_* functions map the allocated memory for
> cpu use. In many cases with OMAP DSS (display subsystem) this is not
> needed: the memory may be written only by the SGX or the DSP, and it's
> only read by the DSS, so it's never touched by the CPU.
> 
> This is even more true when using VRFB on omap3 (and probably TILER on
> omap4) for rotation, as VRFB hides the actual memory and offers rotated
> views. In this case the backend memory is never accessed by anyone else
> than VRFB.
> 
> Is there a way to allocate the memory without creating a mapping? While
> it won't break anything as such, the allocated areas can be quite large
> thus causing large areas of the kernel's memory space to be needlessly
> reserved.

Please check commits d5724f172fd1 and 955c757e090 merged to v3.6-rc1. 
Support for this attribute is now only available in IOMMU-aware 
dma-mapping implementation, but I plan to add it also to standard linear
ARM dma-mapping implementation based on alloc_pages_exact().

Some not-well-documented example can be found here: 
https://patchwork.kernel.org/patch/1323591/ (at the bottom).

You probably might need to add your own custom dma_map_ops set of functions
for TILER device, but I'm not really sure if I get right what does that 
device do and what will be the use cases for it.

 
> The second case is passing a framebuffer address from the bootloader to
> the kernel. Often with mobile devices the bootloader will initialize the
> display hardware, showing a company logo or such. To keep the image on
> the screen when kernel starts we need to reserve the same physical
> memory area early at boot, and use that for the framebuffer.
> 
> I'm not sure if there's any actual problem with this one, presuming
> there is a solution for the first case. Somehow the memory is reserved
> at early boot time, and this is passed to the fb driver. But can the
> memory be managed the same way as in normal case (for example freeing
> it), or does it need to be handled as a special case?

The only solution I see here is to use custom coherent memory pool for the
framebuffer device and setup it starting from the physical address of the
framebuffer configured by bootloader. See dma_declare_coherent() function.
Some usage example on ARM architecture can be found in 
arch/arm/plat-samsung/s5p-dev-mfc.c

The other possibility is to enable Contiguous Memory Allocator and define
a custom contiguous memory area for framebuffer device at the same 
physical address as configured by bootloader:
http://git.linaro.org/gitweb?p=people/mszyprowski/linux-archive.git;a=commitdiff;h=f8ff4f99cfa4f67e09a3c948e007e82a0c21434a

Feel free to comment both possibilities, maybe we can work out something
better for solving this quite common use case.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Converting OMAP's custom vram allocator
  2012-09-07  5:55 ` Marek Szyprowski
@ 2012-09-07 10:54   ` Tomi Valkeinen
  2012-09-14 14:10     ` Marek Szyprowski
  0 siblings, 1 reply; 6+ messages in thread
From: Tomi Valkeinen @ 2012-09-07 10:54 UTC (permalink / raw)
  To: Marek Szyprowski, Rob Clark; +Cc: linux-arm-kernel, 'linux-omap'

[-- Attachment #1: Type: text/plain, Size: 5278 bytes --]

On Fri, 2012-09-07 at 07:55 +0200, Marek Szyprowski wrote:
> Hello,
> 
> On Wednesday, September 05, 2012 12:09 PM Tomi Valkeinen wrote:
> 
> > OMAP has a custom video ram allocator, which I'd like to remove and use
> > the standard dma allocation functions.
> > 
> > There are two problems for which I'd like to hear suggestions or
> > comments:
> > 
> > First one is that the dma_alloc_* functions map the allocated memory for
> > cpu use. In many cases with OMAP DSS (display subsystem) this is not
> > needed: the memory may be written only by the SGX or the DSP, and it's
> > only read by the DSS, so it's never touched by the CPU.
> > 
> > This is even more true when using VRFB on omap3 (and probably TILER on
> > omap4) for rotation, as VRFB hides the actual memory and offers rotated
> > views. In this case the backend memory is never accessed by anyone else
> > than VRFB.
> > 
> > Is there a way to allocate the memory without creating a mapping? While
> > it won't break anything as such, the allocated areas can be quite large
> > thus causing large areas of the kernel's memory space to be needlessly
> > reserved.
> 
> Please check commits d5724f172fd1 and 955c757e090 merged to v3.6-rc1. 
> Support for this attribute is now only available in IOMMU-aware 
> dma-mapping implementation, but I plan to add it also to standard linear
> ARM dma-mapping implementation based on alloc_pages_exact().

Ok, good to know. Do you have any guestimate when the non-iommu version
could end up in the mainline? Any chance for 3.7? I volunteer for
testing if needed =).

> Some not-well-documented example can be found here: 
> https://patchwork.kernel.org/patch/1323591/ (at the bottom).
> 
> You probably might need to add your own custom dma_map_ops set of functions
> for TILER device, but I'm not really sure if I get right what does that 
> device do and what will be the use cases for it.

I think we have three different cases how we need to manage the memory
used for video on OMAP.

1) Conventional case, without VRFB/TILER. We need large contiguous
areas. I think we usually want both normal kernel and userspace mapping
in this case, although some use cases could not need those.

2) VRFB (omap3). In this case we need large contigous area, which is
given to the VRFB hardware to be used as a storage. This area is never
mapped. VRFB offers four rotated "views" (i.e. memory areas), which give
a 0/90/180/270 degree view of the same image, and we will create mapping
of these views with ioremap. The actual data is stored in the memory by
VRFB in a proprietary format.

3) TILER (omap4). I'm not too familiar with TILER, but afaik it's kinda
like a better version of VRFB. In this case we don't need contiguous
memory, but like VRFB, we never create mapping for the memory. (Rob,
correct me if I'm wrong).

I think we can manage all of those with dma_alloc_attrs(), even though
contiguous area is not really needed for TILER.

So, if I define DMA_ATTR_NO_KERNEL_MAPPING, there's no point in defining
DMA_ATTR_WRITE_COMBINE at the same time, right?

Can I still create the kernel mapping for the allocated memory later,
yielding the same result as if I would've omitted
DMA_ATTR_NO_KERNEL_MAPPING?

> > The second case is passing a framebuffer address from the bootloader to
> > the kernel. Often with mobile devices the bootloader will initialize the
> > display hardware, showing a company logo or such. To keep the image on
> > the screen when kernel starts we need to reserve the same physical
> > memory area early at boot, and use that for the framebuffer.
> > 
> > I'm not sure if there's any actual problem with this one, presuming
> > there is a solution for the first case. Somehow the memory is reserved
> > at early boot time, and this is passed to the fb driver. But can the
> > memory be managed the same way as in normal case (for example freeing
> > it), or does it need to be handled as a special case?
> 
> The only solution I see here is to use custom coherent memory pool for the
> framebuffer device and setup it starting from the physical address of the
> framebuffer configured by bootloader. See dma_declare_coherent() function.
> Some usage example on ARM architecture can be found in 
> arch/arm/plat-samsung/s5p-dev-mfc.c
> 
> The other possibility is to enable Contiguous Memory Allocator and define
> a custom contiguous memory area for framebuffer device at the same 
> physical address as configured by bootloader:
> http://git.linaro.org/gitweb?p=people/mszyprowski/linux-archive.git;a=commitdiff;h=f8ff4f99cfa4f67e09a3c948e007e82a0c21434a
> 
> Feel free to comment both possibilities, maybe we can work out something
> better for solving this quite common use case.

I think CMA is definitely the way to go.

But I'm not quite sure how it should be used in this case. I understand
how to reserve the memory area at boot time, as the patch in your link
shows, but how should the driver get the memory?

Normally the driver would just use dma_alloc_*, and the reserved CMA
area would be used automatically, right? But in this case we want to get
the allocation from a particular physical address of the private area.

 Tomi

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Converting OMAP's custom vram allocator
  2012-09-07 10:54   ` Tomi Valkeinen
@ 2012-09-14 14:10     ` Marek Szyprowski
  0 siblings, 0 replies; 6+ messages in thread
From: Marek Szyprowski @ 2012-09-14 14:10 UTC (permalink / raw)
  To: 'Tomi Valkeinen', 'Rob Clark'
  Cc: linux-arm-kernel, 'linux-omap'

Hello,

On Friday, September 07, 2012 12:55 PM Tomi Valkeinen wrote:

> On Fri, 2012-09-07 at 07:55 +0200, Marek Szyprowski wrote:
> > Hello,
> >
> > On Wednesday, September 05, 2012 12:09 PM Tomi Valkeinen wrote:
> >
> > > OMAP has a custom video ram allocator, which I'd like to remove and use
> > > the standard dma allocation functions.
> > >
> > > There are two problems for which I'd like to hear suggestions or
> > > comments:
> > >
> > > First one is that the dma_alloc_* functions map the allocated memory for
> > > cpu use. In many cases with OMAP DSS (display subsystem) this is not
> > > needed: the memory may be written only by the SGX or the DSP, and it's
> > > only read by the DSS, so it's never touched by the CPU.
> > >
> > > This is even more true when using VRFB on omap3 (and probably TILER on
> > > omap4) for rotation, as VRFB hides the actual memory and offers rotated
> > > views. In this case the backend memory is never accessed by anyone else
> > > than VRFB.
> > >
> > > Is there a way to allocate the memory without creating a mapping? While
> > > it won't break anything as such, the allocated areas can be quite large
> > > thus causing large areas of the kernel's memory space to be needlessly
> > > reserved.
> >
> > Please check commits d5724f172fd1 and 955c757e090 merged to v3.6-rc1.
> > Support for this attribute is now only available in IOMMU-aware
> > dma-mapping implementation, but I plan to add it also to standard linear
> > ARM dma-mapping implementation based on alloc_pages_exact().
> 
> Ok, good to know. Do you have any guestimate when the non-iommu version
> could end up in the mainline? Any chance for 3.7? I volunteer for
> testing if needed =).

Well, I'm not sure if I manage to have it ready for 3.7. I was very busy this
week and now I'm just leaving the office for my vacations and I wonder if I
manage to work on it just after getting back... Feel free to provide a patch 
which add such feature, then I will schedule it for inclusion to mainline.

> > Some not-well-documented example can be found here:
> > https://patchwork.kernel.org/patch/1323591/ (at the bottom).
> >
> > You probably might need to add your own custom dma_map_ops set of functions
> > for TILER device, but I'm not really sure if I get right what does that
> > device do and what will be the use cases for it.
> 
> I think we have three different cases how we need to manage the memory
> used for video on OMAP.
> 
> 1) Conventional case, without VRFB/TILER. We need large contiguous
> areas. I think we usually want both normal kernel and userspace mapping
> in this case, although some use cases could not need those.
> 
> 2) VRFB (omap3). In this case we need large contigous area, which is
> given to the VRFB hardware to be used as a storage. This area is never
> mapped. VRFB offers four rotated "views" (i.e. memory areas), which give
> a 0/90/180/270 degree view of the same image, and we will create mapping
> of these views with ioremap. The actual data is stored in the memory by
> VRFB in a proprietary format.
> 
> 3) TILER (omap4). I'm not too familiar with TILER, but afaik it's kinda
> like a better version of VRFB. In this case we don't need contiguous
> memory, but like VRFB, we never create mapping for the memory. (Rob,
> correct me if I'm wrong).
> 
> I think we can manage all of those with dma_alloc_attrs(), even though
> contiguous area is not really needed for TILER.

dma_alloc_attrs()/dma_alloc_coherent() plays with memory which is 
contiguous in the dma (io) address space. It doesn't need to be contiguous 
in physical memory if device has iommu (or iommu-like physical memory
interface).

> So, if I define DMA_ATTR_NO_KERNEL_MAPPING, there's no point in defining
> DMA_ATTR_WRITE_COMBINE at the same time, right?

Yes and no. It might be useful for creating userspace mappings on systems
which support write-combining. Please note that attributes which are not
supported by some systems are simply ignored. So if driver specifies both,
some systems might benefit from using NO_KERNEL_MAPPING, the other will 
benefit from WRITE_COMBINE mappings. Both can coexist without a single 
change to the device driver.

> Can I still create the kernel mapping for the allocated memory later,
> yielding the same result as if I would've omitted
> DMA_ATTR_NO_KERNEL_MAPPING?

Well, this will probably work, but it is not yet officially supported by the 
dma-mapping, but I'm aware of such use cases and specifying how to do it right
is also on my todo list.

> > > The second case is passing a framebuffer address from the bootloader to
> > > the kernel. Often with mobile devices the bootloader will initialize the
> > > display hardware, showing a company logo or such. To keep the image on
> > > the screen when kernel starts we need to reserve the same physical
> > > memory area early at boot, and use that for the framebuffer.
> > >
> > > I'm not sure if there's any actual problem with this one, presuming
> > > there is a solution for the first case. Somehow the memory is reserved
> > > at early boot time, and this is passed to the fb driver. But can the
> > > memory be managed the same way as in normal case (for example freeing
> > > it), or does it need to be handled as a special case?
> >
> > The only solution I see here is to use custom coherent memory pool for the
> > framebuffer device and setup it starting from the physical address of the
> > framebuffer configured by bootloader. See dma_declare_coherent() function.
> > Some usage example on ARM architecture can be found in
> > arch/arm/plat-samsung/s5p-dev-mfc.c
> >
> > The other possibility is to enable Contiguous Memory Allocator and define
> > a custom contiguous memory area for framebuffer device at the same
> > physical address as configured by bootloader:
> > http://git.linaro.org/gitweb?p=people/mszyprowski/linux-
> archive.git;a=commitdiff;h=f8ff4f99cfa4f67e09a3c948e007e82a0c21434a
> >
> > Feel free to comment both possibilities, maybe we can work out something
> > better for solving this quite common use case.
> 
> I think CMA is definitely the way to go.
> 
> But I'm not quite sure how it should be used in this case. I understand
> how to reserve the memory area at boot time, as the patch in your link
> shows, but how should the driver get the memory?

The driver allocates in a standard way - dma_alloc_{coherent,writecombine,attrs}().
It is up to dma-mapping framework to use the right memory regions basing on 
the passed device pointer. Exactly the same driver interface is used for 
dma_declare_coherent() memory regions which are not shared with the system.

> Normally the driver would just use dma_alloc_*, and the reserved CMA
> area would be used automatically, right?

Right.

> But in this case we want to get
> the allocation from a particular physical address of the private area.

The idea was to start the reserved area exactly at the address which is used
by bootloader to set the initial framebuffer. This way the first allocation 
will come from the beginning of such region fitting exactly into the initial 
framebuffer set by bootloader. I know that this is hacky, but right now I 
haven't found anything better, what might fit into the existing dma-mapping
api.

Best regards
-- 
Marek Szyprowski
Samsung Poland R&D Center



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-09-14 14:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-05 10:08 Converting OMAP's custom vram allocator Tomi Valkeinen
2012-09-06 20:55 ` Tony Lindgren
2012-09-06 21:35 ` Rob Clark
2012-09-07  5:55 ` Marek Szyprowski
2012-09-07 10:54   ` Tomi Valkeinen
2012-09-14 14:10     ` Marek Szyprowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).