Linux Framebuffer Layer development

Linux Framebuffer Layer development
 help / color / mirror / Atom feed

* [RFC 0/1] Adding DT support to video/da8xx-fb.c
From: Darren Etheridge @ 2013-08-08 20:15 UTC (permalink / raw)
  To: devicetree, tomi.valkeinen, plagnioj, linux-fbdev, detheridge; +Cc: afzal

This is part of a larger series of patches to upgrade the da8xx-fb.c driver
to support the Texas Instruments AM335x device.  As part of this upgrade
we also want to add devicetree support for both the original da8xx and the
am335x.  Tomi Valkeinen has reviewed the fbdev changes but he suggested
that it was prudent to extract the dt pieces and run it through the
devicetree mailing list for review.

Thanks,
Darren

Darren Etheridge (1):
  video: da8xx-fb: adding dt support

 .../devicetree/bindings/video/fb-da8xx.txt         |   37 +++++++++++
 drivers/video/da8xx-fb.c                           |   67 +++++++++++++++++++-
 2 files changed, 101 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/video/fb-da8xx.txt

^ permalink raw reply

* Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Rob Clark @ 2013-08-07 18:12 UTC (permalink / raw)
  To: Tom Cooksey
  Cc: dri-devel, linux-fbdev, Pawel Moll, linux-arm-kernel, linux-media,
	linaro-mm-sig, linux-kernel
In-Reply-To: <520284d6.07300f0a.72a4.1623SMTPIN_ADDED_BROKEN@mx.google.com>

On Wed, Aug 7, 2013 at 1:33 PM, Tom Cooksey <tom.cooksey@arm.com> wrote:
>
>> >> > Didn't you say that programmatically describing device placement
>> >> > constraints was an unbounded problem? I guess we would have to
>> >> > accept that it's not possible to describe all possible constraints
>> >> > and instead find a way to describe the common ones?
>> >>
>> >> well, the point I'm trying to make, is by dividing your constraints
>> >> into two groups, one that impacts and is handled by userspace, and
>> >> one that is in the kernel (ie. where the pages go), you cut down
>> >> the number of permutations that the kernel has to care about
>> >>  considerably. And kernel already cares about, for example, what
>> >> range of addresses that a device can dma to/from.  I think really
>> >> the only thing missing is the max # of sglist entries (contiguous
>> >> or not)
>> >
>> > I think it's more than physically contiguous or not.
>> >
>> > For example, it can be more efficient to use large page sizes on
>> > devices with IOMMUs to reduce TLB traffic. I think the size and even
>> > the availability of large pages varies between different IOMMUs.
>>
>> sure.. but I suppose if we can spiff out dma_params to express "I need
>> contiguous", perhaps we can add some way to express "I prefer
>> as-contiguous-as-possible".. either way, this is about where the pages
>> are placed, and not about the layout of pixels within the page, so
>> should be in kernel.  It's something that is missing, but I believe
>> that it belongs in dma_params and hidden behind dma_alloc_*() for
>> simple drivers.
>
> Thinking about it, isn't this more a property of the IOMMU? I mean,
> are there any cases where an IOMMU had a large page mode but you
> wouldn't want to use it? So when allocating the memory, you'd have to
> take into account not just the constraints of the devices themselves,
> but also of any IOMMUs any of the device sit behind?
>

perhaps yes.  But the device is associated w/ the iommu it is attached
to, so this shouldn't be a problem

>
>> > There's also the issue of buffer stride alignment. As I say, if the
>> > buffer is to be written by a tile-based GPU like Mali, it's more
>> > efficient if the buffer's stride is aligned to the max AXI bus burst
>> > length. Though I guess a buffer stride only makes sense as a concept
>> > when interpreting the data as a linear-layout 2D image, so perhaps
>> > belongs in user-space along with format negotiation?
>> >
>>
>> Yeah.. this isn't about where the pages go, but about the arrangement
>> within a page.
>>
>> And, well, except for hw that supports the same tiling (or
>> compressed-fb) in display+gpu, you probably aren't sharing tiled
>> buffers.
>
> You'd only want to share a buffer between devices if those devices can
> understand the same pixel format. That pixel format can't be device-
> specific or opaque, it has to be explicit. I think drm_fourcc.h is
> what defines all the possible pixel formats. This is the enum I used
> in EGL_EXT_image_dma_buf_import at least. So if we get to the point
> where multiple devices can understand a tiled or compressed format, I
> assume we could just add that format to drm_fourcc.h and possibly
> v4l2's v4l2_mbus_pixelcode enum in v4l2-mediabus.h.
>
> For user-space to negotiate a common pixel format and now stride
> alignment, I guess it will obviously need a way to query what pixel
> formats a device supports and what its stride alignment requirements
> are.
>
> I don't know v4l2 very well, but it certainly seems the pixel format
> can be queried using V4L2_SUBDEV_FORMAT_TRY when attempting to set
> a particular format. I couldn't however find a way to retrieve a list
> of supported formats - it seems the mechanism is to try out each
> format in turn to determine if it is supported. Is that right?

it is exposed for drm plane's.  What is missing is to expose the
primary-plane associated with the crtc.

> There doesn't however seem a way to query what stride constraints a
> V4l2 device might have. Does HW abstracted by v4l2 typically have
> such constraints? If so, how can we query them such that a buffer
> allocated by a DRM driver can be imported into v4l2 and used with
> that HW?
>
> Turning to DRM/KMS, it seems the supported formats of a plane can be
> queried using drm_mode_get_plane. However, there doesn't seem to be a
> way to query the supported formats of a crtc? If display HW only
> supports scanning out from a single buffer (like pl111 does), I think
> it won't have any planes and a fb can only be set on the crtc. In
> which case, how should user-space query which pixel formats that crtc
> supports?
>
> Assuming user-space can query the supported formats and find a common
> one, it will need to allocate a buffer. Looks like
> drm_mode_create_dumb can do that, but it only takes a bpp parameter,
> there's no format parameter. I assume then that user-space defines
> the format and tells the DRM driver which format the buffer is in
> when creating the fb with drm_mode_fb_cmd2, which does take a format
> parameter? Is that right?

Right, the gem object has no inherent format, it is just some bytes.
The format/width/height/pitch are all attributes of the fb.

> As with v4l2, DRM doesn't appear to have a way to query the stride
> constraints? Assuming there is a way to query the stride constraints,
> there also isn't a way to specify them when creating a buffer with
> DRM, though perhaps the existing pitch parameter of
> drm_mode_create_dumb could be used to allow user-space to pass in a
> minimum stride as well as receive the allocated stride?
>

well, you really shouldn't be using create_dumb..  you should have a
userspace piece that is specific to the drm driver, and knows how to
use that driver's gem allocate ioctl.

>
>> >> > One problem with this is it duplicates a lot of logic in each
>> >> > driver which can export a dma_buf buffer. Each exporter will
>> >> > need to do pretty much the same thing: iterate over all the
>> >> > attachments, determine of all the constraints (assuming that
>> >> > can be done) and allocate pages such that the lowest-common-
>> >> > denominator is satisfied. Perhaps rather than duplicating that
>> >> > logic in every driver, we could instead move allocation of the
>> >> > backing pages into dma_buf itself?
>> >>
>> >> I tend to think it is better to add helpers as we see common
>>
>> >> patterns emerge, which drivers can opt-in to using.  I don't
>> >> think that we should move allocation into dma_buf itself, but
>> >> it would perhaps be useful to have dma_alloc_*() variants that
>> >> could allocate for multiple devices.
>> >
>> > A helper could work I guess, though I quite like the idea of
>> > having dma_alloc_*() variants which take a list of devices to
>> > allocate memory for.
>> >
>> >
>> >> That would help for simple stuff, although I'd suspect
>> >> eventually a GPU driver will move away from that.  (Since you
>> >> probably want to play tricks w/ pools of pages that are
>> >> pre-zero'd and in the correct cache state, use spare cycles on
>> >> the gpu or dma engine to pre-zero uncached pages, and games
>> >> like that.)
>> >
>> > So presumably you're talking about a GPU driver being the exporter
>> > here? If so, how could the GPU driver do these kind of tricks on
>> > memory shared with another device?
>>
>> Yes, that is gpu-as-exporter.  If someone else is allocating buffers,
>> it is up to them to do these tricks or not.  Probably there is a
>> pretty good chance that if you aren't a GPU you don't need those sort
>> of tricks for fast allocation of transient upload buffers, staging
>> textures, temporary pixmaps, etc.  Ie. I don't really think a v4l
>> camera or video decoder would benefit from that sort of optimization.
>
> Right - but none of those are really buffers you'd want to export with
> dma_buf to share with another device are they? In which case, why not
> just have dma_buf figure out the constraints and allocate the memory?

maybe not.. but (a) you don't necessarily know at creation time if it
is going to be exported (maybe you know if it is definitely not going
to be exported, but the converse is not true), and (b) there isn't
really any reason to special case the allocation in the driver because
it is going to be exported.

helpers that can be used by simple drivers, yes.  Forcing the way the
buffer is allocated, for sure not.  Currently, for example, there is
no issue to export a buffer allocated from stolen-mem.  If we put the
page allocation in dma-buf, this would not be possible.  That is just
one quick example off the top of my head, I'm sure there are plenty
more.  But we definitely do not want the allocate in dma_buf itself.

> If a driver needs to allocate memory in a special way for a particular
> device, I can't really imagine how it would be able to share that
> buffer with another device using dma_buf? I guess a driver is likely
> to need some magic voodoo to configure access to the buffer for its
> device, but surely that would be done by the dma_mapping framework
> when dma_buf_map happens?
>

if, what it has to configure actually manages to fit in the
dma-mapping framework

anyways, where the pages come from has nothing to do with whether a
buffer can be shared or not

>
>
>> >> You probably want to get out of the SoC mindset, otherwise you are
>> >> going to make bad assumptions that come back to bite you later on.
>> >
>> > Sure - there are always going to be PC-like devices where the
>> > hardware configuration isn't fixed like it is on a traditional SoC.
>> > But I'd rather have a simple solution which works on traditional SoCs
>> > than no solution at all. Today our solution is to over-load the dumb
>> > buffer alloc functions of the display's DRM driver - For now I'm just
>> > looking for the next step up from that! ;-)
>>
>> True.. the original intention, which is perhaps a bit desktop-centric,
>> really was for there to be a userspace component talking to the drm
>> driver for allocation, ie. xf86-video-foo and/or
>> src/gallium/drivers/foo (for example) ;-)
>>
>> Which means for x11 having a SoC vendor specific xf86-video-foo for
>> x11..  or vendor specific gbm implementation for wayland.  (Although
>> at least in the latter case it is a pretty small piece of code.)  But
>> that is probably what you are trying to avoid.
>
> I've been trying to get my head around how PRIME relates to DDX
> drivers. As I understand it (which is likely wrong), you have a laptop
> with both an Intel & an nVidia GPU. You have both the i915 & nouveau
> kernel drivers loaded. What I'm not sure about is which GPU's display
> controller is actually hooked up to the physical connector? Perhaps
> there is a MUX like there is on Versatile Express?

afaiu it can be a, b, or c (ie. either gpu can have the display or
there can be a mux)..

> What I also don't understand is what DDX driver is loaded? Is it
> xf86-video-intel, xf86-video-nouveau or both? I get the impression
> that there's a "master" DDX which implements 2D operations but can
> import buffers using PRIME from the other driver and draw to them.
> Or is it more that it's able to export rendered buffers to the
> "slave" DRM for scanout? Either way, it's pretty similar to an ARM
> SoC setup which has the GPU and the display as two totally
> independent devices.
>
>
>
>> At any rate, for both xorg and wayland/gbm, you know when a buffer is
>> going to be a scanout buffer.  What I'd recommend is define a small
>> userspace API that your customers (the SoC vendors) implement to
>> allocate a scanout buffer and hand you back a dmabuf fd.  That could
>> be used both for x11 and for gbm.  Inputs should be requested
>> width/height and format.  And outputs pitch plus dmabuf fd.
>>
>> (Actually you might even just want to use gbm as your starting point.
>> You could probably just use gbm from xf86-video-armsoc for allocation,
>> to have one thing that works for both wayland and x11.  Scanout and
>> cursor buffers should go to vendor/SoC specific fxn, rest can be
>> allocated from mali kernel driver.)
>
> What does that buy us over just using drm_mode_create_dumb on the
> display's DRM driver?
>

well, for example, if there was actually some hw w/ omap's dss + mali,
you could actually have mali render transparently to tiled buffers
which could be scanned out rotated.  Which would not be possible w/
dumb buffers.

>
>
>> >> > wouldn't need a way to programmatically describe the constraints
>> >> > either: As you say, if userspace sets the "SCANOUT" flag, it would
>> >> > just "know" that on this SoC, that buffer needs to be physically
>> >> > contiguous for example.
>> >>
>> >> not really.. it just knows it wants to scanout the buffer, and tells
>> >> this as a hint to the kernel.
>> >>
>> >> For example, on omapdrm, the SCANOUT flag does nothing on omap4+
>> >> (where phys contig is not required for scanout), but causes CMA
>> >> (dma_alloc_*()) to be used on omap3.  Userspace doesn't care.
>> >> It just knows that it wants to be able to scanout that particular
>> >> buffer.
>> >
>> > I think that's the idea? The omap3's allocator driver would use
>> > contiguous memory when it detects the SCANOUT flag whereas the omap4
>> > allocator driver wouldn't have to. No complex negotiation of
>> > constraints - it just "knows".
>> >
>>
>> well, it is same allocating driver in both cases (although maybe that
>> is unimportant).  The "it" that just knows it wants to scanout is
>> userspace.  The "it" that just knows that scanout translates to
>> contiguous (or not) is the kernel.  Perhaps we are saying the same
>> thing ;-)
>
> Yeah - I think we are... so what's the issue with having a per-SoC
> allocation driver again?
>

In a way the display driver is a per-SoC allocator.  But not
necessarily the *central* allocator for everything.  Ie. no need for
display driver to allocate vertex buffers for a separate gpu driver,
and that sort of thing.

BR,
-R

>
>
> Cheers,
>
> Tom
>
>
>
>
>

^ permalink raw reply

* Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Alex Deucher @ 2013-08-07 17:56 UTC (permalink / raw)
  To: Tom Cooksey
  Cc: Rob Clark, linux-fbdev, Pawel Moll, linux-kernel, dri-devel,
	linaro-mm-sig, linux-arm-kernel, linux-media
In-Reply-To: <520284fe.a16ec20a.2d3c.6e19SMTPIN_ADDED_BROKEN@mx.google.com>

On Wed, Aug 7, 2013 at 1:33 PM, Tom Cooksey <tom.cooksey@arm.com> wrote:
>
>> >> > Didn't you say that programmatically describing device placement
>> >> > constraints was an unbounded problem? I guess we would have to
>> >> > accept that it's not possible to describe all possible constraints
>> >> > and instead find a way to describe the common ones?
>> >>
>> >> well, the point I'm trying to make, is by dividing your constraints
>> >> into two groups, one that impacts and is handled by userspace, and
>> >> one that is in the kernel (ie. where the pages go), you cut down
>> >> the number of permutations that the kernel has to care about
>> >>  considerably. And kernel already cares about, for example, what
>> >> range of addresses that a device can dma to/from.  I think really
>> >> the only thing missing is the max # of sglist entries (contiguous
>> >> or not)
>> >
>> > I think it's more than physically contiguous or not.
>> >
>> > For example, it can be more efficient to use large page sizes on
>> > devices with IOMMUs to reduce TLB traffic. I think the size and even
>> > the availability of large pages varies between different IOMMUs.
>>
>> sure.. but I suppose if we can spiff out dma_params to express "I need
>> contiguous", perhaps we can add some way to express "I prefer
>> as-contiguous-as-possible".. either way, this is about where the pages
>> are placed, and not about the layout of pixels within the page, so
>> should be in kernel.  It's something that is missing, but I believe
>> that it belongs in dma_params and hidden behind dma_alloc_*() for
>> simple drivers.
>
> Thinking about it, isn't this more a property of the IOMMU? I mean,
> are there any cases where an IOMMU had a large page mode but you
> wouldn't want to use it? So when allocating the memory, you'd have to
> take into account not just the constraints of the devices themselves,
> but also of any IOMMUs any of the device sit behind?
>
>
>> > There's also the issue of buffer stride alignment. As I say, if the
>> > buffer is to be written by a tile-based GPU like Mali, it's more
>> > efficient if the buffer's stride is aligned to the max AXI bus burst
>> > length. Though I guess a buffer stride only makes sense as a concept
>> > when interpreting the data as a linear-layout 2D image, so perhaps
>> > belongs in user-space along with format negotiation?
>> >
>>
>> Yeah.. this isn't about where the pages go, but about the arrangement
>> within a page.
>>
>> And, well, except for hw that supports the same tiling (or
>> compressed-fb) in display+gpu, you probably aren't sharing tiled
>> buffers.
>
> You'd only want to share a buffer between devices if those devices can
> understand the same pixel format. That pixel format can't be device-
> specific or opaque, it has to be explicit. I think drm_fourcc.h is
> what defines all the possible pixel formats. This is the enum I used
> in EGL_EXT_image_dma_buf_import at least. So if we get to the point
> where multiple devices can understand a tiled or compressed format, I
> assume we could just add that format to drm_fourcc.h and possibly
> v4l2's v4l2_mbus_pixelcode enum in v4l2-mediabus.h.
>
> For user-space to negotiate a common pixel format and now stride
> alignment, I guess it will obviously need a way to query what pixel
> formats a device supports and what its stride alignment requirements
> are.
>
> I don't know v4l2 very well, but it certainly seems the pixel format
> can be queried using V4L2_SUBDEV_FORMAT_TRY when attempting to set
> a particular format. I couldn't however find a way to retrieve a list
> of supported formats - it seems the mechanism is to try out each
> format in turn to determine if it is supported. Is that right?
>
> There doesn't however seem a way to query what stride constraints a
> V4l2 device might have. Does HW abstracted by v4l2 typically have
> such constraints? If so, how can we query them such that a buffer
> allocated by a DRM driver can be imported into v4l2 and used with
> that HW?
>
> Turning to DRM/KMS, it seems the supported formats of a plane can be
> queried using drm_mode_get_plane. However, there doesn't seem to be a
> way to query the supported formats of a crtc? If display HW only
> supports scanning out from a single buffer (like pl111 does), I think
> it won't have any planes and a fb can only be set on the crtc. In
> which case, how should user-space query which pixel formats that crtc
> supports?
>
> Assuming user-space can query the supported formats and find a common
> one, it will need to allocate a buffer. Looks like
> drm_mode_create_dumb can do that, but it only takes a bpp parameter,
> there's no format parameter. I assume then that user-space defines
> the format and tells the DRM driver which format the buffer is in
> when creating the fb with drm_mode_fb_cmd2, which does take a format
> parameter? Is that right?
>
> As with v4l2, DRM doesn't appear to have a way to query the stride
> constraints? Assuming there is a way to query the stride constraints,
> there also isn't a way to specify them when creating a buffer with
> DRM, though perhaps the existing pitch parameter of
> drm_mode_create_dumb could be used to allow user-space to pass in a
> minimum stride as well as receive the allocated stride?
>
>
>> >> > One problem with this is it duplicates a lot of logic in each
>> >> > driver which can export a dma_buf buffer. Each exporter will
>> >> > need to do pretty much the same thing: iterate over all the
>> >> > attachments, determine of all the constraints (assuming that
>> >> > can be done) and allocate pages such that the lowest-common-
>> >> > denominator is satisfied. Perhaps rather than duplicating that
>> >> > logic in every driver, we could instead move allocation of the
>> >> > backing pages into dma_buf itself?
>> >>
>> >> I tend to think it is better to add helpers as we see common
>>
>> >> patterns emerge, which drivers can opt-in to using.  I don't
>> >> think that we should move allocation into dma_buf itself, but
>> >> it would perhaps be useful to have dma_alloc_*() variants that
>> >> could allocate for multiple devices.
>> >
>> > A helper could work I guess, though I quite like the idea of
>> > having dma_alloc_*() variants which take a list of devices to
>> > allocate memory for.
>> >
>> >
>> >> That would help for simple stuff, although I'd suspect
>> >> eventually a GPU driver will move away from that.  (Since you
>> >> probably want to play tricks w/ pools of pages that are
>> >> pre-zero'd and in the correct cache state, use spare cycles on
>> >> the gpu or dma engine to pre-zero uncached pages, and games
>> >> like that.)
>> >
>> > So presumably you're talking about a GPU driver being the exporter
>> > here? If so, how could the GPU driver do these kind of tricks on
>> > memory shared with another device?
>>
>> Yes, that is gpu-as-exporter.  If someone else is allocating buffers,
>> it is up to them to do these tricks or not.  Probably there is a
>> pretty good chance that if you aren't a GPU you don't need those sort
>> of tricks for fast allocation of transient upload buffers, staging
>> textures, temporary pixmaps, etc.  Ie. I don't really think a v4l
>> camera or video decoder would benefit from that sort of optimization.
>
> Right - but none of those are really buffers you'd want to export with
> dma_buf to share with another device are they? In which case, why not
> just have dma_buf figure out the constraints and allocate the memory?
>
> If a driver needs to allocate memory in a special way for a particular
> device, I can't really imagine how it would be able to share that
> buffer with another device using dma_buf? I guess a driver is likely
> to need some magic voodoo to configure access to the buffer for its
> device, but surely that would be done by the dma_mapping framework
> when dma_buf_map happens?
>
>
>
>> >> You probably want to get out of the SoC mindset, otherwise you are
>> >> going to make bad assumptions that come back to bite you later on.
>> >
>> > Sure - there are always going to be PC-like devices where the
>> > hardware configuration isn't fixed like it is on a traditional SoC.
>> > But I'd rather have a simple solution which works on traditional SoCs
>> > than no solution at all. Today our solution is to over-load the dumb
>> > buffer alloc functions of the display's DRM driver - For now I'm just
>> > looking for the next step up from that! ;-)
>>
>> True.. the original intention, which is perhaps a bit desktop-centric,
>> really was for there to be a userspace component talking to the drm
>> driver for allocation, ie. xf86-video-foo and/or
>> src/gallium/drivers/foo (for example) ;-)
>>
>> Which means for x11 having a SoC vendor specific xf86-video-foo for
>> x11..  or vendor specific gbm implementation for wayland.  (Although
>> at least in the latter case it is a pretty small piece of code.)  But
>> that is probably what you are trying to avoid.
>
> I've been trying to get my head around how PRIME relates to DDX
> drivers. As I understand it (which is likely wrong), you have a laptop
> with both an Intel & an nVidia GPU. You have both the i915 & nouveau
> kernel drivers loaded. What I'm not sure about is which GPU's display
> controller is actually hooked up to the physical connector? Perhaps
> there is a MUX like there is on Versatile Express?
>
> What I also don't understand is what DDX driver is loaded? Is it
> xf86-video-intel, xf86-video-nouveau or both? I get the impression
> that there's a "master" DDX which implements 2D operations but can
> import buffers using PRIME from the other driver and draw to them.
> Or is it more that it's able to export rendered buffers to the
> "slave" DRM for scanout? Either way, it's pretty similar to an ARM
> SoC setup which has the GPU and the display as two totally
> independent devices.
>

In the early days, there was a MUX to switch the displays between the
two GPUs, but most modern systems are MUX-less and the dGPU is either
connected to no displays or in some cases the local panel is attached
to the integrated GPU and the external displays are connected to the
dGPU.

In the MUX-less case, the dGPU can be used to render, and then the
result is copied to the integrated GPU for display.

Alex

^ permalink raw reply

* Re: [RFC 1/1] drm/pl111: Initial drm/kms driver for pl111
From: Daniel Vetter @ 2013-08-07 16:46 UTC (permalink / raw)
  To: Rob Clark; +Cc: dri-devel, linux-fbdev, pawel.moll, linux-arm-kernel
In-Reply-To: <CAF6AEGvGG1-4k-3_YHQ2ES6JEb-V-Xuicc8gfw9rPWze5JUEDg@mail.gmail.com>

Just comment a bit on Rob's review with my own opinion.

On Wed, Aug 07, 2013 at 12:17:21PM -0400, Rob Clark wrote:
> On Thu, Jul 25, 2013 at 1:17 PM,  <tom.cooksey@arm.com> wrote:
> > From: Tom Cooksey <tom.cooksey@arm.com>
> >
> > This is a mode-setting driver for the pl111 CLCD display controller
> > found on various ARM reference platforms such as the Versatile
> > Express. The driver supports setting of a single mode (640x480) and
> > has only been tested on Versatile Express with a Cortex-A9 core tile.
> >
> > Known issues:
> >  * It still includes code to use KDS, which is not going upstream.
> 
> review's on http://lists.freedesktop.org/archives/dri-devel/2013-July/042462.html
> can't hurt
> 
> although you might consider submitting a reduced functionality driver
> w/ KDS bits removed in the mean time.. then when the fence stuff is
> merged it is just an incremental patch rather than a whole driver ;-)

Yeah, I think the KDS bits and comments need to go first before merginge.


> > +/*
> > + * Number of flips allowed in flight at any one time. Any more flips requested
> > + * beyond this value will cause the caller to block until earlier flips have
> > + * completed.
> > + *
> > + * For performance reasons, this must be greater than the number of buffers
> > + * used in the rendering pipeline. Note that the rendering pipeline can contain
> > + * different types of buffer, e.g.:
> > + * - 2 final framebuffers
> > + * - >2 geometry buffers for GPU use-cases
> > + * - >2 vertex buffers for GPU use-cases
> > + *
> > + * For example, a system using 5 geometry buffers could have 5 flips in flight,
> > + * and so NR_FLIPS_IN_FLIGHT_THRESHOLD must be 5 or greater.
> > + *
> > + * Whilst there may be more intermediate buffers (such as vertex/geometry) than
> > + * final framebuffers, KDS is used to ensure that GPU rendering waits for the
> > + * next off-screen buffer, so it doesn't overwrite an on-screen buffer and
> > + * produce tearing.
> > + */
> > +
> 
> fwiw, this is at least different from how other drivers do triple (or
> > double) buffering.  In other drivers (intel, omap, and
> msm/freedreno, that I know of, maybe others too) the xorg driver dri2
> bits implement the double buffering (ie. send flip event back to
> client immediately and queue up the flip and call page-flip after the
> pageflip event back from kernel.
> 
> I'm not saying not to do it this way, I guess I'd like to hear what
> other folks think.  I kinda prefer doing this in userspace as it keeps
> the kernel bits simpler (plus it would then work properly on exynosdrm
> or other kms drivers).

Yeah, if this is just a sw queue then I don't think it makes sense to have
it in the kernel. Afaik the current pageflip interface drm exposes allows
one oustanding flip only, and you _must_ wait for the flip complete event
before you can submit the second one.

Ofc if your hardware as a hw-based flip queue (maybe even with frame
targets) that's a different matter, but currently we don't have a drm
interface to expose this. I'd say for merging the basic driver first we
should go with the existing simple pageflip semantics.

And tbh I don't understand why the amount of buffers you keep in the
render pipeline side of things matters here at all. But I also haven't
read the details of your driver code.

> 
> > +/*
> > + * Here, we choose a conservative value. A lower value is most likely
> > + * suitable for GPU use-cases.
> > + */
> > +#define NR_FLIPS_IN_FLIGHT_THRESHOLD 16
> > +
> > +#define CLCD_IRQ_NEXTBASE_UPDATE (1u<<2)
> > +
> > +struct pl111_drm_flip_resource;
> > +struct pl111_drm_cursor_plane;
> > +
> > +enum pl111_bo_type {
> > +       PL111_BOT_DMA,
> > +       PL111_BOT_SHM
> > +};
> > +
> > +struct pl111_gem_bo_dma {
> > +       dma_addr_t fb_dev_addr;
> > +       void *fb_cpu_addr;
> > +};
> > +
> > +struct pl111_gem_bo_shm {
> > +       struct page **pages;
> > +       dma_addr_t *dma_addrs;
> > +};
> > +
> > +struct pl111_gem_bo {
> > +       struct drm_gem_object gem_object;
> > +       enum pl111_bo_type type;
> > +       union {
> > +               struct pl111_gem_bo_dma dma;
> > +               struct pl111_gem_bo_shm shm;
> > +       } backing_data;
> > +       struct drm_framebuffer *fb;
> 
> this is at least a bit odd.. normally the fb has ref to the bo(s) and
> not the other way around.  And the same bo could be referenced by
> multiple fb's which would kinda fall down with this approach.

I'd say that's just backwards, framebuffers are created from backing
storage objects (which for a gem based driver is a gem object), not the
other way round. What's this exactly used for?

[snip]

> > +
> > +       /*
> > +        * Used to prevent race between pl111_dma_buf_release and
> > +        * drm_gem_prime_handle_to_fd
> > +        */
> > +       struct mutex export_dma_buf_lock;
> 
> hmm, seems a bit suspicious.. the handle reference should keep the
> object live.  Ie. either drm_gem_object_lookup() will fail because the
> object is gone (userspace has closed it's handle ref and
> dmabuf->release() already dropped it's ref) or it will succeed and
> you'll have a reference to the bo keeping it from going away if the
> release() comes after.

The race is real, I have an evil testcase here which Oopses my kernel. I'm
working on a fix (v1 of my patches is submitted a few weeks back, awaiting
review), but I need to rework a few things since now I've also spotted a
leak or two ;-)

[snip]

> > +static void vsync_worker(struct work_struct *work)
> > +{
> > +       struct pl111_drm_flip_resource *flip_res;
> > +       struct pl111_gem_bo *bo;
> > +       struct pl111_drm_crtc *pl111_crtc;
> > +       struct drm_device *dev;
> > +       int flips_in_flight;
> > +       flip_res > > +               container_of(work, struct pl111_drm_flip_resource, vsync_work);
> > +
> > +       pl111_crtc = to_pl111_crtc(flip_res->crtc);
> > +       dev = pl111_crtc->crtc.dev;
> > +
> > +       DRM_DEBUG_KMS("DRM Finalizing flip_res=%p\n", flip_res);
> > +
> > +       bo = PL111_BO_FROM_FRAMEBUFFER(flip_res->fb);
> > +#ifdef CONFIG_DMA_SHARED_BUFFER_USES_KDS
> > +       if (flip_res->worker_release_kds = true) {
> > +               spin_lock(&pl111_crtc->current_displaying_lock);
> > +               release_kds_resource_and_display(flip_res);
> > +               spin_unlock(&pl111_crtc->current_displaying_lock);
> > +       }
> > +#endif
> > +       /* Release DMA buffer on this flip */
> > +       if (bo->gem_object.export_dma_buf != NULL)
> > +               dma_buf_put(bo->gem_object.export_dma_buf);
> 
> I think you just want to unref the outgoing bo, and let it drop the
> dmabuf ref when the file ref of the imported bo goes.  Or actually, it
> would be better to hold/drop ref's to the fb, rather than the bo.  At
> least this will make things simpler if you ever have multi-planar
> support.

Drivers have no business frobbing around the dma-buf refcount of imported
objects imo, at least if they use all the standard drm prime
infrastructure. And if they're bugs they need to be fixed there, not in
drivers.

[snip]

> > +struct drm_framebuffer *pl111_fb_create(struct drm_device *dev,
> > +                                       struct drm_file *file_priv,
> > +                                       struct drm_mode_fb_cmd2 *mode_cmd)
> > +{
> > +       struct pl111_drm_framebuffer *pl111_fb = NULL;
> > +       struct drm_framebuffer *fb = NULL;
> > +       struct drm_gem_object *gem_obj;
> > +       struct pl111_gem_bo *bo;
> > +
> > +       pr_info("DRM %s\n", __func__);
> > +       gem_obj = drm_gem_object_lookup(dev, file_priv, mode_cmd->handles[0]);
> > +       if (gem_obj = NULL) {
> > +               DRM_ERROR("Could not get gem obj from handle to create fb\n");
> > +               goto out;
> > +       }
> > +
> > +       bo = PL111_BO_FROM_GEM(gem_obj);
> > +       /* Don't even attempt PL111_BOT_SHM, it's not contiguous */
> > +       BUG_ON(bo->type != PL111_BOT_DMA);
> 
> umm, no BUG_ON() is not really a good way to validate userspace input..
> 
>   if (bo->type != ...)
>     return ERR_PTR(-EINVAL);

Yep.

> > +
> > +       switch ((char)(mode_cmd->pixel_format & 0xFF)) {
> > +       case 'Y':
> > +       case 'U':
> > +       case 'V':
> > +       case 'N':
> > +       case 'T':
> 
> perhaps we should instead add a drm_format_is_yuv().. or you could
> (ab)use drm_fb_get_bpp_depth()..

Yeah, I think a new drm_format_is_yuv is asked-for here. Now the bigger
question is why you need this, since the drm core should filter out
formats not in your list of supported ones. Or at least it should ...

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply

* Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Rob Clark @ 2013-08-07 13:27 UTC (permalink / raw)
  To: John Stultz
  Cc: Tom Cooksey, linux-fbdev, Pawel Moll, lkml, dri-devel,
	linaro-mm-sig, linux-arm-kernel, linux-media,
	Rebecca Schultz Zavin, Erik Gilling, Ross Oldfield
In-Reply-To: <CALAqxLW_rjS_bbDXDrPrnBRLbegs9TVmPDnpNVYuoQjaVv3tPw@mail.gmail.com>

On Wed, Aug 7, 2013 at 12:23 AM, John Stultz <john.stultz@linaro.org> wrote:
> On Tue, Aug 6, 2013 at 5:15 AM, Rob Clark <robdclark@gmail.com> wrote:
>> well, let's divide things up into two categories:
>>
>> 1) the arrangement and format of pixels.. ie. what userspace would
>> need to know if it mmap's a buffer.  This includes pixel format,
>> stride, etc.  This should be negotiated in userspace, it would be
>> crazy to try to do this in the kernel.
>>
>> 2) the physical placement of the pages.  Ie. whether it is contiguous
>> or not.  Which bank the pages in the buffer are placed in, etc.  This
>> is not visible to userspace.  This is the purpose of the attach step,
>> so you know all the devices involved in sharing up front before
>> allocating the backing pages.  (Or in the worst case, if you have a
>> "late attacher" you at least know when no device is doing dma access
>> to a buffer and can reallocate and move the buffer.)  A long time
>
> One concern I know the Android folks have expressed previously (and
> correct me if its no longer an objection), is that this attach time
> in-kernel constraint solving / moving or reallocating buffers is
> likely to hurt determinism.  If I understood, their perspective was
> that userland knows the device path the buffers will travel through,
> so why not leverage that knowledge, rather then having the kernel have
> to sort it out for itself after the fact.

If you know the device path, then attach the buffer at all the devices
before you start using it.  Problem solved.. kernel knows all devices
before pages need be allocated ;-)

BR,
-R

^ permalink raw reply

* RE: [PATCH 4/5] fb: Add DCU framebuffer driver for Vybrid VF610 platform
From: Wang Huan-B18965 @ 2013-08-07  8:07 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1375778859.4276.25.camel@weser.hi.pengutronix.de>

DQo+IEFtIEZyZWl0YWcsIGRlbiAxMi4wNy4yMDEzLCAxNDowNyArMDgwMCBzY2hyaWViIEFsaXNv
biBXYW5nOg0KPiA+IFRoZSBEaXNwbGF5IENvbnRyb2xsZXIgVW5pdCAoRENVKSBtb2R1bGUgaXMg
YSBzeXN0ZW0gbWFzdGVyIHRoYXQNCj4gPiBmZXRjaGVzIGdyYXBoaWNzIHN0b3JlZCBpbiBpbnRl
cm5hbCBvciBleHRlcm5hbCBtZW1vcnkgYW5kIGRpc3BsYXlzDQo+ID4gdGhlbSBvbiBhIFRGVCBM
Q0QgcGFuZWwuIEEgd2lkZSByYW5nZSBvZiBwYW5lbCBzaXplcyBpcyBzdXBwb3J0ZWQgYW5kDQo+
ID4gdGhlIHRpbWluZyBvZiB0aGUgaW50ZXJmYWNlIHNpZ25hbHMgaXMgaGlnaGx5IGNvbmZpZ3Vy
YWJsZS4NCj4gPiBHcmFwaGljcyBhcmUgcmVhZCBkaXJlY3RseSBmcm9tIG1lbW9yeSBhbmQgdGhl
biBibGVuZGVkIGluIHJlYWwtdGltZSwNCj4gPiB3aGljaCBhbGxvd3MgZm9yIGR5bmFtaWMgY29u
dGVudCBjcmVhdGlvbiB3aXRoIG1pbmltYWwgQ1BVDQo+IGludGVydmVudGlvbi4NCj4gPg0KPiA+
IFRoZSBmZWF0dXJlczoNCj4gPg0KPiA+ICgxKSBGdWxsIFJHQjg4OCBvdXRwdXQgdG8gVEZUIExD
RCBwYW5lbC4NCj4gPiAoMikgRm9yIHRoZSBjdXJyZW50IExDRCBwYW5lbCwgV1FWR0EgIjQ4MHgy
NzIiIGlzIHRlc3RlZC4NCj4gPiAoMykgQmxlbmRpbmcgb2YgZWFjaCBwaXhlbCB1c2luZyB1cCB0
byA0IHNvdXJjZSBsYXllcnMgZGVwZW5kZW50IG9uDQo+IHNpemUgb2YgcGFuZWwuDQo+ID4gKDQp
IEVhY2ggZ3JhcGhpYyBsYXllciBjYW4gYmUgcGxhY2VkIHdpdGggb25lIHBpeGVsIHJlc29sdXRp
b24gaW4NCj4gZWl0aGVyIGF4aXMuDQo+ID4gKDUpIEVhY2ggZ3JhcGhpYyBsYXllciBzdXBwb3J0
IFJHQjU2NSBhbmQgUkdCODg4IGRpcmVjdCBjb2xvcnMNCj4gd2l0aG91dA0KPiA+IGFscGhhIGNo
YW5uZWwgYW5kIEJHUkE4ODg4IGRpcmVjdCBjb2xvcnMgd2l0aCBhbiBhbHBoYSBjaGFubmVsLg0K
PiA+ICg2KSBFYWNoIGdyYXBoaWMgbGF5ZXIgc3VwcG9ydCBhbHBoYSBibGVuZGluZyB3aXRoIDgt
Yml0IHJlc29sdXRpb24uDQo+ID4NCj4gPiBUaGlzIGRyaXZlciBoYXMgYmVlbiB0ZXN0ZWQgb24g
VnlicmlkIFZGNjEwIFRPV0VSIGJvYXJkLg0KPiA+DQo+ID4gU2lnbmVkLW9mZi1ieTogQWxpc29u
IFdhbmcgPGIxODk2NUBmcmVlc2NhbGUuY29tPg0KPiA+IC0tLQ0KPiA+IENoYW5nZXMgaW4gdjI6
IE5vbmUNCj4gPg0KPiA+ICBkcml2ZXJzL3ZpZGVvL0tjb25maWcgICAgICB8ICAgIDkgKw0KPiA+
ICBkcml2ZXJzL3ZpZGVvL01ha2VmaWxlICAgICB8ICAgIDEgKw0KPiA+ICBkcml2ZXJzL3ZpZGVv
L2ZzbC1kY3UtZmIuYyB8IDEwOTENCj4gPiArKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr
KysrKysrKysrKysrKw0KPiA+ICAzIGZpbGVzIGNoYW5nZWQsIDExMDEgaW5zZXJ0aW9ucygrKQ0K
PiA+ICBjcmVhdGUgbW9kZSAxMDA2NDQgZHJpdmVycy92aWRlby9mc2wtZGN1LWZiLmMNCj4gPg0K
PiBbLi4uXQ0KPiA+ICtlbnVtIG1mYl9pbmRleCB7DQo+ID4gKwlMQVlFUjAgPSAwLA0KPiA+ICsJ
TEFZRVIxLA0KPiA+ICsJTEFZRVIyLA0KPiA+ICsJTEFZRVIzLA0KPiA+ICt9Ow0KPiBXaHkgYXJl
IHRoZXJlIG9ubHkgNCBsYXllcnMgaGVyZT8gSSB0aG91Z2h0IHRoZSBjb250cm9sbGVyIHN1cHBv
cnRzIGF0DQo+IGxlYXN0IDYgc2ltdWx0YW5lb3VzIGxheWVycz8NCltBbGlzb24gV2FuZ10gSSB1
c2VkIDQgbGF5ZXJzIGZvciB0aGUgY3VzdG9tZXIncyByZXF1aXJlbWVudCBiZWZvcmUuIEFueXdh
eSwgSSB3aWxsIGNoYW5nZSBpdCB0byA2IGluIG5leHQgdmVyc2lvbi4NCkJUVywgYWNjb3JkaW5n
IHRvIHRoZSBSTSwgaXQgc2FpZCAiQmxlbmRpbmcgb2YgZWFjaCBwaXhlbCB1c2luZyB1cCB0byA2
IHNvdXJjZSBsYXllcnMgZGVwZW5kZW50IG9uIHNpemUgb2YgcGFuZWwiLiBTbyBJIHRoaW5rIHRo
ZSBtb3N0IHNpbXVsdGFuZW91cyBsYXllcnMgaXMgNi4NCj4gDQo+ID4gKw0KPiA+ICtzdGF0aWMg
c3RydWN0IG1mYl9pbmZvIG1mYl90ZW1wbGF0ZVtdID0gew0KPiA+ICsJew0KPiA+ICsJLmluZGV4
ID0gTEFZRVIwLA0KPiA+ICsJLmlkID0gIkxheWVyMCIsDQo+ID4gKwkuYWxwaGEgPSAweGZmLA0K
PiA+ICsJLmJsZW5kID0gMCwNCj4gPiArCS5jb3VudCA9IDAsDQo+ID4gKwkueF9sYXllcl9kID0g
MCwNCj4gPiArCS55X2xheWVyX2QgPSAwLA0KPiA+ICsJfSwNCj4gPiArCXsNCj4gPiArCS5pbmRl
eCA9IExBWUVSMSwNCj4gPiArCS5pZCA9ICJMYXllcjEiLA0KPiA+ICsJLmFscGhhID0gMHhmZiwN
Cj4gPiArCS5ibGVuZCA9IDAsDQo+ID4gKwkuY291bnQgPSAwLA0KPiA+ICsJLnhfbGF5ZXJfZCA9
IDUwLA0KPiA+ICsJLnlfbGF5ZXJfZCA9IDUwLA0KPiA+ICsJfSwNCj4gPiArCXsNCj4gPiArCS5p
bmRleCA9IExBWUVSMiwNCj4gPiArCS5pZCA9ICJMYXllcjIiLA0KPiA+ICsJLmFscGhhID0gMHhm
ZiwNCj4gPiArCS5ibGVuZCA9IDAsDQo+ID4gKwkuY291bnQgPSAwLA0KPiA+ICsJLnhfbGF5ZXJf
ZCA9IDEwMCwNCj4gPiArCS55X2xheWVyX2QgPSAxMDAsDQo+ID4gKwl9LA0KPiA+ICsJew0KPiA+
ICsJLmluZGV4ID0gTEFZRVIzLA0KPiA+ICsJLmlkID0gIkxheWVyMyIsDQo+ID4gKwkuYWxwaGEg
PSAweGZmLA0KPiA+ICsJLmJsZW5kID0gMCwNCj4gPiArCS5jb3VudCA9IDAsDQo+ID4gKwkueF9s
YXllcl9kID0gMTUwLA0KPiA+ICsJLnlfbGF5ZXJfZCA9IDE1MCwNCj4gPiArCX0sDQo+ID4gK307
DQo+IFsuLi5dDQo+ID4gK3N0YXRpYyBpbnQgZnNsX2RjdV9yZWxlYXNlKHN0cnVjdCBmYl9pbmZv
ICppbmZvLCBpbnQgdXNlcikgew0KPiA+ICsJc3RydWN0IG1mYl9pbmZvICptZmJpID0gaW5mby0+
cGFyOw0KPiA+ICsJaW50IHJldCA9IDA7DQo+ID4gKw0KPiA+ICsJbWZiaS0+Y291bnQtLTsNCj4g
PiArCWlmIChtZmJpLT5jb3VudCA9PSAwKSB7DQo+ID4gKwkJcmV0ID0gZGlzYWJsZV9wYW5lbChp
bmZvKTsNCj4gPiArCQlpZiAocmV0IDwgMCkNCj4gPiArCQkJbWZiaS0+Y291bnQrKzsNCj4gPiAr
CX0NCj4gPiArDQo+ID4gKwlyZXR1cm4gcmV0Ow0KPiA+ICt9DQo+IENvdWxkIHRoaXMgYmUgcmVw
bGFjZWQgYnkgcnVudGltZSBwbT8NCltBbGlzb24gV2FuZ10gWWVzLCBJIHdpbGwgdXNlIHJ1bnRp
bWUgcG0uDQo+IFsuLi5dDQo+ID4gK3N0YXRpYyBpbnQgYnlwYXNzX3Rjb24oc3RydWN0IGRldmlj
ZV9ub2RlICpucCkgew0KPiA+ICsJc3RydWN0IGRldmljZV9ub2RlICp0Y29uX25wOw0KPiA+ICsJ
c3RydWN0IHBsYXRmb3JtX2RldmljZSAqdGNvbl9wZGV2Ow0KPiA+ICsJc3RydWN0IGNsayAqdGNv
bl9jbGs7DQo+ID4gKwl2b2lkIF9faW9tZW0gKnRjb25fcmVnOw0KPiA+ICsJaW50IHJldCA9IDA7
DQo+ID4gKw0KPiA+ICsJdGNvbl9ucCA9IG9mX3BhcnNlX3BoYW5kbGUobnAsICJ0Y29uLWNvbnRy
b2xsZXIiLCAwKTsNCj4gPiArCWlmICghdGNvbl9ucCkNCj4gPiArCQlyZXR1cm4gLUVJTlZBTDsN
Cj4gPiArDQo+ID4gKwl0Y29uX3BkZXYgPSBvZl9maW5kX2RldmljZV9ieV9ub2RlKHRjb25fbnAp
Ow0KPiA+ICsJaWYgKCF0Y29uX3BkZXYpDQo+ID4gKwkJcmV0dXJuIC1FSU5WQUw7DQo+ID4gKw0K
PiA+ICsJdGNvbl9jbGsgPSBkZXZtX2Nsa19nZXQoJnRjb25fcGRldi0+ZGV2LCAidGNvbiIpOw0K
PiA+ICsJaWYgKElTX0VSUih0Y29uX2NsaykpIHsNCj4gPiArCQlyZXQgPSBQVFJfRVJSKHRjb25f
Y2xrKTsNCj4gPiArCQlnb3RvIGZhaWxlZF9nZXRjbG9jazsNCj4gPiArCX0NCj4gPiArCWNsa19w
cmVwYXJlX2VuYWJsZSh0Y29uX2Nsayk7DQo+ID4gKw0KPiA+ICsJdGNvbl9yZWcgPSBvZl9pb21h
cCh0Y29uX25wLCAwKTsNCj4gPiArCWlmICghdGNvbl9yZWcpIHsNCj4gPiArCQlyZXQgPSAtRU5P
TUVNOw0KPiA+ICsJCWdvdG8gZmFpbGVkX2lvcmVtYXA7DQo+ID4gKwl9DQo+ID4gKwl3cml0ZWwo
VENPTl9CWVBBU1NfRU5BQkxFLCB0Y29uX3JlZyArIFRDT05fQ1RSTDEpOw0KPiA+ICsNCj4gPiAr
CXJldHVybiAwOw0KPiA+ICsNCj4gPiArZmFpbGVkX2lvcmVtYXA6DQo+ID4gKwljbGtfZGlzYWJs
ZV91bnByZXBhcmUodGNvbl9jbGspOw0KPiA+ICtmYWlsZWRfZ2V0Y2xvY2s6DQo+ID4gKwlvZl9u
b2RlX3B1dCh0Y29uX25wKTsNCj4gPiArCXJldHVybiByZXQ7DQo+ID4gK30NCj4gPiArDQo+IElz
IHRoZSBmcmFtZWJ1ZmZlciBkcml2ZXIgdGhlIG9ubHkgdXNlciBvZiB0Y29uPyBJZiBub3QgeW91
IHNob3VsZCBub3QNCj4gbWFwIHRoZSBtZW1vcnkgaGVyZSwgYnV0IHJhdGhlciB1c2Ugc29tZXRo
aW5nIGxpa2UgcmVnbWFwIHN5c2Nvbi4NCltBbGlzb24gV2FuZ10gWWVzLCB0aGUgZmIgZHJpdmVy
IGlzIHRoZSBvbmx5IHVzZXIgb2YgdGNvbi4NCg0KVGhhbmtzIGZvciB5b3VyIGNvbW1lbnRzIQ0K
DQoNCkJlc3QgUmVnYXJkcywNCkFsaXNvbiBXYW5nDQoNCg0KDQo

^ permalink raw reply

* Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: John Stultz @ 2013-08-07  4:23 UTC (permalink / raw)
  To: Rob Clark
  Cc: linux-fbdev, Erik Gilling, Pawel Moll, lkml, dri-devel,
	linaro-mm-sig, Ross Oldfield, Tom Cooksey, Rebecca Schultz Zavin,
	linux-arm-kernel, linux-media
In-Reply-To: <CAF6AEGvXcpTKrTjhvrycLqab6F9QP5fAk0ZEWxJ-WvE==PiPsA@mail.gmail.com>

On Tue, Aug 6, 2013 at 5:15 AM, Rob Clark <robdclark@gmail.com> wrote:
> On Tue, Aug 6, 2013 at 7:31 AM, Tom Cooksey <tom.cooksey@arm.com> wrote:
>>
>>> > So in some respects, there is a constraint on how buffers which will
>>> > be drawn to using the GPU are allocated. I don't really like the idea
>>> > of teaching the display controller DRM driver about the GPU buffer
>>> > constraints, even if they are fairly trivial like this. If the same
>>> > display HW IP is being used on several SoCs, it seems wrong somehow
>>> > to enforce those GPU constraints if some of those SoCs don't have a
>>> > GPU.
>>>
>>> Well, I suppose you could get min_pitch_alignment from devicetree, or
>>> something like this..
>>>
>>> In the end, the easy solution is just to make the display allocate to
>>> the worst-case pitch alignment.  In the early days of dma-buf
>>> discussions, we kicked around the idea of negotiating or
>>> programatically describing the constraints, but that didn't really
>>> seem like a bounded problem.
>>
>> Yeah - I was around for some of those discussions and agree it's not
>> really an easy problem to solve.
>>
>>
>>
>>> > We may also then have additional constraints when sharing buffers
>>> > between the display HW and video decode or even camera ISP HW.
>>> > Programmatically describing buffer allocation constraints is very
>>> > difficult and I'm not sure you can actually do it - there's some
>>> > pretty complex constraints out there! E.g. I believe there's a
>>> > platform where Y and UV planes of the reference frame need to be in
>>> > separate DRAM banks for real-time 1080p decode, or something like
>>> > that?
>>>
>>> yes, this was discussed.  This is different from pitch/format/size
>>> constraints.. it is really just a placement constraint (ie. where do
>>> the physical pages go).  IIRC the conclusion was to use a dummy
>>> devices with it's own CMA pool for attaching the Y vs UV buffers.
>>>
>>> > Anyway, I guess my point is that even if we solve how to allocate
>>> > buffers which will be shared between the GPU and display HW such that
>>> > both sets of constraints are satisfied, that may not be the end of
>>> > the story.
>>> >
>>>
>>> that was part of the reason to punt this problem to userspace ;-)
>>>
>>> In practice, the kernel drivers doesn't usually know too much about
>>> the dimensions/format/etc.. that is really userspace level knowledge.
>>> There are a few exceptions when the kernel needs to know how to setup
>>> GTT/etc for tiled buffers, but normally this sort of information is up
>>> at the next level up (userspace, and drm_framebuffer in case of
>>> scanout).  Userspace media frameworks like GStreamer already have a
>>> concept of format/caps negotiation.  For non-display<->gpu sharing, I
>>> think this is probably where this sort of constraint negotiation
>>> should be handled.
>>
>> I agree that user-space will know which devices will access the buffer
>> and thus can figure out at least a common pixel format. Though I'm not
>> so sure userspace can figure out more low-level details like alignment
>> and placement in physical memory, etc.
>
> well, let's divide things up into two categories:
>
> 1) the arrangement and format of pixels.. ie. what userspace would
> need to know if it mmap's a buffer.  This includes pixel format,
> stride, etc.  This should be negotiated in userspace, it would be
> crazy to try to do this in the kernel.
>
> 2) the physical placement of the pages.  Ie. whether it is contiguous
> or not.  Which bank the pages in the buffer are placed in, etc.  This
> is not visible to userspace.  This is the purpose of the attach step,
> so you know all the devices involved in sharing up front before
> allocating the backing pages.  (Or in the worst case, if you have a
> "late attacher" you at least know when no device is doing dma access
> to a buffer and can reallocate and move the buffer.)  A long time

One concern I know the Android folks have expressed previously (and
correct me if its no longer an objection), is that this attach time
in-kernel constraint solving / moving or reallocating buffers is
likely to hurt determinism.  If I understood, their perspective was
that userland knows the device path the buffers will travel through,
so why not leverage that knowledge, rather then having the kernel have
to sort it out for itself after the fact.

The concern about determinism even makes them hesitant about CMA, over
things like carveout, as they don't want to be moving pages around at
allocation time (which could hurt reasonable use cases like the time
it takes to launch a camera app - which is quite important). Though
maybe this concern will lessen as more CMA solutions ship.

I worry some of this split between fully general solutions vs
hard-coded known constraints is somewhat intractable. But what might
make it easier to get android folks interested in approaches like the
attach-time allocation / relocating on late-attach you're proposing is
if there is maybe some way, as you suggested, to hint the allocation
when they do know the device paths, so they can provide that insight
and can avoid *all* reallocations.

Then of course, there's the question of how to consistently hint
things for all the different driver allocation interfaces (and that,
maybe naively, leads to the central allocator approach).

thanks
-john

^ permalink raw reply

* Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: John Stultz @ 2013-08-07  3:57 UTC (permalink / raw)
  To: Rob Clark
  Cc: linux-fbdev, Erik Gilling, Pawel Moll, dri-devel, Ross Oldfield,
	Tom Cooksey, Rebecca Schultz Zavin, linux-arm-kernel
In-Reply-To: <CAF6AEGvFPGueM_LHVij9KFzM6NJySHCzmaLstuzZkK5GwP+6gQ@mail.gmail.com>

On Mon, Jul 29, 2013 at 7:58 AM, Rob Clark <robdclark@gmail.com> wrote:
> On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey <tom.cooksey@arm.com> wrote:
>> Hi Rob,
>>
>>> >  * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to also
>>> >    allocate buffers for the GPU. Still not sure how to resolve this
>>> >    as we don't use DRM for our GPU driver.
>>>
>>> any thoughts/plans about a DRM GPU driver?  Ideally long term (esp.
>>> once the dma-fence stuff is in place), we'd have gpu-specific drm
>>> (gpu-only, no kms) driver, and SoC/display specific drm/kms driver,
>>> using prime/dmabuf to share between the two.
>>
>> The "extra" buffers we were allocating from armsoc DDX were really
>> being allocated through DRM/GEM so we could get an flink name
>> for them and pass a reference to them back to our GPU driver on
>> the client side. If it weren't for our need to access those
>> extra off-screen buffers with the GPU we wouldn't need to
>> allocate them with DRM at all. So, given they are really "GPU"
>> buffers, it does absolutely make sense to allocate them in a
>> different driver to the display driver.
>>
>> However, to avoid unnecessary memcpys & related cache
>> maintenance ops, we'd also like the GPU to render into buffers
>> which are scanned out by the display controller. So let's say
>> we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
>> out buffers with the display's DRM driver but a custom ioctl
>> on the GPU's DRM driver to allocate non scanout, off-screen
>> buffers. Sounds great, but I don't think that really works
>> with DRI2. If we used two drivers to allocate buffers, which
>> of those drivers do we return in DRI2ConnectReply? Even if we
>> solve that somehow, GEM flink names are name-spaced to a
>> single device node (AFAIK). So when we do a DRI2GetBuffers,
>> how does the EGL in the client know which DRM device owns GEM
>> flink name "1234"? We'd need some pretty dirty hacks.
>
> You would return the name of the display driver allocating the
> buffers.  On the client side you can use generic ioctls to go from
> flink -> handle -> dmabuf.  So the client side would end up opening
> both the display drm device and the gpu, but without needing to know
> too much about the display.
>
> (Probably in (for example) mesa there needs to be a bit of work to
> handle this better, but I think that would be needed as well for
> sharing between, say, nouveau and udl displaylink driver.. which is
> really the same scenario.)
>
>> So then we looked at allocating _all_ buffers with the GPU's
>> DRM driver. That solves the DRI2 single-device-name and single
>> name-space issue. It also means the GPU would _never_ render
>> into buffers allocated through DRM_IOCTL_MODE_CREATE_DUMB.
>
> Well, I think we can differentiate between shared buffers and internal
> buffers (textures, vertex upload, etc, etc).
>
> For example, in mesa/gallium driver there are two paths to getting a
> buffer...  ->resource_create() and ->resource_from_handle(), the
> latter is the path you go for dri2 buffers shared w/ x11.  The former
> is buffers that are just internal to gpu (if !(bind &
> PIPE_BIND_SHARED)).  I guess you must have something similar in mali
> driver.
>
>> One thing I wasn't sure about is if there was an objection
>> to using PRIME to export scanout buffers allocated with
>> DRM_IOCTL_MODE_CREATE_DUMB and then importing them into a GPU
>> driver to be rendered into? Is that a concern?
>
> well.. it wasn't quite the original intention of the 'dumb' ioctls.
> But I guess in the end if you hand the gpu a buffer with a
> layout/format that it can't grok, then gpu does a staging texture plus
> copy.  If you had, for example, some setup w/ gpu that could only
> render to tiled format, plus a display that could scanout that format,
> then your DDX driver would need to allocate the dri2 buffers with
> something other than dumb ioctl.  (But at that point, you probably
> need to implement your own EXA so you could hook in your own custom
> upload-to/download-from screen fxns for sw fallbacks, so you're not
> talking about using generic DDX anyways.)
>
>> Anyway, that latter case also gets quite difficult. The "GPU"
>> DRM driver would need to know the constraints of the display
>> controller when allocating buffers intended to be scanned out.
>> For example, pl111 typically isn't behind an IOMMU and so
>> requires physically contiguous memory. We'd have to teach the
>> GPU's DRM driver about the constraints of the display HW. Not
>> exactly a clean driver model. :-(
>>
>> I'm still a little stuck on how to proceed, so any ideas
>> would greatly appreciated! My current train of thought is
>> having a kind of SoC-specific DRM driver which allocates
>> buffers for both display and GPU within a single GEM
>> namespace. That SoC-specific DRM driver could then know the
>> constraints of both the GPU and the display HW. We could then
>> use PRIME to export buffers allocated with the SoC DRM driver
>> and import them into the GPU and/or display DRM driver.
>
> Usually if the display drm driver is allocating the buffers that might
> be scanned out, it just needs to have minimal knowledge of the GPU
> (pitch alignment constraints).  I don't think we need a 3rd device
> just to allocate buffers.
>
> Really I don't think the separate display drm and gpu drm device is
> much different from desktop udl/displaylink case.  Or desktop
> integrated-gpu + external gpu.

Although on many SoCs, aren't the pipelines a bit more complicated?
(ie: camera data -> doing face detection, -> applying filters w/ the
gpu -> and displaying it).

While ION does have interface issues (in that it requires userland to
hard-code particular constraints for that specific SoC) just the idea
of having the centralized allocator that manages the constraints does
seem to make sense for these larger pipelines (and also the fact that
there's different allocation APIs for various devices).

One idea that was discussed in some of the hallway discussions at
connect was if there was some way for the hardware to expose a
"constraint cookie" via sysfs (basically a bit field where the meaning
of each bit is opaque to userland - so each bit can mean a different
constraint on different devices), userland could determine which
combination of devices it wanted to pipe a buffer through, then AND
the device constraint cookies together,  and pass that to a
centralized allocator. This allows the constraint solving to be pushed
out to userland, but also avoids what I see as the biggest negatives
in ION (ie: the heap mask is defined to userland, making it
inflexible, as well as the lack of a method for userland to discover
device constraints, so they're instead hard-coded in userland for each
device).

>> Note: While it doesn't use the DRM framework, the Mali T6xx
>> kernel driver has supported importing buffers through dma_buf
>> for some time. I've even written an EGL extension :-):
>>
>> <http://www.khronos.org/registry/egl/extensions/EXT/EGL_EXT_image_dma_buf_im
>> port.txt>
>>
>>
>>> I'm not entirely sure that the directions that the current CDF
>>> proposals are headed is necessarily the right way forward.  I'd prefer
>>> to see small/incremental evolution of KMS (ie. add drm_bridge and
>>> drm_panel, and refactor the existing encoder-slave).  Keeping it
>>> inside drm means that we can evolve it more easily, and avoid layers
>>> of glue code for no good reason.
>>
>> I think CDF could allow vendors to re-use code they've written
>> for their Android driver stack in DRM drivers more easily. Though
>> I guess ideally KMS would evolve to a point where it could be used
>> by an Android driver stack. I.e. Support explicit fences.
>>
>
> yeah, the best would be evolving DRM/KMS to the point that it meets
> android requirements.  Because I really don't think android has any
> special requirements compared to, say, a wayland compositor.  (Other
> than the way they do synchronization.  But that doesn't really *need*
> to be different, as far as I can tell.)  It would certainly be easier
> if android dev's actually participated in upstream linux graphics, and
> talked about what they needed, and maybe even contributed a patch or
> two.  But that is a whole different topic.

It also couldn't hurt to cc them on these discussions (Rebecca and Erik cc'ed :)

I think Ross covered much of the KMS issues at Linaro Connect (Video
here if you want to spend the time: http://youtu.be/AuoFABKS_Dk), and
indeed, the explicit syncing requirement is probably the biggest
barrier. My understanding is the additional parallelism/performance
that the explicit method provides is the key requirement (though maybe
Erik or Ross can chime in with further details? - I've been told Erik
has a long list as to why explicit is preferred over implicit and it
would be good to get that documented somewhere :)

I understand the implicit model is easier to program for, and Daniel
mentioned dynamic buffer eviction as having a requirement on the
implicit model, so I'm currently in the camp of assuming the
synchronization needs are different enough that one method alone (of
the two currently presented) likely won't work. My current hope is
there's some hybrid approach that can be used (implicitly maybe for
the majority of cases, but with the option of doing explicit syncing
when necessary).

thanks
-john (who's unfortunately heading off on vacation soon, so might not
be fast to reply)

^ permalink raw reply

* [PATCH] efifb: prevent null dereferences by removing unused array  indices from dmi_list
From: James Bates @ 2013-08-06 23:15 UTC (permalink / raw)
  To: Peter Jones, Jean-Christophe Plagniol-Villard, Tomi Valkeinen,
	linux-fbdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2368 bytes --]

Hi all,

The dmi_list array is initialized using gnu designated initializers, and therefore
contains fewer explicitly defined entries as there are elements in it. This
is because the enum above with M_blabla constants contains more items than the designated
initializer. Those elements not explicitly initialized are implicitly set to 0.

Now efifb_setup(), L.322 & L.323, loops through all these array elements, and performs
a strcmp o a field (optname) in each item. For non explicitly initialized elements this
will be a null pointer:

                        for (i = 0; i < M_UNKNOWN; i++) {
                                if (!strcmp(this_opt, dmi_list[i].optname) &&

On my macbook6,1 the predefined values are for some reason incorrect, and most parameters
are preset correctly by my efi bootloader (elilo). but stride/line_length is not detected
correctly and so I wish to set it explicitly using a "video=efifb:stride:2048" command-line
argument. Because of the above null dereference, an exception (presumably) occurs before
the parsing code (L.333) is ever reached. I say presumably since the mac hangs on boot
without a console, and I can therefore not see any output.

By removing the unused values from the enum, and thus preventing implicitly initialized items
in the dmi_list array, the null dereference does not occur, my customer command-line arg is
parsed correctly, and my console displays correctly.

Signed-off-by: James Bates <james.h.bates@gmail.com>
---
 drivers/video/efifb.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/video/efifb.c b/drivers/video/efifb.c
index 50fe668..52d1d88 100644
--- a/drivers/video/efifb.c
+++ b/drivers/video/efifb.c
@@ -50,12 +50,9 @@ enum {
        M_MINI_3_1,     /* Mac Mini, 3,1th gen */
        M_MINI_4_1,     /* Mac Mini, 4,1th gen */
        M_MB,           /* MacBook */
-       M_MB_2,         /* MacBook, 2nd rev. */
-       M_MB_3,         /* MacBook, 3rd rev. */
        M_MB_5_1,       /* MacBook, 5th rev. */
        M_MB_6_1,       /* MacBook, 6th rev. */
        M_MB_7_1,       /* MacBook, 7th rev. */
-       M_MB_SR,        /* MacBook, 2nd gen, (Santa Rosa) */
        M_MBA,          /* MacBook Air */
        M_MBA_3,        /* Macbook Air, 3rd rev */
        M_MBP,          /* MacBook Pro */
--
1.7.12.4 (Apple Git-37)

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4151 bytes --]

^ permalink raw reply related

* Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Rob Clark @ 2013-08-06 19:06 UTC (permalink / raw)
  To: Tom Cooksey
  Cc: dri-devel, linux-fbdev, Pawel Moll, linux-arm-kernel, linux-media,
	linaro-mm-sig, linux-kernel
In-Reply-To: <52013482.e107c20a.27f9.ffffa718SMTPIN_ADDED_BROKEN@mx.google.com>

On Tue, Aug 6, 2013 at 1:38 PM, Tom Cooksey <tom.cooksey@arm.com> wrote:
>
>> >> ... This is the purpose of the attach step,
>> >> so you know all the devices involved in sharing up front before
>> >> allocating the backing pages. (Or in the worst case, if you have a
>> >> "late attacher" you at least know when no device is doing dma access
>> >> to a buffer and can reallocate and move the buffer.)  A long time
>> >> back, I had a patch that added a field or two to 'struct
>> >> device_dma_parameters' so that it could be known if a device
>> >> required contiguous buffers.. looks like that never got merged, so
>> >> I'd need to dig that back up and resend it.  But the idea was to
>> >> have the 'struct device' encapsulate all the information that would
>> >> be needed to do-the-right-thing when it comes to placement.
>> >
>> > As I understand it, it's up to the exporting device to allocate the
>> > memory backing the dma_buf buffer. I guess the latest possible point
>> > you can allocate the backing pages is when map_dma_buf is first
>> > called? At that point the exporter can iterate over the current set
>> > of attachments, programmatically determine the all the constraints of
>> > all the attached drivers and attempt to allocate the backing pages
>> > in such a way as to satisfy all those constraints?
>>
>> yes, this is the idea..  possibly some room for some helpers to help
>> out with this, but that is all under the hood from userspace
>> perspective
>>
>> > Didn't you say that programmatically describing device placement
>> > constraints was an unbounded problem? I guess we would have to
>> > accept that it's not possible to describe all possible constraints
>> > and instead find a way to describe the common ones?
>>
>> well, the point I'm trying to make, is by dividing your constraints
>> into two groups, one that impacts and is handled by userspace, and one
>> that is in the kernel (ie. where the pages go), you cut down the
>> number of permutations that the kernel has to care about considerably.
>>  And kernel already cares about, for example, what range of addresses
>> that a device can dma to/from.  I think really the only thing missing
>> is the max # of sglist entries (contiguous or not)
>
> I think it's more than physically contiguous or not.
>
> For example, it can be more efficient to use large page sizes on
> devices with IOMMUs to reduce TLB traffic. I think the size and even
> the availability of large pages varies between different IOMMUs.

sure.. but I suppose if we can spiff out dma_params to express "I need
contiguous", perhaps we can add some way to express "I prefer
as-contiguous-as-possible".. either way, this is about where the pages
are placed, and not about the layout of pixels within the page, so
should be in kernel.  It's something that is missing, but I believe
that it belongs in dma_params and hidden behind dma_alloc_*() for
simple drivers.

> There's also the issue of buffer stride alignment. As I say, if the
> buffer is to be written by a tile-based GPU like Mali, it's more
> efficient if the buffer's stride is aligned to the max AXI bus burst
> length. Though I guess a buffer stride only makes sense as a concept
> when interpreting the data as a linear-layout 2D image, so perhaps
> belongs in user-space along with format negotiation?
>

Yeah.. this isn't about where the pages go, but about the arrangement
within a page.

And, well, except for hw that supports the same tiling (or
compressed-fb) in display+gpu, you probably aren't sharing tiled
buffers.

>
>> > One problem with this is it duplicates a lot of logic in each
>> > driver which can export a dma_buf buffer. Each exporter will need to
>> > do pretty much the same thing: iterate over all the attachments,
>> > determine of all the constraints (assuming that can be done) and
>> > allocate pages such that the lowest-common-denominator is satisfied.
>> >
>> > Perhaps rather than duplicating that logic in every driver, we could
>> > Instead move allocation of the backing pages into dma_buf itself?
>> >
>>
>> I tend to think it is better to add helpers as we see common patterns
>> emerge, which drivers can opt-in to using.  I don't think that we
>> should move allocation into dma_buf itself, but it would perhaps be
>> useful to have dma_alloc_*() variants that could allocate for multiple
>> devices.
>
> A helper could work I guess, though I quite like the idea of having
> dma_alloc_*() variants which take a list of devices to allocate memory
> for.
>
>
>> That would help for simple stuff, although I'd suspect
>> eventually a GPU driver will move away from that.  (Since you probably
>> want to play tricks w/ pools of pages that are pre-zero'd and in the
>> correct cache state, use spare cycles on the gpu or dma engine to
>> pre-zero uncached pages, and games like that.)
>
> So presumably you're talking about a GPU driver being the exporter
> here? If so, how could the GPU driver do these kind of tricks on
> memory shared with another device?
>

Yes, that is gpu-as-exporter.  If someone else is allocating buffers,
it is up to them to do these tricks or not.  Probably there is a
pretty good chance that if you aren't a GPU you don't need those sort
of tricks for fast allocation of transient upload buffers, staging
textures, temporary pixmaps, etc.  Ie. I don't really think a v4l
camera or video decoder would benefit from that sort of optimization.

>
>> >> > Anyway, assuming user-space can figure out how a buffer should be
>> >> > stored in memory, how does it indicate this to a kernel driver and
>> >> > actually allocate it? Which ioctl on which device does user-space
>> >> > call, with what parameters? Are you suggesting using something
>> >> > like ION which exposes the low-level details of how buffers are
>> > >> laid out in physical memory to userspace? If not, what?
>> >> >
>> >>
>> >> no, userspace should not need to know this.  And having a central
>> >> driver that knows this for all the other drivers in the system
>> >> doesn't really solve anything and isn't really scalable.  At best
>> >> you might want, in some cases, a flag you can pass when allocating.
>> >> For example, some of the drivers have a 'SCANOUT' flag that can be
>> >> passed when allocating a GEM buffer, as a hint to the kernel that
>> >> 'if this hw requires contig memory for scanout, allocate this
>> >> buffer contig'. But really, when it comes to sharing buffers
>> >> between devices, we want this sort of information in dev->dma_params
>> >> of the importing device(s).
>> >
>> > If you had a single driver which knew the constraints of all devices
>> > on that particular SoC and the interface allowed user-space to
>> > specify which devices a buffer is intended to be used with, I guess
>> > it could pretty trivially allocate pages which satisfy those
> constraints?
>>
>> keep in mind, even a number of SoC's come with pcie these days.  You
>> already have things like
>>
>>   https://developer.nvidia.com/content/kayla-platform
>>
>> You probably want to get out of the SoC mindset, otherwise you are
>> going to make bad assumptions that come back to bite you later on.
>
> Sure - there are always going to be PC-like devices where the
> hardware configuration isn't fixed like it is on a traditional SoC.
> But I'd rather have a simple solution which works on traditional SoCs
> than no solution at all. Today our solution is to over-load the dumb
> buffer alloc functions of the display's DRM driver - For now I'm just
> looking for the next step up from that! ;-)
>

True.. the original intention, which is perhaps a bit desktop-centric,
really was for there to be a userspace component talking to the drm
driver for allocation, ie. xf86-video-foo and/or
src/gallium/drivers/foo (for example) ;-)

Which means for x11 having a SoC vendor specific xf86-video-foo for
x11..  or vendor specific gbm implementation for wayland.  (Although
at least in the latter case it is a pretty small piece of code.)  But
that is probably what you are trying to avoid.

At any rate, for both xorg and wayland/gbm, you know when a buffer is
going to be a scanout buffer.  What I'd recommend is define a small
userspace API that your customers (the SoC vendors) implement to
allocate a scanout buffer and hand you back a dmabuf fd.  That could
be used both for x11 and for gbm.  Inputs should be requested
width/height and format.  And outputs pitch plus dmabuf fd.

(Actually you might even just want to use gbm as your starting point.
You could probably just use gbm from xf86-video-armsoc for allocation,
to have one thing that works for both wayland and x11.  Scanout and
cursor buffers should go to vendor/SoC specific fxn, rest can be
allocated from mali kernel driver.)

>
>> > wouldn't need a way to programmatically describe the constraints
>> > either: As you say, if userspace sets the "SCANOUT" flag, it would
>> > just "know" that on this SoC, that buffer needs to be physically
>> > contiguous for example.
>>
>> not really.. it just knows it wants to scanout the buffer, and tells
>> this as a hint to the kernel.
>>
>> For example, on omapdrm, the SCANOUT flag does nothing on omap4+
>> (where phys contig is not required for scanout), but causes CMA
>> (dma_alloc_*()) to be used on omap3.  Userspace doesn't care.  It just
>> knows that it wants to be able to scanout that particular buffer.
>
> I think that's the idea? The omap3's allocator driver would use
> contiguous memory when it detects the SCANOUT flag whereas the omap4
> allocator driver wouldn't have to. No complex negotiation of
> constraints - it just "knows".
>

well, it is same allocating driver in both cases (although maybe that
is unimportant).  The "it" that just knows it wants to scanout is
userspace.  The "it" that just knows that scanout translates to
contiguous (or not) is the kernel.  Perhaps we are saying the same
thing ;-)

BR,
-R

>
> Cheers,
>
> Tom
>
>
>
>

^ permalink raw reply

* kernel mode splash screen
From: Cliff Brake @ 2013-08-06 16:15 UTC (permalink / raw)
  To: linux-fbdev

Hello,

We are working on an embedded system where we need immediate feedback
on the screen in response to power button events, etc.  We also need
to be able to display the splash screen whenever kernel is running.
User space splash screens get killed too early on shutdown, etc.

Does anyone know of a kernel mode splash screen that will:

1) display an image
2) display status messages
3) display progress bar (nice to have, but not required)
4) can be accessed from kernel and user space
5) when enabled, will disable writes from user space
6) FB based

Thanks,
Cliff

-- 
========http://bec-systems.com

^ permalink raw reply

* Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Daniel Vetter @ 2013-08-06 15:28 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Tom Cooksey, Linux Fbdev development list, Pawel Moll,
	Linux Kernel Mailing List, dri-devel,
	linaro-mm-sig@lists.linaro.org,
	linux-arm-kernel@lists.infradead.org, linux-media@vger.kernel.org
In-Reply-To: <1375791502.4276.38.camel@weser.hi.pengutronix.de>

On Tue, Aug 6, 2013 at 2:18 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
> I strongly disagree with exposing low-level hardware details like tiling
> to userspace. If we have to do the negotiation of those things in
> userspace we will end up with having to pipe those information through
> things like the wayland protocol. I don't see how this could ever be
> considered a good idea.
>
> I would rather see kernel drivers negotiating those things at dmabuf
> attach time in way invisible to userspace. I agree that this negotiation
> thing isn't easy to get right for the plethora of different hardware
> constraints we see today, but I would rather see this in-kernel, where
> we have the chance to fix things up if needed, than in a fixed userspace
> interface.

The problem with negotiating tiling in the kernel for at least
drm/i915 is that userspace needs to know the tiling and for allocating
actually be in charge of it. The approach used by prime thus far is to
simply use linear buffers between different gpus. If you want
something better I guess you need to have a fairly tightly integrate
userspace driver, so I don't see how passing around the tiling
information should be an issue for those cases.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply

* Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Rob Clark @ 2013-08-06 14:59 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Tom Cooksey, linux-fbdev, Pawel Moll, linux-kernel, dri-devel,
	linaro-mm-sig, linux-arm-kernel, linux-media
In-Reply-To: <1375799770.4276.47.camel@weser.hi.pengutronix.de>

On Tue, Aug 6, 2013 at 10:36 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 06.08.2013, 10:14 -0400 schrieb Rob Clark:
>> On Tue, Aug 6, 2013 at 8:18 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> > Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey:
>> >> Hi Rob,
>> >>
>> >> +lkml
>> >>
>> >> > >> On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey <tom.cooksey@arm.com>
>> >> > >> wrote:
>> >> > >> >> >  * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to
>> >> > >> >> >    also allocate buffers for the GPU. Still not sure how to
>> >> > >> >> >    resolve this as we don't use DRM for our GPU driver.
>> >> > >> >>
>> >> > >> >> any thoughts/plans about a DRM GPU driver?  Ideally long term
>> >> > >> >> (esp. once the dma-fence stuff is in place), we'd have
>> >> > >> >> gpu-specific drm (gpu-only, no kms) driver, and SoC/display
>> >> > >> >> specific drm/kms driver, using prime/dmabuf to share between
>> >> > >> >> the two.
>> >> > >> >
>> >> > >> > The "extra" buffers we were allocating from armsoc DDX were really
>> >> > >> > being allocated through DRM/GEM so we could get an flink name
>> >> > >> > for them and pass a reference to them back to our GPU driver on
>> >> > >> > the client side. If it weren't for our need to access those
>> >> > >> > extra off-screen buffers with the GPU we wouldn't need to
>> >> > >> > allocate them with DRM at all. So, given they are really "GPU"
>> >> > >> > buffers, it does absolutely make sense to allocate them in a
>> >> > >> > different driver to the display driver.
>> >> > >> >
>> >> > >> > However, to avoid unnecessary memcpys & related cache
>> >> > >> > maintenance ops, we'd also like the GPU to render into buffers
>> >> > >> > which are scanned out by the display controller. So let's say
>> >> > >> > we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
>> >> > >> > out buffers with the display's DRM driver but a custom ioctl
>> >> > >> > on the GPU's DRM driver to allocate non scanout, off-screen
>> >> > >> > buffers. Sounds great, but I don't think that really works
>> >> > >> > with DRI2. If we used two drivers to allocate buffers, which
>> >> > >> > of those drivers do we return in DRI2ConnectReply? Even if we
>> >> > >> > solve that somehow, GEM flink names are name-spaced to a
>> >> > >> > single device node (AFAIK). So when we do a DRI2GetBuffers,
>> >> > >> > how does the EGL in the client know which DRM device owns GEM
>> >> > >> > flink name "1234"? We'd need some pretty dirty hacks.
>> >> > >>
>> >> > >> You would return the name of the display driver allocating the
>> >> > >> buffers.  On the client side you can use generic ioctls to go from
>> >> > >> flink -> handle -> dmabuf.  So the client side would end up opening
>> >> > >> both the display drm device and the gpu, but without needing to know
>> >> > >> too much about the display.
>> >> > >
>> >> > > I think the bit I was missing was that a GEM bo for a buffer imported
>> >> > > using dma_buf/PRIME can still be flink'd. So the display controller's
>> >> > > DRM driver allocates scan-out buffers via the DUMB buffer allocate
>> >> > > ioctl. Those scan-out buffers than then be exported from the
>> >> > > dispaly's DRM driver and imported into the GPU's DRM driver using
>> >> > > PRIME. Once imported into the GPU's driver, we can use flink to get a
>> >> > > name for that buffer within the GPU DRM driver's name-space to return
>> >> > > to the DRI2 client. That same namespace is also what DRI2 back-
>> >> > > buffers are allocated from, so I think that could work... Except...
>> >> >
>> >> > (and.. the general direction is that things will move more to just use
>> >> > dmabuf directly, ie. wayland or dri3)
>> >>
>> >> I agree, DRI2 is the only reason why we need a system-wide ID. I also
>> >> prefer buffers to be passed around by dma_buf fd, but we still need to
>> >> support DRI2 and will do for some time I expect.
>> >>
>> >>
>> >>
>> >> > >> > Anyway, that latter case also gets quite difficult. The "GPU"
>> >> > >> > DRM driver would need to know the constraints of the display
>> >> > >> > controller when allocating buffers intended to be scanned out.
>> >> > >> > For example, pl111 typically isn't behind an IOMMU and so
>> >> > >> > requires physically contiguous memory. We'd have to teach the
>> >> > >> > GPU's DRM driver about the constraints of the display HW. Not
>> >> > >> > exactly a clean driver model. :-(
>> >> > >> >
>> >> > >> > I'm still a little stuck on how to proceed, so any ideas
>> >> > >> > would greatly appreciated! My current train of thought is
>> >> > >> > having a kind of SoC-specific DRM driver which allocates
>> >> > >> > buffers for both display and GPU within a single GEM
>> >> > >> > namespace. That SoC-specific DRM driver could then know the
>> >> > >> > constraints of both the GPU and the display HW. We could then
>> >> > >> > use PRIME to export buffers allocated with the SoC DRM driver
>> >> > >> > and import them into the GPU and/or display DRM driver.
>> >> > >>
>> >> > >> Usually if the display drm driver is allocating the buffers that
>> >> > >> might be scanned out, it just needs to have minimal knowledge of
>> >> > >> the GPU (pitch alignment constraints).  I don't think we need a
>> >> > >> 3rd device just to allocate buffers.
>> >> > >
>> >> > > While Mali can render to pretty much any buffer, there is a mild
>> >> > > performance improvement to be had if the buffer stride is aligned to
>> >> > > the AXI bus's max burst length when drawing to the buffer.
>> >> >
>> >> > I suspect the display controllers might frequently benefit if the
>> >> > pitch is aligned to AXI burst length too..
>> >>
>> >> If the display controller is going to be reading from linear memory
>> >> I don't think it will make much difference - you'll just get an extra
>> >> 1-2 bus transactions per scanline. With a tile-based GPU like Mali,
>> >> you get those extra transactions per _tile_ scan-line and as such,
>> >> the overhead is more pronounced.
>> >>
>> >>
>> >>
>> >> > > So in some respects, there is a constraint on how buffers which will
>> >> > > be drawn to using the GPU are allocated. I don't really like the idea
>> >> > > of teaching the display controller DRM driver about the GPU buffer
>> >> > > constraints, even if they are fairly trivial like this. If the same
>> >> > > display HW IP is being used on several SoCs, it seems wrong somehow
>> >> > > to enforce those GPU constraints if some of those SoCs don't have a
>> >> > > GPU.
>> >> >
>> >> > Well, I suppose you could get min_pitch_alignment from devicetree, or
>> >> > something like this..
>> >> >
>> >> > In the end, the easy solution is just to make the display allocate to
>> >> > the worst-case pitch alignment.  In the early days of dma-buf
>> >> > discussions, we kicked around the idea of negotiating or
>> >> > programatically describing the constraints, but that didn't really
>> >> > seem like a bounded problem.
>> >>
>> >> Yeah - I was around for some of those discussions and agree it's not
>> >> really an easy problem to solve.
>> >>
>> >>
>> >>
>> >> > > We may also then have additional constraints when sharing buffers
>> >> > > between the display HW and video decode or even camera ISP HW.
>> >> > > Programmatically describing buffer allocation constraints is very
>> >> > > difficult and I'm not sure you can actually do it - there's some
>> >> > > pretty complex constraints out there! E.g. I believe there's a
>> >> > > platform where Y and UV planes of the reference frame need to be in
>> >> > > separate DRAM banks for real-time 1080p decode, or something like
>> >> > > that?
>> >> >
>> >> > yes, this was discussed.  This is different from pitch/format/size
>> >> > constraints.. it is really just a placement constraint (ie. where do
>> >> > the physical pages go).  IIRC the conclusion was to use a dummy
>> >> > devices with it's own CMA pool for attaching the Y vs UV buffers.
>> >> >
>> >> > > Anyway, I guess my point is that even if we solve how to allocate
>> >> > > buffers which will be shared between the GPU and display HW such that
>> >> > > both sets of constraints are satisfied, that may not be the end of
>> >> > > the story.
>> >> > >
>> >> >
>> >> > that was part of the reason to punt this problem to userspace ;-)
>> >> >
>> >> > In practice, the kernel drivers doesn't usually know too much about
>> >> > the dimensions/format/etc.. that is really userspace level knowledge.
>> >> > There are a few exceptions when the kernel needs to know how to setup
>> >> > GTT/etc for tiled buffers, but normally this sort of information is up
>> >> > at the next level up (userspace, and drm_framebuffer in case of
>> >> > scanout).  Userspace media frameworks like GStreamer already have a
>> >> > concept of format/caps negotiation.  For non-display<->gpu sharing, I
>> >> > think this is probably where this sort of constraint negotiation
>> >> > should be handled.
>> >>
>> >> I agree that user-space will know which devices will access the buffer
>> >> and thus can figure out at least a common pixel format. Though I'm not
>> >> so sure userspace can figure out more low-level details like alignment
>> >> and placement in physical memory, etc.
>> >>
>> >> Anyway, assuming user-space can figure out how a buffer should be
>> >> stored in memory, how does it indicate this to a kernel driver and
>> >> actually allocate it? Which ioctl on which device does user-space
>> >> call, with what parameters? Are you suggesting using something like
>> >> ION which exposes the low-level details of how buffers are laid out in
>> >> physical memory to userspace? If not, what?
>> >>
>> >
>> > I strongly disagree with exposing low-level hardware details like tiling
>> > to userspace. If we have to do the negotiation of those things in
>> > userspace we will end up with having to pipe those information through
>> > things like the wayland protocol. I don't see how this could ever be
>> > considered a good idea.
>>
>> well, unless userspace mmap's via a de-tiling gart type thing, I don't
>> think tiling can be invisible to userspace.
>>
> Why is mmap considered to be such a strong use-case for DMABUFs? After
> all we are trying to _avoid_ mmapping shared buffers where ever
> possible.

I don't know if I'd call it a strong use-case, as much as a good
worst-case to consider

>> But if two GPU's have some overlap in supportable tiled formats, and
>> you have one gpu doing app and one doing compositor, and you want to
>> use tiling for the shared buffers, you need something in the wayland
>> protocol to figure out what the common supported formats are between
>> the two sides.  I suppose it shouldn't be too hard to add a
>> standardized (cross-driver) format-negotiation protocol, and in the
>> absence of that fallback to non tiled format for shared buffers.
>>
> I don't see how tiling format negotiation would be easier in userspace
> than in the kernel. If we can come up with a scheme for that, we can as
> well do it in the kernel.

well.. I guess I don't see how it would be easier in the kernel than
in userspace ;-)

But if you can come up with a good way to add it to dev->dma_params
that everyone can agree on, I don't mind.  I suppose I'm starting with
the assumption that it will be difficult/impossible to get everyone to
agree on this, but I don't mind being proved wrong.

>> > I would rather see kernel drivers negotiating those things at dmabuf
>> > attach time in way invisible to userspace. I agree that this negotiation
>> > thing isn't easy to get right for the plethora of different hardware
>> > constraints we see today, but I would rather see this in-kernel, where
>> > we have the chance to fix things up if needed, than in a fixed userspace
>> > interface.
>>
>> Well, if you can think of a sane way to add that to dev->dma_params,
>> and if it isn't visible if userspace mmap's a buffer, then we could
>> handle that in the kernel.  But I don't think that will be the case.
>>
> I'm sure we can come up with something sane to put it in there. The
> userspace mmap thing is a bit complicated to deal with, but I have the
> feeling that going through a slow path if you really need mmap is a
> reasonable thing to do.
> For example we could just make mmap some form of attach that forces
> linear layout if the exporter isn't able to give userspace a linear
> mapping from a tiled buffer by using GART or VM.

I guess I don't object to mmap being a slow path (as long is it
doesn't end up requiring a slow path on hw where this otherwise
wouldn't be needed).

BR,
-R

> Regards,
> Lucas
> --
> Pengutronix e.K.                           | Lucas Stach                 |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
>

^ permalink raw reply

* Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Rob Clark @ 2013-08-06 14:40 UTC (permalink / raw)
  To: Tom Cooksey
  Cc: dri-devel, linux-fbdev, Pawel Moll, linux-arm-kernel, linux-media,
	linaro-mm-sig, linux-kernel
In-Reply-To: <52010257.245fc20a.6ff8.1cfdSMTPIN_ADDED_BROKEN@mx.google.com>

On Tue, Aug 6, 2013 at 10:03 AM, Tom Cooksey <tom.cooksey@arm.com> wrote:
> Hi Rob,
>
>> >> > We may also then have additional constraints when sharing buffers
>> >> > between the display HW and video decode or even camera ISP HW.
>> >> > Programmatically describing buffer allocation constraints is very
>> >> > difficult and I'm not sure you can actually do it - there's some
>> >> > pretty complex constraints out there! E.g. I believe there's a
>> >> > platform where Y and UV planes of the reference frame need to be
>> >> > in separate DRAM banks for real-time 1080p decode, or something
>> >> > like that?
>> >>
>> >> yes, this was discussed.  This is different from pitch/format/size
>> >> constraints.. it is really just a placement constraint (ie. where
>> >> do the physical pages go).  IIRC the conclusion was to use a dummy
>> >> devices with it's own CMA pool for attaching the Y vs UV buffers.
>> >>
>> >> > Anyway, I guess my point is that even if we solve how to allocate
>> >> > buffers which will be shared between the GPU and display HW such
>> >> > that both sets of constraints are satisfied, that may not be the
>> >> > end of the story.
>> >> >
>> >>
>> >> that was part of the reason to punt this problem to userspace ;-)
>> >>
>> >> In practice, the kernel drivers doesn't usually know too much about
>> >> the dimensions/format/etc.. that is really userspace level
>> >> knowledge. There are a few exceptions when the kernel needs to know
>> >> how to setup GTT/etc for tiled buffers, but normally this sort of
>> >> information is up at the next level up (userspace, and
>> >> drm_framebuffer in case of scanout).  Userspace media frameworks
>> >> like GStreamer already have a concept of format/caps negotiation.
>> >> For non-display<->gpu sharing, I think this is probably where this
>> >> sort of constraint negotiation should be handled.
>> >
>> > I agree that user-space will know which devices will access the
>> > buffer and thus can figure out at least a common pixel format.
>> > Though I'm not so sure userspace can figure out more low-level
>> > details like alignment and placement in physical memory, etc.
>> >
>>
>> well, let's divide things up into two categories:
>>
>> 1) the arrangement and format of pixels.. ie. what userspace would
>> need to know if it mmap's a buffer.  This includes pixel format,
>> stride, etc.  This should be negotiated in userspace, it would be
>> crazy to try to do this in the kernel.
>
> Absolutely. Pixel format has to be negotiated by user-space as in
> most cases, user-space can map the buffer and thus will need to
> know how to interpret the data.
>
>
>
>> 2) the physical placement of the pages.  Ie. whether it is contiguous
>> or not.  Which bank the pages in the buffer are placed in, etc.  This
>> is not visible to userspace.
>
> Seems sensible to me.
>
>
>> ... This is the purpose of the attach step,
>> so you know all the devices involved in sharing up front before
>> allocating the backing pages. (Or in the worst case, if you have a
>> "late attacher" you at least know when no device is doing dma access
>> to a buffer and can reallocate and move the buffer.)  A long time
>> back, I had a patch that added a field or two to 'struct
>> device_dma_parameters' so that it could be known if a device required
>> contiguous buffers.. looks like that never got merged, so I'd need to
>> dig that back up and resend it.  But the idea was to have the 'struct
>> device' encapsulate all the information that would be needed to
>> do-the-right-thing when it comes to placement.
>
> As I understand it, it's up to the exporting device to allocate the
> memory backing the dma_buf buffer. I guess the latest possible point
> you can allocate the backing pages is when map_dma_buf is first
> called? At that point the exporter can iterate over the current set
> of attachments, programmatically determine the all the constraints of
> all the attached drivers and attempt to allocate the backing pages
> in such a way as to satisfy all those constraints?

yes, this is the idea..  possibly some room for some helpers to help
out with this, but that is all under the hood from userspace
perspective

> Didn't you say that programmatically describing device placement
> constraints was an unbounded problem? I guess we would have to
> accept that it's not possible to describe all possible constraints
> and instead find a way to describe the common ones?

well, the point I'm trying to make, is by dividing your constraints
into two groups, one that impacts and is handled by userspace, and one
that is in the kernel (ie. where the pages go), you cut down the
number of permutations that the kernel has to care about considerably.
 And kernel already cares about, for example, what range of addresses
that a device can dma to/from.  I think really the only thing missing
is the max # of sglist entries (contiguous or not)

> One problem with this is it duplicates a lot of logic in each
> driver which can export a dma_buf buffer. Each exporter will need to
> do pretty much the same thing: iterate over all the attachments,
> determine of all the constraints (assuming that can be done) and
> allocate pages such that the lowest-common-denominator is satisfied.
>
> Perhaps rather than duplicating that logic in every driver, we could
> Instead move allocation of the backing pages into dma_buf itself?
>

I tend to think it is better to add helpers as we see common patterns
emerge, which drivers can opt-in to using.  I don't think that we
should move allocation into dma_buf itself, but it would perhaps be
useful to have dma_alloc_*() variants that could allocate for multiple
devices.  That would help for simple stuff, although I'd suspect
eventually a GPU driver will move away from that.  (Since you probably
want to play tricks w/ pools of pages that are pre-zero'd and in the
correct cache state, use spare cycles on the gpu or dma engine to
pre-zero uncached pages, and games like that.)

>
>> > Anyway, assuming user-space can figure out how a buffer should be
>> > stored in memory, how does it indicate this to a kernel driver and
>> > actually allocate it? Which ioctl on which device does user-space
>> > call, with what parameters? Are you suggesting using something like
>> > ION which exposes the low-level details of how buffers are laid out
>> in
>> > physical memory to userspace? If not, what?
>>
>> no, userspace should not need to know this.  And having a central
>> driver that knows this for all the other drivers in the system doesn't
>> really solve anything and isn't really scalable.  At best you might
>> want, in some cases, a flag you can pass when allocating.  For
>> example, some of the drivers have a 'SCANOUT' flag that can be passed
>> when allocating a GEM buffer, as a hint to the kernel that 'if this hw
>> requires contig memory for scanout, allocate this buffer contig'.  But
>> really, when it comes to sharing buffers between devices, we want this
>> sort of information in dev->dma_params of the importing device(s).
>
> If you had a single driver which knew the constraints of all devices
> on that particular SoC and the interface allowed user-space to specify
> which devices a buffer is intended to be used with, I guess it could
> pretty trivially allocate pages which satisfy those constraints? It

keep in mind, even a number of SoC's come with pcie these days.  You
already have things like

  https://developer.nvidia.com/content/kayla-platform

You probably want to get out of the SoC mindset, otherwise you are
going to make bad assumptions that come back to bite you later on.

> wouldn't need a way to programmatically describe the constraints
> either: As you say, if userspace sets the "SCANOUT" flag, it would
> just "know" that on this SoC, that buffer needs to be physically
> contiguous for example.

not really.. it just knows it wants to scanout the buffer, and tells
this as a hint to the kernel.

For example, on omapdrm, the SCANOUT flag does nothing on omap4+
(where phys contig is not required for scanout), but causes CMA
(dma_alloc_*()) to be used on omap3.  Userspace doesn't care.  It just
knows that it wants to be able to scanout that particular buffer.

> Though It would effectively mean you'd need an "allocation" driver per
> SoC, which as you say may not be scalable?

Right.. and not actually even possible in the general sense (see SoC +
external pcie gfx card)

BR,
-R

>
>
> Cheers,
>
> Tom
>
>
>
>
>

^ permalink raw reply

* Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Lucas Stach @ 2013-08-06 14:36 UTC (permalink / raw)
  To: Rob Clark
  Cc: Tom Cooksey, linux-fbdev, Pawel Moll, linux-kernel, dri-devel,
	linaro-mm-sig, linux-arm-kernel, linux-media
In-Reply-To: <CAF6AEGvgqsS4Ht6xuACbywTorRiCwHcUaYPji9dRP6AM3Ag-ww@mail.gmail.com>

Am Dienstag, den 06.08.2013, 10:14 -0400 schrieb Rob Clark:
> On Tue, Aug 6, 2013 at 8:18 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
> > Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey:
> >> Hi Rob,
> >>
> >> +lkml
> >>
> >> > >> On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey <tom.cooksey@arm.com>
> >> > >> wrote:
> >> > >> >> >  * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to
> >> > >> >> >    also allocate buffers for the GPU. Still not sure how to
> >> > >> >> >    resolve this as we don't use DRM for our GPU driver.
> >> > >> >>
> >> > >> >> any thoughts/plans about a DRM GPU driver?  Ideally long term
> >> > >> >> (esp. once the dma-fence stuff is in place), we'd have
> >> > >> >> gpu-specific drm (gpu-only, no kms) driver, and SoC/display
> >> > >> >> specific drm/kms driver, using prime/dmabuf to share between
> >> > >> >> the two.
> >> > >> >
> >> > >> > The "extra" buffers we were allocating from armsoc DDX were really
> >> > >> > being allocated through DRM/GEM so we could get an flink name
> >> > >> > for them and pass a reference to them back to our GPU driver on
> >> > >> > the client side. If it weren't for our need to access those
> >> > >> > extra off-screen buffers with the GPU we wouldn't need to
> >> > >> > allocate them with DRM at all. So, given they are really "GPU"
> >> > >> > buffers, it does absolutely make sense to allocate them in a
> >> > >> > different driver to the display driver.
> >> > >> >
> >> > >> > However, to avoid unnecessary memcpys & related cache
> >> > >> > maintenance ops, we'd also like the GPU to render into buffers
> >> > >> > which are scanned out by the display controller. So let's say
> >> > >> > we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
> >> > >> > out buffers with the display's DRM driver but a custom ioctl
> >> > >> > on the GPU's DRM driver to allocate non scanout, off-screen
> >> > >> > buffers. Sounds great, but I don't think that really works
> >> > >> > with DRI2. If we used two drivers to allocate buffers, which
> >> > >> > of those drivers do we return in DRI2ConnectReply? Even if we
> >> > >> > solve that somehow, GEM flink names are name-spaced to a
> >> > >> > single device node (AFAIK). So when we do a DRI2GetBuffers,
> >> > >> > how does the EGL in the client know which DRM device owns GEM
> >> > >> > flink name "1234"? We'd need some pretty dirty hacks.
> >> > >>
> >> > >> You would return the name of the display driver allocating the
> >> > >> buffers.  On the client side you can use generic ioctls to go from
> >> > >> flink -> handle -> dmabuf.  So the client side would end up opening
> >> > >> both the display drm device and the gpu, but without needing to know
> >> > >> too much about the display.
> >> > >
> >> > > I think the bit I was missing was that a GEM bo for a buffer imported
> >> > > using dma_buf/PRIME can still be flink'd. So the display controller's
> >> > > DRM driver allocates scan-out buffers via the DUMB buffer allocate
> >> > > ioctl. Those scan-out buffers than then be exported from the
> >> > > dispaly's DRM driver and imported into the GPU's DRM driver using
> >> > > PRIME. Once imported into the GPU's driver, we can use flink to get a
> >> > > name for that buffer within the GPU DRM driver's name-space to return
> >> > > to the DRI2 client. That same namespace is also what DRI2 back-
> >> > > buffers are allocated from, so I think that could work... Except...
> >> >
> >> > (and.. the general direction is that things will move more to just use
> >> > dmabuf directly, ie. wayland or dri3)
> >>
> >> I agree, DRI2 is the only reason why we need a system-wide ID. I also
> >> prefer buffers to be passed around by dma_buf fd, but we still need to
> >> support DRI2 and will do for some time I expect.
> >>
> >>
> >>
> >> > >> > Anyway, that latter case also gets quite difficult. The "GPU"
> >> > >> > DRM driver would need to know the constraints of the display
> >> > >> > controller when allocating buffers intended to be scanned out.
> >> > >> > For example, pl111 typically isn't behind an IOMMU and so
> >> > >> > requires physically contiguous memory. We'd have to teach the
> >> > >> > GPU's DRM driver about the constraints of the display HW. Not
> >> > >> > exactly a clean driver model. :-(
> >> > >> >
> >> > >> > I'm still a little stuck on how to proceed, so any ideas
> >> > >> > would greatly appreciated! My current train of thought is
> >> > >> > having a kind of SoC-specific DRM driver which allocates
> >> > >> > buffers for both display and GPU within a single GEM
> >> > >> > namespace. That SoC-specific DRM driver could then know the
> >> > >> > constraints of both the GPU and the display HW. We could then
> >> > >> > use PRIME to export buffers allocated with the SoC DRM driver
> >> > >> > and import them into the GPU and/or display DRM driver.
> >> > >>
> >> > >> Usually if the display drm driver is allocating the buffers that
> >> > >> might be scanned out, it just needs to have minimal knowledge of
> >> > >> the GPU (pitch alignment constraints).  I don't think we need a
> >> > >> 3rd device just to allocate buffers.
> >> > >
> >> > > While Mali can render to pretty much any buffer, there is a mild
> >> > > performance improvement to be had if the buffer stride is aligned to
> >> > > the AXI bus's max burst length when drawing to the buffer.
> >> >
> >> > I suspect the display controllers might frequently benefit if the
> >> > pitch is aligned to AXI burst length too..
> >>
> >> If the display controller is going to be reading from linear memory
> >> I don't think it will make much difference - you'll just get an extra
> >> 1-2 bus transactions per scanline. With a tile-based GPU like Mali,
> >> you get those extra transactions per _tile_ scan-line and as such,
> >> the overhead is more pronounced.
> >>
> >>
> >>
> >> > > So in some respects, there is a constraint on how buffers which will
> >> > > be drawn to using the GPU are allocated. I don't really like the idea
> >> > > of teaching the display controller DRM driver about the GPU buffer
> >> > > constraints, even if they are fairly trivial like this. If the same
> >> > > display HW IP is being used on several SoCs, it seems wrong somehow
> >> > > to enforce those GPU constraints if some of those SoCs don't have a
> >> > > GPU.
> >> >
> >> > Well, I suppose you could get min_pitch_alignment from devicetree, or
> >> > something like this..
> >> >
> >> > In the end, the easy solution is just to make the display allocate to
> >> > the worst-case pitch alignment.  In the early days of dma-buf
> >> > discussions, we kicked around the idea of negotiating or
> >> > programatically describing the constraints, but that didn't really
> >> > seem like a bounded problem.
> >>
> >> Yeah - I was around for some of those discussions and agree it's not
> >> really an easy problem to solve.
> >>
> >>
> >>
> >> > > We may also then have additional constraints when sharing buffers
> >> > > between the display HW and video decode or even camera ISP HW.
> >> > > Programmatically describing buffer allocation constraints is very
> >> > > difficult and I'm not sure you can actually do it - there's some
> >> > > pretty complex constraints out there! E.g. I believe there's a
> >> > > platform where Y and UV planes of the reference frame need to be in
> >> > > separate DRAM banks for real-time 1080p decode, or something like
> >> > > that?
> >> >
> >> > yes, this was discussed.  This is different from pitch/format/size
> >> > constraints.. it is really just a placement constraint (ie. where do
> >> > the physical pages go).  IIRC the conclusion was to use a dummy
> >> > devices with it's own CMA pool for attaching the Y vs UV buffers.
> >> >
> >> > > Anyway, I guess my point is that even if we solve how to allocate
> >> > > buffers which will be shared between the GPU and display HW such that
> >> > > both sets of constraints are satisfied, that may not be the end of
> >> > > the story.
> >> > >
> >> >
> >> > that was part of the reason to punt this problem to userspace ;-)
> >> >
> >> > In practice, the kernel drivers doesn't usually know too much about
> >> > the dimensions/format/etc.. that is really userspace level knowledge.
> >> > There are a few exceptions when the kernel needs to know how to setup
> >> > GTT/etc for tiled buffers, but normally this sort of information is up
> >> > at the next level up (userspace, and drm_framebuffer in case of
> >> > scanout).  Userspace media frameworks like GStreamer already have a
> >> > concept of format/caps negotiation.  For non-display<->gpu sharing, I
> >> > think this is probably where this sort of constraint negotiation
> >> > should be handled.
> >>
> >> I agree that user-space will know which devices will access the buffer
> >> and thus can figure out at least a common pixel format. Though I'm not
> >> so sure userspace can figure out more low-level details like alignment
> >> and placement in physical memory, etc.
> >>
> >> Anyway, assuming user-space can figure out how a buffer should be
> >> stored in memory, how does it indicate this to a kernel driver and
> >> actually allocate it? Which ioctl on which device does user-space
> >> call, with what parameters? Are you suggesting using something like
> >> ION which exposes the low-level details of how buffers are laid out in
> >> physical memory to userspace? If not, what?
> >>
> >
> > I strongly disagree with exposing low-level hardware details like tiling
> > to userspace. If we have to do the negotiation of those things in
> > userspace we will end up with having to pipe those information through
> > things like the wayland protocol. I don't see how this could ever be
> > considered a good idea.
> 
> well, unless userspace mmap's via a de-tiling gart type thing, I don't
> think tiling can be invisible to userspace.
> 
Why is mmap considered to be such a strong use-case for DMABUFs? After
all we are trying to _avoid_ mmapping shared buffers where ever
possible.

> But if two GPU's have some overlap in supportable tiled formats, and
> you have one gpu doing app and one doing compositor, and you want to
> use tiling for the shared buffers, you need something in the wayland
> protocol to figure out what the common supported formats are between
> the two sides.  I suppose it shouldn't be too hard to add a
> standardized (cross-driver) format-negotiation protocol, and in the
> absence of that fallback to non tiled format for shared buffers.
> 
I don't see how tiling format negotiation would be easier in userspace
than in the kernel. If we can come up with a scheme for that, we can as
well do it in the kernel.

> > I would rather see kernel drivers negotiating those things at dmabuf
> > attach time in way invisible to userspace. I agree that this negotiation
> > thing isn't easy to get right for the plethora of different hardware
> > constraints we see today, but I would rather see this in-kernel, where
> > we have the chance to fix things up if needed, than in a fixed userspace
> > interface.
> 
> Well, if you can think of a sane way to add that to dev->dma_params,
> and if it isn't visible if userspace mmap's a buffer, then we could
> handle that in the kernel.  But I don't think that will be the case.
> 
I'm sure we can come up with something sane to put it in there. The
userspace mmap thing is a bit complicated to deal with, but I have the
feeling that going through a slow path if you really need mmap is a
reasonable thing to do.
For example we could just make mmap some form of attach that forces
linear layout if the exporter isn't able to give userspace a linear
mapping from a tiled buffer by using GART or VM. 

Regards,
Lucas
-- 
Pengutronix e.K.                           | Lucas Stach                 |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


^ permalink raw reply

* Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Rob Clark @ 2013-08-06 14:14 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Tom Cooksey, linux-fbdev, Pawel Moll, linux-kernel, dri-devel,
	linaro-mm-sig, linux-arm-kernel, linux-media
In-Reply-To: <1375791502.4276.38.camel@weser.hi.pengutronix.de>

On Tue, Aug 6, 2013 at 8:18 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey:
>> Hi Rob,
>>
>> +lkml
>>
>> > >> On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey <tom.cooksey@arm.com>
>> > >> wrote:
>> > >> >> >  * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to
>> > >> >> >    also allocate buffers for the GPU. Still not sure how to
>> > >> >> >    resolve this as we don't use DRM for our GPU driver.
>> > >> >>
>> > >> >> any thoughts/plans about a DRM GPU driver?  Ideally long term
>> > >> >> (esp. once the dma-fence stuff is in place), we'd have
>> > >> >> gpu-specific drm (gpu-only, no kms) driver, and SoC/display
>> > >> >> specific drm/kms driver, using prime/dmabuf to share between
>> > >> >> the two.
>> > >> >
>> > >> > The "extra" buffers we were allocating from armsoc DDX were really
>> > >> > being allocated through DRM/GEM so we could get an flink name
>> > >> > for them and pass a reference to them back to our GPU driver on
>> > >> > the client side. If it weren't for our need to access those
>> > >> > extra off-screen buffers with the GPU we wouldn't need to
>> > >> > allocate them with DRM at all. So, given they are really "GPU"
>> > >> > buffers, it does absolutely make sense to allocate them in a
>> > >> > different driver to the display driver.
>> > >> >
>> > >> > However, to avoid unnecessary memcpys & related cache
>> > >> > maintenance ops, we'd also like the GPU to render into buffers
>> > >> > which are scanned out by the display controller. So let's say
>> > >> > we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
>> > >> > out buffers with the display's DRM driver but a custom ioctl
>> > >> > on the GPU's DRM driver to allocate non scanout, off-screen
>> > >> > buffers. Sounds great, but I don't think that really works
>> > >> > with DRI2. If we used two drivers to allocate buffers, which
>> > >> > of those drivers do we return in DRI2ConnectReply? Even if we
>> > >> > solve that somehow, GEM flink names are name-spaced to a
>> > >> > single device node (AFAIK). So when we do a DRI2GetBuffers,
>> > >> > how does the EGL in the client know which DRM device owns GEM
>> > >> > flink name "1234"? We'd need some pretty dirty hacks.
>> > >>
>> > >> You would return the name of the display driver allocating the
>> > >> buffers.  On the client side you can use generic ioctls to go from
>> > >> flink -> handle -> dmabuf.  So the client side would end up opening
>> > >> both the display drm device and the gpu, but without needing to know
>> > >> too much about the display.
>> > >
>> > > I think the bit I was missing was that a GEM bo for a buffer imported
>> > > using dma_buf/PRIME can still be flink'd. So the display controller's
>> > > DRM driver allocates scan-out buffers via the DUMB buffer allocate
>> > > ioctl. Those scan-out buffers than then be exported from the
>> > > dispaly's DRM driver and imported into the GPU's DRM driver using
>> > > PRIME. Once imported into the GPU's driver, we can use flink to get a
>> > > name for that buffer within the GPU DRM driver's name-space to return
>> > > to the DRI2 client. That same namespace is also what DRI2 back-
>> > > buffers are allocated from, so I think that could work... Except...
>> >
>> > (and.. the general direction is that things will move more to just use
>> > dmabuf directly, ie. wayland or dri3)
>>
>> I agree, DRI2 is the only reason why we need a system-wide ID. I also
>> prefer buffers to be passed around by dma_buf fd, but we still need to
>> support DRI2 and will do for some time I expect.
>>
>>
>>
>> > >> > Anyway, that latter case also gets quite difficult. The "GPU"
>> > >> > DRM driver would need to know the constraints of the display
>> > >> > controller when allocating buffers intended to be scanned out.
>> > >> > For example, pl111 typically isn't behind an IOMMU and so
>> > >> > requires physically contiguous memory. We'd have to teach the
>> > >> > GPU's DRM driver about the constraints of the display HW. Not
>> > >> > exactly a clean driver model. :-(
>> > >> >
>> > >> > I'm still a little stuck on how to proceed, so any ideas
>> > >> > would greatly appreciated! My current train of thought is
>> > >> > having a kind of SoC-specific DRM driver which allocates
>> > >> > buffers for both display and GPU within a single GEM
>> > >> > namespace. That SoC-specific DRM driver could then know the
>> > >> > constraints of both the GPU and the display HW. We could then
>> > >> > use PRIME to export buffers allocated with the SoC DRM driver
>> > >> > and import them into the GPU and/or display DRM driver.
>> > >>
>> > >> Usually if the display drm driver is allocating the buffers that
>> > >> might be scanned out, it just needs to have minimal knowledge of
>> > >> the GPU (pitch alignment constraints).  I don't think we need a
>> > >> 3rd device just to allocate buffers.
>> > >
>> > > While Mali can render to pretty much any buffer, there is a mild
>> > > performance improvement to be had if the buffer stride is aligned to
>> > > the AXI bus's max burst length when drawing to the buffer.
>> >
>> > I suspect the display controllers might frequently benefit if the
>> > pitch is aligned to AXI burst length too..
>>
>> If the display controller is going to be reading from linear memory
>> I don't think it will make much difference - you'll just get an extra
>> 1-2 bus transactions per scanline. With a tile-based GPU like Mali,
>> you get those extra transactions per _tile_ scan-line and as such,
>> the overhead is more pronounced.
>>
>>
>>
>> > > So in some respects, there is a constraint on how buffers which will
>> > > be drawn to using the GPU are allocated. I don't really like the idea
>> > > of teaching the display controller DRM driver about the GPU buffer
>> > > constraints, even if they are fairly trivial like this. If the same
>> > > display HW IP is being used on several SoCs, it seems wrong somehow
>> > > to enforce those GPU constraints if some of those SoCs don't have a
>> > > GPU.
>> >
>> > Well, I suppose you could get min_pitch_alignment from devicetree, or
>> > something like this..
>> >
>> > In the end, the easy solution is just to make the display allocate to
>> > the worst-case pitch alignment.  In the early days of dma-buf
>> > discussions, we kicked around the idea of negotiating or
>> > programatically describing the constraints, but that didn't really
>> > seem like a bounded problem.
>>
>> Yeah - I was around for some of those discussions and agree it's not
>> really an easy problem to solve.
>>
>>
>>
>> > > We may also then have additional constraints when sharing buffers
>> > > between the display HW and video decode or even camera ISP HW.
>> > > Programmatically describing buffer allocation constraints is very
>> > > difficult and I'm not sure you can actually do it - there's some
>> > > pretty complex constraints out there! E.g. I believe there's a
>> > > platform where Y and UV planes of the reference frame need to be in
>> > > separate DRAM banks for real-time 1080p decode, or something like
>> > > that?
>> >
>> > yes, this was discussed.  This is different from pitch/format/size
>> > constraints.. it is really just a placement constraint (ie. where do
>> > the physical pages go).  IIRC the conclusion was to use a dummy
>> > devices with it's own CMA pool for attaching the Y vs UV buffers.
>> >
>> > > Anyway, I guess my point is that even if we solve how to allocate
>> > > buffers which will be shared between the GPU and display HW such that
>> > > both sets of constraints are satisfied, that may not be the end of
>> > > the story.
>> > >
>> >
>> > that was part of the reason to punt this problem to userspace ;-)
>> >
>> > In practice, the kernel drivers doesn't usually know too much about
>> > the dimensions/format/etc.. that is really userspace level knowledge.
>> > There are a few exceptions when the kernel needs to know how to setup
>> > GTT/etc for tiled buffers, but normally this sort of information is up
>> > at the next level up (userspace, and drm_framebuffer in case of
>> > scanout).  Userspace media frameworks like GStreamer already have a
>> > concept of format/caps negotiation.  For non-display<->gpu sharing, I
>> > think this is probably where this sort of constraint negotiation
>> > should be handled.
>>
>> I agree that user-space will know which devices will access the buffer
>> and thus can figure out at least a common pixel format. Though I'm not
>> so sure userspace can figure out more low-level details like alignment
>> and placement in physical memory, etc.
>>
>> Anyway, assuming user-space can figure out how a buffer should be
>> stored in memory, how does it indicate this to a kernel driver and
>> actually allocate it? Which ioctl on which device does user-space
>> call, with what parameters? Are you suggesting using something like
>> ION which exposes the low-level details of how buffers are laid out in
>> physical memory to userspace? If not, what?
>>
>
> I strongly disagree with exposing low-level hardware details like tiling
> to userspace. If we have to do the negotiation of those things in
> userspace we will end up with having to pipe those information through
> things like the wayland protocol. I don't see how this could ever be
> considered a good idea.

well, unless userspace mmap's via a de-tiling gart type thing, I don't
think tiling can be invisible to userspace.

But if two GPU's have some overlap in supportable tiled formats, and
you have one gpu doing app and one doing compositor, and you want to
use tiling for the shared buffers, you need something in the wayland
protocol to figure out what the common supported formats are between
the two sides.  I suppose it shouldn't be too hard to add a
standardized (cross-driver) format-negotiation protocol, and in the
absence of that fallback to non tiled format for shared buffers.

> I would rather see kernel drivers negotiating those things at dmabuf
> attach time in way invisible to userspace. I agree that this negotiation
> thing isn't easy to get right for the plethora of different hardware
> constraints we see today, but I would rather see this in-kernel, where
> we have the chance to fix things up if needed, than in a fixed userspace
> interface.

Well, if you can think of a sane way to add that to dev->dma_params,
and if it isn't visible if userspace mmap's a buffer, then we could
handle that in the kernel.  But I don't think that will be the case.

BR,
-R

>
> Regards,
> Lucas
> --
> Pengutronix e.K.                           | Lucas Stach                 |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
>

^ permalink raw reply

* Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Lucas Stach @ 2013-08-06 12:18 UTC (permalink / raw)
  To: Tom Cooksey
  Cc: 'Rob Clark', linux-fbdev, Pawel Moll, linux-kernel,
	dri-devel, linaro-mm-sig, linux-arm-kernel, linux-media
In-Reply-To: <000101ce9298$8ce44ee0$a6aceca0$@cooksey@arm.com>

Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey:
> Hi Rob,
> 
> +lkml
> 
> > >> On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey <tom.cooksey@arm.com>
> > >> wrote:
> > >> >> >  * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to
> > >> >> >    also allocate buffers for the GPU. Still not sure how to 
> > >> >> >    resolve this as we don't use DRM for our GPU driver.
> > >> >>
> > >> >> any thoughts/plans about a DRM GPU driver?  Ideally long term
> > >> >> (esp. once the dma-fence stuff is in place), we'd have 
> > >> >> gpu-specific drm (gpu-only, no kms) driver, and SoC/display
> > >> >> specific drm/kms driver, using prime/dmabuf to share between
> > >> >> the two.
> > >> >
> > >> > The "extra" buffers we were allocating from armsoc DDX were really
> > >> > being allocated through DRM/GEM so we could get an flink name
> > >> > for them and pass a reference to them back to our GPU driver on
> > >> > the client side. If it weren't for our need to access those
> > >> > extra off-screen buffers with the GPU we wouldn't need to
> > >> > allocate them with DRM at all. So, given they are really "GPU"
> > >> > buffers, it does absolutely make sense to allocate them in a
> > >> > different driver to the display driver.
> > >> >
> > >> > However, to avoid unnecessary memcpys & related cache
> > >> > maintenance ops, we'd also like the GPU to render into buffers
> > >> > which are scanned out by the display controller. So let's say
> > >> > we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
> > >> > out buffers with the display's DRM driver but a custom ioctl
> > >> > on the GPU's DRM driver to allocate non scanout, off-screen
> > >> > buffers. Sounds great, but I don't think that really works
> > >> > with DRI2. If we used two drivers to allocate buffers, which
> > >> > of those drivers do we return in DRI2ConnectReply? Even if we
> > >> > solve that somehow, GEM flink names are name-spaced to a
> > >> > single device node (AFAIK). So when we do a DRI2GetBuffers,
> > >> > how does the EGL in the client know which DRM device owns GEM
> > >> > flink name "1234"? We'd need some pretty dirty hacks.
> > >>
> > >> You would return the name of the display driver allocating the
> > >> buffers.  On the client side you can use generic ioctls to go from
> > >> flink -> handle -> dmabuf.  So the client side would end up opening
> > >> both the display drm device and the gpu, but without needing to know
> > >> too much about the display.
> > >
> > > I think the bit I was missing was that a GEM bo for a buffer imported
> > > using dma_buf/PRIME can still be flink'd. So the display controller's
> > > DRM driver allocates scan-out buffers via the DUMB buffer allocate
> > > ioctl. Those scan-out buffers than then be exported from the
> > > dispaly's DRM driver and imported into the GPU's DRM driver using
> > > PRIME. Once imported into the GPU's driver, we can use flink to get a
> > > name for that buffer within the GPU DRM driver's name-space to return
> > > to the DRI2 client. That same namespace is also what DRI2 back-
> > > buffers are allocated from, so I think that could work... Except...
> > 
> > (and.. the general direction is that things will move more to just use
> > dmabuf directly, ie. wayland or dri3)
> 
> I agree, DRI2 is the only reason why we need a system-wide ID. I also
> prefer buffers to be passed around by dma_buf fd, but we still need to
> support DRI2 and will do for some time I expect.
> 
> 
> 
> > >> > Anyway, that latter case also gets quite difficult. The "GPU"
> > >> > DRM driver would need to know the constraints of the display
> > >> > controller when allocating buffers intended to be scanned out.
> > >> > For example, pl111 typically isn't behind an IOMMU and so
> > >> > requires physically contiguous memory. We'd have to teach the
> > >> > GPU's DRM driver about the constraints of the display HW. Not
> > >> > exactly a clean driver model. :-(
> > >> >
> > >> > I'm still a little stuck on how to proceed, so any ideas
> > >> > would greatly appreciated! My current train of thought is
> > >> > having a kind of SoC-specific DRM driver which allocates
> > >> > buffers for both display and GPU within a single GEM
> > >> > namespace. That SoC-specific DRM driver could then know the
> > >> > constraints of both the GPU and the display HW. We could then
> > >> > use PRIME to export buffers allocated with the SoC DRM driver
> > >> > and import them into the GPU and/or display DRM driver.
> > >>
> > >> Usually if the display drm driver is allocating the buffers that
> > >> might be scanned out, it just needs to have minimal knowledge of 
> > >> the GPU (pitch alignment constraints).  I don't think we need a 
> > >> 3rd device just to allocate buffers.
> > >
> > > While Mali can render to pretty much any buffer, there is a mild
> > > performance improvement to be had if the buffer stride is aligned to
> > > the AXI bus's max burst length when drawing to the buffer.
> > 
> > I suspect the display controllers might frequently benefit if the
> > pitch is aligned to AXI burst length too..
> 
> If the display controller is going to be reading from linear memory
> I don't think it will make much difference - you'll just get an extra
> 1-2 bus transactions per scanline. With a tile-based GPU like Mali,
> you get those extra transactions per _tile_ scan-line and as such,
> the overhead is more pronounced.
> 
> 
> 
> > > So in some respects, there is a constraint on how buffers which will
> > > be drawn to using the GPU are allocated. I don't really like the idea
> > > of teaching the display controller DRM driver about the GPU buffer
> > > constraints, even if they are fairly trivial like this. If the same
> > > display HW IP is being used on several SoCs, it seems wrong somehow
> > > to enforce those GPU constraints if some of those SoCs don't have a
> > > GPU.
> > 
> > Well, I suppose you could get min_pitch_alignment from devicetree, or
> > something like this..
> > 
> > In the end, the easy solution is just to make the display allocate to
> > the worst-case pitch alignment.  In the early days of dma-buf
> > discussions, we kicked around the idea of negotiating or
> > programatically describing the constraints, but that didn't really
> > seem like a bounded problem.
> 
> Yeah - I was around for some of those discussions and agree it's not
> really an easy problem to solve.
> 
> 
> 
> > > We may also then have additional constraints when sharing buffers
> > > between the display HW and video decode or even camera ISP HW.
> > > Programmatically describing buffer allocation constraints is very
> > > difficult and I'm not sure you can actually do it - there's some
> > > pretty complex constraints out there! E.g. I believe there's a
> > > platform where Y and UV planes of the reference frame need to be in
> > > separate DRAM banks for real-time 1080p decode, or something like
> > > that?
> > 
> > yes, this was discussed.  This is different from pitch/format/size
> > constraints.. it is really just a placement constraint (ie. where do
> > the physical pages go).  IIRC the conclusion was to use a dummy
> > devices with it's own CMA pool for attaching the Y vs UV buffers.
> > 
> > > Anyway, I guess my point is that even if we solve how to allocate
> > > buffers which will be shared between the GPU and display HW such that
> > > both sets of constraints are satisfied, that may not be the end of
> > > the story.
> > >
> > 
> > that was part of the reason to punt this problem to userspace ;-)
> >
> > In practice, the kernel drivers doesn't usually know too much about
> > the dimensions/format/etc.. that is really userspace level knowledge.
> > There are a few exceptions when the kernel needs to know how to setup
> > GTT/etc for tiled buffers, but normally this sort of information is up
> > at the next level up (userspace, and drm_framebuffer in case of
> > scanout).  Userspace media frameworks like GStreamer already have a
> > concept of format/caps negotiation.  For non-display<->gpu sharing, I
> > think this is probably where this sort of constraint negotiation
> > should be handled.
> 
> I agree that user-space will know which devices will access the buffer
> and thus can figure out at least a common pixel format. Though I'm not
> so sure userspace can figure out more low-level details like alignment
> and placement in physical memory, etc.
> 
> Anyway, assuming user-space can figure out how a buffer should be 
> stored in memory, how does it indicate this to a kernel driver and 
> actually allocate it? Which ioctl on which device does user-space
> call, with what parameters? Are you suggesting using something like
> ION which exposes the low-level details of how buffers are laid out in
> physical memory to userspace? If not, what?
> 

I strongly disagree with exposing low-level hardware details like tiling
to userspace. If we have to do the negotiation of those things in
userspace we will end up with having to pipe those information through
things like the wayland protocol. I don't see how this could ever be
considered a good idea.

I would rather see kernel drivers negotiating those things at dmabuf
attach time in way invisible to userspace. I agree that this negotiation
thing isn't easy to get right for the plethora of different hardware
constraints we see today, but I would rather see this in-kernel, where
we have the chance to fix things up if needed, than in a fixed userspace
interface.

Regards,
Lucas
-- 
Pengutronix e.K.                           | Lucas Stach                 |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


^ permalink raw reply

* Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Rob Clark @ 2013-08-06 12:15 UTC (permalink / raw)
  To: Tom Cooksey
  Cc: dri-devel, linux-fbdev, Pawel Moll, linux-arm-kernel, linux-media,
	linaro-mm-sig, linux-kernel
In-Reply-To: <5200deb3.0b24b40a.3b26.ffffbadeSMTPIN_ADDED_BROKEN@mx.google.com>

On Tue, Aug 6, 2013 at 7:31 AM, Tom Cooksey <tom.cooksey@arm.com> wrote:
>
>> > So in some respects, there is a constraint on how buffers which will
>> > be drawn to using the GPU are allocated. I don't really like the idea
>> > of teaching the display controller DRM driver about the GPU buffer
>> > constraints, even if they are fairly trivial like this. If the same
>> > display HW IP is being used on several SoCs, it seems wrong somehow
>> > to enforce those GPU constraints if some of those SoCs don't have a
>> > GPU.
>>
>> Well, I suppose you could get min_pitch_alignment from devicetree, or
>> something like this..
>>
>> In the end, the easy solution is just to make the display allocate to
>> the worst-case pitch alignment.  In the early days of dma-buf
>> discussions, we kicked around the idea of negotiating or
>> programatically describing the constraints, but that didn't really
>> seem like a bounded problem.
>
> Yeah - I was around for some of those discussions and agree it's not
> really an easy problem to solve.
>
>
>
>> > We may also then have additional constraints when sharing buffers
>> > between the display HW and video decode or even camera ISP HW.
>> > Programmatically describing buffer allocation constraints is very
>> > difficult and I'm not sure you can actually do it - there's some
>> > pretty complex constraints out there! E.g. I believe there's a
>> > platform where Y and UV planes of the reference frame need to be in
>> > separate DRAM banks for real-time 1080p decode, or something like
>> > that?
>>
>> yes, this was discussed.  This is different from pitch/format/size
>> constraints.. it is really just a placement constraint (ie. where do
>> the physical pages go).  IIRC the conclusion was to use a dummy
>> devices with it's own CMA pool for attaching the Y vs UV buffers.
>>
>> > Anyway, I guess my point is that even if we solve how to allocate
>> > buffers which will be shared between the GPU and display HW such that
>> > both sets of constraints are satisfied, that may not be the end of
>> > the story.
>> >
>>
>> that was part of the reason to punt this problem to userspace ;-)
>>
>> In practice, the kernel drivers doesn't usually know too much about
>> the dimensions/format/etc.. that is really userspace level knowledge.
>> There are a few exceptions when the kernel needs to know how to setup
>> GTT/etc for tiled buffers, but normally this sort of information is up
>> at the next level up (userspace, and drm_framebuffer in case of
>> scanout).  Userspace media frameworks like GStreamer already have a
>> concept of format/caps negotiation.  For non-display<->gpu sharing, I
>> think this is probably where this sort of constraint negotiation
>> should be handled.
>
> I agree that user-space will know which devices will access the buffer
> and thus can figure out at least a common pixel format. Though I'm not
> so sure userspace can figure out more low-level details like alignment
> and placement in physical memory, etc.

well, let's divide things up into two categories:

1) the arrangement and format of pixels.. ie. what userspace would
need to know if it mmap's a buffer.  This includes pixel format,
stride, etc.  This should be negotiated in userspace, it would be
crazy to try to do this in the kernel.

2) the physical placement of the pages.  Ie. whether it is contiguous
or not.  Which bank the pages in the buffer are placed in, etc.  This
is not visible to userspace.  This is the purpose of the attach step,
so you know all the devices involved in sharing up front before
allocating the backing pages.  (Or in the worst case, if you have a
"late attacher" you at least know when no device is doing dma access
to a buffer and can reallocate and move the buffer.)  A long time
back, I had a patch that added a field or two to 'struct
device_dma_parameters' so that it could be known if a device required
contiguous buffers.. looks like that never got merged, so I'd need to
dig that back up and resend it.  But the idea was to have the 'struct
device' encapsulate all the information that would be needed to
do-the-right-thing when it comes to placement.

> Anyway, assuming user-space can figure out how a buffer should be
> stored in memory, how does it indicate this to a kernel driver and
> actually allocate it? Which ioctl on which device does user-space
> call, with what parameters? Are you suggesting using something like
> ION which exposes the low-level details of how buffers are laid out in
> physical memory to userspace? If not, what?

no, userspace should not need to know this.  And having a central
driver that knows this for all the other drivers in the system doesn't
really solve anything and isn't really scalable.  At best you might
want, in some cases, a flag you can pass when allocating.  For
example, some of the drivers have a 'SCANOUT' flag that can be passed
when allocating a GEM buffer, as a hint to the kernel that 'if this hw
requires contig memory for scanout, allocate this buffer contig'.  But
really, when it comes to sharing buffers between devices, we want this
sort of information in dev->dma_params of the importing device(s).

BR,
-R

^ permalink raw reply

* Re: [PATCH 4/5] fb: Add DCU framebuffer driver for Vybrid VF610 platform
From: Lucas Stach @ 2013-08-06  8:47 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1373609276-14566-5-git-send-email-b18965@freescale.com>

Am Freitag, den 12.07.2013, 14:07 +0800 schrieb Alison Wang:
> The Display Controller Unit (DCU) module is a system master that
> fetches graphics stored in internal or external memory and displays
> them on a TFT LCD panel. A wide range of panel sizes is supported
> and the timing of the interface signals is highly configurable.
> Graphics are read directly from memory and then blended in real-time,
> which allows for dynamic content creation with minimal CPU intervention.
> 
> The features:
> 
> (1) Full RGB888 output to TFT LCD panel.
> (2) For the current LCD panel, WQVGA "480x272" is tested.
> (3) Blending of each pixel using up to 4 source layers dependent on size of panel.
> (4) Each graphic layer can be placed with one pixel resolution in either axis.
> (5) Each graphic layer support RGB565 and RGB888 direct colors without alpha channel
> and BGRA8888 direct colors with an alpha channel.
> (6) Each graphic layer support alpha blending with 8-bit resolution.
> 
> This driver has been tested on Vybrid VF610 TOWER board.
> 
> Signed-off-by: Alison Wang <b18965@freescale.com>
> ---
> Changes in v2: None
> 
>  drivers/video/Kconfig      |    9 +
>  drivers/video/Makefile     |    1 +
>  drivers/video/fsl-dcu-fb.c | 1091 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 1101 insertions(+)
>  create mode 100644 drivers/video/fsl-dcu-fb.c
> 
[...]
> +
> +static struct fb_videomode dcu_mode_db[] = {
> +	{
> +		.name		= "480x272",
> +		.refresh	= 75,
> +		.xres		= 480,
> +		.yres		= 272,
> +		.pixclock	= 91996,
> +		.left_margin	= 2,
> +		.right_margin	= 2,
> +		.upper_margin	= 1,
> +		.lower_margin	= 1,
> +		.hsync_len	= 41,
> +		.vsync_len	= 2,
> +		.sync		= FB_SYNC_COMP_HIGH_ACT | FB_SYNC_VERT_HIGH_ACT,
> +		.vmode		= FB_VMODE_NONINTERLACED,
> +	},
> +};
Don't hardcode panel data in the driver. Use the videomode helpers to
get the relevant data from devicetree.

> +
> +/* DCU framebuffer data structure */
> +struct dcu_fb_data {
> +	struct fb_info *fsl_dcu_info[DCU_LAYER_NUM];
> +	void __iomem *reg_base;
> +	unsigned int irq;
> +	struct clk *clk;
> +};
> +
> +struct layer_display_offset {
> +	int x_layer_d;
> +	int y_layer_d;
> +};
> +
> +struct mfb_info {
> +	int index;
> +	char *id;
> +	unsigned long pseudo_palette[16];
> +	unsigned char alpha;
> +	unsigned char blend;
> +	unsigned int count;
> +	int x_layer_d;	/* layer display x offset to physical screen */
> +	int y_layer_d;	/* layer display y offset to physical screen */
> +	struct dcu_fb_data *parent;
> +};
> +
> +enum mfb_index {
> +	LAYER0 = 0,
> +	LAYER1,
> +	LAYER2,
> +	LAYER3,
> +};
Why are there only 4 layers here? I thought the controller supports at
least 6 simultaneous layers?

> +
> +static struct mfb_info mfb_template[] = {
> +	{
> +	.index = LAYER0,
> +	.id = "Layer0",
> +	.alpha = 0xff,
> +	.blend = 0,
> +	.count = 0,
> +	.x_layer_d = 0,
> +	.y_layer_d = 0,
> +	},
> +	{
> +	.index = LAYER1,
> +	.id = "Layer1",
> +	.alpha = 0xff,
> +	.blend = 0,
> +	.count = 0,
> +	.x_layer_d = 50,
> +	.y_layer_d = 50,
> +	},
> +	{
> +	.index = LAYER2,
> +	.id = "Layer2",
> +	.alpha = 0xff,
> +	.blend = 0,
> +	.count = 0,
> +	.x_layer_d = 100,
> +	.y_layer_d = 100,
> +	},
> +	{
> +	.index = LAYER3,
> +	.id = "Layer3",
> +	.alpha = 0xff,
> +	.blend = 0,
> +	.count = 0,
> +	.x_layer_d = 150,
> +	.y_layer_d = 150,
> +	},
> +};
[...]

> +
> +static int calc_div_ratio(struct fb_info *info)
> +{
> +	struct mfb_info *mfbi = info->par;
> +	struct dcu_fb_data *dcufb = mfbi->parent;
> +	unsigned long dcu_clk;
> +	unsigned long long tmp;
> +
> +	dcu_clk = clk_get_rate(dcufb->clk);
> +	tmp = info->var.pixclock * (unsigned long long)dcu_clk;
> +
> +	do_div(tmp, 1000000);
> +
> +	if (do_div(tmp, 1000000) > 500000)
> +		tmp++;
Urgh, you are changing the value of tmp inside the if clause. This isn't
nice. This function as a a whole looks really odd.

> +
> +	tmp = tmp - 1;
> +	return tmp;
> +}
> +
> +static void update_controller(struct fb_info *info)
> +{
> +	struct fb_var_screeninfo *var = &info->var;
> +	struct mfb_info *mfbi = info->par;
> +	struct dcu_fb_data *dcufb = mfbi->parent;
> +	unsigned int ratio;
> +
> +	ratio = calc_div_ratio(info);
> +	writel(ratio, dcufb->reg_base + DCU_DIV_RATIO);
> +
> +	writel(DCU_DISP_SIZE_DELTA_Y(var->yres) |
> +		DCU_DISP_SIZE_DELTA_X(var->xres / 16),
> +		dcufb->reg_base + DCU_DISP_SIZE);
> +
> +	/* Horizontal and vertical sync parameter */
> +	writel(DCU_HSYN_PARA_BP(var->left_margin) |
> +		DCU_HSYN_PARA_PW(var->hsync_len) |
> +		DCU_HSYN_PARA_FP(var->right_margin),
> +		dcufb->reg_base + DCU_HSYN_PARA);
> +
> +	writel(DCU_VSYN_PARA_BP(var->upper_margin) |
> +		DCU_VSYN_PARA_PW(var->vsync_len) |
> +		DCU_VSYN_PARA_FP(var->lower_margin),
> +		dcufb->reg_base + DCU_VSYN_PARA);
> +
> +	writel(DCU_SYN_POL_INV_PXCK_FALL | DCU_SYN_POL_NEG_REMAIN |
> +		DCU_SYN_POL_INV_VS_LOW | DCU_SYN_POL_INV_HS_LOW,
> +		dcufb->reg_base + DCU_SYN_POL);
> +
> +	writel(DCU_BGND_R(0) | DCU_BGND_G(0) | DCU_BGND_B(0),
> +		dcufb->reg_base + DCU_BGND);
> +
> +	writel(DCU_MODE_BLEND_ITER(DCU_LAYER_NUM_MAX) | DCU_MODE_RASTER_EN,
> +			dcufb->reg_base + DCU_DCU_MODE);
> +
> +	writel(DCU_THRESHOLD_LS_BF_VS(0x3) | DCU_THRESHOLD_OUT_BUF_HIGH(0x78) |
> +		DCU_THRESHOLD_OUT_BUF_LOW(0), dcufb->reg_base + DCU_THRESHOLD);
> +
> +	enable_controller(info);
> +}
> +
> +static int map_video_memory(struct fb_info *info)
> +{
> +	u32 smem_len = info->fix.line_length * info->var.yres_virtual;
> +
> +	info->fix.smem_len = smem_len;
> +
> +	info->screen_base = dma_alloc_coherent(info->device, info->fix.smem_len,
You are setting up an uncached mapping here. Use a writecombined mapping
to help performance. It's still coherent, but allows at least write to
be fast.

> +		(dma_addr_t *)&info->fix.smem_start, GFP_KERNEL);
> +	if (!info->screen_base) {
> +		printk(KERN_ERR "unable to allocate fb memory\n");
> +		return -ENOMEM;
> +	}
> +
> +	memset(info->screen_base, 0, info->fix.smem_len);
> +
> +	return 0;
> +}
> +
> +static void unmap_video_memory(struct fb_info *info)
> +{
> +	if (!info->screen_base)
> +		return;
> +
> +	dma_free_coherent(info->device, info->fix.smem_len,
> +		info->screen_base, info->fix.smem_start);
> +
> +	info->screen_base = NULL;
> +	info->fix.smem_start = 0;
> +	info->fix.smem_len = 0;
> +}
[...]

> +
> +static int fsl_dcu_open(struct fb_info *info, int user)
> +{
> +	struct mfb_info *mfbi = info->par;
> +	int ret = 0;
> +
> +	mfbi->index = info->node;
> +
> +	mfbi->count++;
> +	if (mfbi->count = 1) {
> +		fsl_dcu_check_var(&info->var, info);
> +		ret = fsl_dcu_set_par(info);
> +		if (ret < 0)
> +			mfbi->count--;
> +		else
> +			enable_interrupts(mfbi->parent);
> +	}
> +
> +	return ret;
> +}
> +
> +static int fsl_dcu_release(struct fb_info *info, int user)
> +{
> +	struct mfb_info *mfbi = info->par;
> +	int ret = 0;
> +
> +	mfbi->count--;
> +	if (mfbi->count = 0) {
> +		ret = disable_panel(info);
> +		if (ret < 0)
> +			mfbi->count++;
> +	}
> +
> +	return ret;
> +}
Could this be replaced by runtime pm?

> +
> +static struct fb_ops fsl_dcu_ops = {
> +	.owner = THIS_MODULE,
> +	.fb_check_var = fsl_dcu_check_var,
> +	.fb_set_par = fsl_dcu_set_par,
> +	.fb_setcolreg = fsl_dcu_setcolreg,
> +	.fb_blank = fsl_dcu_blank,
> +	.fb_pan_display = fsl_dcu_pan_display,
> +	.fb_fillrect = cfb_fillrect,
> +	.fb_copyarea = cfb_copyarea,
> +	.fb_imageblit = cfb_imageblit,
> +	.fb_ioctl = fsl_dcu_ioctl,
> +	.fb_open = fsl_dcu_open,
> +	.fb_release = fsl_dcu_release,
> +};
> +
> +static int install_framebuffer(struct fb_info *info)
> +{
> +	struct mfb_info *mfbi = info->par;
> +	struct fb_videomode *db = dcu_mode_db;
> +	unsigned int dbsize = ARRAY_SIZE(dcu_mode_db);
> +	int ret;
> +
> +	info->var.activate = FB_ACTIVATE_NOW;
> +	info->fbops = &fsl_dcu_ops;
> +	info->flags = FBINFO_FLAG_DEFAULT;
> +	info->pseudo_palette = &mfbi->pseudo_palette;
> +
> +	fb_alloc_cmap(&info->cmap, 16, 0);
> +
> +	ret = fb_find_mode(&info->var, info, fb_mode, db, dbsize,
> +			NULL, default_bpp);
> +
> +	if (fsl_dcu_check_var(&info->var, info)) {
> +		ret = -EINVAL;
> +		goto failed_checkvar;
> +	}
> +
> +	if (register_framebuffer(info) < 0) {
> +		ret = -EINVAL;
> +		goto failed_register_framebuffer;
> +	}
> +
> +	printk(KERN_INFO "fb%d: %s fb device registered successfully.\n",
> +		info->node, info->fix.id);
> +	return 0;
> +
> +failed_checkvar:
> +	fb_dealloc_cmap(&info->cmap);
> +failed_register_framebuffer:
> +	unmap_video_memory(info);
> +	fb_dealloc_cmap(&info->cmap);
> +	return ret;
> +}
> +
> +static void uninstall_framebuffer(struct fb_info *info)
> +{
> +	unregister_framebuffer(info);
> +	unmap_video_memory(info);
> +
> +	if (&info->cmap)
> +		fb_dealloc_cmap(&info->cmap);
> +}
> +
> +static irqreturn_t fsl_dcu_irq(int irq, void *dev_id)
> +{
> +	struct dcu_fb_data *dcufb = dev_id;
> +	unsigned int status = readl(dcufb->reg_base + DCU_INT_STATUS);
> +	u32 dcu_mode;
> +
> +	if (status) {
> +		if (status & DCU_INT_STATUS_UNDRUN) {
> +			dcu_mode = readl(dcufb->reg_base + DCU_DCU_MODE);
> +			dcu_mode &= ~DCU_MODE_DCU_MODE_MASK;
> +			writel(dcu_mode | DCU_MODE_DCU_MODE(DCU_MODE_OFF),
> +				dcufb->reg_base + DCU_DCU_MODE);
> +			udelay(1);
> +			writel(dcu_mode | DCU_MODE_DCU_MODE(DCU_MODE_NORMAL),
> +				dcufb->reg_base + DCU_DCU_MODE);
> +		}
> +		writel(status, dcufb->reg_base + DCU_INT_STATUS);
> +		return IRQ_HANDLED;
> +	}
> +	return IRQ_NONE;
> +}
> +
> +#ifdef CONFIG_PM
> +static int fsl_dcu_suspend(struct platform_device *pdev,
> +		pm_message_t state)
> +{
> +	struct dcu_fb_data *dcufb = dev_get_drvdata(&pdev->dev);
> +
> +	clk_disable_unprepare(dcufb->clk);
> +	return 0;
> +}
> +
> +static int fsl_dcu_resume(struct platform_device *pdev)
> +{
> +	struct dcu_fb_data *dcufb = dev_get_drvdata(&pdev->dev);
> +
> +	clk_prepare_enable(dcufb->clk);
> +	return 0;
> +}
> +#else
> +#define fsl_dcu_suspend	NULL
> +#define fsl_dcu_resume	NULL
> +#endif
> +
Could this be replaced by runtime pm?

> +static int bypass_tcon(struct device_node *np)
> +{
> +	struct device_node *tcon_np;
> +	struct platform_device *tcon_pdev;
> +	struct clk *tcon_clk;
> +	void __iomem *tcon_reg;
> +	int ret = 0;
> +
> +	tcon_np = of_parse_phandle(np, "tcon-controller", 0);
> +	if (!tcon_np)
> +		return -EINVAL;
> +
> +	tcon_pdev = of_find_device_by_node(tcon_np);
> +	if (!tcon_pdev)
> +		return -EINVAL;
> +
> +	tcon_clk = devm_clk_get(&tcon_pdev->dev, "tcon");
> +	if (IS_ERR(tcon_clk)) {
> +		ret = PTR_ERR(tcon_clk);
> +		goto failed_getclock;
> +	}
> +	clk_prepare_enable(tcon_clk);
> +
> +	tcon_reg = of_iomap(tcon_np, 0);
> +	if (!tcon_reg) {
> +		ret = -ENOMEM;
> +		goto failed_ioremap;
> +	}
> +	writel(TCON_BYPASS_ENABLE, tcon_reg + TCON_CTRL1);
> +
> +	return 0;
> +
> +failed_ioremap:
> +	clk_disable_unprepare(tcon_clk);
> +failed_getclock:
> +	of_node_put(tcon_np);
> +	return ret;
> +}
> +
Is the framebuffer driver the only user of tcon? If not you should not
map the memory here, but rather use something like regmap syscon.

[...]

Regards,
Lucas

-- 
Pengutronix e.K.                           | Lucas Stach                 |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


^ permalink raw reply

* RE: [PATCH 4/5] fb: Add DCU framebuffer driver for Vybrid VF610 platform
From: Wang Huan-B18965 @ 2013-08-06  7:20 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1375712322.4276.12.camel@weser.hi.pengutronix.de>

SGksDQoNCj4gQW0gTW9udGFnLCBkZW4gMDUuMDguMjAxMywgMTU6MDkgKzAyMDAgc2NocmllYiBS
b2JlcnQgU2Nod2ViZWw6DQo+ID4gT24gTW9uLCBBdWcgMDUsIDIwMTMgYXQgMDk6NTE6NDBBTSAr
MDAwMCwgV2FuZyBIdWFuLUIxODk2NSB3cm90ZToNCj4gPiA+ID4gT24gRnJpLCBKdWwgMTIsIDIw
MTMgYXQgMDI6MDc6NTVQTSArMDgwMCwgQWxpc29uIFdhbmcgd3JvdGU6DQo+ID4gPiA+ID4gVGhl
IERpc3BsYXkgQ29udHJvbGxlciBVbml0IChEQ1UpIG1vZHVsZSBpcyBhIHN5c3RlbSBtYXN0ZXIN
Cj4gdGhhdA0KPiA+ID4gPiA+IGZldGNoZXMgZ3JhcGhpY3Mgc3RvcmVkIGluIGludGVybmFsIG9y
IGV4dGVybmFsIG1lbW9yeSBhbmQNCj4gPiA+ID4gPiBkaXNwbGF5cyB0aGVtIG9uIGEgVEZUIExD
RCBwYW5lbC4gQSB3aWRlIHJhbmdlIG9mIHBhbmVsIHNpemVzDQo+IGlzDQo+ID4gPiA+ID4gc3Vw
cG9ydGVkIGFuZCB0aGUgdGltaW5nIG9mIHRoZSBpbnRlcmZhY2Ugc2lnbmFscyBpcyBoaWdobHkN
Cj4gY29uZmlndXJhYmxlLg0KPiA+ID4gPiA+IEdyYXBoaWNzIGFyZSByZWFkIGRpcmVjdGx5IGZy
b20gbWVtb3J5IGFuZCB0aGVuIGJsZW5kZWQgaW4NCj4gPiA+ID4gPiByZWFsLXRpbWUsIHdoaWNo
IGFsbG93cyBmb3IgZHluYW1pYyBjb250ZW50IGNyZWF0aW9uIHdpdGgNCj4gPiA+ID4gPiBtaW5p
bWFsIENQVQ0KPiA+ID4gPiBpbnRlcnZlbnRpb24uDQo+ID4gPiA+DQo+ID4gPiA+IE9ubHkgYSBy
ZXZpZXcgb2YgdGhlIGNvZGUgaW5saW5lLg0KPiA+ID4gPg0KPiA+ID4gPiBNYXliZSB0aGUgcmVh
bCBxdWVzdGlvbiBpcyB3aGV0aGVyIHdlIHdhbnQgdG8gaW50cm9kdWNlIGFub3RoZXINCj4gPiA+
ID4gZnJhbWVidWZmZXIgZHJpdmVyIGF0IGFsbCBpbnN0ZWFkIG9mIG1ha2luZyBpdCBhIERSTSBk
cml2ZXIuDQo+ID4gPiBbQWxpc29uIFdhbmddIEkgdGhpbmsgRENVIG1vZHVsZSBpcyBtb3JlIHN1
aXRhYmxlIHRvIGJlIGRlc2lnbmVkIGFzDQo+IGEgZnJhbWVidWZmZXIgZHJpdmVyIHRoYW4gYSBE
Uk0gZHJpdmVyLiBKdXN0IGxpa2UgRElVIGZyYW1lYnVmZmVyDQo+IGRyaXZlciBmb3IgUG93ZXJQ
Qy4NCj4gPg0KPiA+IFdlIGxvb2tlZCBhdCB0aGUgVnlicmlkIGRhdGFzaGVldCBhbmQgaXQncyBE
Q1Ugc2VjdGlvbiBsYXN0IHdlZWssIGFuZA0KPiA+IHdpdGggaXRzIDY0IHBsYW5lcywgdGhlIGNv
bnRyb2xsZXIgcmVhbGx5IHdhbnRzIHRvIGdldCBhIERSTSBkcml2ZXIuDQo+ID4NCj4gRXhhY3Rs
eSwgd2l0aCBpdCdzIGV4dGVuc2l2ZSBoYXJkd2FyZSBjb21wb3NpdGlvbiBjYXBhYmlsaXRpZXMg
dGhlDQo+IGNvbnRyb2xsZXIgYmVncyBmb3IgYmVpbmcgZHJpdmVuIGJ5IGEgcHJvcGVyIERSTSBk
cml2ZXIuIFRoZXJlIGlzIG5vDQo+IHdheSBpbiBmYmRldiB0byBwcm9wZXJseSBzdXBwb3J0IGFs
bCB0aG9zZSBoYXJkd2FyZSBwbGFuZXMgd2l0aG91dA0KPiBpbnRyb2R1Y2luZyBhIGxvdCBvZiBu
b24gc3RhbmRhcmQgaW9jdGxzLCB3aGljaCBpbiB0dXJuIHdpbGwgZm9yY2UgeW91DQo+IGludG8g
d3JpdGluZyBhIGxvdCBvZiBjdXN0b20gdXNlcnNwYWNlIGNvZGUgaW5zdGVhZCBvZiBydW5uaW5n
IHByb3Zlbg0KPiB0ZWNobm9sb2d5IHRoYXQgc2l0cyBvbiB0b3Agb2YgS01TLg0KPiANCj4gRG9p
bmcgdGhpbmdzIGluIERSTSBtaWdodCBiZSBzbGlnaHRseSBtb3JlIHdvcmsgZm9yIHRoZSBpbml0
aWFsIGJyaW5nDQo+IHVwLCBidXQgd2lsbCBwYXkgb2ZmIGluIHRoZSBsb25nIHJ1biB3aGVuIHlv
dSBhcmUgZ29pbmcgdG8gcmVhbGx5IHVzZQ0KPiB0aGUgaGFyZHdhcmUgY2FwcyBvZiB0aGUgY29u
dHJvbGxlci4NCj4gDQpbQWxpc29uIFdhbmddIFRoYW5rcyBmb3IgeW91ciBjb21tZW50cyBhbmQg
ZXhwbGFuYXRpb25zLiBJIHVuZGVyc3RhbmQgeW91ciBjb25zaWRlcmF0aW9ucy4NCg0KSW4gRENV
IG1vZHVsZSwgdGhlcmUgYXJlIDY0IGdyYXBoaWNzIGxheWVycywgYnV0IHRoZSBtYXhpbXVtIHNv
dXJjZSBsYXllcnMgZm9yIGJsZW5kaW5nIG9mIGVhY2ggcGl4ZWwgaXMgb25seSA2LiBUaGUgZmIg
ZHJpdmVyIGNvdWxkIHNhdGlzZnkgdGhlIGN1cnJlbnQgYXBwbGljYXRpb24gcmVxdWlyZW1lbnRz
Lg0KDQpTbyBJIHdpbGwgcHVzaCBvdXQgdGhpcyBmYiBkcml2ZXIgZmlyc3QsIGFuZCB0aGVuIEkg
d2lsbCBhZGQgYSBEUk0gZHJpdmVyIGZvciBtb3JlIGFkdmFuY2VkIHJlcXVpcmVtZW50cy4NCg0K
RG8geW91IGhhdmUgbW9yZSBjb21tZW50cz8NCg0KDQpCZXN0IFJlZ2FyZHMsDQpBbGlzb24gV2Fu
Zw0K


^ permalink raw reply

* RE: [PATCH 4/5] fb: Add DCU framebuffer driver for Vybrid VF610 platform
From: Wang Huan-B18965 @ 2013-08-06  3:42 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <20130805100630.GQ26614@pengutronix.de>

PiBPbiBNb24sIEF1ZyAwNSwgMjAxMyBhdCAwOTo1MTo0MEFNICswMDAwLCBXYW5nIEh1YW4tQjE4
OTY1IHdyb3RlOg0KPiA+IEhpLCBTYXNjaGEsDQo+ID4NCj4gPiA+IE9uIEZyaSwgSnVsIDEyLCAy
MDEzIGF0IDAyOjA3OjU1UE0gKzA4MDAsIEFsaXNvbiBXYW5nIHdyb3RlOg0KPiA+ID4gPiBUaGUg
RGlzcGxheSBDb250cm9sbGVyIFVuaXQgKERDVSkgbW9kdWxlIGlzIGEgc3lzdGVtIG1hc3RlciB0
aGF0DQo+ID4gPiA+IGZldGNoZXMgZ3JhcGhpY3Mgc3RvcmVkIGluIGludGVybmFsIG9yIGV4dGVy
bmFsIG1lbW9yeSBhbmQNCj4gPiA+ID4gZGlzcGxheXMgdGhlbSBvbiBhIFRGVCBMQ0QgcGFuZWwu
IEEgd2lkZSByYW5nZSBvZiBwYW5lbCBzaXplcyBpcw0KPiA+ID4gPiBzdXBwb3J0ZWQgYW5kIHRo
ZSB0aW1pbmcgb2YgdGhlIGludGVyZmFjZSBzaWduYWxzIGlzIGhpZ2hseQ0KPiBjb25maWd1cmFi
bGUuDQo+ID4gPiA+IEdyYXBoaWNzIGFyZSByZWFkIGRpcmVjdGx5IGZyb20gbWVtb3J5IGFuZCB0
aGVuIGJsZW5kZWQgaW4NCj4gPiA+ID4gcmVhbC10aW1lLCB3aGljaCBhbGxvd3MgZm9yIGR5bmFt
aWMgY29udGVudCBjcmVhdGlvbiB3aXRoIG1pbmltYWwNCj4gPiA+ID4gQ1BVDQo+ID4gPiBpbnRl
cnZlbnRpb24uDQo+ID4gPg0KPiA+ID4gT25seSBhIHJldmlldyBvZiB0aGUgY29kZSBpbmxpbmUu
DQo+ID4gPg0KPiA+ID4gTWF5YmUgdGhlIHJlYWwgcXVlc3Rpb24gaXMgd2hldGhlciB3ZSB3YW50
IHRvIGludHJvZHVjZSBhbm90aGVyDQo+ID4gPiBmcmFtZWJ1ZmZlciBkcml2ZXIgYXQgYWxsIGlu
c3RlYWQgb2YgbWFraW5nIGl0IGEgRFJNIGRyaXZlci4NCj4gPiBbQWxpc29uIFdhbmddIEkgdGhp
bmsgRENVIG1vZHVsZSBpcyBtb3JlIHN1aXRhYmxlIHRvIGJlIGRlc2lnbmVkIGFzIGENCj4gZnJh
bWVidWZmZXIgZHJpdmVyIHRoYW4gYSBEUk0gZHJpdmVyLiBKdXN0IGxpa2UgRElVIGZyYW1lYnVm
ZmVyIGRyaXZlcg0KPiBmb3IgUG93ZXJQQy4NCj4gPiA+DQo+ID4gPiA+ICsNCj4gPiA+ID4gKyNk
ZWZpbmUgRFJJVkVSX05BTUUJCQkiZnNsLWRjdS1mYiINCj4gPiA+ID4gKw0KPiA+ID4gPiArI2Rl
ZmluZSBEQ1VfRENVX01PREUJCQkweDAwMTANCj4gPiA+ID4gKyNkZWZpbmUgRENVX01PREVfQkxF
TkRfSVRFUih4KQkJKHggPDwgMjApDQo+ID4gPg0KPiA+ID4gV2hhdCdzIHRoZSByZXN1bHQgb2Yg
RENVX01PREVfQkxFTkRfSVRFUigxICsgMSk/DQo+ID4gW0FsaXNvbiBXYW5nXSBJcyBpdCByZWFs
bHkgbmVjZXNzYXJ5PyBJIGRvbid0IHVzZSB0aGlzIG1hY3JvIGxpa2UNCj4gRENVX01PREVfQkxF
TkRfSVRFUigxICsgMSksIGp1c3QgdXNlIERDVV9NT0RFX0JMRU5EX0lURVIoMikuDQo+IA0KPiA8
aXJvbnk+DQo+IE5vIGl0J3Mgbm90IG5lY2Vzc2FyeS4gV2UgY2FuIGp1c3QgYWRkIGluY29ycmVj
dCBtYWNyb3MgZXZlcnl3aGVyZSBhbmQNCj4gZml4IHRoZW0gYXMgbmVjZXNzYXJ5IGFmdGVyIGEg
bG9uZyBidWcgc3F1YXNoaW5nIHNlc3Npb24gPC9pcm9ueT4NCj4gDQo+IFlFUyEgVGhpcyBpcyBu
ZWNlc3NhcnkuDQpbQWxpc29uIFdhbmddIE9rLCBJIHdpbGwgY2hhbmdlIGl0IGFzIGZvbGxvd3Mu
DQojZGVmaW5lIERDVV9NT0RFX0JMRU5EX0lURVIoeCkJKCh4KSA8PCAyMCkNCj4gDQo+ID4gPiA+
ICsJcmV0dXJuIDA7DQo+ID4gPiA+ICt9DQo+ID4gPiA+ICsNCj4gPiA+ID4gK3N0YXRpYyBpbnQg
Y2FsY19kaXZfcmF0aW8oc3RydWN0IGZiX2luZm8gKmluZm8pIHsNCj4gPiA+DQo+ID4gPiBVc2Ug
YSBjb25zaXN0ZW50IG5hbWVzcGFjZSBmb3IgZnVuY3Rpb24gbmFtZXMgKGZzbF9kY3VfKQ0KPiA+
IFtBbGlzb24gV2FuZ10gSXMgaXQgbmVjZXNzYXJ5IHRvIHVzZSBhIGNvbnNpc3RlbnQgbmFtZXNw
YWNlIGZvciB0aGlzDQo+IGdlbmVyaWMgZnVuY3Rpb24/IEkgdGhpbmsgaXQgaXMgbmVjZXNzYXJ5
IHRvIHVzZSBhIGNvbnNpc3RlbnQgbmFtZXNwYWNlDQo+IChmc2xfZGN1XykgZm9yIHRoZSBmdW5j
dGlvbiBuYW1lcyBpbiBzdHJ1Y3R1cmUgZnNsX2RjdV9vcHMuDQo+IA0KPiBUaGlzIGZ1bmN0aW9u
IGNhbGN1bGF0ZXMgc29tZSBkaXZpZGVyIG91dCBvZiBhIHN0cnVjdCBmYl9pbmZvLiBUaGlzIGlz
DQo+IGJ5IG5vIG1lYW5zIGdlbmVyaWMuDQpbQWxpc29uIFdhbmddIE9rLCBJIHdpbGwgdXNlIGEg
Y29uc2lzdGVudCBuYW1lc3BhY2UgZm9yIHRoaXMgZnVuY3Rpb24uDQo+IA0KPiA+ID4gPiArCQlp
ZiAoY29weV90b191c2VyKGJ1ZiwgJmFscGhhLCBzaXplb2YoYWxwaGEpKSkNCj4gPiA+ID4gKwkJ
CXJldHVybiAtRUZBVUxUOw0KPiA+ID4gPiArCQlicmVhazsNCj4gPiA+ID4gKwljYXNlIE1GQl9T
RVRfQUxQSEE6DQo+ID4gPiA+ICsJCWlmIChjb3B5X2Zyb21fdXNlcigmYWxwaGEsIGJ1Ziwgc2l6
ZW9mKGFscGhhKSkpDQo+ID4gPiA+ICsJCQlyZXR1cm4gLUVGQVVMVDsNCj4gPiA+ID4gKwkJbWZi
aS0+YmxlbmQgPSAxOw0KPiA+ID4gPiArCQltZmJpLT5hbHBoYSA9IGFscGhhOw0KPiA+ID4gPiAr
CQlmc2xfZGN1X3NldF9wYXIoaW5mbyk7DQo+ID4gPiA+ICsJCWJyZWFrOw0KPiA+ID4NCj4gPiA+
IElzIGl0IHN0aWxsIHN0YXRlIG9mIHRoZSBhcnQgdG8gaW50cm9kdWNlIGlvY3RscyBpbiB0aGUg
ZmINCj4gZnJhbWV3b3JrDQo+ID4gPiB3aXRob3V0IGFueSBraW5kIG9mIGRvY3VtZW50YXRpb24/
DQo+ID4gW0FsaXNvbiBXYW5nXSBEbyB5b3UgbWVhbiBJIG5lZWQgdG8gYWRkIGEgZG9jdW1lbnRh
dGlvbiBpbg0KPiBEb2N1bWVudGF0aW9uL2ZiLz8NCj4gDQo+IFRoaXMgd2FzIG1vcmUgYSBxdWVz
dGlvbiB0byB0aGUgZmIgbWFpbnRhaW5lcnMuDQo+IA0KPiA+ID4gPiArc3RhdGljIGlycXJldHVy
bl90IGZzbF9kY3VfaXJxKGludCBpcnEsIHZvaWQgKmRldl9pZCkgew0KPiA+ID4gPiArCXN0cnVj
dCBkY3VfZmJfZGF0YSAqZGN1ZmIgPSBkZXZfaWQ7DQo+ID4gPiA+ICsJdW5zaWduZWQgaW50IHN0
YXR1cyA9IHJlYWRsKGRjdWZiLT5yZWdfYmFzZSArDQo+IERDVV9JTlRfU1RBVFVTKTsNCj4gPiA+
ID4gKwl1MzIgZGN1X21vZGU7DQo+ID4gPiA+ICsNCj4gPiA+ID4gKwlpZiAoc3RhdHVzKSB7DQo+
ID4gPg0KPiA+ID4gU2F2ZSBpbmRlbmRhdGlvbiBsZXZlbCBieSBiYWlsaW5nIG91dCBlYXJseS4N
Cj4gPiBbQWxpc29uIFdhbmddIFdoYXQgZG8geW91IG1lYW4/DQo+IA0KPiANCj4gSW5zdGVhZCBv
ZiBkb2luZzoNCj4gDQo+IAlpZiAoYmxhKSB7DQo+IAkJZG90aGlzKCk7DQo+IAkJZG90aGF0KCk7
DQo+IAkJcmV0dXJuOw0KPiAJfQ0KPiANCj4gCXJldHVybjsNCj4gDQo+IGl0J3MgZWFzaWVyIHRv
IHJlYWQ6DQo+IA0KPiAJaWYgKCFibGEpDQo+IAkJcmV0dXJuOw0KPiANCj4gCWRvdGhpcygpOw0K
PiAJZG90aGF0KCk7DQo+IAlyZXR1cm47DQo+IA0KPiBOb3RlIHRoYXQgZG90aGlzKCkgYW5kIGRv
dGhhdCgpIGFyZSBvbmUgaW5kZW50YXRpb24gbGV2ZWwgdG8gdGhlIGxlZnQNCj4gYW5kIHRodXMg
aGF2ZSBtb3JlIHNwYWNlIHBlciBsaW5lLg0KW0FsaXNvbiBXYW5nXSBJIHNlZS4gVGhhbmtzLg0K
DQoNCkJlc3QgUmVnYXJkcywNCkFsaXNvbiBXYW5nDQo

^ permalink raw reply

* Re: [PATCH] fbdev: suppress warning when assigning vga-save/restore base
From: David Miller @ 2013-08-05 22:29 UTC (permalink / raw)
  To: santiago
  Cc: hpa, dh.herrmann, linux-fbdev, linux-kernel, plagnioj,
	tomi.valkeinen, linux
In-Reply-To: <20130805202955.GH22789@localhost>

From: Ondrej Zajicek <santiago@crfreenet.org>
Date: Mon, 5 Aug 2013 22:29:55 +0200

> On Sun, Aug 04, 2013 at 06:51:46PM -0700, David Miller wrote:
>> From: "H. Peter Anvin" <hpa@zytor.com>
>> Date: Sun, 04 Aug 2013 10:33:46 -0700
>> 
>> > Anyone who can dig backwards and summarize?  In other words:
>> > 
>> > Where in the current code do we stuff a physical address in a pointer,
>> > or a virtual address into a non-pointer?
>> 
>> The VGA register accessors try to accomodate iomem and ioport
>> accesses.
>> 
>> If they are given a non-NULL iomem pointer 'regbase' they use
>> iomem accesses, otherwise they do direct ISA port poking.
>> 
>> And yes the drivers in question are making some brash assumptions.
>> I suspect they should be using ioremap() or similar.
> 
> Well, these drivers were written without MMIO (port IO only with NULL
> instead of 'vgabase' for VGA register accessors). They were converted in
> patches 94c322c30bd14ae6cdd369cb4a1f94c5c3809ac9,
> f8645933513c65ac55f23c63b2649097289795c6 and a few others (from David
> Miller) to potentially use MMIO by using 'vgabase' instead of NULL:
> 
>  pcibios_bus_to_resource(dev, &vga_res, &bus_reg);
>  par->state.vgabase = (void __iomem *) vga_res.start;
> 
> How this could even work? AFAIK these cards have to be explicitly programmed
> to enable MMIO (which was not done in the patches). These patches claim that
> it is for multi-domain PCI. I would guess that vgabase is NULL in common
> configurations but if it is non-NULL, it probably wouldn't work, unless
> there is some hardware magic that transparently converts MMIO (from CPU PoV)
> to port IO (from card/PCI PoV).
> 
> Note that there are some later patches (86c0f043a737dadf034a4e6f29aefb074f4a1146)
> from Ondrej Zary that independently enable MMIO, but they use par->mmio
> field instead of par->state.vgabase .

Ok, so the correct fix would be to make them pass NULL again and to simply
ignore the resources.

^ permalink raw reply

* [PATCH v3 20/20] video: da8xx-fb: adding am33xx as dependency
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
  To: linux-fbdev

Updating Kconfig to allow am33xx to include lcdc fbdev driver
including some extra dependencies needed by device tree mods.

v2: must add FB_MODE_HELPERS as VIDEOMODE_HELPERS alone doesn't
	get the correct functions from fbmon.c built in.

Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
 drivers/video/Kconfig |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index 2e937bd..f36c2f7 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -2226,15 +2226,17 @@ config FB_SH7760
 	  panels <= 320 pixel horizontal resolution.
 
 config FB_DA8XX
-	tristate "DA8xx/OMAP-L1xx Framebuffer support"
-	depends on FB && ARCH_DAVINCI_DA8XX
+	tristate "DA8xx/OMAP-L1xx/AM335x Framebuffer support"
+	depends on FB && (ARCH_DAVINCI_DA8XX || SOC_AM33XX)
 	select FB_CFB_FILLRECT
 	select FB_CFB_COPYAREA
 	select FB_CFB_IMAGEBLIT
 	select FB_CFB_REV_PIXELS_IN_BYTE
+	select FB_MODE_HELPERS
+	select VIDEOMODE_HELPERS
 	---help---
 	  This is the frame buffer device driver for the TI LCD controller
-	  found on DA8xx/OMAP-L1xx SoCs.
+	  found on DA8xx/OMAP-L1xx/AM335x SoCs.
 	  If unsure, say N.
 
 config FB_VIRTUAL
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH v3 19/20] video: da8xx-fb: let compiler decide what to inline
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
  To: linux-fbdev

Remove use of explicit inline compiler directive and let the compiler
make the decision as per Documentation/CodingStyle.

Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
 drivers/video/da8xx-fb.c |   17 ++++++++---------
 1 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/video/da8xx-fb.c b/drivers/video/da8xx-fb.c
index 8c0ca11..ea9771f 100644
--- a/drivers/video/da8xx-fb.c
+++ b/drivers/video/da8xx-fb.c
@@ -141,12 +141,12 @@ static irq_handler_t lcdc_irq_handler;
 static wait_queue_head_t frame_done_wq;
 static int frame_done_flag;
 
-static inline unsigned int lcdc_read(unsigned int addr)
+static unsigned int lcdc_read(unsigned int addr)
 {
 	return (unsigned int)__raw_readl(da8xx_fb_reg_base + (addr));
 }
 
-static inline void lcdc_write(unsigned int val, unsigned int addr)
+static void lcdc_write(unsigned int val, unsigned int addr)
 {
 	__raw_writel(val, da8xx_fb_reg_base + (addr));
 }
@@ -245,13 +245,13 @@ static struct fb_videomode known_lcd_panels[] = {
 	},
 };
 
-static inline bool da8xx_fb_is_raster_enabled(void)
+static bool da8xx_fb_is_raster_enabled(void)
 {
 	return !!(lcdc_read(LCD_RASTER_CTRL_REG) & LCD_RASTER_ENABLE);
 }
 
 /* Enable the Raster Engine of the LCD Controller */
-static inline void lcd_enable_raster(void)
+static void lcd_enable_raster(void)
 {
 	u32 reg;
 
@@ -273,8 +273,7 @@ static inline void lcd_enable_raster(void)
 }
 
 /* Disable the Raster Engine of the LCD Controller */
-static inline void lcd_disable_raster(enum da8xx_frame_complete
-					wait_for_frame_done)
+static void lcd_disable_raster(enum da8xx_frame_complete wait_for_frame_done)
 {
 	u32 reg;
 	int ret;
@@ -750,7 +749,7 @@ static int da8xx_fb_calc_config_clk_divider(struct da8xx_fb_par *par,
 	return da8xx_fb_config_clk_divider(par, lcdc_clk_div, lcdc_clk_rate);
 }
 
-static inline unsigned da8xx_fb_round_clk(struct da8xx_fb_par *par,
+static unsigned da8xx_fb_round_clk(struct da8xx_fb_par *par,
 					  unsigned pixclock)
 {
 	unsigned lcdc_clk_div, lcdc_clk_rate;
@@ -1043,7 +1042,7 @@ static int lcd_da8xx_cpufreq_transition(struct notifier_block *nb,
 	return 0;
 }
 
-static inline int lcd_da8xx_cpufreq_register(struct da8xx_fb_par *par)
+static int lcd_da8xx_cpufreq_register(struct da8xx_fb_par *par)
 {
 	par->freq_transition.notifier_call = lcd_da8xx_cpufreq_transition;
 
@@ -1051,7 +1050,7 @@ static inline int lcd_da8xx_cpufreq_register(struct da8xx_fb_par *par)
 					 CPUFREQ_TRANSITION_NOTIFIER);
 }
 
-static inline void lcd_da8xx_cpufreq_deregister(struct da8xx_fb_par *par)
+static void lcd_da8xx_cpufreq_deregister(struct da8xx_fb_par *par)
 {
 	cpufreq_unregister_notifier(&par->freq_transition,
 				    CPUFREQ_TRANSITION_NOTIFIER);
-- 
1.7.0.4


^ permalink raw reply related

* [PATCH v3 18/20] video: da8xx-fb: make clock naming consistent
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
  To: linux-fbdev

Clean up the code, so that the names of the various clock variables
are consistent to it is clear what variable is associated with what
clock.

Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
 drivers/video/da8xx-fb.c |   24 ++++++++++++------------
 1 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/video/da8xx-fb.c b/drivers/video/da8xx-fb.c
index dfe7572..8c0ca11 100644
--- a/drivers/video/da8xx-fb.c
+++ b/drivers/video/da8xx-fb.c
@@ -177,7 +177,7 @@ struct da8xx_fb_par {
 #ifdef CONFIG_CPU_FREQ
 	struct notifier_block	freq_transition;
 #endif
-	unsigned int		lcd_fck_rate;
+	unsigned int		lcdc_clk_rate;
 	void (*panel_power_ctrl)(int);
 	u32 pseudo_palette[16];
 	struct fb_videomode	mode;
@@ -693,7 +693,7 @@ static int da8xx_fb_config_clk_divider(struct da8xx_fb_par *par,
 {
 	int ret;
 
-	if (par->lcd_fck_rate != lcdc_clk_rate) {
+	if (par->lcdc_clk_rate != lcdc_clk_rate) {
 		ret = clk_set_rate(par->lcdc_clk, lcdc_clk_rate);
 		if (IS_ERR_VALUE(ret)) {
 			dev_err(par->dev,
@@ -701,7 +701,7 @@ static int da8xx_fb_config_clk_divider(struct da8xx_fb_par *par,
 				lcdc_clk_rate);
 			return ret;
 		}
-		par->lcd_fck_rate = clk_get_rate(par->lcdc_clk);
+		par->lcdc_clk_rate = clk_get_rate(par->lcdc_clk);
 	}
 
 	/* Configure the LCD clock divisor. */
@@ -723,7 +723,7 @@ static unsigned int da8xx_fb_calc_clk_divider(struct da8xx_fb_par *par,
 
 	pixclock = PICOS2KHZ(pixclock) * 1000;
 
-	*lcdc_clk_rate = par->lcd_fck_rate;
+	*lcdc_clk_rate = par->lcdc_clk_rate;
 
 	if (pixclock < (*lcdc_clk_rate / CLK_MAX_DIV)) {
 		*lcdc_clk_rate = clk_round_rate(par->lcdc_clk,
@@ -1031,8 +1031,8 @@ static int lcd_da8xx_cpufreq_transition(struct notifier_block *nb,
 
 	par = container_of(nb, struct da8xx_fb_par, freq_transition);
 	if (val = CPUFREQ_POSTCHANGE) {
-		if (par->lcd_fck_rate != clk_get_rate(par->lcdc_clk)) {
-			par->lcd_fck_rate = clk_get_rate(par->lcdc_clk);
+		if (par->lcdc_clk_rate != clk_get_rate(par->lcdc_clk)) {
+			par->lcdc_clk_rate = clk_get_rate(par->lcdc_clk);
 			lcd_disable_raster(DA8XX_FRAME_WAIT);
 			da8xx_fb_calc_config_clk_divider(par, &par->mode);
 			if (par->blank = FB_BLANK_UNBLANK)
@@ -1327,8 +1327,8 @@ static int fb_probe(struct platform_device *device)
 	struct lcd_ctrl_config *lcd_cfg;
 	struct fb_videomode *lcdc_info;
 	struct fb_info *da8xx_fb_info;
-	struct clk *fb_clk = NULL;
 	struct da8xx_fb_par *par;
+	struct clk *tmp_lcdc_clk;
 	int ret;
 	unsigned long ulcm;
 
@@ -1346,10 +1346,10 @@ static int fb_probe(struct platform_device *device)
 	if (IS_ERR(da8xx_fb_reg_base))
 		return PTR_ERR(da8xx_fb_reg_base);
 
-	fb_clk = devm_clk_get(&device->dev, "fck");
-	if (IS_ERR(fb_clk)) {
+	tmp_lcdc_clk = devm_clk_get(&device->dev, "fck");
+	if (IS_ERR(tmp_lcdc_clk)) {
 		dev_err(&device->dev, "Can not get device clock\n");
-		return PTR_ERR(fb_clk);
+		return PTR_ERR(tmp_lcdc_clk);
 	}
 
 	pm_runtime_enable(&device->dev);
@@ -1389,8 +1389,8 @@ static int fb_probe(struct platform_device *device)
 
 	par = da8xx_fb_info->par;
 	par->dev = &device->dev;
-	par->lcdc_clk = fb_clk;
-	par->lcd_fck_rate = clk_get_rate(fb_clk);
+	par->lcdc_clk = tmp_lcdc_clk;
+	par->lcdc_clk_rate = clk_get_rate(par->lcdc_clk);
 	if (fb_pdata->panel_power_ctrl) {
 		par->panel_power_ctrl = fb_pdata->panel_power_ctrl;
 		par->panel_power_ctrl(1);
-- 
1.7.0.4


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox