* Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Rob Clark @ 2013-08-06 19:06 UTC (permalink / raw)
To: Tom Cooksey
Cc: dri-devel, linux-fbdev, Pawel Moll, linux-arm-kernel, linux-media,
linaro-mm-sig, linux-kernel
In-Reply-To: <52013482.e107c20a.27f9.ffffa718SMTPIN_ADDED_BROKEN@mx.google.com>
On Tue, Aug 6, 2013 at 1:38 PM, Tom Cooksey <tom.cooksey@arm.com> wrote:
>
>> >> ... This is the purpose of the attach step,
>> >> so you know all the devices involved in sharing up front before
>> >> allocating the backing pages. (Or in the worst case, if you have a
>> >> "late attacher" you at least know when no device is doing dma access
>> >> to a buffer and can reallocate and move the buffer.) A long time
>> >> back, I had a patch that added a field or two to 'struct
>> >> device_dma_parameters' so that it could be known if a device
>> >> required contiguous buffers.. looks like that never got merged, so
>> >> I'd need to dig that back up and resend it. But the idea was to
>> >> have the 'struct device' encapsulate all the information that would
>> >> be needed to do-the-right-thing when it comes to placement.
>> >
>> > As I understand it, it's up to the exporting device to allocate the
>> > memory backing the dma_buf buffer. I guess the latest possible point
>> > you can allocate the backing pages is when map_dma_buf is first
>> > called? At that point the exporter can iterate over the current set
>> > of attachments, programmatically determine the all the constraints of
>> > all the attached drivers and attempt to allocate the backing pages
>> > in such a way as to satisfy all those constraints?
>>
>> yes, this is the idea.. possibly some room for some helpers to help
>> out with this, but that is all under the hood from userspace
>> perspective
>>
>> > Didn't you say that programmatically describing device placement
>> > constraints was an unbounded problem? I guess we would have to
>> > accept that it's not possible to describe all possible constraints
>> > and instead find a way to describe the common ones?
>>
>> well, the point I'm trying to make, is by dividing your constraints
>> into two groups, one that impacts and is handled by userspace, and one
>> that is in the kernel (ie. where the pages go), you cut down the
>> number of permutations that the kernel has to care about considerably.
>> And kernel already cares about, for example, what range of addresses
>> that a device can dma to/from. I think really the only thing missing
>> is the max # of sglist entries (contiguous or not)
>
> I think it's more than physically contiguous or not.
>
> For example, it can be more efficient to use large page sizes on
> devices with IOMMUs to reduce TLB traffic. I think the size and even
> the availability of large pages varies between different IOMMUs.
sure.. but I suppose if we can spiff out dma_params to express "I need
contiguous", perhaps we can add some way to express "I prefer
as-contiguous-as-possible".. either way, this is about where the pages
are placed, and not about the layout of pixels within the page, so
should be in kernel. It's something that is missing, but I believe
that it belongs in dma_params and hidden behind dma_alloc_*() for
simple drivers.
> There's also the issue of buffer stride alignment. As I say, if the
> buffer is to be written by a tile-based GPU like Mali, it's more
> efficient if the buffer's stride is aligned to the max AXI bus burst
> length. Though I guess a buffer stride only makes sense as a concept
> when interpreting the data as a linear-layout 2D image, so perhaps
> belongs in user-space along with format negotiation?
>
Yeah.. this isn't about where the pages go, but about the arrangement
within a page.
And, well, except for hw that supports the same tiling (or
compressed-fb) in display+gpu, you probably aren't sharing tiled
buffers.
>
>> > One problem with this is it duplicates a lot of logic in each
>> > driver which can export a dma_buf buffer. Each exporter will need to
>> > do pretty much the same thing: iterate over all the attachments,
>> > determine of all the constraints (assuming that can be done) and
>> > allocate pages such that the lowest-common-denominator is satisfied.
>> >
>> > Perhaps rather than duplicating that logic in every driver, we could
>> > Instead move allocation of the backing pages into dma_buf itself?
>> >
>>
>> I tend to think it is better to add helpers as we see common patterns
>> emerge, which drivers can opt-in to using. I don't think that we
>> should move allocation into dma_buf itself, but it would perhaps be
>> useful to have dma_alloc_*() variants that could allocate for multiple
>> devices.
>
> A helper could work I guess, though I quite like the idea of having
> dma_alloc_*() variants which take a list of devices to allocate memory
> for.
>
>
>> That would help for simple stuff, although I'd suspect
>> eventually a GPU driver will move away from that. (Since you probably
>> want to play tricks w/ pools of pages that are pre-zero'd and in the
>> correct cache state, use spare cycles on the gpu or dma engine to
>> pre-zero uncached pages, and games like that.)
>
> So presumably you're talking about a GPU driver being the exporter
> here? If so, how could the GPU driver do these kind of tricks on
> memory shared with another device?
>
Yes, that is gpu-as-exporter. If someone else is allocating buffers,
it is up to them to do these tricks or not. Probably there is a
pretty good chance that if you aren't a GPU you don't need those sort
of tricks for fast allocation of transient upload buffers, staging
textures, temporary pixmaps, etc. Ie. I don't really think a v4l
camera or video decoder would benefit from that sort of optimization.
>
>> >> > Anyway, assuming user-space can figure out how a buffer should be
>> >> > stored in memory, how does it indicate this to a kernel driver and
>> >> > actually allocate it? Which ioctl on which device does user-space
>> >> > call, with what parameters? Are you suggesting using something
>> >> > like ION which exposes the low-level details of how buffers are
>> > >> laid out in physical memory to userspace? If not, what?
>> >> >
>> >>
>> >> no, userspace should not need to know this. And having a central
>> >> driver that knows this for all the other drivers in the system
>> >> doesn't really solve anything and isn't really scalable. At best
>> >> you might want, in some cases, a flag you can pass when allocating.
>> >> For example, some of the drivers have a 'SCANOUT' flag that can be
>> >> passed when allocating a GEM buffer, as a hint to the kernel that
>> >> 'if this hw requires contig memory for scanout, allocate this
>> >> buffer contig'. But really, when it comes to sharing buffers
>> >> between devices, we want this sort of information in dev->dma_params
>> >> of the importing device(s).
>> >
>> > If you had a single driver which knew the constraints of all devices
>> > on that particular SoC and the interface allowed user-space to
>> > specify which devices a buffer is intended to be used with, I guess
>> > it could pretty trivially allocate pages which satisfy those
> constraints?
>>
>> keep in mind, even a number of SoC's come with pcie these days. You
>> already have things like
>>
>> https://developer.nvidia.com/content/kayla-platform
>>
>> You probably want to get out of the SoC mindset, otherwise you are
>> going to make bad assumptions that come back to bite you later on.
>
> Sure - there are always going to be PC-like devices where the
> hardware configuration isn't fixed like it is on a traditional SoC.
> But I'd rather have a simple solution which works on traditional SoCs
> than no solution at all. Today our solution is to over-load the dumb
> buffer alloc functions of the display's DRM driver - For now I'm just
> looking for the next step up from that! ;-)
>
True.. the original intention, which is perhaps a bit desktop-centric,
really was for there to be a userspace component talking to the drm
driver for allocation, ie. xf86-video-foo and/or
src/gallium/drivers/foo (for example) ;-)
Which means for x11 having a SoC vendor specific xf86-video-foo for
x11.. or vendor specific gbm implementation for wayland. (Although
at least in the latter case it is a pretty small piece of code.) But
that is probably what you are trying to avoid.
At any rate, for both xorg and wayland/gbm, you know when a buffer is
going to be a scanout buffer. What I'd recommend is define a small
userspace API that your customers (the SoC vendors) implement to
allocate a scanout buffer and hand you back a dmabuf fd. That could
be used both for x11 and for gbm. Inputs should be requested
width/height and format. And outputs pitch plus dmabuf fd.
(Actually you might even just want to use gbm as your starting point.
You could probably just use gbm from xf86-video-armsoc for allocation,
to have one thing that works for both wayland and x11. Scanout and
cursor buffers should go to vendor/SoC specific fxn, rest can be
allocated from mali kernel driver.)
>
>> > wouldn't need a way to programmatically describe the constraints
>> > either: As you say, if userspace sets the "SCANOUT" flag, it would
>> > just "know" that on this SoC, that buffer needs to be physically
>> > contiguous for example.
>>
>> not really.. it just knows it wants to scanout the buffer, and tells
>> this as a hint to the kernel.
>>
>> For example, on omapdrm, the SCANOUT flag does nothing on omap4+
>> (where phys contig is not required for scanout), but causes CMA
>> (dma_alloc_*()) to be used on omap3. Userspace doesn't care. It just
>> knows that it wants to be able to scanout that particular buffer.
>
> I think that's the idea? The omap3's allocator driver would use
> contiguous memory when it detects the SCANOUT flag whereas the omap4
> allocator driver wouldn't have to. No complex negotiation of
> constraints - it just "knows".
>
well, it is same allocating driver in both cases (although maybe that
is unimportant). The "it" that just knows it wants to scanout is
userspace. The "it" that just knows that scanout translates to
contiguous (or not) is the kernel. Perhaps we are saying the same
thing ;-)
BR,
-R
>
> Cheers,
>
> Tom
>
>
>
>
^ permalink raw reply
* kernel mode splash screen
From: Cliff Brake @ 2013-08-06 16:15 UTC (permalink / raw)
To: linux-fbdev
Hello,
We are working on an embedded system where we need immediate feedback
on the screen in response to power button events, etc. We also need
to be able to display the splash screen whenever kernel is running.
User space splash screens get killed too early on shutdown, etc.
Does anyone know of a kernel mode splash screen that will:
1) display an image
2) display status messages
3) display progress bar (nice to have, but not required)
4) can be accessed from kernel and user space
5) when enabled, will disable writes from user space
6) FB based
Thanks,
Cliff
--
========http://bec-systems.com
^ permalink raw reply
* Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Daniel Vetter @ 2013-08-06 15:28 UTC (permalink / raw)
To: Lucas Stach
Cc: Tom Cooksey, Linux Fbdev development list, Pawel Moll,
Linux Kernel Mailing List, dri-devel,
linaro-mm-sig@lists.linaro.org,
linux-arm-kernel@lists.infradead.org, linux-media@vger.kernel.org
In-Reply-To: <1375791502.4276.38.camel@weser.hi.pengutronix.de>
On Tue, Aug 6, 2013 at 2:18 PM, Lucas Stach <l.stach@pengutronix.de> wrote:
> I strongly disagree with exposing low-level hardware details like tiling
> to userspace. If we have to do the negotiation of those things in
> userspace we will end up with having to pipe those information through
> things like the wayland protocol. I don't see how this could ever be
> considered a good idea.
>
> I would rather see kernel drivers negotiating those things at dmabuf
> attach time in way invisible to userspace. I agree that this negotiation
> thing isn't easy to get right for the plethora of different hardware
> constraints we see today, but I would rather see this in-kernel, where
> we have the chance to fix things up if needed, than in a fixed userspace
> interface.
The problem with negotiating tiling in the kernel for at least
drm/i915 is that userspace needs to know the tiling and for allocating
actually be in charge of it. The approach used by prime thus far is to
simply use linear buffers between different gpus. If you want
something better I guess you need to have a fairly tightly integrate
userspace driver, so I don't see how passing around the tiling
information should be an issue for those cases.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
^ permalink raw reply
* Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Rob Clark @ 2013-08-06 14:59 UTC (permalink / raw)
To: Lucas Stach
Cc: Tom Cooksey, linux-fbdev, Pawel Moll, linux-kernel, dri-devel,
linaro-mm-sig, linux-arm-kernel, linux-media
In-Reply-To: <1375799770.4276.47.camel@weser.hi.pengutronix.de>
On Tue, Aug 6, 2013 at 10:36 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 06.08.2013, 10:14 -0400 schrieb Rob Clark:
>> On Tue, Aug 6, 2013 at 8:18 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> > Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey:
>> >> Hi Rob,
>> >>
>> >> +lkml
>> >>
>> >> > >> On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey <tom.cooksey@arm.com>
>> >> > >> wrote:
>> >> > >> >> > * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to
>> >> > >> >> > also allocate buffers for the GPU. Still not sure how to
>> >> > >> >> > resolve this as we don't use DRM for our GPU driver.
>> >> > >> >>
>> >> > >> >> any thoughts/plans about a DRM GPU driver? Ideally long term
>> >> > >> >> (esp. once the dma-fence stuff is in place), we'd have
>> >> > >> >> gpu-specific drm (gpu-only, no kms) driver, and SoC/display
>> >> > >> >> specific drm/kms driver, using prime/dmabuf to share between
>> >> > >> >> the two.
>> >> > >> >
>> >> > >> > The "extra" buffers we were allocating from armsoc DDX were really
>> >> > >> > being allocated through DRM/GEM so we could get an flink name
>> >> > >> > for them and pass a reference to them back to our GPU driver on
>> >> > >> > the client side. If it weren't for our need to access those
>> >> > >> > extra off-screen buffers with the GPU we wouldn't need to
>> >> > >> > allocate them with DRM at all. So, given they are really "GPU"
>> >> > >> > buffers, it does absolutely make sense to allocate them in a
>> >> > >> > different driver to the display driver.
>> >> > >> >
>> >> > >> > However, to avoid unnecessary memcpys & related cache
>> >> > >> > maintenance ops, we'd also like the GPU to render into buffers
>> >> > >> > which are scanned out by the display controller. So let's say
>> >> > >> > we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
>> >> > >> > out buffers with the display's DRM driver but a custom ioctl
>> >> > >> > on the GPU's DRM driver to allocate non scanout, off-screen
>> >> > >> > buffers. Sounds great, but I don't think that really works
>> >> > >> > with DRI2. If we used two drivers to allocate buffers, which
>> >> > >> > of those drivers do we return in DRI2ConnectReply? Even if we
>> >> > >> > solve that somehow, GEM flink names are name-spaced to a
>> >> > >> > single device node (AFAIK). So when we do a DRI2GetBuffers,
>> >> > >> > how does the EGL in the client know which DRM device owns GEM
>> >> > >> > flink name "1234"? We'd need some pretty dirty hacks.
>> >> > >>
>> >> > >> You would return the name of the display driver allocating the
>> >> > >> buffers. On the client side you can use generic ioctls to go from
>> >> > >> flink -> handle -> dmabuf. So the client side would end up opening
>> >> > >> both the display drm device and the gpu, but without needing to know
>> >> > >> too much about the display.
>> >> > >
>> >> > > I think the bit I was missing was that a GEM bo for a buffer imported
>> >> > > using dma_buf/PRIME can still be flink'd. So the display controller's
>> >> > > DRM driver allocates scan-out buffers via the DUMB buffer allocate
>> >> > > ioctl. Those scan-out buffers than then be exported from the
>> >> > > dispaly's DRM driver and imported into the GPU's DRM driver using
>> >> > > PRIME. Once imported into the GPU's driver, we can use flink to get a
>> >> > > name for that buffer within the GPU DRM driver's name-space to return
>> >> > > to the DRI2 client. That same namespace is also what DRI2 back-
>> >> > > buffers are allocated from, so I think that could work... Except...
>> >> >
>> >> > (and.. the general direction is that things will move more to just use
>> >> > dmabuf directly, ie. wayland or dri3)
>> >>
>> >> I agree, DRI2 is the only reason why we need a system-wide ID. I also
>> >> prefer buffers to be passed around by dma_buf fd, but we still need to
>> >> support DRI2 and will do for some time I expect.
>> >>
>> >>
>> >>
>> >> > >> > Anyway, that latter case also gets quite difficult. The "GPU"
>> >> > >> > DRM driver would need to know the constraints of the display
>> >> > >> > controller when allocating buffers intended to be scanned out.
>> >> > >> > For example, pl111 typically isn't behind an IOMMU and so
>> >> > >> > requires physically contiguous memory. We'd have to teach the
>> >> > >> > GPU's DRM driver about the constraints of the display HW. Not
>> >> > >> > exactly a clean driver model. :-(
>> >> > >> >
>> >> > >> > I'm still a little stuck on how to proceed, so any ideas
>> >> > >> > would greatly appreciated! My current train of thought is
>> >> > >> > having a kind of SoC-specific DRM driver which allocates
>> >> > >> > buffers for both display and GPU within a single GEM
>> >> > >> > namespace. That SoC-specific DRM driver could then know the
>> >> > >> > constraints of both the GPU and the display HW. We could then
>> >> > >> > use PRIME to export buffers allocated with the SoC DRM driver
>> >> > >> > and import them into the GPU and/or display DRM driver.
>> >> > >>
>> >> > >> Usually if the display drm driver is allocating the buffers that
>> >> > >> might be scanned out, it just needs to have minimal knowledge of
>> >> > >> the GPU (pitch alignment constraints). I don't think we need a
>> >> > >> 3rd device just to allocate buffers.
>> >> > >
>> >> > > While Mali can render to pretty much any buffer, there is a mild
>> >> > > performance improvement to be had if the buffer stride is aligned to
>> >> > > the AXI bus's max burst length when drawing to the buffer.
>> >> >
>> >> > I suspect the display controllers might frequently benefit if the
>> >> > pitch is aligned to AXI burst length too..
>> >>
>> >> If the display controller is going to be reading from linear memory
>> >> I don't think it will make much difference - you'll just get an extra
>> >> 1-2 bus transactions per scanline. With a tile-based GPU like Mali,
>> >> you get those extra transactions per _tile_ scan-line and as such,
>> >> the overhead is more pronounced.
>> >>
>> >>
>> >>
>> >> > > So in some respects, there is a constraint on how buffers which will
>> >> > > be drawn to using the GPU are allocated. I don't really like the idea
>> >> > > of teaching the display controller DRM driver about the GPU buffer
>> >> > > constraints, even if they are fairly trivial like this. If the same
>> >> > > display HW IP is being used on several SoCs, it seems wrong somehow
>> >> > > to enforce those GPU constraints if some of those SoCs don't have a
>> >> > > GPU.
>> >> >
>> >> > Well, I suppose you could get min_pitch_alignment from devicetree, or
>> >> > something like this..
>> >> >
>> >> > In the end, the easy solution is just to make the display allocate to
>> >> > the worst-case pitch alignment. In the early days of dma-buf
>> >> > discussions, we kicked around the idea of negotiating or
>> >> > programatically describing the constraints, but that didn't really
>> >> > seem like a bounded problem.
>> >>
>> >> Yeah - I was around for some of those discussions and agree it's not
>> >> really an easy problem to solve.
>> >>
>> >>
>> >>
>> >> > > We may also then have additional constraints when sharing buffers
>> >> > > between the display HW and video decode or even camera ISP HW.
>> >> > > Programmatically describing buffer allocation constraints is very
>> >> > > difficult and I'm not sure you can actually do it - there's some
>> >> > > pretty complex constraints out there! E.g. I believe there's a
>> >> > > platform where Y and UV planes of the reference frame need to be in
>> >> > > separate DRAM banks for real-time 1080p decode, or something like
>> >> > > that?
>> >> >
>> >> > yes, this was discussed. This is different from pitch/format/size
>> >> > constraints.. it is really just a placement constraint (ie. where do
>> >> > the physical pages go). IIRC the conclusion was to use a dummy
>> >> > devices with it's own CMA pool for attaching the Y vs UV buffers.
>> >> >
>> >> > > Anyway, I guess my point is that even if we solve how to allocate
>> >> > > buffers which will be shared between the GPU and display HW such that
>> >> > > both sets of constraints are satisfied, that may not be the end of
>> >> > > the story.
>> >> > >
>> >> >
>> >> > that was part of the reason to punt this problem to userspace ;-)
>> >> >
>> >> > In practice, the kernel drivers doesn't usually know too much about
>> >> > the dimensions/format/etc.. that is really userspace level knowledge.
>> >> > There are a few exceptions when the kernel needs to know how to setup
>> >> > GTT/etc for tiled buffers, but normally this sort of information is up
>> >> > at the next level up (userspace, and drm_framebuffer in case of
>> >> > scanout). Userspace media frameworks like GStreamer already have a
>> >> > concept of format/caps negotiation. For non-display<->gpu sharing, I
>> >> > think this is probably where this sort of constraint negotiation
>> >> > should be handled.
>> >>
>> >> I agree that user-space will know which devices will access the buffer
>> >> and thus can figure out at least a common pixel format. Though I'm not
>> >> so sure userspace can figure out more low-level details like alignment
>> >> and placement in physical memory, etc.
>> >>
>> >> Anyway, assuming user-space can figure out how a buffer should be
>> >> stored in memory, how does it indicate this to a kernel driver and
>> >> actually allocate it? Which ioctl on which device does user-space
>> >> call, with what parameters? Are you suggesting using something like
>> >> ION which exposes the low-level details of how buffers are laid out in
>> >> physical memory to userspace? If not, what?
>> >>
>> >
>> > I strongly disagree with exposing low-level hardware details like tiling
>> > to userspace. If we have to do the negotiation of those things in
>> > userspace we will end up with having to pipe those information through
>> > things like the wayland protocol. I don't see how this could ever be
>> > considered a good idea.
>>
>> well, unless userspace mmap's via a de-tiling gart type thing, I don't
>> think tiling can be invisible to userspace.
>>
> Why is mmap considered to be such a strong use-case for DMABUFs? After
> all we are trying to _avoid_ mmapping shared buffers where ever
> possible.
I don't know if I'd call it a strong use-case, as much as a good
worst-case to consider
>> But if two GPU's have some overlap in supportable tiled formats, and
>> you have one gpu doing app and one doing compositor, and you want to
>> use tiling for the shared buffers, you need something in the wayland
>> protocol to figure out what the common supported formats are between
>> the two sides. I suppose it shouldn't be too hard to add a
>> standardized (cross-driver) format-negotiation protocol, and in the
>> absence of that fallback to non tiled format for shared buffers.
>>
> I don't see how tiling format negotiation would be easier in userspace
> than in the kernel. If we can come up with a scheme for that, we can as
> well do it in the kernel.
well.. I guess I don't see how it would be easier in the kernel than
in userspace ;-)
But if you can come up with a good way to add it to dev->dma_params
that everyone can agree on, I don't mind. I suppose I'm starting with
the assumption that it will be difficult/impossible to get everyone to
agree on this, but I don't mind being proved wrong.
>> > I would rather see kernel drivers negotiating those things at dmabuf
>> > attach time in way invisible to userspace. I agree that this negotiation
>> > thing isn't easy to get right for the plethora of different hardware
>> > constraints we see today, but I would rather see this in-kernel, where
>> > we have the chance to fix things up if needed, than in a fixed userspace
>> > interface.
>>
>> Well, if you can think of a sane way to add that to dev->dma_params,
>> and if it isn't visible if userspace mmap's a buffer, then we could
>> handle that in the kernel. But I don't think that will be the case.
>>
> I'm sure we can come up with something sane to put it in there. The
> userspace mmap thing is a bit complicated to deal with, but I have the
> feeling that going through a slow path if you really need mmap is a
> reasonable thing to do.
> For example we could just make mmap some form of attach that forces
> linear layout if the exporter isn't able to give userspace a linear
> mapping from a tiled buffer by using GART or VM.
I guess I don't object to mmap being a slow path (as long is it
doesn't end up requiring a slow path on hw where this otherwise
wouldn't be needed).
BR,
-R
> Regards,
> Lucas
> --
> Pengutronix e.K. | Lucas Stach |
> Industrial Linux Solutions | http://www.pengutronix.de/ |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
>
^ permalink raw reply
* Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Rob Clark @ 2013-08-06 14:40 UTC (permalink / raw)
To: Tom Cooksey
Cc: dri-devel, linux-fbdev, Pawel Moll, linux-arm-kernel, linux-media,
linaro-mm-sig, linux-kernel
In-Reply-To: <52010257.245fc20a.6ff8.1cfdSMTPIN_ADDED_BROKEN@mx.google.com>
On Tue, Aug 6, 2013 at 10:03 AM, Tom Cooksey <tom.cooksey@arm.com> wrote:
> Hi Rob,
>
>> >> > We may also then have additional constraints when sharing buffers
>> >> > between the display HW and video decode or even camera ISP HW.
>> >> > Programmatically describing buffer allocation constraints is very
>> >> > difficult and I'm not sure you can actually do it - there's some
>> >> > pretty complex constraints out there! E.g. I believe there's a
>> >> > platform where Y and UV planes of the reference frame need to be
>> >> > in separate DRAM banks for real-time 1080p decode, or something
>> >> > like that?
>> >>
>> >> yes, this was discussed. This is different from pitch/format/size
>> >> constraints.. it is really just a placement constraint (ie. where
>> >> do the physical pages go). IIRC the conclusion was to use a dummy
>> >> devices with it's own CMA pool for attaching the Y vs UV buffers.
>> >>
>> >> > Anyway, I guess my point is that even if we solve how to allocate
>> >> > buffers which will be shared between the GPU and display HW such
>> >> > that both sets of constraints are satisfied, that may not be the
>> >> > end of the story.
>> >> >
>> >>
>> >> that was part of the reason to punt this problem to userspace ;-)
>> >>
>> >> In practice, the kernel drivers doesn't usually know too much about
>> >> the dimensions/format/etc.. that is really userspace level
>> >> knowledge. There are a few exceptions when the kernel needs to know
>> >> how to setup GTT/etc for tiled buffers, but normally this sort of
>> >> information is up at the next level up (userspace, and
>> >> drm_framebuffer in case of scanout). Userspace media frameworks
>> >> like GStreamer already have a concept of format/caps negotiation.
>> >> For non-display<->gpu sharing, I think this is probably where this
>> >> sort of constraint negotiation should be handled.
>> >
>> > I agree that user-space will know which devices will access the
>> > buffer and thus can figure out at least a common pixel format.
>> > Though I'm not so sure userspace can figure out more low-level
>> > details like alignment and placement in physical memory, etc.
>> >
>>
>> well, let's divide things up into two categories:
>>
>> 1) the arrangement and format of pixels.. ie. what userspace would
>> need to know if it mmap's a buffer. This includes pixel format,
>> stride, etc. This should be negotiated in userspace, it would be
>> crazy to try to do this in the kernel.
>
> Absolutely. Pixel format has to be negotiated by user-space as in
> most cases, user-space can map the buffer and thus will need to
> know how to interpret the data.
>
>
>
>> 2) the physical placement of the pages. Ie. whether it is contiguous
>> or not. Which bank the pages in the buffer are placed in, etc. This
>> is not visible to userspace.
>
> Seems sensible to me.
>
>
>> ... This is the purpose of the attach step,
>> so you know all the devices involved in sharing up front before
>> allocating the backing pages. (Or in the worst case, if you have a
>> "late attacher" you at least know when no device is doing dma access
>> to a buffer and can reallocate and move the buffer.) A long time
>> back, I had a patch that added a field or two to 'struct
>> device_dma_parameters' so that it could be known if a device required
>> contiguous buffers.. looks like that never got merged, so I'd need to
>> dig that back up and resend it. But the idea was to have the 'struct
>> device' encapsulate all the information that would be needed to
>> do-the-right-thing when it comes to placement.
>
> As I understand it, it's up to the exporting device to allocate the
> memory backing the dma_buf buffer. I guess the latest possible point
> you can allocate the backing pages is when map_dma_buf is first
> called? At that point the exporter can iterate over the current set
> of attachments, programmatically determine the all the constraints of
> all the attached drivers and attempt to allocate the backing pages
> in such a way as to satisfy all those constraints?
yes, this is the idea.. possibly some room for some helpers to help
out with this, but that is all under the hood from userspace
perspective
> Didn't you say that programmatically describing device placement
> constraints was an unbounded problem? I guess we would have to
> accept that it's not possible to describe all possible constraints
> and instead find a way to describe the common ones?
well, the point I'm trying to make, is by dividing your constraints
into two groups, one that impacts and is handled by userspace, and one
that is in the kernel (ie. where the pages go), you cut down the
number of permutations that the kernel has to care about considerably.
And kernel already cares about, for example, what range of addresses
that a device can dma to/from. I think really the only thing missing
is the max # of sglist entries (contiguous or not)
> One problem with this is it duplicates a lot of logic in each
> driver which can export a dma_buf buffer. Each exporter will need to
> do pretty much the same thing: iterate over all the attachments,
> determine of all the constraints (assuming that can be done) and
> allocate pages such that the lowest-common-denominator is satisfied.
>
> Perhaps rather than duplicating that logic in every driver, we could
> Instead move allocation of the backing pages into dma_buf itself?
>
I tend to think it is better to add helpers as we see common patterns
emerge, which drivers can opt-in to using. I don't think that we
should move allocation into dma_buf itself, but it would perhaps be
useful to have dma_alloc_*() variants that could allocate for multiple
devices. That would help for simple stuff, although I'd suspect
eventually a GPU driver will move away from that. (Since you probably
want to play tricks w/ pools of pages that are pre-zero'd and in the
correct cache state, use spare cycles on the gpu or dma engine to
pre-zero uncached pages, and games like that.)
>
>> > Anyway, assuming user-space can figure out how a buffer should be
>> > stored in memory, how does it indicate this to a kernel driver and
>> > actually allocate it? Which ioctl on which device does user-space
>> > call, with what parameters? Are you suggesting using something like
>> > ION which exposes the low-level details of how buffers are laid out
>> in
>> > physical memory to userspace? If not, what?
>>
>> no, userspace should not need to know this. And having a central
>> driver that knows this for all the other drivers in the system doesn't
>> really solve anything and isn't really scalable. At best you might
>> want, in some cases, a flag you can pass when allocating. For
>> example, some of the drivers have a 'SCANOUT' flag that can be passed
>> when allocating a GEM buffer, as a hint to the kernel that 'if this hw
>> requires contig memory for scanout, allocate this buffer contig'. But
>> really, when it comes to sharing buffers between devices, we want this
>> sort of information in dev->dma_params of the importing device(s).
>
> If you had a single driver which knew the constraints of all devices
> on that particular SoC and the interface allowed user-space to specify
> which devices a buffer is intended to be used with, I guess it could
> pretty trivially allocate pages which satisfy those constraints? It
keep in mind, even a number of SoC's come with pcie these days. You
already have things like
https://developer.nvidia.com/content/kayla-platform
You probably want to get out of the SoC mindset, otherwise you are
going to make bad assumptions that come back to bite you later on.
> wouldn't need a way to programmatically describe the constraints
> either: As you say, if userspace sets the "SCANOUT" flag, it would
> just "know" that on this SoC, that buffer needs to be physically
> contiguous for example.
not really.. it just knows it wants to scanout the buffer, and tells
this as a hint to the kernel.
For example, on omapdrm, the SCANOUT flag does nothing on omap4+
(where phys contig is not required for scanout), but causes CMA
(dma_alloc_*()) to be used on omap3. Userspace doesn't care. It just
knows that it wants to be able to scanout that particular buffer.
> Though It would effectively mean you'd need an "allocation" driver per
> SoC, which as you say may not be scalable?
Right.. and not actually even possible in the general sense (see SoC +
external pcie gfx card)
BR,
-R
>
>
> Cheers,
>
> Tom
>
>
>
>
>
^ permalink raw reply
* Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Lucas Stach @ 2013-08-06 14:36 UTC (permalink / raw)
To: Rob Clark
Cc: Tom Cooksey, linux-fbdev, Pawel Moll, linux-kernel, dri-devel,
linaro-mm-sig, linux-arm-kernel, linux-media
In-Reply-To: <CAF6AEGvgqsS4Ht6xuACbywTorRiCwHcUaYPji9dRP6AM3Ag-ww@mail.gmail.com>
Am Dienstag, den 06.08.2013, 10:14 -0400 schrieb Rob Clark:
> On Tue, Aug 6, 2013 at 8:18 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
> > Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey:
> >> Hi Rob,
> >>
> >> +lkml
> >>
> >> > >> On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey <tom.cooksey@arm.com>
> >> > >> wrote:
> >> > >> >> > * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to
> >> > >> >> > also allocate buffers for the GPU. Still not sure how to
> >> > >> >> > resolve this as we don't use DRM for our GPU driver.
> >> > >> >>
> >> > >> >> any thoughts/plans about a DRM GPU driver? Ideally long term
> >> > >> >> (esp. once the dma-fence stuff is in place), we'd have
> >> > >> >> gpu-specific drm (gpu-only, no kms) driver, and SoC/display
> >> > >> >> specific drm/kms driver, using prime/dmabuf to share between
> >> > >> >> the two.
> >> > >> >
> >> > >> > The "extra" buffers we were allocating from armsoc DDX were really
> >> > >> > being allocated through DRM/GEM so we could get an flink name
> >> > >> > for them and pass a reference to them back to our GPU driver on
> >> > >> > the client side. If it weren't for our need to access those
> >> > >> > extra off-screen buffers with the GPU we wouldn't need to
> >> > >> > allocate them with DRM at all. So, given they are really "GPU"
> >> > >> > buffers, it does absolutely make sense to allocate them in a
> >> > >> > different driver to the display driver.
> >> > >> >
> >> > >> > However, to avoid unnecessary memcpys & related cache
> >> > >> > maintenance ops, we'd also like the GPU to render into buffers
> >> > >> > which are scanned out by the display controller. So let's say
> >> > >> > we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
> >> > >> > out buffers with the display's DRM driver but a custom ioctl
> >> > >> > on the GPU's DRM driver to allocate non scanout, off-screen
> >> > >> > buffers. Sounds great, but I don't think that really works
> >> > >> > with DRI2. If we used two drivers to allocate buffers, which
> >> > >> > of those drivers do we return in DRI2ConnectReply? Even if we
> >> > >> > solve that somehow, GEM flink names are name-spaced to a
> >> > >> > single device node (AFAIK). So when we do a DRI2GetBuffers,
> >> > >> > how does the EGL in the client know which DRM device owns GEM
> >> > >> > flink name "1234"? We'd need some pretty dirty hacks.
> >> > >>
> >> > >> You would return the name of the display driver allocating the
> >> > >> buffers. On the client side you can use generic ioctls to go from
> >> > >> flink -> handle -> dmabuf. So the client side would end up opening
> >> > >> both the display drm device and the gpu, but without needing to know
> >> > >> too much about the display.
> >> > >
> >> > > I think the bit I was missing was that a GEM bo for a buffer imported
> >> > > using dma_buf/PRIME can still be flink'd. So the display controller's
> >> > > DRM driver allocates scan-out buffers via the DUMB buffer allocate
> >> > > ioctl. Those scan-out buffers than then be exported from the
> >> > > dispaly's DRM driver and imported into the GPU's DRM driver using
> >> > > PRIME. Once imported into the GPU's driver, we can use flink to get a
> >> > > name for that buffer within the GPU DRM driver's name-space to return
> >> > > to the DRI2 client. That same namespace is also what DRI2 back-
> >> > > buffers are allocated from, so I think that could work... Except...
> >> >
> >> > (and.. the general direction is that things will move more to just use
> >> > dmabuf directly, ie. wayland or dri3)
> >>
> >> I agree, DRI2 is the only reason why we need a system-wide ID. I also
> >> prefer buffers to be passed around by dma_buf fd, but we still need to
> >> support DRI2 and will do for some time I expect.
> >>
> >>
> >>
> >> > >> > Anyway, that latter case also gets quite difficult. The "GPU"
> >> > >> > DRM driver would need to know the constraints of the display
> >> > >> > controller when allocating buffers intended to be scanned out.
> >> > >> > For example, pl111 typically isn't behind an IOMMU and so
> >> > >> > requires physically contiguous memory. We'd have to teach the
> >> > >> > GPU's DRM driver about the constraints of the display HW. Not
> >> > >> > exactly a clean driver model. :-(
> >> > >> >
> >> > >> > I'm still a little stuck on how to proceed, so any ideas
> >> > >> > would greatly appreciated! My current train of thought is
> >> > >> > having a kind of SoC-specific DRM driver which allocates
> >> > >> > buffers for both display and GPU within a single GEM
> >> > >> > namespace. That SoC-specific DRM driver could then know the
> >> > >> > constraints of both the GPU and the display HW. We could then
> >> > >> > use PRIME to export buffers allocated with the SoC DRM driver
> >> > >> > and import them into the GPU and/or display DRM driver.
> >> > >>
> >> > >> Usually if the display drm driver is allocating the buffers that
> >> > >> might be scanned out, it just needs to have minimal knowledge of
> >> > >> the GPU (pitch alignment constraints). I don't think we need a
> >> > >> 3rd device just to allocate buffers.
> >> > >
> >> > > While Mali can render to pretty much any buffer, there is a mild
> >> > > performance improvement to be had if the buffer stride is aligned to
> >> > > the AXI bus's max burst length when drawing to the buffer.
> >> >
> >> > I suspect the display controllers might frequently benefit if the
> >> > pitch is aligned to AXI burst length too..
> >>
> >> If the display controller is going to be reading from linear memory
> >> I don't think it will make much difference - you'll just get an extra
> >> 1-2 bus transactions per scanline. With a tile-based GPU like Mali,
> >> you get those extra transactions per _tile_ scan-line and as such,
> >> the overhead is more pronounced.
> >>
> >>
> >>
> >> > > So in some respects, there is a constraint on how buffers which will
> >> > > be drawn to using the GPU are allocated. I don't really like the idea
> >> > > of teaching the display controller DRM driver about the GPU buffer
> >> > > constraints, even if they are fairly trivial like this. If the same
> >> > > display HW IP is being used on several SoCs, it seems wrong somehow
> >> > > to enforce those GPU constraints if some of those SoCs don't have a
> >> > > GPU.
> >> >
> >> > Well, I suppose you could get min_pitch_alignment from devicetree, or
> >> > something like this..
> >> >
> >> > In the end, the easy solution is just to make the display allocate to
> >> > the worst-case pitch alignment. In the early days of dma-buf
> >> > discussions, we kicked around the idea of negotiating or
> >> > programatically describing the constraints, but that didn't really
> >> > seem like a bounded problem.
> >>
> >> Yeah - I was around for some of those discussions and agree it's not
> >> really an easy problem to solve.
> >>
> >>
> >>
> >> > > We may also then have additional constraints when sharing buffers
> >> > > between the display HW and video decode or even camera ISP HW.
> >> > > Programmatically describing buffer allocation constraints is very
> >> > > difficult and I'm not sure you can actually do it - there's some
> >> > > pretty complex constraints out there! E.g. I believe there's a
> >> > > platform where Y and UV planes of the reference frame need to be in
> >> > > separate DRAM banks for real-time 1080p decode, or something like
> >> > > that?
> >> >
> >> > yes, this was discussed. This is different from pitch/format/size
> >> > constraints.. it is really just a placement constraint (ie. where do
> >> > the physical pages go). IIRC the conclusion was to use a dummy
> >> > devices with it's own CMA pool for attaching the Y vs UV buffers.
> >> >
> >> > > Anyway, I guess my point is that even if we solve how to allocate
> >> > > buffers which will be shared between the GPU and display HW such that
> >> > > both sets of constraints are satisfied, that may not be the end of
> >> > > the story.
> >> > >
> >> >
> >> > that was part of the reason to punt this problem to userspace ;-)
> >> >
> >> > In practice, the kernel drivers doesn't usually know too much about
> >> > the dimensions/format/etc.. that is really userspace level knowledge.
> >> > There are a few exceptions when the kernel needs to know how to setup
> >> > GTT/etc for tiled buffers, but normally this sort of information is up
> >> > at the next level up (userspace, and drm_framebuffer in case of
> >> > scanout). Userspace media frameworks like GStreamer already have a
> >> > concept of format/caps negotiation. For non-display<->gpu sharing, I
> >> > think this is probably where this sort of constraint negotiation
> >> > should be handled.
> >>
> >> I agree that user-space will know which devices will access the buffer
> >> and thus can figure out at least a common pixel format. Though I'm not
> >> so sure userspace can figure out more low-level details like alignment
> >> and placement in physical memory, etc.
> >>
> >> Anyway, assuming user-space can figure out how a buffer should be
> >> stored in memory, how does it indicate this to a kernel driver and
> >> actually allocate it? Which ioctl on which device does user-space
> >> call, with what parameters? Are you suggesting using something like
> >> ION which exposes the low-level details of how buffers are laid out in
> >> physical memory to userspace? If not, what?
> >>
> >
> > I strongly disagree with exposing low-level hardware details like tiling
> > to userspace. If we have to do the negotiation of those things in
> > userspace we will end up with having to pipe those information through
> > things like the wayland protocol. I don't see how this could ever be
> > considered a good idea.
>
> well, unless userspace mmap's via a de-tiling gart type thing, I don't
> think tiling can be invisible to userspace.
>
Why is mmap considered to be such a strong use-case for DMABUFs? After
all we are trying to _avoid_ mmapping shared buffers where ever
possible.
> But if two GPU's have some overlap in supportable tiled formats, and
> you have one gpu doing app and one doing compositor, and you want to
> use tiling for the shared buffers, you need something in the wayland
> protocol to figure out what the common supported formats are between
> the two sides. I suppose it shouldn't be too hard to add a
> standardized (cross-driver) format-negotiation protocol, and in the
> absence of that fallback to non tiled format for shared buffers.
>
I don't see how tiling format negotiation would be easier in userspace
than in the kernel. If we can come up with a scheme for that, we can as
well do it in the kernel.
> > I would rather see kernel drivers negotiating those things at dmabuf
> > attach time in way invisible to userspace. I agree that this negotiation
> > thing isn't easy to get right for the plethora of different hardware
> > constraints we see today, but I would rather see this in-kernel, where
> > we have the chance to fix things up if needed, than in a fixed userspace
> > interface.
>
> Well, if you can think of a sane way to add that to dev->dma_params,
> and if it isn't visible if userspace mmap's a buffer, then we could
> handle that in the kernel. But I don't think that will be the case.
>
I'm sure we can come up with something sane to put it in there. The
userspace mmap thing is a bit complicated to deal with, but I have the
feeling that going through a slow path if you really need mmap is a
reasonable thing to do.
For example we could just make mmap some form of attach that forces
linear layout if the exporter isn't able to give userspace a linear
mapping from a tiled buffer by using GART or VM.
Regards,
Lucas
--
Pengutronix e.K. | Lucas Stach |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
^ permalink raw reply
* Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Rob Clark @ 2013-08-06 14:14 UTC (permalink / raw)
To: Lucas Stach
Cc: Tom Cooksey, linux-fbdev, Pawel Moll, linux-kernel, dri-devel,
linaro-mm-sig, linux-arm-kernel, linux-media
In-Reply-To: <1375791502.4276.38.camel@weser.hi.pengutronix.de>
On Tue, Aug 6, 2013 at 8:18 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
> Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey:
>> Hi Rob,
>>
>> +lkml
>>
>> > >> On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey <tom.cooksey@arm.com>
>> > >> wrote:
>> > >> >> > * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to
>> > >> >> > also allocate buffers for the GPU. Still not sure how to
>> > >> >> > resolve this as we don't use DRM for our GPU driver.
>> > >> >>
>> > >> >> any thoughts/plans about a DRM GPU driver? Ideally long term
>> > >> >> (esp. once the dma-fence stuff is in place), we'd have
>> > >> >> gpu-specific drm (gpu-only, no kms) driver, and SoC/display
>> > >> >> specific drm/kms driver, using prime/dmabuf to share between
>> > >> >> the two.
>> > >> >
>> > >> > The "extra" buffers we were allocating from armsoc DDX were really
>> > >> > being allocated through DRM/GEM so we could get an flink name
>> > >> > for them and pass a reference to them back to our GPU driver on
>> > >> > the client side. If it weren't for our need to access those
>> > >> > extra off-screen buffers with the GPU we wouldn't need to
>> > >> > allocate them with DRM at all. So, given they are really "GPU"
>> > >> > buffers, it does absolutely make sense to allocate them in a
>> > >> > different driver to the display driver.
>> > >> >
>> > >> > However, to avoid unnecessary memcpys & related cache
>> > >> > maintenance ops, we'd also like the GPU to render into buffers
>> > >> > which are scanned out by the display controller. So let's say
>> > >> > we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
>> > >> > out buffers with the display's DRM driver but a custom ioctl
>> > >> > on the GPU's DRM driver to allocate non scanout, off-screen
>> > >> > buffers. Sounds great, but I don't think that really works
>> > >> > with DRI2. If we used two drivers to allocate buffers, which
>> > >> > of those drivers do we return in DRI2ConnectReply? Even if we
>> > >> > solve that somehow, GEM flink names are name-spaced to a
>> > >> > single device node (AFAIK). So when we do a DRI2GetBuffers,
>> > >> > how does the EGL in the client know which DRM device owns GEM
>> > >> > flink name "1234"? We'd need some pretty dirty hacks.
>> > >>
>> > >> You would return the name of the display driver allocating the
>> > >> buffers. On the client side you can use generic ioctls to go from
>> > >> flink -> handle -> dmabuf. So the client side would end up opening
>> > >> both the display drm device and the gpu, but without needing to know
>> > >> too much about the display.
>> > >
>> > > I think the bit I was missing was that a GEM bo for a buffer imported
>> > > using dma_buf/PRIME can still be flink'd. So the display controller's
>> > > DRM driver allocates scan-out buffers via the DUMB buffer allocate
>> > > ioctl. Those scan-out buffers than then be exported from the
>> > > dispaly's DRM driver and imported into the GPU's DRM driver using
>> > > PRIME. Once imported into the GPU's driver, we can use flink to get a
>> > > name for that buffer within the GPU DRM driver's name-space to return
>> > > to the DRI2 client. That same namespace is also what DRI2 back-
>> > > buffers are allocated from, so I think that could work... Except...
>> >
>> > (and.. the general direction is that things will move more to just use
>> > dmabuf directly, ie. wayland or dri3)
>>
>> I agree, DRI2 is the only reason why we need a system-wide ID. I also
>> prefer buffers to be passed around by dma_buf fd, but we still need to
>> support DRI2 and will do for some time I expect.
>>
>>
>>
>> > >> > Anyway, that latter case also gets quite difficult. The "GPU"
>> > >> > DRM driver would need to know the constraints of the display
>> > >> > controller when allocating buffers intended to be scanned out.
>> > >> > For example, pl111 typically isn't behind an IOMMU and so
>> > >> > requires physically contiguous memory. We'd have to teach the
>> > >> > GPU's DRM driver about the constraints of the display HW. Not
>> > >> > exactly a clean driver model. :-(
>> > >> >
>> > >> > I'm still a little stuck on how to proceed, so any ideas
>> > >> > would greatly appreciated! My current train of thought is
>> > >> > having a kind of SoC-specific DRM driver which allocates
>> > >> > buffers for both display and GPU within a single GEM
>> > >> > namespace. That SoC-specific DRM driver could then know the
>> > >> > constraints of both the GPU and the display HW. We could then
>> > >> > use PRIME to export buffers allocated with the SoC DRM driver
>> > >> > and import them into the GPU and/or display DRM driver.
>> > >>
>> > >> Usually if the display drm driver is allocating the buffers that
>> > >> might be scanned out, it just needs to have minimal knowledge of
>> > >> the GPU (pitch alignment constraints). I don't think we need a
>> > >> 3rd device just to allocate buffers.
>> > >
>> > > While Mali can render to pretty much any buffer, there is a mild
>> > > performance improvement to be had if the buffer stride is aligned to
>> > > the AXI bus's max burst length when drawing to the buffer.
>> >
>> > I suspect the display controllers might frequently benefit if the
>> > pitch is aligned to AXI burst length too..
>>
>> If the display controller is going to be reading from linear memory
>> I don't think it will make much difference - you'll just get an extra
>> 1-2 bus transactions per scanline. With a tile-based GPU like Mali,
>> you get those extra transactions per _tile_ scan-line and as such,
>> the overhead is more pronounced.
>>
>>
>>
>> > > So in some respects, there is a constraint on how buffers which will
>> > > be drawn to using the GPU are allocated. I don't really like the idea
>> > > of teaching the display controller DRM driver about the GPU buffer
>> > > constraints, even if they are fairly trivial like this. If the same
>> > > display HW IP is being used on several SoCs, it seems wrong somehow
>> > > to enforce those GPU constraints if some of those SoCs don't have a
>> > > GPU.
>> >
>> > Well, I suppose you could get min_pitch_alignment from devicetree, or
>> > something like this..
>> >
>> > In the end, the easy solution is just to make the display allocate to
>> > the worst-case pitch alignment. In the early days of dma-buf
>> > discussions, we kicked around the idea of negotiating or
>> > programatically describing the constraints, but that didn't really
>> > seem like a bounded problem.
>>
>> Yeah - I was around for some of those discussions and agree it's not
>> really an easy problem to solve.
>>
>>
>>
>> > > We may also then have additional constraints when sharing buffers
>> > > between the display HW and video decode or even camera ISP HW.
>> > > Programmatically describing buffer allocation constraints is very
>> > > difficult and I'm not sure you can actually do it - there's some
>> > > pretty complex constraints out there! E.g. I believe there's a
>> > > platform where Y and UV planes of the reference frame need to be in
>> > > separate DRAM banks for real-time 1080p decode, or something like
>> > > that?
>> >
>> > yes, this was discussed. This is different from pitch/format/size
>> > constraints.. it is really just a placement constraint (ie. where do
>> > the physical pages go). IIRC the conclusion was to use a dummy
>> > devices with it's own CMA pool for attaching the Y vs UV buffers.
>> >
>> > > Anyway, I guess my point is that even if we solve how to allocate
>> > > buffers which will be shared between the GPU and display HW such that
>> > > both sets of constraints are satisfied, that may not be the end of
>> > > the story.
>> > >
>> >
>> > that was part of the reason to punt this problem to userspace ;-)
>> >
>> > In practice, the kernel drivers doesn't usually know too much about
>> > the dimensions/format/etc.. that is really userspace level knowledge.
>> > There are a few exceptions when the kernel needs to know how to setup
>> > GTT/etc for tiled buffers, but normally this sort of information is up
>> > at the next level up (userspace, and drm_framebuffer in case of
>> > scanout). Userspace media frameworks like GStreamer already have a
>> > concept of format/caps negotiation. For non-display<->gpu sharing, I
>> > think this is probably where this sort of constraint negotiation
>> > should be handled.
>>
>> I agree that user-space will know which devices will access the buffer
>> and thus can figure out at least a common pixel format. Though I'm not
>> so sure userspace can figure out more low-level details like alignment
>> and placement in physical memory, etc.
>>
>> Anyway, assuming user-space can figure out how a buffer should be
>> stored in memory, how does it indicate this to a kernel driver and
>> actually allocate it? Which ioctl on which device does user-space
>> call, with what parameters? Are you suggesting using something like
>> ION which exposes the low-level details of how buffers are laid out in
>> physical memory to userspace? If not, what?
>>
>
> I strongly disagree with exposing low-level hardware details like tiling
> to userspace. If we have to do the negotiation of those things in
> userspace we will end up with having to pipe those information through
> things like the wayland protocol. I don't see how this could ever be
> considered a good idea.
well, unless userspace mmap's via a de-tiling gart type thing, I don't
think tiling can be invisible to userspace.
But if two GPU's have some overlap in supportable tiled formats, and
you have one gpu doing app and one doing compositor, and you want to
use tiling for the shared buffers, you need something in the wayland
protocol to figure out what the common supported formats are between
the two sides. I suppose it shouldn't be too hard to add a
standardized (cross-driver) format-negotiation protocol, and in the
absence of that fallback to non tiled format for shared buffers.
> I would rather see kernel drivers negotiating those things at dmabuf
> attach time in way invisible to userspace. I agree that this negotiation
> thing isn't easy to get right for the plethora of different hardware
> constraints we see today, but I would rather see this in-kernel, where
> we have the chance to fix things up if needed, than in a fixed userspace
> interface.
Well, if you can think of a sane way to add that to dev->dma_params,
and if it isn't visible if userspace mmap's a buffer, then we could
handle that in the kernel. But I don't think that will be the case.
BR,
-R
>
> Regards,
> Lucas
> --
> Pengutronix e.K. | Lucas Stach |
> Industrial Linux Solutions | http://www.pengutronix.de/ |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
>
^ permalink raw reply
* Re: [Linaro-mm-sig] [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Lucas Stach @ 2013-08-06 12:18 UTC (permalink / raw)
To: Tom Cooksey
Cc: 'Rob Clark', linux-fbdev, Pawel Moll, linux-kernel,
dri-devel, linaro-mm-sig, linux-arm-kernel, linux-media
In-Reply-To: <000101ce9298$8ce44ee0$a6aceca0$@cooksey@arm.com>
Am Dienstag, den 06.08.2013, 12:31 +0100 schrieb Tom Cooksey:
> Hi Rob,
>
> +lkml
>
> > >> On Fri, Jul 26, 2013 at 11:58 AM, Tom Cooksey <tom.cooksey@arm.com>
> > >> wrote:
> > >> >> > * It abuses flags parameter of DRM_IOCTL_MODE_CREATE_DUMB to
> > >> >> > also allocate buffers for the GPU. Still not sure how to
> > >> >> > resolve this as we don't use DRM for our GPU driver.
> > >> >>
> > >> >> any thoughts/plans about a DRM GPU driver? Ideally long term
> > >> >> (esp. once the dma-fence stuff is in place), we'd have
> > >> >> gpu-specific drm (gpu-only, no kms) driver, and SoC/display
> > >> >> specific drm/kms driver, using prime/dmabuf to share between
> > >> >> the two.
> > >> >
> > >> > The "extra" buffers we were allocating from armsoc DDX were really
> > >> > being allocated through DRM/GEM so we could get an flink name
> > >> > for them and pass a reference to them back to our GPU driver on
> > >> > the client side. If it weren't for our need to access those
> > >> > extra off-screen buffers with the GPU we wouldn't need to
> > >> > allocate them with DRM at all. So, given they are really "GPU"
> > >> > buffers, it does absolutely make sense to allocate them in a
> > >> > different driver to the display driver.
> > >> >
> > >> > However, to avoid unnecessary memcpys & related cache
> > >> > maintenance ops, we'd also like the GPU to render into buffers
> > >> > which are scanned out by the display controller. So let's say
> > >> > we continue using DRM_IOCTL_MODE_CREATE_DUMB to allocate scan
> > >> > out buffers with the display's DRM driver but a custom ioctl
> > >> > on the GPU's DRM driver to allocate non scanout, off-screen
> > >> > buffers. Sounds great, but I don't think that really works
> > >> > with DRI2. If we used two drivers to allocate buffers, which
> > >> > of those drivers do we return in DRI2ConnectReply? Even if we
> > >> > solve that somehow, GEM flink names are name-spaced to a
> > >> > single device node (AFAIK). So when we do a DRI2GetBuffers,
> > >> > how does the EGL in the client know which DRM device owns GEM
> > >> > flink name "1234"? We'd need some pretty dirty hacks.
> > >>
> > >> You would return the name of the display driver allocating the
> > >> buffers. On the client side you can use generic ioctls to go from
> > >> flink -> handle -> dmabuf. So the client side would end up opening
> > >> both the display drm device and the gpu, but without needing to know
> > >> too much about the display.
> > >
> > > I think the bit I was missing was that a GEM bo for a buffer imported
> > > using dma_buf/PRIME can still be flink'd. So the display controller's
> > > DRM driver allocates scan-out buffers via the DUMB buffer allocate
> > > ioctl. Those scan-out buffers than then be exported from the
> > > dispaly's DRM driver and imported into the GPU's DRM driver using
> > > PRIME. Once imported into the GPU's driver, we can use flink to get a
> > > name for that buffer within the GPU DRM driver's name-space to return
> > > to the DRI2 client. That same namespace is also what DRI2 back-
> > > buffers are allocated from, so I think that could work... Except...
> >
> > (and.. the general direction is that things will move more to just use
> > dmabuf directly, ie. wayland or dri3)
>
> I agree, DRI2 is the only reason why we need a system-wide ID. I also
> prefer buffers to be passed around by dma_buf fd, but we still need to
> support DRI2 and will do for some time I expect.
>
>
>
> > >> > Anyway, that latter case also gets quite difficult. The "GPU"
> > >> > DRM driver would need to know the constraints of the display
> > >> > controller when allocating buffers intended to be scanned out.
> > >> > For example, pl111 typically isn't behind an IOMMU and so
> > >> > requires physically contiguous memory. We'd have to teach the
> > >> > GPU's DRM driver about the constraints of the display HW. Not
> > >> > exactly a clean driver model. :-(
> > >> >
> > >> > I'm still a little stuck on how to proceed, so any ideas
> > >> > would greatly appreciated! My current train of thought is
> > >> > having a kind of SoC-specific DRM driver which allocates
> > >> > buffers for both display and GPU within a single GEM
> > >> > namespace. That SoC-specific DRM driver could then know the
> > >> > constraints of both the GPU and the display HW. We could then
> > >> > use PRIME to export buffers allocated with the SoC DRM driver
> > >> > and import them into the GPU and/or display DRM driver.
> > >>
> > >> Usually if the display drm driver is allocating the buffers that
> > >> might be scanned out, it just needs to have minimal knowledge of
> > >> the GPU (pitch alignment constraints). I don't think we need a
> > >> 3rd device just to allocate buffers.
> > >
> > > While Mali can render to pretty much any buffer, there is a mild
> > > performance improvement to be had if the buffer stride is aligned to
> > > the AXI bus's max burst length when drawing to the buffer.
> >
> > I suspect the display controllers might frequently benefit if the
> > pitch is aligned to AXI burst length too..
>
> If the display controller is going to be reading from linear memory
> I don't think it will make much difference - you'll just get an extra
> 1-2 bus transactions per scanline. With a tile-based GPU like Mali,
> you get those extra transactions per _tile_ scan-line and as such,
> the overhead is more pronounced.
>
>
>
> > > So in some respects, there is a constraint on how buffers which will
> > > be drawn to using the GPU are allocated. I don't really like the idea
> > > of teaching the display controller DRM driver about the GPU buffer
> > > constraints, even if they are fairly trivial like this. If the same
> > > display HW IP is being used on several SoCs, it seems wrong somehow
> > > to enforce those GPU constraints if some of those SoCs don't have a
> > > GPU.
> >
> > Well, I suppose you could get min_pitch_alignment from devicetree, or
> > something like this..
> >
> > In the end, the easy solution is just to make the display allocate to
> > the worst-case pitch alignment. In the early days of dma-buf
> > discussions, we kicked around the idea of negotiating or
> > programatically describing the constraints, but that didn't really
> > seem like a bounded problem.
>
> Yeah - I was around for some of those discussions and agree it's not
> really an easy problem to solve.
>
>
>
> > > We may also then have additional constraints when sharing buffers
> > > between the display HW and video decode or even camera ISP HW.
> > > Programmatically describing buffer allocation constraints is very
> > > difficult and I'm not sure you can actually do it - there's some
> > > pretty complex constraints out there! E.g. I believe there's a
> > > platform where Y and UV planes of the reference frame need to be in
> > > separate DRAM banks for real-time 1080p decode, or something like
> > > that?
> >
> > yes, this was discussed. This is different from pitch/format/size
> > constraints.. it is really just a placement constraint (ie. where do
> > the physical pages go). IIRC the conclusion was to use a dummy
> > devices with it's own CMA pool for attaching the Y vs UV buffers.
> >
> > > Anyway, I guess my point is that even if we solve how to allocate
> > > buffers which will be shared between the GPU and display HW such that
> > > both sets of constraints are satisfied, that may not be the end of
> > > the story.
> > >
> >
> > that was part of the reason to punt this problem to userspace ;-)
> >
> > In practice, the kernel drivers doesn't usually know too much about
> > the dimensions/format/etc.. that is really userspace level knowledge.
> > There are a few exceptions when the kernel needs to know how to setup
> > GTT/etc for tiled buffers, but normally this sort of information is up
> > at the next level up (userspace, and drm_framebuffer in case of
> > scanout). Userspace media frameworks like GStreamer already have a
> > concept of format/caps negotiation. For non-display<->gpu sharing, I
> > think this is probably where this sort of constraint negotiation
> > should be handled.
>
> I agree that user-space will know which devices will access the buffer
> and thus can figure out at least a common pixel format. Though I'm not
> so sure userspace can figure out more low-level details like alignment
> and placement in physical memory, etc.
>
> Anyway, assuming user-space can figure out how a buffer should be
> stored in memory, how does it indicate this to a kernel driver and
> actually allocate it? Which ioctl on which device does user-space
> call, with what parameters? Are you suggesting using something like
> ION which exposes the low-level details of how buffers are laid out in
> physical memory to userspace? If not, what?
>
I strongly disagree with exposing low-level hardware details like tiling
to userspace. If we have to do the negotiation of those things in
userspace we will end up with having to pipe those information through
things like the wayland protocol. I don't see how this could ever be
considered a good idea.
I would rather see kernel drivers negotiating those things at dmabuf
attach time in way invisible to userspace. I agree that this negotiation
thing isn't easy to get right for the plethora of different hardware
constraints we see today, but I would rather see this in-kernel, where
we have the chance to fix things up if needed, than in a fixed userspace
interface.
Regards,
Lucas
--
Pengutronix e.K. | Lucas Stach |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
^ permalink raw reply
* Re: [RFC 0/1] drm/pl111: Initial drm/kms driver for pl111
From: Rob Clark @ 2013-08-06 12:15 UTC (permalink / raw)
To: Tom Cooksey
Cc: dri-devel, linux-fbdev, Pawel Moll, linux-arm-kernel, linux-media,
linaro-mm-sig, linux-kernel
In-Reply-To: <5200deb3.0b24b40a.3b26.ffffbadeSMTPIN_ADDED_BROKEN@mx.google.com>
On Tue, Aug 6, 2013 at 7:31 AM, Tom Cooksey <tom.cooksey@arm.com> wrote:
>
>> > So in some respects, there is a constraint on how buffers which will
>> > be drawn to using the GPU are allocated. I don't really like the idea
>> > of teaching the display controller DRM driver about the GPU buffer
>> > constraints, even if they are fairly trivial like this. If the same
>> > display HW IP is being used on several SoCs, it seems wrong somehow
>> > to enforce those GPU constraints if some of those SoCs don't have a
>> > GPU.
>>
>> Well, I suppose you could get min_pitch_alignment from devicetree, or
>> something like this..
>>
>> In the end, the easy solution is just to make the display allocate to
>> the worst-case pitch alignment. In the early days of dma-buf
>> discussions, we kicked around the idea of negotiating or
>> programatically describing the constraints, but that didn't really
>> seem like a bounded problem.
>
> Yeah - I was around for some of those discussions and agree it's not
> really an easy problem to solve.
>
>
>
>> > We may also then have additional constraints when sharing buffers
>> > between the display HW and video decode or even camera ISP HW.
>> > Programmatically describing buffer allocation constraints is very
>> > difficult and I'm not sure you can actually do it - there's some
>> > pretty complex constraints out there! E.g. I believe there's a
>> > platform where Y and UV planes of the reference frame need to be in
>> > separate DRAM banks for real-time 1080p decode, or something like
>> > that?
>>
>> yes, this was discussed. This is different from pitch/format/size
>> constraints.. it is really just a placement constraint (ie. where do
>> the physical pages go). IIRC the conclusion was to use a dummy
>> devices with it's own CMA pool for attaching the Y vs UV buffers.
>>
>> > Anyway, I guess my point is that even if we solve how to allocate
>> > buffers which will be shared between the GPU and display HW such that
>> > both sets of constraints are satisfied, that may not be the end of
>> > the story.
>> >
>>
>> that was part of the reason to punt this problem to userspace ;-)
>>
>> In practice, the kernel drivers doesn't usually know too much about
>> the dimensions/format/etc.. that is really userspace level knowledge.
>> There are a few exceptions when the kernel needs to know how to setup
>> GTT/etc for tiled buffers, but normally this sort of information is up
>> at the next level up (userspace, and drm_framebuffer in case of
>> scanout). Userspace media frameworks like GStreamer already have a
>> concept of format/caps negotiation. For non-display<->gpu sharing, I
>> think this is probably where this sort of constraint negotiation
>> should be handled.
>
> I agree that user-space will know which devices will access the buffer
> and thus can figure out at least a common pixel format. Though I'm not
> so sure userspace can figure out more low-level details like alignment
> and placement in physical memory, etc.
well, let's divide things up into two categories:
1) the arrangement and format of pixels.. ie. what userspace would
need to know if it mmap's a buffer. This includes pixel format,
stride, etc. This should be negotiated in userspace, it would be
crazy to try to do this in the kernel.
2) the physical placement of the pages. Ie. whether it is contiguous
or not. Which bank the pages in the buffer are placed in, etc. This
is not visible to userspace. This is the purpose of the attach step,
so you know all the devices involved in sharing up front before
allocating the backing pages. (Or in the worst case, if you have a
"late attacher" you at least know when no device is doing dma access
to a buffer and can reallocate and move the buffer.) A long time
back, I had a patch that added a field or two to 'struct
device_dma_parameters' so that it could be known if a device required
contiguous buffers.. looks like that never got merged, so I'd need to
dig that back up and resend it. But the idea was to have the 'struct
device' encapsulate all the information that would be needed to
do-the-right-thing when it comes to placement.
> Anyway, assuming user-space can figure out how a buffer should be
> stored in memory, how does it indicate this to a kernel driver and
> actually allocate it? Which ioctl on which device does user-space
> call, with what parameters? Are you suggesting using something like
> ION which exposes the low-level details of how buffers are laid out in
> physical memory to userspace? If not, what?
no, userspace should not need to know this. And having a central
driver that knows this for all the other drivers in the system doesn't
really solve anything and isn't really scalable. At best you might
want, in some cases, a flag you can pass when allocating. For
example, some of the drivers have a 'SCANOUT' flag that can be passed
when allocating a GEM buffer, as a hint to the kernel that 'if this hw
requires contig memory for scanout, allocate this buffer contig'. But
really, when it comes to sharing buffers between devices, we want this
sort of information in dev->dma_params of the importing device(s).
BR,
-R
^ permalink raw reply
* Re: [PATCH 4/5] fb: Add DCU framebuffer driver for Vybrid VF610 platform
From: Lucas Stach @ 2013-08-06 8:47 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1373609276-14566-5-git-send-email-b18965@freescale.com>
Am Freitag, den 12.07.2013, 14:07 +0800 schrieb Alison Wang:
> The Display Controller Unit (DCU) module is a system master that
> fetches graphics stored in internal or external memory and displays
> them on a TFT LCD panel. A wide range of panel sizes is supported
> and the timing of the interface signals is highly configurable.
> Graphics are read directly from memory and then blended in real-time,
> which allows for dynamic content creation with minimal CPU intervention.
>
> The features:
>
> (1) Full RGB888 output to TFT LCD panel.
> (2) For the current LCD panel, WQVGA "480x272" is tested.
> (3) Blending of each pixel using up to 4 source layers dependent on size of panel.
> (4) Each graphic layer can be placed with one pixel resolution in either axis.
> (5) Each graphic layer support RGB565 and RGB888 direct colors without alpha channel
> and BGRA8888 direct colors with an alpha channel.
> (6) Each graphic layer support alpha blending with 8-bit resolution.
>
> This driver has been tested on Vybrid VF610 TOWER board.
>
> Signed-off-by: Alison Wang <b18965@freescale.com>
> ---
> Changes in v2: None
>
> drivers/video/Kconfig | 9 +
> drivers/video/Makefile | 1 +
> drivers/video/fsl-dcu-fb.c | 1091 ++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 1101 insertions(+)
> create mode 100644 drivers/video/fsl-dcu-fb.c
>
[...]
> +
> +static struct fb_videomode dcu_mode_db[] = {
> + {
> + .name = "480x272",
> + .refresh = 75,
> + .xres = 480,
> + .yres = 272,
> + .pixclock = 91996,
> + .left_margin = 2,
> + .right_margin = 2,
> + .upper_margin = 1,
> + .lower_margin = 1,
> + .hsync_len = 41,
> + .vsync_len = 2,
> + .sync = FB_SYNC_COMP_HIGH_ACT | FB_SYNC_VERT_HIGH_ACT,
> + .vmode = FB_VMODE_NONINTERLACED,
> + },
> +};
Don't hardcode panel data in the driver. Use the videomode helpers to
get the relevant data from devicetree.
> +
> +/* DCU framebuffer data structure */
> +struct dcu_fb_data {
> + struct fb_info *fsl_dcu_info[DCU_LAYER_NUM];
> + void __iomem *reg_base;
> + unsigned int irq;
> + struct clk *clk;
> +};
> +
> +struct layer_display_offset {
> + int x_layer_d;
> + int y_layer_d;
> +};
> +
> +struct mfb_info {
> + int index;
> + char *id;
> + unsigned long pseudo_palette[16];
> + unsigned char alpha;
> + unsigned char blend;
> + unsigned int count;
> + int x_layer_d; /* layer display x offset to physical screen */
> + int y_layer_d; /* layer display y offset to physical screen */
> + struct dcu_fb_data *parent;
> +};
> +
> +enum mfb_index {
> + LAYER0 = 0,
> + LAYER1,
> + LAYER2,
> + LAYER3,
> +};
Why are there only 4 layers here? I thought the controller supports at
least 6 simultaneous layers?
> +
> +static struct mfb_info mfb_template[] = {
> + {
> + .index = LAYER0,
> + .id = "Layer0",
> + .alpha = 0xff,
> + .blend = 0,
> + .count = 0,
> + .x_layer_d = 0,
> + .y_layer_d = 0,
> + },
> + {
> + .index = LAYER1,
> + .id = "Layer1",
> + .alpha = 0xff,
> + .blend = 0,
> + .count = 0,
> + .x_layer_d = 50,
> + .y_layer_d = 50,
> + },
> + {
> + .index = LAYER2,
> + .id = "Layer2",
> + .alpha = 0xff,
> + .blend = 0,
> + .count = 0,
> + .x_layer_d = 100,
> + .y_layer_d = 100,
> + },
> + {
> + .index = LAYER3,
> + .id = "Layer3",
> + .alpha = 0xff,
> + .blend = 0,
> + .count = 0,
> + .x_layer_d = 150,
> + .y_layer_d = 150,
> + },
> +};
[...]
> +
> +static int calc_div_ratio(struct fb_info *info)
> +{
> + struct mfb_info *mfbi = info->par;
> + struct dcu_fb_data *dcufb = mfbi->parent;
> + unsigned long dcu_clk;
> + unsigned long long tmp;
> +
> + dcu_clk = clk_get_rate(dcufb->clk);
> + tmp = info->var.pixclock * (unsigned long long)dcu_clk;
> +
> + do_div(tmp, 1000000);
> +
> + if (do_div(tmp, 1000000) > 500000)
> + tmp++;
Urgh, you are changing the value of tmp inside the if clause. This isn't
nice. This function as a a whole looks really odd.
> +
> + tmp = tmp - 1;
> + return tmp;
> +}
> +
> +static void update_controller(struct fb_info *info)
> +{
> + struct fb_var_screeninfo *var = &info->var;
> + struct mfb_info *mfbi = info->par;
> + struct dcu_fb_data *dcufb = mfbi->parent;
> + unsigned int ratio;
> +
> + ratio = calc_div_ratio(info);
> + writel(ratio, dcufb->reg_base + DCU_DIV_RATIO);
> +
> + writel(DCU_DISP_SIZE_DELTA_Y(var->yres) |
> + DCU_DISP_SIZE_DELTA_X(var->xres / 16),
> + dcufb->reg_base + DCU_DISP_SIZE);
> +
> + /* Horizontal and vertical sync parameter */
> + writel(DCU_HSYN_PARA_BP(var->left_margin) |
> + DCU_HSYN_PARA_PW(var->hsync_len) |
> + DCU_HSYN_PARA_FP(var->right_margin),
> + dcufb->reg_base + DCU_HSYN_PARA);
> +
> + writel(DCU_VSYN_PARA_BP(var->upper_margin) |
> + DCU_VSYN_PARA_PW(var->vsync_len) |
> + DCU_VSYN_PARA_FP(var->lower_margin),
> + dcufb->reg_base + DCU_VSYN_PARA);
> +
> + writel(DCU_SYN_POL_INV_PXCK_FALL | DCU_SYN_POL_NEG_REMAIN |
> + DCU_SYN_POL_INV_VS_LOW | DCU_SYN_POL_INV_HS_LOW,
> + dcufb->reg_base + DCU_SYN_POL);
> +
> + writel(DCU_BGND_R(0) | DCU_BGND_G(0) | DCU_BGND_B(0),
> + dcufb->reg_base + DCU_BGND);
> +
> + writel(DCU_MODE_BLEND_ITER(DCU_LAYER_NUM_MAX) | DCU_MODE_RASTER_EN,
> + dcufb->reg_base + DCU_DCU_MODE);
> +
> + writel(DCU_THRESHOLD_LS_BF_VS(0x3) | DCU_THRESHOLD_OUT_BUF_HIGH(0x78) |
> + DCU_THRESHOLD_OUT_BUF_LOW(0), dcufb->reg_base + DCU_THRESHOLD);
> +
> + enable_controller(info);
> +}
> +
> +static int map_video_memory(struct fb_info *info)
> +{
> + u32 smem_len = info->fix.line_length * info->var.yres_virtual;
> +
> + info->fix.smem_len = smem_len;
> +
> + info->screen_base = dma_alloc_coherent(info->device, info->fix.smem_len,
You are setting up an uncached mapping here. Use a writecombined mapping
to help performance. It's still coherent, but allows at least write to
be fast.
> + (dma_addr_t *)&info->fix.smem_start, GFP_KERNEL);
> + if (!info->screen_base) {
> + printk(KERN_ERR "unable to allocate fb memory\n");
> + return -ENOMEM;
> + }
> +
> + memset(info->screen_base, 0, info->fix.smem_len);
> +
> + return 0;
> +}
> +
> +static void unmap_video_memory(struct fb_info *info)
> +{
> + if (!info->screen_base)
> + return;
> +
> + dma_free_coherent(info->device, info->fix.smem_len,
> + info->screen_base, info->fix.smem_start);
> +
> + info->screen_base = NULL;
> + info->fix.smem_start = 0;
> + info->fix.smem_len = 0;
> +}
[...]
> +
> +static int fsl_dcu_open(struct fb_info *info, int user)
> +{
> + struct mfb_info *mfbi = info->par;
> + int ret = 0;
> +
> + mfbi->index = info->node;
> +
> + mfbi->count++;
> + if (mfbi->count = 1) {
> + fsl_dcu_check_var(&info->var, info);
> + ret = fsl_dcu_set_par(info);
> + if (ret < 0)
> + mfbi->count--;
> + else
> + enable_interrupts(mfbi->parent);
> + }
> +
> + return ret;
> +}
> +
> +static int fsl_dcu_release(struct fb_info *info, int user)
> +{
> + struct mfb_info *mfbi = info->par;
> + int ret = 0;
> +
> + mfbi->count--;
> + if (mfbi->count = 0) {
> + ret = disable_panel(info);
> + if (ret < 0)
> + mfbi->count++;
> + }
> +
> + return ret;
> +}
Could this be replaced by runtime pm?
> +
> +static struct fb_ops fsl_dcu_ops = {
> + .owner = THIS_MODULE,
> + .fb_check_var = fsl_dcu_check_var,
> + .fb_set_par = fsl_dcu_set_par,
> + .fb_setcolreg = fsl_dcu_setcolreg,
> + .fb_blank = fsl_dcu_blank,
> + .fb_pan_display = fsl_dcu_pan_display,
> + .fb_fillrect = cfb_fillrect,
> + .fb_copyarea = cfb_copyarea,
> + .fb_imageblit = cfb_imageblit,
> + .fb_ioctl = fsl_dcu_ioctl,
> + .fb_open = fsl_dcu_open,
> + .fb_release = fsl_dcu_release,
> +};
> +
> +static int install_framebuffer(struct fb_info *info)
> +{
> + struct mfb_info *mfbi = info->par;
> + struct fb_videomode *db = dcu_mode_db;
> + unsigned int dbsize = ARRAY_SIZE(dcu_mode_db);
> + int ret;
> +
> + info->var.activate = FB_ACTIVATE_NOW;
> + info->fbops = &fsl_dcu_ops;
> + info->flags = FBINFO_FLAG_DEFAULT;
> + info->pseudo_palette = &mfbi->pseudo_palette;
> +
> + fb_alloc_cmap(&info->cmap, 16, 0);
> +
> + ret = fb_find_mode(&info->var, info, fb_mode, db, dbsize,
> + NULL, default_bpp);
> +
> + if (fsl_dcu_check_var(&info->var, info)) {
> + ret = -EINVAL;
> + goto failed_checkvar;
> + }
> +
> + if (register_framebuffer(info) < 0) {
> + ret = -EINVAL;
> + goto failed_register_framebuffer;
> + }
> +
> + printk(KERN_INFO "fb%d: %s fb device registered successfully.\n",
> + info->node, info->fix.id);
> + return 0;
> +
> +failed_checkvar:
> + fb_dealloc_cmap(&info->cmap);
> +failed_register_framebuffer:
> + unmap_video_memory(info);
> + fb_dealloc_cmap(&info->cmap);
> + return ret;
> +}
> +
> +static void uninstall_framebuffer(struct fb_info *info)
> +{
> + unregister_framebuffer(info);
> + unmap_video_memory(info);
> +
> + if (&info->cmap)
> + fb_dealloc_cmap(&info->cmap);
> +}
> +
> +static irqreturn_t fsl_dcu_irq(int irq, void *dev_id)
> +{
> + struct dcu_fb_data *dcufb = dev_id;
> + unsigned int status = readl(dcufb->reg_base + DCU_INT_STATUS);
> + u32 dcu_mode;
> +
> + if (status) {
> + if (status & DCU_INT_STATUS_UNDRUN) {
> + dcu_mode = readl(dcufb->reg_base + DCU_DCU_MODE);
> + dcu_mode &= ~DCU_MODE_DCU_MODE_MASK;
> + writel(dcu_mode | DCU_MODE_DCU_MODE(DCU_MODE_OFF),
> + dcufb->reg_base + DCU_DCU_MODE);
> + udelay(1);
> + writel(dcu_mode | DCU_MODE_DCU_MODE(DCU_MODE_NORMAL),
> + dcufb->reg_base + DCU_DCU_MODE);
> + }
> + writel(status, dcufb->reg_base + DCU_INT_STATUS);
> + return IRQ_HANDLED;
> + }
> + return IRQ_NONE;
> +}
> +
> +#ifdef CONFIG_PM
> +static int fsl_dcu_suspend(struct platform_device *pdev,
> + pm_message_t state)
> +{
> + struct dcu_fb_data *dcufb = dev_get_drvdata(&pdev->dev);
> +
> + clk_disable_unprepare(dcufb->clk);
> + return 0;
> +}
> +
> +static int fsl_dcu_resume(struct platform_device *pdev)
> +{
> + struct dcu_fb_data *dcufb = dev_get_drvdata(&pdev->dev);
> +
> + clk_prepare_enable(dcufb->clk);
> + return 0;
> +}
> +#else
> +#define fsl_dcu_suspend NULL
> +#define fsl_dcu_resume NULL
> +#endif
> +
Could this be replaced by runtime pm?
> +static int bypass_tcon(struct device_node *np)
> +{
> + struct device_node *tcon_np;
> + struct platform_device *tcon_pdev;
> + struct clk *tcon_clk;
> + void __iomem *tcon_reg;
> + int ret = 0;
> +
> + tcon_np = of_parse_phandle(np, "tcon-controller", 0);
> + if (!tcon_np)
> + return -EINVAL;
> +
> + tcon_pdev = of_find_device_by_node(tcon_np);
> + if (!tcon_pdev)
> + return -EINVAL;
> +
> + tcon_clk = devm_clk_get(&tcon_pdev->dev, "tcon");
> + if (IS_ERR(tcon_clk)) {
> + ret = PTR_ERR(tcon_clk);
> + goto failed_getclock;
> + }
> + clk_prepare_enable(tcon_clk);
> +
> + tcon_reg = of_iomap(tcon_np, 0);
> + if (!tcon_reg) {
> + ret = -ENOMEM;
> + goto failed_ioremap;
> + }
> + writel(TCON_BYPASS_ENABLE, tcon_reg + TCON_CTRL1);
> +
> + return 0;
> +
> +failed_ioremap:
> + clk_disable_unprepare(tcon_clk);
> +failed_getclock:
> + of_node_put(tcon_np);
> + return ret;
> +}
> +
Is the framebuffer driver the only user of tcon? If not you should not
map the memory here, but rather use something like regmap syscon.
[...]
Regards,
Lucas
--
Pengutronix e.K. | Lucas Stach |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
^ permalink raw reply
* RE: [PATCH 4/5] fb: Add DCU framebuffer driver for Vybrid VF610 platform
From: Wang Huan-B18965 @ 2013-08-06 7:20 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <1375712322.4276.12.camel@weser.hi.pengutronix.de>
SGksDQoNCj4gQW0gTW9udGFnLCBkZW4gMDUuMDguMjAxMywgMTU6MDkgKzAyMDAgc2NocmllYiBS
b2JlcnQgU2Nod2ViZWw6DQo+ID4gT24gTW9uLCBBdWcgMDUsIDIwMTMgYXQgMDk6NTE6NDBBTSAr
MDAwMCwgV2FuZyBIdWFuLUIxODk2NSB3cm90ZToNCj4gPiA+ID4gT24gRnJpLCBKdWwgMTIsIDIw
MTMgYXQgMDI6MDc6NTVQTSArMDgwMCwgQWxpc29uIFdhbmcgd3JvdGU6DQo+ID4gPiA+ID4gVGhl
IERpc3BsYXkgQ29udHJvbGxlciBVbml0IChEQ1UpIG1vZHVsZSBpcyBhIHN5c3RlbSBtYXN0ZXIN
Cj4gdGhhdA0KPiA+ID4gPiA+IGZldGNoZXMgZ3JhcGhpY3Mgc3RvcmVkIGluIGludGVybmFsIG9y
IGV4dGVybmFsIG1lbW9yeSBhbmQNCj4gPiA+ID4gPiBkaXNwbGF5cyB0aGVtIG9uIGEgVEZUIExD
RCBwYW5lbC4gQSB3aWRlIHJhbmdlIG9mIHBhbmVsIHNpemVzDQo+IGlzDQo+ID4gPiA+ID4gc3Vw
cG9ydGVkIGFuZCB0aGUgdGltaW5nIG9mIHRoZSBpbnRlcmZhY2Ugc2lnbmFscyBpcyBoaWdobHkN
Cj4gY29uZmlndXJhYmxlLg0KPiA+ID4gPiA+IEdyYXBoaWNzIGFyZSByZWFkIGRpcmVjdGx5IGZy
b20gbWVtb3J5IGFuZCB0aGVuIGJsZW5kZWQgaW4NCj4gPiA+ID4gPiByZWFsLXRpbWUsIHdoaWNo
IGFsbG93cyBmb3IgZHluYW1pYyBjb250ZW50IGNyZWF0aW9uIHdpdGgNCj4gPiA+ID4gPiBtaW5p
bWFsIENQVQ0KPiA+ID4gPiBpbnRlcnZlbnRpb24uDQo+ID4gPiA+DQo+ID4gPiA+IE9ubHkgYSBy
ZXZpZXcgb2YgdGhlIGNvZGUgaW5saW5lLg0KPiA+ID4gPg0KPiA+ID4gPiBNYXliZSB0aGUgcmVh
bCBxdWVzdGlvbiBpcyB3aGV0aGVyIHdlIHdhbnQgdG8gaW50cm9kdWNlIGFub3RoZXINCj4gPiA+
ID4gZnJhbWVidWZmZXIgZHJpdmVyIGF0IGFsbCBpbnN0ZWFkIG9mIG1ha2luZyBpdCBhIERSTSBk
cml2ZXIuDQo+ID4gPiBbQWxpc29uIFdhbmddIEkgdGhpbmsgRENVIG1vZHVsZSBpcyBtb3JlIHN1
aXRhYmxlIHRvIGJlIGRlc2lnbmVkIGFzDQo+IGEgZnJhbWVidWZmZXIgZHJpdmVyIHRoYW4gYSBE
Uk0gZHJpdmVyLiBKdXN0IGxpa2UgRElVIGZyYW1lYnVmZmVyDQo+IGRyaXZlciBmb3IgUG93ZXJQ
Qy4NCj4gPg0KPiA+IFdlIGxvb2tlZCBhdCB0aGUgVnlicmlkIGRhdGFzaGVldCBhbmQgaXQncyBE
Q1Ugc2VjdGlvbiBsYXN0IHdlZWssIGFuZA0KPiA+IHdpdGggaXRzIDY0IHBsYW5lcywgdGhlIGNv
bnRyb2xsZXIgcmVhbGx5IHdhbnRzIHRvIGdldCBhIERSTSBkcml2ZXIuDQo+ID4NCj4gRXhhY3Rs
eSwgd2l0aCBpdCdzIGV4dGVuc2l2ZSBoYXJkd2FyZSBjb21wb3NpdGlvbiBjYXBhYmlsaXRpZXMg
dGhlDQo+IGNvbnRyb2xsZXIgYmVncyBmb3IgYmVpbmcgZHJpdmVuIGJ5IGEgcHJvcGVyIERSTSBk
cml2ZXIuIFRoZXJlIGlzIG5vDQo+IHdheSBpbiBmYmRldiB0byBwcm9wZXJseSBzdXBwb3J0IGFs
bCB0aG9zZSBoYXJkd2FyZSBwbGFuZXMgd2l0aG91dA0KPiBpbnRyb2R1Y2luZyBhIGxvdCBvZiBu
b24gc3RhbmRhcmQgaW9jdGxzLCB3aGljaCBpbiB0dXJuIHdpbGwgZm9yY2UgeW91DQo+IGludG8g
d3JpdGluZyBhIGxvdCBvZiBjdXN0b20gdXNlcnNwYWNlIGNvZGUgaW5zdGVhZCBvZiBydW5uaW5n
IHByb3Zlbg0KPiB0ZWNobm9sb2d5IHRoYXQgc2l0cyBvbiB0b3Agb2YgS01TLg0KPiANCj4gRG9p
bmcgdGhpbmdzIGluIERSTSBtaWdodCBiZSBzbGlnaHRseSBtb3JlIHdvcmsgZm9yIHRoZSBpbml0
aWFsIGJyaW5nDQo+IHVwLCBidXQgd2lsbCBwYXkgb2ZmIGluIHRoZSBsb25nIHJ1biB3aGVuIHlv
dSBhcmUgZ29pbmcgdG8gcmVhbGx5IHVzZQ0KPiB0aGUgaGFyZHdhcmUgY2FwcyBvZiB0aGUgY29u
dHJvbGxlci4NCj4gDQpbQWxpc29uIFdhbmddIFRoYW5rcyBmb3IgeW91ciBjb21tZW50cyBhbmQg
ZXhwbGFuYXRpb25zLiBJIHVuZGVyc3RhbmQgeW91ciBjb25zaWRlcmF0aW9ucy4NCg0KSW4gRENV
IG1vZHVsZSwgdGhlcmUgYXJlIDY0IGdyYXBoaWNzIGxheWVycywgYnV0IHRoZSBtYXhpbXVtIHNv
dXJjZSBsYXllcnMgZm9yIGJsZW5kaW5nIG9mIGVhY2ggcGl4ZWwgaXMgb25seSA2LiBUaGUgZmIg
ZHJpdmVyIGNvdWxkIHNhdGlzZnkgdGhlIGN1cnJlbnQgYXBwbGljYXRpb24gcmVxdWlyZW1lbnRz
Lg0KDQpTbyBJIHdpbGwgcHVzaCBvdXQgdGhpcyBmYiBkcml2ZXIgZmlyc3QsIGFuZCB0aGVuIEkg
d2lsbCBhZGQgYSBEUk0gZHJpdmVyIGZvciBtb3JlIGFkdmFuY2VkIHJlcXVpcmVtZW50cy4NCg0K
RG8geW91IGhhdmUgbW9yZSBjb21tZW50cz8NCg0KDQpCZXN0IFJlZ2FyZHMsDQpBbGlzb24gV2Fu
Zw0K
^ permalink raw reply
* RE: [PATCH 4/5] fb: Add DCU framebuffer driver for Vybrid VF610 platform
From: Wang Huan-B18965 @ 2013-08-06 3:42 UTC (permalink / raw)
To: linux-arm-kernel
In-Reply-To: <20130805100630.GQ26614@pengutronix.de>
PiBPbiBNb24sIEF1ZyAwNSwgMjAxMyBhdCAwOTo1MTo0MEFNICswMDAwLCBXYW5nIEh1YW4tQjE4
OTY1IHdyb3RlOg0KPiA+IEhpLCBTYXNjaGEsDQo+ID4NCj4gPiA+IE9uIEZyaSwgSnVsIDEyLCAy
MDEzIGF0IDAyOjA3OjU1UE0gKzA4MDAsIEFsaXNvbiBXYW5nIHdyb3RlOg0KPiA+ID4gPiBUaGUg
RGlzcGxheSBDb250cm9sbGVyIFVuaXQgKERDVSkgbW9kdWxlIGlzIGEgc3lzdGVtIG1hc3RlciB0
aGF0DQo+ID4gPiA+IGZldGNoZXMgZ3JhcGhpY3Mgc3RvcmVkIGluIGludGVybmFsIG9yIGV4dGVy
bmFsIG1lbW9yeSBhbmQNCj4gPiA+ID4gZGlzcGxheXMgdGhlbSBvbiBhIFRGVCBMQ0QgcGFuZWwu
IEEgd2lkZSByYW5nZSBvZiBwYW5lbCBzaXplcyBpcw0KPiA+ID4gPiBzdXBwb3J0ZWQgYW5kIHRo
ZSB0aW1pbmcgb2YgdGhlIGludGVyZmFjZSBzaWduYWxzIGlzIGhpZ2hseQ0KPiBjb25maWd1cmFi
bGUuDQo+ID4gPiA+IEdyYXBoaWNzIGFyZSByZWFkIGRpcmVjdGx5IGZyb20gbWVtb3J5IGFuZCB0
aGVuIGJsZW5kZWQgaW4NCj4gPiA+ID4gcmVhbC10aW1lLCB3aGljaCBhbGxvd3MgZm9yIGR5bmFt
aWMgY29udGVudCBjcmVhdGlvbiB3aXRoIG1pbmltYWwNCj4gPiA+ID4gQ1BVDQo+ID4gPiBpbnRl
cnZlbnRpb24uDQo+ID4gPg0KPiA+ID4gT25seSBhIHJldmlldyBvZiB0aGUgY29kZSBpbmxpbmUu
DQo+ID4gPg0KPiA+ID4gTWF5YmUgdGhlIHJlYWwgcXVlc3Rpb24gaXMgd2hldGhlciB3ZSB3YW50
IHRvIGludHJvZHVjZSBhbm90aGVyDQo+ID4gPiBmcmFtZWJ1ZmZlciBkcml2ZXIgYXQgYWxsIGlu
c3RlYWQgb2YgbWFraW5nIGl0IGEgRFJNIGRyaXZlci4NCj4gPiBbQWxpc29uIFdhbmddIEkgdGhp
bmsgRENVIG1vZHVsZSBpcyBtb3JlIHN1aXRhYmxlIHRvIGJlIGRlc2lnbmVkIGFzIGENCj4gZnJh
bWVidWZmZXIgZHJpdmVyIHRoYW4gYSBEUk0gZHJpdmVyLiBKdXN0IGxpa2UgRElVIGZyYW1lYnVm
ZmVyIGRyaXZlcg0KPiBmb3IgUG93ZXJQQy4NCj4gPiA+DQo+ID4gPiA+ICsNCj4gPiA+ID4gKyNk
ZWZpbmUgRFJJVkVSX05BTUUJCQkiZnNsLWRjdS1mYiINCj4gPiA+ID4gKw0KPiA+ID4gPiArI2Rl
ZmluZSBEQ1VfRENVX01PREUJCQkweDAwMTANCj4gPiA+ID4gKyNkZWZpbmUgRENVX01PREVfQkxF
TkRfSVRFUih4KQkJKHggPDwgMjApDQo+ID4gPg0KPiA+ID4gV2hhdCdzIHRoZSByZXN1bHQgb2Yg
RENVX01PREVfQkxFTkRfSVRFUigxICsgMSk/DQo+ID4gW0FsaXNvbiBXYW5nXSBJcyBpdCByZWFs
bHkgbmVjZXNzYXJ5PyBJIGRvbid0IHVzZSB0aGlzIG1hY3JvIGxpa2UNCj4gRENVX01PREVfQkxF
TkRfSVRFUigxICsgMSksIGp1c3QgdXNlIERDVV9NT0RFX0JMRU5EX0lURVIoMikuDQo+IA0KPiA8
aXJvbnk+DQo+IE5vIGl0J3Mgbm90IG5lY2Vzc2FyeS4gV2UgY2FuIGp1c3QgYWRkIGluY29ycmVj
dCBtYWNyb3MgZXZlcnl3aGVyZSBhbmQNCj4gZml4IHRoZW0gYXMgbmVjZXNzYXJ5IGFmdGVyIGEg
bG9uZyBidWcgc3F1YXNoaW5nIHNlc3Npb24gPC9pcm9ueT4NCj4gDQo+IFlFUyEgVGhpcyBpcyBu
ZWNlc3NhcnkuDQpbQWxpc29uIFdhbmddIE9rLCBJIHdpbGwgY2hhbmdlIGl0IGFzIGZvbGxvd3Mu
DQojZGVmaW5lIERDVV9NT0RFX0JMRU5EX0lURVIoeCkJKCh4KSA8PCAyMCkNCj4gDQo+ID4gPiA+
ICsJcmV0dXJuIDA7DQo+ID4gPiA+ICt9DQo+ID4gPiA+ICsNCj4gPiA+ID4gK3N0YXRpYyBpbnQg
Y2FsY19kaXZfcmF0aW8oc3RydWN0IGZiX2luZm8gKmluZm8pIHsNCj4gPiA+DQo+ID4gPiBVc2Ug
YSBjb25zaXN0ZW50IG5hbWVzcGFjZSBmb3IgZnVuY3Rpb24gbmFtZXMgKGZzbF9kY3VfKQ0KPiA+
IFtBbGlzb24gV2FuZ10gSXMgaXQgbmVjZXNzYXJ5IHRvIHVzZSBhIGNvbnNpc3RlbnQgbmFtZXNw
YWNlIGZvciB0aGlzDQo+IGdlbmVyaWMgZnVuY3Rpb24/IEkgdGhpbmsgaXQgaXMgbmVjZXNzYXJ5
IHRvIHVzZSBhIGNvbnNpc3RlbnQgbmFtZXNwYWNlDQo+IChmc2xfZGN1XykgZm9yIHRoZSBmdW5j
dGlvbiBuYW1lcyBpbiBzdHJ1Y3R1cmUgZnNsX2RjdV9vcHMuDQo+IA0KPiBUaGlzIGZ1bmN0aW9u
IGNhbGN1bGF0ZXMgc29tZSBkaXZpZGVyIG91dCBvZiBhIHN0cnVjdCBmYl9pbmZvLiBUaGlzIGlz
DQo+IGJ5IG5vIG1lYW5zIGdlbmVyaWMuDQpbQWxpc29uIFdhbmddIE9rLCBJIHdpbGwgdXNlIGEg
Y29uc2lzdGVudCBuYW1lc3BhY2UgZm9yIHRoaXMgZnVuY3Rpb24uDQo+IA0KPiA+ID4gPiArCQlp
ZiAoY29weV90b191c2VyKGJ1ZiwgJmFscGhhLCBzaXplb2YoYWxwaGEpKSkNCj4gPiA+ID4gKwkJ
CXJldHVybiAtRUZBVUxUOw0KPiA+ID4gPiArCQlicmVhazsNCj4gPiA+ID4gKwljYXNlIE1GQl9T
RVRfQUxQSEE6DQo+ID4gPiA+ICsJCWlmIChjb3B5X2Zyb21fdXNlcigmYWxwaGEsIGJ1Ziwgc2l6
ZW9mKGFscGhhKSkpDQo+ID4gPiA+ICsJCQlyZXR1cm4gLUVGQVVMVDsNCj4gPiA+ID4gKwkJbWZi
aS0+YmxlbmQgPSAxOw0KPiA+ID4gPiArCQltZmJpLT5hbHBoYSA9IGFscGhhOw0KPiA+ID4gPiAr
CQlmc2xfZGN1X3NldF9wYXIoaW5mbyk7DQo+ID4gPiA+ICsJCWJyZWFrOw0KPiA+ID4NCj4gPiA+
IElzIGl0IHN0aWxsIHN0YXRlIG9mIHRoZSBhcnQgdG8gaW50cm9kdWNlIGlvY3RscyBpbiB0aGUg
ZmINCj4gZnJhbWV3b3JrDQo+ID4gPiB3aXRob3V0IGFueSBraW5kIG9mIGRvY3VtZW50YXRpb24/
DQo+ID4gW0FsaXNvbiBXYW5nXSBEbyB5b3UgbWVhbiBJIG5lZWQgdG8gYWRkIGEgZG9jdW1lbnRh
dGlvbiBpbg0KPiBEb2N1bWVudGF0aW9uL2ZiLz8NCj4gDQo+IFRoaXMgd2FzIG1vcmUgYSBxdWVz
dGlvbiB0byB0aGUgZmIgbWFpbnRhaW5lcnMuDQo+IA0KPiA+ID4gPiArc3RhdGljIGlycXJldHVy
bl90IGZzbF9kY3VfaXJxKGludCBpcnEsIHZvaWQgKmRldl9pZCkgew0KPiA+ID4gPiArCXN0cnVj
dCBkY3VfZmJfZGF0YSAqZGN1ZmIgPSBkZXZfaWQ7DQo+ID4gPiA+ICsJdW5zaWduZWQgaW50IHN0
YXR1cyA9IHJlYWRsKGRjdWZiLT5yZWdfYmFzZSArDQo+IERDVV9JTlRfU1RBVFVTKTsNCj4gPiA+
ID4gKwl1MzIgZGN1X21vZGU7DQo+ID4gPiA+ICsNCj4gPiA+ID4gKwlpZiAoc3RhdHVzKSB7DQo+
ID4gPg0KPiA+ID4gU2F2ZSBpbmRlbmRhdGlvbiBsZXZlbCBieSBiYWlsaW5nIG91dCBlYXJseS4N
Cj4gPiBbQWxpc29uIFdhbmddIFdoYXQgZG8geW91IG1lYW4/DQo+IA0KPiANCj4gSW5zdGVhZCBv
ZiBkb2luZzoNCj4gDQo+IAlpZiAoYmxhKSB7DQo+IAkJZG90aGlzKCk7DQo+IAkJZG90aGF0KCk7
DQo+IAkJcmV0dXJuOw0KPiAJfQ0KPiANCj4gCXJldHVybjsNCj4gDQo+IGl0J3MgZWFzaWVyIHRv
IHJlYWQ6DQo+IA0KPiAJaWYgKCFibGEpDQo+IAkJcmV0dXJuOw0KPiANCj4gCWRvdGhpcygpOw0K
PiAJZG90aGF0KCk7DQo+IAlyZXR1cm47DQo+IA0KPiBOb3RlIHRoYXQgZG90aGlzKCkgYW5kIGRv
dGhhdCgpIGFyZSBvbmUgaW5kZW50YXRpb24gbGV2ZWwgdG8gdGhlIGxlZnQNCj4gYW5kIHRodXMg
aGF2ZSBtb3JlIHNwYWNlIHBlciBsaW5lLg0KW0FsaXNvbiBXYW5nXSBJIHNlZS4gVGhhbmtzLg0K
DQoNCkJlc3QgUmVnYXJkcywNCkFsaXNvbiBXYW5nDQo
^ permalink raw reply
* Re: [PATCH] fbdev: suppress warning when assigning vga-save/restore base
From: David Miller @ 2013-08-05 22:29 UTC (permalink / raw)
To: santiago
Cc: hpa, dh.herrmann, linux-fbdev, linux-kernel, plagnioj,
tomi.valkeinen, linux
In-Reply-To: <20130805202955.GH22789@localhost>
From: Ondrej Zajicek <santiago@crfreenet.org>
Date: Mon, 5 Aug 2013 22:29:55 +0200
> On Sun, Aug 04, 2013 at 06:51:46PM -0700, David Miller wrote:
>> From: "H. Peter Anvin" <hpa@zytor.com>
>> Date: Sun, 04 Aug 2013 10:33:46 -0700
>>
>> > Anyone who can dig backwards and summarize? In other words:
>> >
>> > Where in the current code do we stuff a physical address in a pointer,
>> > or a virtual address into a non-pointer?
>>
>> The VGA register accessors try to accomodate iomem and ioport
>> accesses.
>>
>> If they are given a non-NULL iomem pointer 'regbase' they use
>> iomem accesses, otherwise they do direct ISA port poking.
>>
>> And yes the drivers in question are making some brash assumptions.
>> I suspect they should be using ioremap() or similar.
>
> Well, these drivers were written without MMIO (port IO only with NULL
> instead of 'vgabase' for VGA register accessors). They were converted in
> patches 94c322c30bd14ae6cdd369cb4a1f94c5c3809ac9,
> f8645933513c65ac55f23c63b2649097289795c6 and a few others (from David
> Miller) to potentially use MMIO by using 'vgabase' instead of NULL:
>
> pcibios_bus_to_resource(dev, &vga_res, &bus_reg);
> par->state.vgabase = (void __iomem *) vga_res.start;
>
> How this could even work? AFAIK these cards have to be explicitly programmed
> to enable MMIO (which was not done in the patches). These patches claim that
> it is for multi-domain PCI. I would guess that vgabase is NULL in common
> configurations but if it is non-NULL, it probably wouldn't work, unless
> there is some hardware magic that transparently converts MMIO (from CPU PoV)
> to port IO (from card/PCI PoV).
>
> Note that there are some later patches (86c0f043a737dadf034a4e6f29aefb074f4a1146)
> from Ondrej Zary that independently enable MMIO, but they use par->mmio
> field instead of par->state.vgabase .
Ok, so the correct fix would be to make them pass NULL again and to simply
ignore the resources.
^ permalink raw reply
* [PATCH v3 20/20] video: da8xx-fb: adding am33xx as dependency
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
To: linux-fbdev
Updating Kconfig to allow am33xx to include lcdc fbdev driver
including some extra dependencies needed by device tree mods.
v2: must add FB_MODE_HELPERS as VIDEOMODE_HELPERS alone doesn't
get the correct functions from fbmon.c built in.
Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
drivers/video/Kconfig | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig
index 2e937bd..f36c2f7 100644
--- a/drivers/video/Kconfig
+++ b/drivers/video/Kconfig
@@ -2226,15 +2226,17 @@ config FB_SH7760
panels <= 320 pixel horizontal resolution.
config FB_DA8XX
- tristate "DA8xx/OMAP-L1xx Framebuffer support"
- depends on FB && ARCH_DAVINCI_DA8XX
+ tristate "DA8xx/OMAP-L1xx/AM335x Framebuffer support"
+ depends on FB && (ARCH_DAVINCI_DA8XX || SOC_AM33XX)
select FB_CFB_FILLRECT
select FB_CFB_COPYAREA
select FB_CFB_IMAGEBLIT
select FB_CFB_REV_PIXELS_IN_BYTE
+ select FB_MODE_HELPERS
+ select VIDEOMODE_HELPERS
---help---
This is the frame buffer device driver for the TI LCD controller
- found on DA8xx/OMAP-L1xx SoCs.
+ found on DA8xx/OMAP-L1xx/AM335x SoCs.
If unsure, say N.
config FB_VIRTUAL
--
1.7.0.4
^ permalink raw reply related
* [PATCH v3 19/20] video: da8xx-fb: let compiler decide what to inline
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
To: linux-fbdev
Remove use of explicit inline compiler directive and let the compiler
make the decision as per Documentation/CodingStyle.
Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
drivers/video/da8xx-fb.c | 17 ++++++++---------
1 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/drivers/video/da8xx-fb.c b/drivers/video/da8xx-fb.c
index 8c0ca11..ea9771f 100644
--- a/drivers/video/da8xx-fb.c
+++ b/drivers/video/da8xx-fb.c
@@ -141,12 +141,12 @@ static irq_handler_t lcdc_irq_handler;
static wait_queue_head_t frame_done_wq;
static int frame_done_flag;
-static inline unsigned int lcdc_read(unsigned int addr)
+static unsigned int lcdc_read(unsigned int addr)
{
return (unsigned int)__raw_readl(da8xx_fb_reg_base + (addr));
}
-static inline void lcdc_write(unsigned int val, unsigned int addr)
+static void lcdc_write(unsigned int val, unsigned int addr)
{
__raw_writel(val, da8xx_fb_reg_base + (addr));
}
@@ -245,13 +245,13 @@ static struct fb_videomode known_lcd_panels[] = {
},
};
-static inline bool da8xx_fb_is_raster_enabled(void)
+static bool da8xx_fb_is_raster_enabled(void)
{
return !!(lcdc_read(LCD_RASTER_CTRL_REG) & LCD_RASTER_ENABLE);
}
/* Enable the Raster Engine of the LCD Controller */
-static inline void lcd_enable_raster(void)
+static void lcd_enable_raster(void)
{
u32 reg;
@@ -273,8 +273,7 @@ static inline void lcd_enable_raster(void)
}
/* Disable the Raster Engine of the LCD Controller */
-static inline void lcd_disable_raster(enum da8xx_frame_complete
- wait_for_frame_done)
+static void lcd_disable_raster(enum da8xx_frame_complete wait_for_frame_done)
{
u32 reg;
int ret;
@@ -750,7 +749,7 @@ static int da8xx_fb_calc_config_clk_divider(struct da8xx_fb_par *par,
return da8xx_fb_config_clk_divider(par, lcdc_clk_div, lcdc_clk_rate);
}
-static inline unsigned da8xx_fb_round_clk(struct da8xx_fb_par *par,
+static unsigned da8xx_fb_round_clk(struct da8xx_fb_par *par,
unsigned pixclock)
{
unsigned lcdc_clk_div, lcdc_clk_rate;
@@ -1043,7 +1042,7 @@ static int lcd_da8xx_cpufreq_transition(struct notifier_block *nb,
return 0;
}
-static inline int lcd_da8xx_cpufreq_register(struct da8xx_fb_par *par)
+static int lcd_da8xx_cpufreq_register(struct da8xx_fb_par *par)
{
par->freq_transition.notifier_call = lcd_da8xx_cpufreq_transition;
@@ -1051,7 +1050,7 @@ static inline int lcd_da8xx_cpufreq_register(struct da8xx_fb_par *par)
CPUFREQ_TRANSITION_NOTIFIER);
}
-static inline void lcd_da8xx_cpufreq_deregister(struct da8xx_fb_par *par)
+static void lcd_da8xx_cpufreq_deregister(struct da8xx_fb_par *par)
{
cpufreq_unregister_notifier(&par->freq_transition,
CPUFREQ_TRANSITION_NOTIFIER);
--
1.7.0.4
^ permalink raw reply related
* [PATCH v3 18/20] video: da8xx-fb: make clock naming consistent
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
To: linux-fbdev
Clean up the code, so that the names of the various clock variables
are consistent to it is clear what variable is associated with what
clock.
Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
drivers/video/da8xx-fb.c | 24 ++++++++++++------------
1 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/drivers/video/da8xx-fb.c b/drivers/video/da8xx-fb.c
index dfe7572..8c0ca11 100644
--- a/drivers/video/da8xx-fb.c
+++ b/drivers/video/da8xx-fb.c
@@ -177,7 +177,7 @@ struct da8xx_fb_par {
#ifdef CONFIG_CPU_FREQ
struct notifier_block freq_transition;
#endif
- unsigned int lcd_fck_rate;
+ unsigned int lcdc_clk_rate;
void (*panel_power_ctrl)(int);
u32 pseudo_palette[16];
struct fb_videomode mode;
@@ -693,7 +693,7 @@ static int da8xx_fb_config_clk_divider(struct da8xx_fb_par *par,
{
int ret;
- if (par->lcd_fck_rate != lcdc_clk_rate) {
+ if (par->lcdc_clk_rate != lcdc_clk_rate) {
ret = clk_set_rate(par->lcdc_clk, lcdc_clk_rate);
if (IS_ERR_VALUE(ret)) {
dev_err(par->dev,
@@ -701,7 +701,7 @@ static int da8xx_fb_config_clk_divider(struct da8xx_fb_par *par,
lcdc_clk_rate);
return ret;
}
- par->lcd_fck_rate = clk_get_rate(par->lcdc_clk);
+ par->lcdc_clk_rate = clk_get_rate(par->lcdc_clk);
}
/* Configure the LCD clock divisor. */
@@ -723,7 +723,7 @@ static unsigned int da8xx_fb_calc_clk_divider(struct da8xx_fb_par *par,
pixclock = PICOS2KHZ(pixclock) * 1000;
- *lcdc_clk_rate = par->lcd_fck_rate;
+ *lcdc_clk_rate = par->lcdc_clk_rate;
if (pixclock < (*lcdc_clk_rate / CLK_MAX_DIV)) {
*lcdc_clk_rate = clk_round_rate(par->lcdc_clk,
@@ -1031,8 +1031,8 @@ static int lcd_da8xx_cpufreq_transition(struct notifier_block *nb,
par = container_of(nb, struct da8xx_fb_par, freq_transition);
if (val = CPUFREQ_POSTCHANGE) {
- if (par->lcd_fck_rate != clk_get_rate(par->lcdc_clk)) {
- par->lcd_fck_rate = clk_get_rate(par->lcdc_clk);
+ if (par->lcdc_clk_rate != clk_get_rate(par->lcdc_clk)) {
+ par->lcdc_clk_rate = clk_get_rate(par->lcdc_clk);
lcd_disable_raster(DA8XX_FRAME_WAIT);
da8xx_fb_calc_config_clk_divider(par, &par->mode);
if (par->blank = FB_BLANK_UNBLANK)
@@ -1327,8 +1327,8 @@ static int fb_probe(struct platform_device *device)
struct lcd_ctrl_config *lcd_cfg;
struct fb_videomode *lcdc_info;
struct fb_info *da8xx_fb_info;
- struct clk *fb_clk = NULL;
struct da8xx_fb_par *par;
+ struct clk *tmp_lcdc_clk;
int ret;
unsigned long ulcm;
@@ -1346,10 +1346,10 @@ static int fb_probe(struct platform_device *device)
if (IS_ERR(da8xx_fb_reg_base))
return PTR_ERR(da8xx_fb_reg_base);
- fb_clk = devm_clk_get(&device->dev, "fck");
- if (IS_ERR(fb_clk)) {
+ tmp_lcdc_clk = devm_clk_get(&device->dev, "fck");
+ if (IS_ERR(tmp_lcdc_clk)) {
dev_err(&device->dev, "Can not get device clock\n");
- return PTR_ERR(fb_clk);
+ return PTR_ERR(tmp_lcdc_clk);
}
pm_runtime_enable(&device->dev);
@@ -1389,8 +1389,8 @@ static int fb_probe(struct platform_device *device)
par = da8xx_fb_info->par;
par->dev = &device->dev;
- par->lcdc_clk = fb_clk;
- par->lcd_fck_rate = clk_get_rate(fb_clk);
+ par->lcdc_clk = tmp_lcdc_clk;
+ par->lcdc_clk_rate = clk_get_rate(par->lcdc_clk);
if (fb_pdata->panel_power_ctrl) {
par->panel_power_ctrl = fb_pdata->panel_power_ctrl;
par->panel_power_ctrl(1);
--
1.7.0.4
^ permalink raw reply related
* [PATCH v3 17/20] video: da8xx-fb: set upstream clock rate (if reqd)
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
To: linux-fbdev
Based on original patch by: Afzal Mohammed <afzal@ti.com>
LCDC IP has a clock divider to adjust pixel clock, this limits pixel
clock range to fck/255 - fck/2(fck - rate of input clock to LCDC IP).
In the case of AM335x, where this IP is present, default fck is not
sufficient to provide normal pixel clock rates, hence rendering this
driver unusable on AM335x.
If input clock too is configurable, allowable range of pixel clock
would increase. Here initially it is checked whether with present fck,
divider in IP could be configured to obtain required rate, if not,
fck is adjusted. This makes it usable on AM335x.
Note:
Another solution would be to model an inherited basic clock divider of
CCF, an advantage would be a better possible resolution for pixel clk.
And trying to instantiate a CCF clock would mean that to be consistent,
3 bits being turned on to enable clocks of LCDC IP would have to be
modeled as gate clocks. Now that would bring in a total of 4 clocks,
including necessity to create a new inherited divider clock, and that
mean a branch of clock tree would be present in LCDC driver. This
would add complexity to LCDC driver bringing in considerable amount
of clock handling code, and this would not bring in much advantage
for existing use cases other than providing a higher resolution of
pixel clock. And existing use cases work without relying on clock
modeling. Another fact is that out of the two platform's using this
driver DaVinci is not yet converted to CCF. In future if higher
resolution of pixel clock is required, and probably after DaVinci is
CCF'ed, modeling clock nodes inside driver may be considered.
v2: purely cosmetic changes to try and clarify what variables are
being used for in the clock rate / divider calculations
Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
drivers/video/da8xx-fb.c | 85 +++++++++++++++++++++++++++++++++++-----------
1 files changed, 65 insertions(+), 20 deletions(-)
diff --git a/drivers/video/da8xx-fb.c b/drivers/video/da8xx-fb.c
index d8b295a..dfe7572 100644
--- a/drivers/video/da8xx-fb.c
+++ b/drivers/video/da8xx-fb.c
@@ -132,6 +132,9 @@
#define WSI_TIMEOUT 50
#define PALETTE_SIZE 256
+#define CLK_MIN_DIV 2
+#define CLK_MAX_DIV 255
+
static void __iomem *da8xx_fb_reg_base;
static unsigned int lcd_revision;
static irq_handler_t lcdc_irq_handler;
@@ -684,38 +687,76 @@ static void da8xx_fb_lcd_reset(void)
}
}
-static inline unsigned da8xx_fb_calc_clk_divider(struct da8xx_fb_par *par,
- unsigned pixclock)
-{
- return par->lcd_fck_rate / (PICOS2KHZ(pixclock) * 1000);
-}
-
-static inline unsigned da8xx_fb_round_clk(struct da8xx_fb_par *par,
- unsigned pixclock)
+static int da8xx_fb_config_clk_divider(struct da8xx_fb_par *par,
+ unsigned lcdc_clk_div,
+ unsigned lcdc_clk_rate)
{
- unsigned div;
+ int ret;
- div = da8xx_fb_calc_clk_divider(par, pixclock);
- return KHZ2PICOS(par->lcd_fck_rate / (1000 * div));
-}
+ if (par->lcd_fck_rate != lcdc_clk_rate) {
+ ret = clk_set_rate(par->lcdc_clk, lcdc_clk_rate);
+ if (IS_ERR_VALUE(ret)) {
+ dev_err(par->dev,
+ "unable to set clock rate at %u\n",
+ lcdc_clk_rate);
+ return ret;
+ }
+ par->lcd_fck_rate = clk_get_rate(par->lcdc_clk);
+ }
-static inline void da8xx_fb_config_clk_divider(unsigned div)
-{
/* Configure the LCD clock divisor. */
- lcdc_write(LCD_CLK_DIVISOR(div) |
+ lcdc_write(LCD_CLK_DIVISOR(lcdc_clk_div) |
(LCD_RASTER_MODE & 0x1), LCD_CTRL_REG);
if (lcd_revision = LCD_VERSION_2)
lcdc_write(LCD_V2_DMA_CLK_EN | LCD_V2_LIDD_CLK_EN |
LCD_V2_CORE_CLK_EN, LCD_CLK_ENABLE_REG);
+
+ return 0;
+}
+
+static unsigned int da8xx_fb_calc_clk_divider(struct da8xx_fb_par *par,
+ unsigned pixclock,
+ unsigned *lcdc_clk_rate)
+{
+ unsigned lcdc_clk_div;
+
+ pixclock = PICOS2KHZ(pixclock) * 1000;
+
+ *lcdc_clk_rate = par->lcd_fck_rate;
+
+ if (pixclock < (*lcdc_clk_rate / CLK_MAX_DIV)) {
+ *lcdc_clk_rate = clk_round_rate(par->lcdc_clk,
+ pixclock * CLK_MAX_DIV);
+ lcdc_clk_div = CLK_MAX_DIV;
+ } else if (pixclock > (*lcdc_clk_rate / CLK_MIN_DIV)) {
+ *lcdc_clk_rate = clk_round_rate(par->lcdc_clk,
+ pixclock * CLK_MIN_DIV);
+ lcdc_clk_div = CLK_MIN_DIV;
+ } else {
+ lcdc_clk_div = *lcdc_clk_rate / pixclock;
+ }
+
+ return lcdc_clk_div;
}
-static inline void da8xx_fb_calc_config_clk_divider(struct da8xx_fb_par *par,
- struct fb_videomode *mode)
+static int da8xx_fb_calc_config_clk_divider(struct da8xx_fb_par *par,
+ struct fb_videomode *mode)
{
- unsigned div = da8xx_fb_calc_clk_divider(par, mode->pixclock);
+ unsigned lcdc_clk_rate;
+ unsigned lcdc_clk_div = da8xx_fb_calc_clk_divider(par, mode->pixclock,
+ &lcdc_clk_rate);
- da8xx_fb_config_clk_divider(div);
+ return da8xx_fb_config_clk_divider(par, lcdc_clk_div, lcdc_clk_rate);
+}
+
+static inline unsigned da8xx_fb_round_clk(struct da8xx_fb_par *par,
+ unsigned pixclock)
+{
+ unsigned lcdc_clk_div, lcdc_clk_rate;
+
+ lcdc_clk_div = da8xx_fb_calc_clk_divider(par, pixclock, &lcdc_clk_rate);
+ return KHZ2PICOS(lcdc_clk_rate / (1000 * lcdc_clk_div));
}
static int lcd_init(struct da8xx_fb_par *par, const struct lcd_ctrl_config *cfg,
@@ -724,7 +765,11 @@ static int lcd_init(struct da8xx_fb_par *par, const struct lcd_ctrl_config *cfg,
u32 bpp;
int ret = 0;
- da8xx_fb_calc_config_clk_divider(par, panel);
+ ret = da8xx_fb_calc_config_clk_divider(par, panel);
+ if (IS_ERR_VALUE(ret)) {
+ dev_err(par->dev, "unable to configure clock\n");
+ return ret;
+ }
if (panel->sync & FB_SYNC_CLK_INVERT)
lcdc_write((lcdc_read(LCD_RASTER_TIMING_2_REG) |
--
1.7.0.4
^ permalink raw reply related
* [PATCH v3 16/20] video: da8xx-fb: reorganize panel detection
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
To: linux-fbdev
From: Afzal Mohammed <afzal@ti.com>
Move panel detection to a separate function, this helps in readability
as well as makes DT support cleaner.
Signed-off-by: Afzal Mohammed <afzal@ti.com>
Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
drivers/video/da8xx-fb.c | 42 ++++++++++++++++++++++++++----------------
1 files changed, 26 insertions(+), 16 deletions(-)
diff --git a/drivers/video/da8xx-fb.c b/drivers/video/da8xx-fb.c
index dec2777..d8b295a 100644
--- a/drivers/video/da8xx-fb.c
+++ b/drivers/video/da8xx-fb.c
@@ -1253,6 +1253,27 @@ static struct fb_ops da8xx_fb_ops = {
.fb_blank = cfb_blank,
};
+static struct fb_videomode *da8xx_fb_get_videomode(struct platform_device *dev)
+{
+ struct da8xx_lcdc_platform_data *fb_pdata = dev->dev.platform_data;
+ struct fb_videomode *lcdc_info;
+ int i;
+
+ for (i = 0, lcdc_info = known_lcd_panels;
+ i < ARRAY_SIZE(known_lcd_panels); i++, lcdc_info++) {
+ if (strcmp(fb_pdata->type, lcdc_info->name) = 0)
+ break;
+ }
+
+ if (i = ARRAY_SIZE(known_lcd_panels)) {
+ dev_err(&dev->dev, "no panel found\n");
+ return NULL;
+ }
+ dev_info(&dev->dev, "found %s panel\n", lcdc_info->name);
+
+ return lcdc_info;
+}
+
static int fb_probe(struct platform_device *device)
{
struct da8xx_lcdc_platform_data *fb_pdata @@ -1263,7 +1284,7 @@ static int fb_probe(struct platform_device *device)
struct fb_info *da8xx_fb_info;
struct clk *fb_clk = NULL;
struct da8xx_fb_par *par;
- int ret, i;
+ int ret;
unsigned long ulcm;
if (fb_pdata = NULL) {
@@ -1271,6 +1292,10 @@ static int fb_probe(struct platform_device *device)
return -ENOENT;
}
+ lcdc_info = da8xx_fb_get_videomode(device);
+ if (lcdc_info = NULL)
+ return -ENODEV;
+
lcdc_regs = platform_get_resource(device, IORESOURCE_MEM, 0);
da8xx_fb_reg_base = devm_ioremap_resource(&device->dev, lcdc_regs);
if (IS_ERR(da8xx_fb_reg_base))
@@ -1302,21 +1327,6 @@ static int fb_probe(struct platform_device *device)
break;
}
- for (i = 0, lcdc_info = known_lcd_panels;
- i < ARRAY_SIZE(known_lcd_panels);
- i++, lcdc_info++) {
- if (strcmp(fb_pdata->type, lcdc_info->name) = 0)
- break;
- }
-
- if (i = ARRAY_SIZE(known_lcd_panels)) {
- dev_err(&device->dev, "GLCD: No valid panel found\n");
- ret = -ENODEV;
- goto err_pm_runtime_disable;
- } else
- dev_info(&device->dev, "GLCD: Found %s panel\n",
- fb_pdata->type);
-
lcd_cfg = (struct lcd_ctrl_config *)fb_pdata->controller_data;
if (!lcd_cfg) {
--
1.7.0.4
^ permalink raw reply related
* [PATCH v3 15/20] video: da8xx-fb: ensure non-null cfg in pdata
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
To: linux-fbdev
From: Afzal Mohammed <afzal@ti.com>
Ensure that platform data contains pointer for lcd_ctrl_config.
Signed-off-by: Afzal Mohammed <afzal@ti.com>
Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
drivers/video/da8xx-fb.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/drivers/video/da8xx-fb.c b/drivers/video/da8xx-fb.c
index c620a32..dec2777 100644
--- a/drivers/video/da8xx-fb.c
+++ b/drivers/video/da8xx-fb.c
@@ -1319,6 +1319,11 @@ static int fb_probe(struct platform_device *device)
lcd_cfg = (struct lcd_ctrl_config *)fb_pdata->controller_data;
+ if (!lcd_cfg) {
+ ret = -EINVAL;
+ goto err_pm_runtime_disable;
+ }
+
da8xx_fb_info = framebuffer_alloc(sizeof(struct da8xx_fb_par),
&device->dev);
if (!da8xx_fb_info) {
--
1.7.0.4
^ permalink raw reply related
* [PATCH v3 14/20] video: da8xx-fb: use devres
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
To: linux-fbdev
Replace existing resource handling in the driver with managed device
resource.
v2: implement some changes as recommended by
Prabhakar Lad <prabhakar.csengg@gmail.com>
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
drivers/video/da8xx-fb.c | 41 ++++++++---------------------------------
1 files changed, 8 insertions(+), 33 deletions(-)
diff --git a/drivers/video/da8xx-fb.c b/drivers/video/da8xx-fb.c
index 8384455..c620a32 100644
--- a/drivers/video/da8xx-fb.c
+++ b/drivers/video/da8xx-fb.c
@@ -133,7 +133,6 @@
#define PALETTE_SIZE 256
static void __iomem *da8xx_fb_reg_base;
-static struct resource *lcdc_regs;
static unsigned int lcd_revision;
static irq_handler_t lcdc_irq_handler;
static wait_queue_head_t frame_done_wq;
@@ -1039,12 +1038,9 @@ static int fb_remove(struct platform_device *dev)
par->p_palette_base);
dma_free_coherent(NULL, par->vram_size, par->vram_virt,
par->vram_phys);
- free_irq(par->irq, par);
pm_runtime_put_sync(&dev->dev);
pm_runtime_disable(&dev->dev);
framebuffer_release(info);
- iounmap(da8xx_fb_reg_base);
- release_mem_region(lcdc_regs->start, resource_size(lcdc_regs));
}
return 0;
@@ -1261,12 +1257,12 @@ static int fb_probe(struct platform_device *device)
{
struct da8xx_lcdc_platform_data *fb_pdata device->dev.platform_data;
+ static struct resource *lcdc_regs;
struct lcd_ctrl_config *lcd_cfg;
struct fb_videomode *lcdc_info;
struct fb_info *da8xx_fb_info;
struct clk *fb_clk = NULL;
struct da8xx_fb_par *par;
- resource_size_t len;
int ret, i;
unsigned long ulcm;
@@ -1276,29 +1272,14 @@ static int fb_probe(struct platform_device *device)
}
lcdc_regs = platform_get_resource(device, IORESOURCE_MEM, 0);
- if (!lcdc_regs) {
- dev_err(&device->dev,
- "Can not get memory resource for LCD controller\n");
- return -ENOENT;
- }
-
- len = resource_size(lcdc_regs);
-
- lcdc_regs = request_mem_region(lcdc_regs->start, len, lcdc_regs->name);
- if (!lcdc_regs)
- return -EBUSY;
+ da8xx_fb_reg_base = devm_ioremap_resource(&device->dev, lcdc_regs);
+ if (IS_ERR(da8xx_fb_reg_base))
+ return PTR_ERR(da8xx_fb_reg_base);
- da8xx_fb_reg_base = ioremap(lcdc_regs->start, len);
- if (!da8xx_fb_reg_base) {
- ret = -EBUSY;
- goto err_request_mem;
- }
-
- fb_clk = clk_get(&device->dev, "fck");
+ fb_clk = devm_clk_get(&device->dev, "fck");
if (IS_ERR(fb_clk)) {
dev_err(&device->dev, "Can not get device clock\n");
- ret = -ENODEV;
- goto err_ioremap;
+ return PTR_ERR(fb_clk);
}
pm_runtime_enable(&device->dev);
@@ -1459,8 +1440,8 @@ static int fb_probe(struct platform_device *device)
lcdc_irq_handler = lcdc_irq_handler_rev02;
}
- ret = request_irq(par->irq, lcdc_irq_handler, 0,
- DRIVER_NAME, par);
+ ret = devm_request_irq(&device->dev, par->irq, lcdc_irq_handler, 0,
+ DRIVER_NAME, par);
if (ret)
goto irq_freq;
return 0;
@@ -1489,12 +1470,6 @@ err_pm_runtime_disable:
pm_runtime_put_sync(&device->dev);
pm_runtime_disable(&device->dev);
-err_ioremap:
- iounmap(da8xx_fb_reg_base);
-
-err_request_mem:
- release_mem_region(lcdc_regs->start, len);
-
return ret;
}
--
1.7.0.4
^ permalink raw reply related
* [PATCH v3 13/20] video: da8xx-fb: enable sync lost intr for v2 ip
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
To: linux-fbdev
From: Afzal Mohammed <afzal@ti.com>
The interrupt handler explicitly has code that handles the sync lost
interrupt. However the sync lost interrupt is never actually being
enabled in the LCD controller, therefore this interrupt code path is not
being exercised. This fix simply enables the generation of the sync
lost interrupt by the LCD controller so it can be dealt with
appropriately by the interrupt handler.
Signed-off-by: Afzal Mohammed <afzal@ti.com>
Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
drivers/video/da8xx-fb.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/video/da8xx-fb.c b/drivers/video/da8xx-fb.c
index 893aefe..8384455 100644
--- a/drivers/video/da8xx-fb.c
+++ b/drivers/video/da8xx-fb.c
@@ -320,7 +320,7 @@ static void lcd_blit(int load_mode, struct da8xx_fb_par *par)
reg_int = lcdc_read(LCD_INT_ENABLE_SET_REG) |
LCD_V2_END_OF_FRAME0_INT_ENA |
LCD_V2_END_OF_FRAME1_INT_ENA |
- LCD_FRAME_DONE;
+ LCD_FRAME_DONE | LCD_SYNC_LOST;
lcdc_write(reg_int, LCD_INT_ENABLE_SET_REG);
}
reg_dma |= LCD_DUAL_FRAME_BUFFER_ENABLE;
--
1.7.0.4
^ permalink raw reply related
* [PATCH v3 12/20] video: da8xx-fb: fix 24bpp raster configuration
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
To: linux-fbdev
Based on original patch by: Manjunathappa, Prakash <prakash.pm@ti.com>
and Afzal Mohammed <afzal@ti.com>
Set only LCD_V2_TFT_24BPP_MODE bit for 24bpp and LCD_V2_TFT_24BPP_UNPACK
bit along with LCD_V2_TFT_24BPP_MODE for 32bpp configuration.
Patch is tested on am335x-evm for 24bpp and da850-evm for 16bpp
configurations.
v2: removes confusing fall through in case statement for pixel
depth configuration.
Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
drivers/video/da8xx-fb.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/drivers/video/da8xx-fb.c b/drivers/video/da8xx-fb.c
index 9375382..893aefe 100644
--- a/drivers/video/da8xx-fb.c
+++ b/drivers/video/da8xx-fb.c
@@ -554,10 +554,11 @@ static int lcd_cfg_frame_buffer(struct da8xx_fb_par *par, u32 width, u32 height,
break;
case 24:
reg |= LCD_V2_TFT_24BPP_MODE;
+ break;
case 32:
+ reg |= LCD_V2_TFT_24BPP_MODE;
reg |= LCD_V2_TFT_24BPP_UNPACK;
break;
-
case 8:
par->palette_sz = 256 * 2;
break;
--
1.7.0.4
^ permalink raw reply related
* [PATCH v3 11/20] video: da8xx-fb: improve readability of code
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
To: linux-fbdev
Change the lcd_disable_raster funtion from using a bool to an enum
as the function is very confusing with the current api. This helps
make it clearer what the parameter is really doing.
Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
drivers/video/da8xx-fb.c | 24 +++++++++++++-----------
include/video/da8xx-fb.h | 5 +++++
2 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/drivers/video/da8xx-fb.c b/drivers/video/da8xx-fb.c
index 6beb88d..9375382 100644
--- a/drivers/video/da8xx-fb.c
+++ b/drivers/video/da8xx-fb.c
@@ -271,7 +271,8 @@ static inline void lcd_enable_raster(void)
}
/* Disable the Raster Engine of the LCD Controller */
-static inline void lcd_disable_raster(bool wait_for_frame_done)
+static inline void lcd_disable_raster(enum da8xx_frame_complete
+ wait_for_frame_done)
{
u32 reg;
int ret;
@@ -283,7 +284,8 @@ static inline void lcd_disable_raster(bool wait_for_frame_done)
/* return if already disabled */
return;
- if ((wait_for_frame_done = true) && (lcd_revision = LCD_VERSION_2)) {
+ if ((wait_for_frame_done = DA8XX_FRAME_WAIT) &&
+ (lcd_revision = LCD_VERSION_2)) {
frame_done_flag = 0;
ret = wait_event_interruptible_timeout(frame_done_wq,
frame_done_flag != 0,
@@ -771,7 +773,7 @@ static irqreturn_t lcdc_irq_handler_rev02(int irq, void *arg)
u32 stat = lcdc_read(LCD_MASKED_STAT_REG);
if ((stat & LCD_SYNC_LOST) && (stat & LCD_FIFO_UNDERFLOW)) {
- lcd_disable_raster(false);
+ lcd_disable_raster(DA8XX_FRAME_NOWAIT);
lcdc_write(stat, LCD_MASKED_STAT_REG);
lcd_enable_raster();
} else if (stat & LCD_PL_LOAD_DONE) {
@@ -781,7 +783,7 @@ static irqreturn_t lcdc_irq_handler_rev02(int irq, void *arg)
* interrupt via the following write to the status register. If
* this is done after then one gets multiple PL done interrupts.
*/
- lcd_disable_raster(false);
+ lcd_disable_raster(DA8XX_FRAME_NOWAIT);
lcdc_write(stat, LCD_MASKED_STAT_REG);
@@ -834,7 +836,7 @@ static irqreturn_t lcdc_irq_handler_rev01(int irq, void *arg)
u32 reg_ras;
if ((stat & LCD_SYNC_LOST) && (stat & LCD_FIFO_UNDERFLOW)) {
- lcd_disable_raster(false);
+ lcd_disable_raster(DA8XX_FRAME_NOWAIT);
lcdc_write(stat, LCD_STAT_REG);
lcd_enable_raster();
} else if (stat & LCD_PL_LOAD_DONE) {
@@ -844,7 +846,7 @@ static irqreturn_t lcdc_irq_handler_rev01(int irq, void *arg)
* interrupt via the following write to the status register. If
* this is done after then one gets multiple PL done interrupts.
*/
- lcd_disable_raster(false);
+ lcd_disable_raster(DA8XX_FRAME_NOWAIT);
lcdc_write(stat, LCD_STAT_REG);
@@ -986,7 +988,7 @@ static int lcd_da8xx_cpufreq_transition(struct notifier_block *nb,
if (val = CPUFREQ_POSTCHANGE) {
if (par->lcd_fck_rate != clk_get_rate(par->lcdc_clk)) {
par->lcd_fck_rate = clk_get_rate(par->lcdc_clk);
- lcd_disable_raster(true);
+ lcd_disable_raster(DA8XX_FRAME_WAIT);
da8xx_fb_calc_config_clk_divider(par, &par->mode);
if (par->blank = FB_BLANK_UNBLANK)
lcd_enable_raster();
@@ -1024,7 +1026,7 @@ static int fb_remove(struct platform_device *dev)
if (par->panel_power_ctrl)
par->panel_power_ctrl(0);
- lcd_disable_raster(true);
+ lcd_disable_raster(DA8XX_FRAME_WAIT);
lcdc_write(0, LCD_RASTER_CTRL_REG);
/* disable DMA */
@@ -1140,7 +1142,7 @@ static int cfb_blank(int blank, struct fb_info *info)
if (par->panel_power_ctrl)
par->panel_power_ctrl(0);
- lcd_disable_raster(true);
+ lcd_disable_raster(DA8XX_FRAME_WAIT);
break;
default:
ret = -EINVAL;
@@ -1208,7 +1210,7 @@ static int da8xxfb_set_par(struct fb_info *info)
bool raster = da8xx_fb_is_raster_enabled();
if (raster)
- lcd_disable_raster(true);
+ lcd_disable_raster(DA8XX_FRAME_WAIT);
fb_var_to_videomode(&par->mode, &info->var);
@@ -1569,7 +1571,7 @@ static int fb_suspend(struct platform_device *dev, pm_message_t state)
par->panel_power_ctrl(0);
fb_set_suspend(info, 1);
- lcd_disable_raster(true);
+ lcd_disable_raster(DA8XX_FRAME_WAIT);
lcd_context_save();
pm_runtime_put_sync(&dev->dev);
console_unlock();
diff --git a/include/video/da8xx-fb.h b/include/video/da8xx-fb.h
index f888259..efed3c3 100644
--- a/include/video/da8xx-fb.h
+++ b/include/video/da8xx-fb.h
@@ -23,6 +23,11 @@ enum raster_load_mode {
LOAD_PALETTE,
};
+enum da8xx_frame_complete {
+ DA8XX_FRAME_WAIT,
+ DA8XX_FRAME_NOWAIT,
+};
+
struct da8xx_lcdc_platform_data {
const char manu_name[10];
void *controller_data;
--
1.7.0.4
^ permalink raw reply related
* [PATCH v3 10/20] video: da8xx-fb: fb_set_par support
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
To: linux-fbdev
v1: original from Afzal Mohammed <afzal@ti.com>
fb_set_par helps in runtime configuration of lcd controller like
changing resolution, pixel clock etc. (eg. using fbset utility)
Reconfigure lcd controller based on information passed by framework.
Enable raster back if it was already enabled.
As fb_set_par would get invoked indirectly from probe via fb_set_var,
remove existing lcdc initialization in probe and do lcdc reset in
probe so that reset happens only at the begining.
v2: changes from Darren Etheridge <detheridge@ti.com>
remove unnecessary conditional branch where we attempt to disable
something that we already checked to see if it was disabled.
Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
drivers/video/da8xx-fb.c | 58 +++++++++++++++++++++++++++++++++++++--------
1 files changed, 47 insertions(+), 11 deletions(-)
diff --git a/drivers/video/da8xx-fb.c b/drivers/video/da8xx-fb.c
index 8d73730..6beb88d 100644
--- a/drivers/video/da8xx-fb.c
+++ b/drivers/video/da8xx-fb.c
@@ -243,6 +243,11 @@ static struct fb_videomode known_lcd_panels[] = {
},
};
+static inline bool da8xx_fb_is_raster_enabled(void)
+{
+ return !!(lcdc_read(LCD_RASTER_CTRL_REG) & LCD_RASTER_ENABLE);
+}
+
/* Enable the Raster Engine of the LCD Controller */
static inline void lcd_enable_raster(void)
{
@@ -665,9 +670,6 @@ static int fb_setcolreg(unsigned regno, unsigned red, unsigned green,
static void da8xx_fb_lcd_reset(void)
{
- /* Disable the Raster if previously Enabled */
- lcd_disable_raster(false);
-
/* DMA has to be disabled */
lcdc_write(0, LCD_DMA_CTRL_REG);
lcdc_write(0, LCD_RASTER_CTRL_REG);
@@ -720,8 +722,6 @@ static int lcd_init(struct da8xx_fb_par *par, const struct lcd_ctrl_config *cfg,
u32 bpp;
int ret = 0;
- da8xx_fb_lcd_reset();
-
da8xx_fb_calc_config_clk_divider(par, panel);
if (panel->sync & FB_SYNC_CLK_INVERT)
@@ -1201,9 +1201,50 @@ static int da8xx_pan_display(struct fb_var_screeninfo *var,
return ret;
}
+static int da8xxfb_set_par(struct fb_info *info)
+{
+ struct da8xx_fb_par *par = info->par;
+ int ret;
+ bool raster = da8xx_fb_is_raster_enabled();
+
+ if (raster)
+ lcd_disable_raster(true);
+
+ fb_var_to_videomode(&par->mode, &info->var);
+
+ par->cfg.bpp = info->var.bits_per_pixel;
+
+ info->fix.visual = (par->cfg.bpp <= 8) ?
+ FB_VISUAL_PSEUDOCOLOR : FB_VISUAL_TRUECOLOR;
+ info->fix.line_length = (par->mode.xres * par->cfg.bpp) / 8;
+
+ ret = lcd_init(par, &par->cfg, &par->mode);
+ if (ret < 0) {
+ dev_err(par->dev, "lcd init failed\n");
+ return ret;
+ }
+
+ par->dma_start = info->fix.smem_start +
+ info->var.yoffset * info->fix.line_length +
+ info->var.xoffset * info->var.bits_per_pixel / 8;
+ par->dma_end = par->dma_start +
+ info->var.yres * info->fix.line_length - 1;
+
+ lcdc_write(par->dma_start, LCD_DMA_FRM_BUF_BASE_ADDR_0_REG);
+ lcdc_write(par->dma_end, LCD_DMA_FRM_BUF_CEILING_ADDR_0_REG);
+ lcdc_write(par->dma_start, LCD_DMA_FRM_BUF_BASE_ADDR_1_REG);
+ lcdc_write(par->dma_end, LCD_DMA_FRM_BUF_CEILING_ADDR_1_REG);
+
+ if (raster)
+ lcd_enable_raster();
+
+ return 0;
+}
+
static struct fb_ops da8xx_fb_ops = {
.owner = THIS_MODULE,
.fb_check_var = fb_check_var,
+ .fb_set_par = da8xxfb_set_par,
.fb_setcolreg = fb_setcolreg,
.fb_pan_display = da8xx_pan_display,
.fb_ioctl = fb_ioctl,
@@ -1312,14 +1353,9 @@ static int fb_probe(struct platform_device *device)
}
fb_videomode_to_var(&da8xx_fb_var, lcdc_info);
- fb_var_to_videomode(&par->mode, &da8xx_fb_var);
par->cfg = *lcd_cfg;
- if (lcd_init(par, lcd_cfg, lcdc_info) < 0) {
- dev_err(&device->dev, "lcd_init failed\n");
- ret = -EFAULT;
- goto err_release_fb;
- }
+ da8xx_fb_lcd_reset();
/* allocate frame buffer */
par->vram_size = lcdc_info->xres * lcdc_info->yres * lcd_cfg->bpp;
--
1.7.0.4
^ permalink raw reply related
* [PATCH v3 09/20] video: da8xx-fb: report correct pixclock
From: Darren Etheridge @ 2013-08-05 22:02 UTC (permalink / raw)
To: linux-fbdev
From: Afzal Mohammed <afzal@ti.com>
Update "var" pixclock with the value that is configurable in hardware.
This lets user know the actual pixclock.
Signed-off-by: Afzal Mohammed <afzal@ti.com>
Signed-off-by: Darren Etheridge <detheridge@ti.com>
---
drivers/video/da8xx-fb.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)
diff --git a/drivers/video/da8xx-fb.c b/drivers/video/da8xx-fb.c
index 7db1097..8d73730 100644
--- a/drivers/video/da8xx-fb.c
+++ b/drivers/video/da8xx-fb.c
@@ -686,6 +686,15 @@ static inline unsigned da8xx_fb_calc_clk_divider(struct da8xx_fb_par *par,
return par->lcd_fck_rate / (PICOS2KHZ(pixclock) * 1000);
}
+static inline unsigned da8xx_fb_round_clk(struct da8xx_fb_par *par,
+ unsigned pixclock)
+{
+ unsigned div;
+
+ div = da8xx_fb_calc_clk_divider(par, pixclock);
+ return KHZ2PICOS(par->lcd_fck_rate / (1000 * div));
+}
+
static inline void da8xx_fb_config_clk_divider(unsigned div)
{
/* Configure the LCD clock divisor. */
@@ -962,6 +971,8 @@ static int fb_check_var(struct fb_var_screeninfo *var,
if (var->yres + var->yoffset > var->yres_virtual)
var->yoffset = var->yres_virtual - var->yres;
+ var->pixclock = da8xx_fb_round_clk(par, var->pixclock);
+
return err;
}
--
1.7.0.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox