Linux Framebuffer Layer development
 help / color / mirror / Atom feed
* Re: Introduce a new helper framework for buffer synchronization
From: Daniel Vetter @ 2013-05-20 21:30 UTC (permalink / raw)
  To: Inki Dae
  Cc: Rob Clark, linux-fbdev, DRI mailing list, Kyungmin Park,
	myungjoo.ham, YoungJun Cho, linux-arm-kernel@lists.infradead.org,
	linux-media@vger.kernel.org
In-Reply-To: <20130520211304.GV12292@phenom.ffwll.local>

On Mon, May 20, 2013 at 11:13:04PM +0200, Daniel Vetter wrote:
> On Sat, May 18, 2013 at 03:47:43PM +0900, Inki Dae wrote:
> > 2013/5/15 Rob Clark <robdclark@gmail.com>
> > 
> > > On Wed, May 15, 2013 at 1:19 AM, Inki Dae <inki.dae@samsung.com> wrote:
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: Rob Clark [mailto:robdclark@gmail.com]
> > > >> Sent: Tuesday, May 14, 2013 10:39 PM
> > > >> To: Inki Dae
> > > >> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun
> > > >> Cho; linux-arm-kernel@lists.infradead.org; linux-media@vger.kernel.org
> > > >> Subject: Re: Introduce a new helper framework for buffer synchronization
> > > >>
> > > >> On Mon, May 13, 2013 at 10:52 PM, Inki Dae <inki.dae@samsung.com>
> > > wrote:
> > > >> >> well, for cache management, I think it is a better idea.. I didn't
> > > >> >> really catch that this was the motivation from the initial patch, but
> > > >> >> maybe I read it too quickly.  But cache can be decoupled from
> > > >> >> synchronization, because CPU access is not asynchronous.  For
> > > >> >> userspace/CPU access to buffer, you should:
> > > >> >>
> > > >> >>   1) wait for buffer
> > > >> >>   2) prepare-access
> > > >> >>   3)  ... do whatever cpu access to buffer ...
> > > >> >>   4) finish-access
> > > >> >>   5) submit buffer for new dma-operation
> > > >> >>
> > > >> >
> > > >> >
> > > >> > For data flow from CPU to DMA device,
> > > >> > 1) wait for buffer
> > > >> > 2) prepare-access (dma_buf_begin_cpu_access)
> > > >> > 3) cpu access to buffer
> > > >> >
> > > >> >
> > > >> > For data flow from DMA device to CPU
> > > >> > 1) wait for buffer
> > > >>
> > > >> Right, but CPU access isn't asynchronous (from the point of view of
> > > >> the CPU), so there isn't really any wait step at this point.  And if
> > > >> you do want the CPU to be able to signal a fence from userspace for
> > > >> some reason, you probably what something file/fd based so the
> > > >> refcnting/cleanup when process dies doesn't leave some pending DMA
> > > >> action wedged.  But I don't really see the point of that complexity
> > > >> when the CPU access isn't asynchronous in the first place.
> > > >>
> > > >
> > > > There was my missing comments, please see the below sequence.
> > > >
> > > > For data flow from CPU to DMA device and then from DMA device to CPU,
> > > > 1) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...)
> > > >         - including prepare-access (dma_buf_begin_cpu_access)
> > > > 2) cpu access to buffer
> > > > 3) wait for buffer <- at device driver
> > > >         - but CPU is already accessing the buffer so blocked.
> > > > 4) signal <- at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...)
> > > > 5) the thread, blocked at 3), is waked up by 4).
> > > >         - and then finish-access (dma_buf_end_cpu_access)
> > >
> > > right, I understand you can have background threads, etc, in
> > > userspace.  But there are already plenty of synchronization primitives
> > > that can be used for cpu->cpu synchronization, either within the same
> > > process or between multiple processes.  For cpu access, even if it is
> > > handled by background threads/processes, I think it is better to use
> > > the traditional pthreads or unix synchronization primitives.  They
> > > have existed forever, they are well tested, and they work.
> > >
> > > So while it seems nice and orthogonal/clean to couple cache and
> > > synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the
> > > same generic way, but I think in practice we have to make things more
> > > complex than they otherwise need to be to do this.  Otherwise I think
> > > we'll be having problems with badly behaved or crashing userspace.
> > >
> > >
> > Right, this is very important. I think it's not esay to solve this
> > issue. Aand at least for ARM based embedded system, such feature is useful
> > to us; coupling cache operation and buffer synchronization. I'd like to
> > collect other opinions and advices to solve this issue.
> 
> Maybe we have a bit a misunderstanding here. The kernel really should take
> care of sync and cache coherency, and I agree that with the current
> dma_buf code (and the proposed fences) that part is still a bit hazy. But
> the kernel should not allow userspace to block access to a buffer until
> userspace is done. It should only sync with any oustanding fences and
> flush buffers before that userspace access happens.
> 
> Then the kernel would again flush caches on the next dma access (which
> hopefully is only started once userspace completed access). Atm this isn't
> possible in an efficient way since the dma_buf api only exposes map/unmap
> attachment and not a function to just sync an existing mapping like
> dma_sync_single_for_device. I guess we should add a
> dma_buf_sync_attachment interface for that - we already have range-based
> cpu sync support with the begin/end_cpu_access interfaces.

I think my mail here was a bit unclear, so let me try to rephrase.
Re-reading through part of this discussion I think you raise some good
shortcomings of the current dma_buf interfaces and the proposed fence
patches. But I think we should address the individual pieces bit-by-bit.
On a quick survey I see a few parts to what you're trying to solve:

- More efficient cache coherency management. I think we already have all
  required interfaces for efficient cpu-side access with
  begin/end_cpu_access. So I think we only need a device-side dma sync
  mechanism to be able to flush cpu caches after userspace/cpu access has
  completed (before the next dma op).

- More efficient mmap handling. The current dma_buf mmap support is
  explicitly designed as something simply, but probably dead-slow for
  last-resort fallback ops. I'm not sure whether we should fix this up and
  extend with special ioctls. But the current coherency interface should
  be good enough for you to write an efficient private mmap implementation
  for exynos drm.

- Integration of fence syncing into dma_buf. Imo we should have a
  per-attachment mode which decides whether map/unmap (and the new sync)
  should wait for fences or whether the driver takes care of syncing
  through the new fence framework itself (for direct hw sync). Imo cpu
  access should also have such a flag. I guess in both cases we should
  sync by default.

- cpu/cpu sync additions to dma_buf. Like I've said, I'm not sold at all
  on this idea. I think it would be best if we try to fix up all the other
  current shortcomings first though and then take a fresh look at this
  issue here.

Have I missed or misunderstood something?

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply

* RE: Introduce a new helper framework for buffer synchronization
From: Inki Dae @ 2013-05-21  7:03 UTC (permalink / raw)
  To: linux-fbdev
In-Reply-To: <51909DB4.2060208@canonical.com>



> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Tuesday, May 21, 2013 6:31 AM
> To: Inki Dae
> Cc: Rob Clark; linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham;
> YoungJun Cho; linux-arm-kernel@lists.infradead.org; linux-
> media@vger.kernel.org
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> On Mon, May 20, 2013 at 11:13:04PM +0200, Daniel Vetter wrote:
> > On Sat, May 18, 2013 at 03:47:43PM +0900, Inki Dae wrote:
> > > 2013/5/15 Rob Clark <robdclark@gmail.com>
> > >
> > > > On Wed, May 15, 2013 at 1:19 AM, Inki Dae <inki.dae@samsung.com>
> wrote:
> > > > >
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Rob Clark [mailto:robdclark@gmail.com]
> > > > >> Sent: Tuesday, May 14, 2013 10:39 PM
> > > > >> To: Inki Dae
> > > > >> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham;
> YoungJun
> > > > >> Cho; linux-arm-kernel@lists.infradead.org; linux-
> media@vger.kernel.org
> > > > >> Subject: Re: Introduce a new helper framework for buffer
> synchronization
> > > > >>
> > > > >> On Mon, May 13, 2013 at 10:52 PM, Inki Dae <inki.dae@samsung.com>
> > > > wrote:
> > > > >> >> well, for cache management, I think it is a better idea.. I
> didn't
> > > > >> >> really catch that this was the motivation from the initial
> patch, but
> > > > >> >> maybe I read it too quickly.  But cache can be decoupled from
> > > > >> >> synchronization, because CPU access is not asynchronous.  For
> > > > >> >> userspace/CPU access to buffer, you should:
> > > > >> >>
> > > > >> >>   1) wait for buffer
> > > > >> >>   2) prepare-access
> > > > >> >>   3)  ... do whatever cpu access to buffer ...
> > > > >> >>   4) finish-access
> > > > >> >>   5) submit buffer for new dma-operation
> > > > >> >>
> > > > >> >
> > > > >> >
> > > > >> > For data flow from CPU to DMA device,
> > > > >> > 1) wait for buffer
> > > > >> > 2) prepare-access (dma_buf_begin_cpu_access)
> > > > >> > 3) cpu access to buffer
> > > > >> >
> > > > >> >
> > > > >> > For data flow from DMA device to CPU
> > > > >> > 1) wait for buffer
> > > > >>
> > > > >> Right, but CPU access isn't asynchronous (from the point of view
> of
> > > > >> the CPU), so there isn't really any wait step at this point.  And
> if
> > > > >> you do want the CPU to be able to signal a fence from userspace
> for
> > > > >> some reason, you probably what something file/fd based so the
> > > > >> refcnting/cleanup when process dies doesn't leave some pending
> DMA
> > > > >> action wedged.  But I don't really see the point of that
> complexity
> > > > >> when the CPU access isn't asynchronous in the first place.
> > > > >>
> > > > >
> > > > > There was my missing comments, please see the below sequence.
> > > > >
> > > > > For data flow from CPU to DMA device and then from DMA device to
> CPU,
> > > > > 1) wait for buffer <- at user side - ioctl(fd,
> DMA_BUF_GET_FENCE, ...)
> > > > >         - including prepare-access (dma_buf_begin_cpu_access)
> > > > > 2) cpu access to buffer
> > > > > 3) wait for buffer <- at device driver
> > > > >         - but CPU is already accessing the buffer so blocked.
> > > > > 4) signal <- at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...)
> > > > > 5) the thread, blocked at 3), is waked up by 4).
> > > > >         - and then finish-access (dma_buf_end_cpu_access)
> > > >
> > > > right, I understand you can have background threads, etc, in
> > > > userspace.  But there are already plenty of synchronization
> primitives
> > > > that can be used for cpu->cpu synchronization, either within the
> same
> > > > process or between multiple processes.  For cpu access, even if it
> is
> > > > handled by background threads/processes, I think it is better to use
> > > > the traditional pthreads or unix synchronization primitives.  They
> > > > have existed forever, they are well tested, and they work.
> > > >
> > > > So while it seems nice and orthogonal/clean to couple cache and
> > > > synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the
> > > > same generic way, but I think in practice we have to make things
> more
> > > > complex than they otherwise need to be to do this.  Otherwise I
> think
> > > > we'll be having problems with badly behaved or crashing userspace.
> > > >
> > > >
> > > Right, this is very important. I think it's not esay to solve this
> > > issue. Aand at least for ARM based embedded system, such feature is
> useful
> > > to us; coupling cache operation and buffer synchronization. I'd like
> to
> > > collect other opinions and advices to solve this issue.
> >
> > Maybe we have a bit a misunderstanding here. The kernel really should
> take
> > care of sync and cache coherency, and I agree that with the current
> > dma_buf code (and the proposed fences) that part is still a bit hazy.
> But
> > the kernel should not allow userspace to block access to a buffer until
> > userspace is done. It should only sync with any oustanding fences and
> > flush buffers before that userspace access happens.
> >
> > Then the kernel would again flush caches on the next dma access (which
> > hopefully is only started once userspace completed access). Atm this
> isn't
> > possible in an efficient way since the dma_buf api only exposes
> map/unmap
> > attachment and not a function to just sync an existing mapping like
> > dma_sync_single_for_device. I guess we should add a
> > dma_buf_sync_attachment interface for that - we already have range-based
> > cpu sync support with the begin/end_cpu_access interfaces.
> 
> I think my mail here was a bit unclear, so let me try to rephrase.
> Re-reading through part of this discussion I think you raise some good
> shortcomings of the current dma_buf interfaces and the proposed fence
> patches. But I think we should address the individual pieces bit-by-bit.
> On a quick survey I see a few parts to what you're trying to solve:
> 
> - More efficient cache coherency management. I think we already have all
>   required interfaces for efficient cpu-side access with
>   begin/end_cpu_access. So I think we only need a device-side dma sync
>   mechanism to be able to flush cpu caches after userspace/cpu access has
>   completed (before the next dma op).
> 
> - More efficient mmap handling. The current dma_buf mmap support is
>   explicitly designed as something simply, but probably dead-slow for
>   last-resort fallback ops. I'm not sure whether we should fix this up and
>   extend with special ioctls. But the current coherency interface should
>   be good enough for you to write an efficient private mmap implementation
>   for exynos drm.

I agree with you.

> 
> - Integration of fence syncing into dma_buf. Imo we should have a
>   per-attachment mode which decides whether map/unmap (and the new sync)
>   should wait for fences or whether the driver takes care of syncing
>   through the new fence framework itself (for direct hw sync).

I think it's a good idea to have per-attachment mode for buffer sync. But
I'd like to say we make sure what is the purpose of map
(dma_buf_map_attachment)first. This interface is used to get a sgt;
containing pages to physical memory region, and map that region with
device's iommu table. The important thing is that this interface is called
just one time when user wants to share an allocated buffer with dma. But cpu
will try to access the buffer every time as long as it wants. Therefore, we
need cache control every time cpu and dma access the shared buffer: cache
clean when cpu goes to dma and cache invalidate when dma goes to cpu. That
is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE, to
dma buf framework. Of course, Those are ugly so we need a better way: I just
wanted to show you that in such way, we can synchronize the shared buffer
between cpu and dma. By any chance, is there any way that kernel can be
aware of when cpu accesses the shared buffer or is there any point I didn't
catch up?

>   Imo cpu access should also have such a flag. I guess in both cases we
should
>   sync by default.
> 
> - cpu/cpu sync additions to dma_buf. Like I've said, I'm not sold at all

I think we should concentrate on cpu and dma sync. 

>   on this idea. I think it would be best if we try to fix up all the other
>   current shortcomings first though and then take a fresh look at this
>   issue here.

Right, agree.

> 
> Have I missed or misunderstood something?
> 

Your comments are very useful to me. Thanks again.

In Linux kernel, especially embedded system, we had tried to implement
generic interfaces for buffer management; how to allocate a buffer and how
to share a buffer. As a result, now we have CMA (Contiguous Memory
Allocator) for buffer allocation, and we have DMA-BUF for buffer sharing
between cpu and dma. And then how to synchronize a buffer between cpu and
dma? I think now it's best time to discuss buffer synchronization mechanism,
and that is very natural.

Please give me more comments and advices if there is my missing or
misunderstanding point.

Thanks,
Inki Dae

> Cheers, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch


^ permalink raw reply

* Re: Introduce a new helper framework for buffer synchronization
From: Daniel Vetter @ 2013-05-21  7:44 UTC (permalink / raw)
  To: Inki Dae
  Cc: 'Daniel Vetter', 'Rob Clark',
	'linux-fbdev', 'DRI mailing list',
	'Kyungmin Park', 'myungjoo.ham',
	'YoungJun Cho', linux-arm-kernel, linux-media
In-Reply-To: <032701ce55f1$3e9ba4b0$bbd2ee10$%dae@samsung.com>

On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote:
> > - Integration of fence syncing into dma_buf. Imo we should have a
> >   per-attachment mode which decides whether map/unmap (and the new sync)
> >   should wait for fences or whether the driver takes care of syncing
> >   through the new fence framework itself (for direct hw sync).
> 
> I think it's a good idea to have per-attachment mode for buffer sync. But
> I'd like to say we make sure what is the purpose of map
> (dma_buf_map_attachment)first. This interface is used to get a sgt;
> containing pages to physical memory region, and map that region with
> device's iommu table. The important thing is that this interface is called
> just one time when user wants to share an allocated buffer with dma. But cpu
> will try to access the buffer every time as long as it wants. Therefore, we
> need cache control every time cpu and dma access the shared buffer: cache
> clean when cpu goes to dma and cache invalidate when dma goes to cpu. That
> is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE, to
> dma buf framework. Of course, Those are ugly so we need a better way: I just
> wanted to show you that in such way, we can synchronize the shared buffer
> between cpu and dma. By any chance, is there any way that kernel can be
> aware of when cpu accesses the shared buffer or is there any point I didn't
> catch up?

Well dma_buf_map/unmap_attachment should also flush/invalidate any caches,
and with the current dma_buf spec those two functions are the _only_ means
you have to do so. Which strictly means that if you interleave device dma
and cpu acccess you need to unmap/map every time.

Which is obviously horribly inefficient, but a known gap in the dma_buf
interface. Hence why I proposed to add dma_buf_sync_attachment similar to
dma_sync_single_for_device:

/**
 * dma_buf_sync_sg_attachment - sync caches for dma access
 * @attach: dma-buf attachment to sync
 * @sgt: the sg table to sync (returned by dma_buf_map_attachement)
 * @direction: dma direction to sync for
 *
 * This function synchronizes caches for device dma through the given
 * dma-buf attachment when interleaving dma from different devices and the
 * cpu. Other device dma needs to be synced also by calls to this
 * function (or a pair of dma_buf_map/unmap_attachment calls), cpu access
 * needs to be synced with dma_buf_begin/end_cpu_access.
 */
void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach,
				struct sg_table *sgt,
				enum dma_data_direction direction)

Note that "sync" here only means to synchronize caches, not wait for any
outstanding fences. This is simply to be consistent with the established
lingo of the dma api. How the dma-buf fences fit into this is imo a
different topic, but my idea is that all the cache coherency barriers
(i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and
dma_buf_begin/end_cpu_access) would automatically block for any attached
fences (i.e. block for write fences when doing read-only access, block for
all fences otherwise).

Then we could do a new dma_buf_attach_flags interface for special cases
(might also be useful for other things, similar to the recently added
flags in the dma api for wc/no-kernel-mapping/...). A new flag like
DMA_BUF_ATTACH_NO_AUTOFENCE would then mean that the driver takes care of all
fencing for this attachment and the dma-buf functions should not do the
automagic fence blocking.

> In Linux kernel, especially embedded system, we had tried to implement
> generic interfaces for buffer management; how to allocate a buffer and how
> to share a buffer. As a result, now we have CMA (Contiguous Memory
> Allocator) for buffer allocation, and we have DMA-BUF for buffer sharing
> between cpu and dma. And then how to synchronize a buffer between cpu and
> dma? I think now it's best time to discuss buffer synchronization mechanism,
> and that is very natural.

I think it's important to differentiate between the two meanings of sync:
- synchronize caches (i.e. cpu cache flushing in most cases)
- and synchronize in-flight dma with fences.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply

* RE: Introduce a new helper framework for buffer synchronization
From: Inki Dae @ 2013-05-21  9:22 UTC (permalink / raw)
  To: linux-fbdev
In-Reply-To: <51909DB4.2060208@canonical.com>



> -----Original Message-----
> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> Vetter
> Sent: Tuesday, May 21, 2013 4:45 PM
> To: Inki Dae
> Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'DRI mailing list';
> 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm-
> kernel@lists.infradead.org; linux-media@vger.kernel.org
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote:
> > > - Integration of fence syncing into dma_buf. Imo we should have a
> > >   per-attachment mode which decides whether map/unmap (and the new
> sync)
> > >   should wait for fences or whether the driver takes care of syncing
> > >   through the new fence framework itself (for direct hw sync).
> >
> > I think it's a good idea to have per-attachment mode for buffer sync.
> But
> > I'd like to say we make sure what is the purpose of map
> > (dma_buf_map_attachment)first. This interface is used to get a sgt;
> > containing pages to physical memory region, and map that region with
> > device's iommu table. The important thing is that this interface is
> called
> > just one time when user wants to share an allocated buffer with dma. But
> cpu
> > will try to access the buffer every time as long as it wants. Therefore,
> we
> > need cache control every time cpu and dma access the shared buffer:
> cache
> > clean when cpu goes to dma and cache invalidate when dma goes to cpu.
> That
> > is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE,
> to
> > dma buf framework. Of course, Those are ugly so we need a better way: I
> just
> > wanted to show you that in such way, we can synchronize the shared
> buffer
> > between cpu and dma. By any chance, is there any way that kernel can be
> > aware of when cpu accesses the shared buffer or is there any point I
> didn't
> > catch up?
> 
> Well dma_buf_map/unmap_attachment should also flush/invalidate any caches,
> and with the current dma_buf spec those two functions are the _only_ means

I know that dma buf exporter should make sure that cache clean/invalidate
are done when dma_buf_map/unmap_attachement is called. For this, already we
do so. However, this function is called when dma buf import is requested by
user to map a dmabuf fd with dma. This means that dma_buf_map_attachement()
is called ONCE when user wants to share the dmabuf fd with dma. Actually, in
case of exynos drm, dma_map_sg_attrs(), performing cache operation
internally, is called when dmabuf import is requested by user.

> you have to do so. Which strictly means that if you interleave device dma
> and cpu acccess you need to unmap/map every time.
> 
> Which is obviously horribly inefficient, but a known gap in the dma_buf

Right, and also this has big overhead.

> interface. Hence why I proposed to add dma_buf_sync_attachment similar to
> dma_sync_single_for_device:
> 
> /**
>  * dma_buf_sync_sg_attachment - sync caches for dma access
>  * @attach: dma-buf attachment to sync
>  * @sgt: the sg table to sync (returned by dma_buf_map_attachement)
>  * @direction: dma direction to sync for
>  *
>  * This function synchronizes caches for device dma through the given
>  * dma-buf attachment when interleaving dma from different devices and the
>  * cpu. Other device dma needs to be synced also by calls to this
>  * function (or a pair of dma_buf_map/unmap_attachment calls), cpu access
>  * needs to be synced with dma_buf_begin/end_cpu_access.
>  */
> void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach,
> 				struct sg_table *sgt,
> 				enum dma_data_direction direction)
> 
> Note that "sync" here only means to synchronize caches, not wait for any
> outstanding fences. This is simply to be consistent with the established
> lingo of the dma api. How the dma-buf fences fit into this is imo a
> different topic, but my idea is that all the cache coherency barriers
> (i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and
> dma_buf_begin/end_cpu_access) would automatically block for any attached
> fences (i.e. block for write fences when doing read-only access, block for
> all fences otherwise).

As I mentioned already, kernel can't aware of when cpu accesses a shared
buffer: user can access a shared buffer after mmap anytime and the shared
buffer should be synchronized between cpu and dma. Therefore, the above
cache coherency barriers should be called every time cpu and dma tries to
access a shared buffer, checking before and after of cpu and dma access. And
exactly, the proposed way do such thing. For this, you can refer to the
below link,
	
http://www.mail-archive.com/linux-media@vger.kernel.org/msg62124.html

My point is that how kernel can aware of when those cache coherency barriers
should be called to synchronize caches and buffer access between cpu and
dma.

> 
> Then we could do a new dma_buf_attach_flags interface for special cases
> (might also be useful for other things, similar to the recently added
> flags in the dma api for wc/no-kernel-mapping/...). A new flag like
> DMA_BUF_ATTACH_NO_AUTOFENCE would then mean that the driver takes care of
> all
> fencing for this attachment and the dma-buf functions should not do the
> automagic fence blocking.
> 
> > In Linux kernel, especially embedded system, we had tried to implement
> > generic interfaces for buffer management; how to allocate a buffer and
> how
> > to share a buffer. As a result, now we have CMA (Contiguous Memory
> > Allocator) for buffer allocation, and we have DMA-BUF for buffer sharing
> > between cpu and dma. And then how to synchronize a buffer between cpu
> and
> > dma? I think now it's best time to discuss buffer synchronization
> mechanism,
> > and that is very natural.
> 
> I think it's important to differentiate between the two meanings of sync:
> - synchronize caches (i.e. cpu cache flushing in most cases)
> - and synchronize in-flight dma with fences.
> 

Agree, and I meant that the buffer synchronization mechanism includes the
above two things. And I think the two things should be addressed together.

Thanks,
Inki Dae

> Cheers, Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch


^ permalink raw reply

* [PATCH V2] drivers: video: mxsfb: clean use of devm_ioremap_resource()
From: Laurent Navet @ 2013-05-21 12:33 UTC (permalink / raw)
  To: FlorianSchandinat, jg1.han; +Cc: linux-fbdev, linux-kernel, Laurent Navet
In-Reply-To: <19650694.136921368693687337.JavaMail.weblogic@epml12>

Check of 'res' and calls to dev_err are already done in devm_ioremap_resource,
so no need to do them twice.

Signed-off-by: Laurent Navet <laurent.navet@gmail.com>
---
 drivers/video/mxsfb.c |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/video/mxsfb.c b/drivers/video/mxsfb.c
index 21223d4..6a1b338 100644
--- a/drivers/video/mxsfb.c
+++ b/drivers/video/mxsfb.c
@@ -883,12 +883,6 @@ static int mxsfb_probe(struct platform_device *pdev)
 	if (of_id)
 		pdev->id_entry = of_id->data;
 
-	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-	if (!res) {
-		dev_err(&pdev->dev, "Cannot get memory IO resource\n");
-		return -ENODEV;
-	}
-
 	fb_info = framebuffer_alloc(sizeof(struct mxsfb_info), &pdev->dev);
 	if (!fb_info) {
 		dev_err(&pdev->dev, "Failed to allocate fbdev\n");
@@ -897,9 +891,9 @@ static int mxsfb_probe(struct platform_device *pdev)
 
 	host = to_imxfb_host(fb_info);
 
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	host->base = devm_ioremap_resource(&pdev->dev, res);
 	if (IS_ERR(host->base)) {
-		dev_err(&pdev->dev, "ioremap failed\n");
 		ret = PTR_ERR(host->base);
 		goto fb_release;
 	}
-- 
1.7.10.4


^ permalink raw reply related

* Find girls for chat and hookup now.
From: root @ 2013-05-21 13:01 UTC (permalink / raw)
  To: linux-ext4-owner-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-fbdev-u79uwXL29TY76Z2rM5mHXA,
	linux-ha-dev-cunTk1MwBs9DHfOqEeIffh2eb7JE58TQ,
	linux-help-mtg-iCYJNvnsbHs8qNwGfFDJ/Q,
	linux-india-cunTk1MwBs9qd80yoKVOQ4aCUm+kVXi4,
	linux-informix-qTB+/3op4OQ,
	linux-kbuild-owner-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-pFTr4yDReMQpLY+iQtz4oH4Z0sVMXfX4hC4ANOJQIlc,
	linux-leds-u79uwXL29TY76Z2rM5mHXA,
	linux-man-u79uwXL29TY76Z2rM5mHXA,
	linux-mips-VZNHf3L845pBDgjK7y7TUQ

Chat with lonely women in your area and hook up.

http://dir.orangefronthost.info











xrfhlofrtsj


^ permalink raw reply

* [PATCH] video/ps3fb: Fix section mismatch warning
From: Geoff Levand @ 2013-05-21 13:01 UTC (permalink / raw)
  To: linux-fbdev

Remove the __initdata attribute from the ps3fb_fix variable.  This is in
follow up to the removal of the __devinit attribute on the ps3fb_probe()
routine in commit 48c68c4f1b542444f175a9e136febcecf3e704d8 (Drivers:
video: remove __dev* attributes).

Fixes build warnings like these:

  WARNING: vmlinux.o Section mismatch in reference from the function .ps3fb_probe() to the variable .init.data:ps3fb_fix
  The function .ps3fb_probe() references the variable __initdata ps3fb_fix.
  This is often because .ps3fb_probe lacks a __initdata annotation or the annotation of ps3fb_fix is wrong.

Signed-off-by: Geoff Levand <geoff@infradead.org>
---
 drivers/video/ps3fb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/video/ps3fb.c b/drivers/video/ps3fb.c
index a397271d..4819cdf 100644
--- a/drivers/video/ps3fb.c
+++ b/drivers/video/ps3fb.c
@@ -952,7 +952,7 @@ static struct fb_ops ps3fb_ops = {
 	.fb_compat_ioctl = ps3fb_ioctl
 };
 
-static struct fb_fix_screeninfo ps3fb_fix __initdata = {
+static struct fb_fix_screeninfo ps3fb_fix = {
 	.id =		DEVICE_NAME,
 	.type =		FB_TYPE_PACKED_PIXELS,
 	.visual =	FB_VISUAL_TRUECOLOR,
-- 
1.8.1.2




^ permalink raw reply related

* Re: [PATCH] video/ps3fb: Fix section mismatch warning
From: Geert Uytterhoeven @ 2013-05-21 15:11 UTC (permalink / raw)
  To: linux-fbdev
In-Reply-To: <1369141302.3652.14.camel@clam>

On Tue, May 21, 2013 at 3:01 PM, Geoff Levand <geoff@infradead.org> wrote:
> Remove the __initdata attribute from the ps3fb_fix variable.  This is in
> follow up to the removal of the __devinit attribute on the ps3fb_probe()
> routine in commit 48c68c4f1b542444f175a9e136febcecf3e704d8 (Drivers:
> video: remove __dev* attributes).
>
> Fixes build warnings like these:
>
>   WARNING: vmlinux.o Section mismatch in reference from the function .ps3fb_probe() to the variable .init.data:ps3fb_fix
>   The function .ps3fb_probe() references the variable __initdata ps3fb_fix.
>   This is often because .ps3fb_probe lacks a __initdata annotation or the annotation of ps3fb_fix is wrong.
>
> Signed-off-by: Geoff Levand <geoff@infradead.org>

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH] build some drivers only when compile-testing
From: Greg Kroah-Hartman @ 2013-05-23  2:23 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: jirislaby, linux-kernel, Andrew Morton, Linus Torvalds,
	Jeff Mahoney, Alexander Shishkin, linux-usb,
	Florian Tobias Schandinat, linux-geode, linux-fbdev,
	Richard Cochran, netdev, Ben Hutchings, Keller, Jacob E
In-Reply-To: <1369214326-6558-1-git-send-email-jslaby@suse.cz>

On Wed, May 22, 2013 at 11:18:46AM +0200, Jiri Slaby wrote:
> Some drivers can be built on more platforms than they run on. This
> causes users and distributors packaging burden when they have to
> manually deselect some drivers from their allmodconfigs. Or sometimes
> it is even impossible to disable the drivers without patching the
> kernel.
> 
> Introduce a new config option COMPILE_TEST and make all those drivers
> to depend on the platform they run on, or on the COMPILE_TEST option.
> Now, when users/distributors choose COMPILE_TEST=n they will not have
> the drivers in their allmodconfig setups, but developers still can
> compile-test them with COMPILE_TEST=y.

I understand the urge, and it's getting hard for distros to handle these
drivers that just don't work on other architectures, but it's really
valuable to ensure that they build properly, for those of us that don't
have many/any cross compilers set up.

> Now the drivers where we use this new option:
> * PTP_1588_CLOCK_PCH: The PCH EG20T is only compatible with Intel Atom
>   processors so it should depend on x86.
> * FB_GEODE: Geode is 32-bit only so only enable it for X86_32.
> * USB_CHIPIDEA_IMX: The OF_DEVICE dependency will be met on powerpc
>   systems -- which do not actually support the hardware via that
>   method.

This seems ripe to start to get really messy, really quickly.  Shouldn't
"default configs" handle if this "should" be enabled for a platform or
not, and let the rest of us just build them with no problems?

What problems is this causing you?  Are you running out of space in
kernel packages with drivers that will never be actually used?

> +config COMPILE_TEST
> +	bool "Compile also drivers which will not load" if EXPERT

EXPERT is getting to be the "let's hide it here" option, isn't it...

I don't know, if no one else strongly objects, I can be convinced that
this is needed, but so far, I don't see why it really is, or what this
is going to help with.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH] build some drivers only when compile-testing
From: Jeff Mahoney @ 2013-05-23  3:09 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Slaby, jirislaby, linux-kernel, Andrew Morton,
	Linus Torvalds, Alexander Shishkin, linux-usb,
	Florian Tobias Schandinat, linux-geode, linux-fbdev,
	Richard Cochran, netdev, Ben Hutchings, Keller, Jacob E
In-Reply-To: <20130523022327.GB6159@kroah.com>

On 5/22/13 10:23 PM, Greg Kroah-Hartman wrote:
> On Wed, May 22, 2013 at 11:18:46AM +0200, Jiri Slaby wrote:
>> Some drivers can be built on more platforms than they run on. This
>> causes users and distributors packaging burden when they have to
>> manually deselect some drivers from their allmodconfigs. Or sometimes
>> it is even impossible to disable the drivers without patching the
>> kernel.
>>
>> Introduce a new config option COMPILE_TEST and make all those drivers
>> to depend on the platform they run on, or on the COMPILE_TEST option.
>> Now, when users/distributors choose COMPILE_TEST=n they will not have
>> the drivers in their allmodconfig setups, but developers still can
>> compile-test them with COMPILE_TEST=y.
> 
> I understand the urge, and it's getting hard for distros to handle these
> drivers that just don't work on other architectures, but it's really
> valuable to ensure that they build properly, for those of us that don't
> have many/any cross compilers set up.
> 
>> Now the drivers where we use this new option:
>> * PTP_1588_CLOCK_PCH: The PCH EG20T is only compatible with Intel Atom
>>   processors so it should depend on x86.
>> * FB_GEODE: Geode is 32-bit only so only enable it for X86_32.
>> * USB_CHIPIDEA_IMX: The OF_DEVICE dependency will be met on powerpc
>>   systems -- which do not actually support the hardware via that
>>   method.
> 
> This seems ripe to start to get really messy, really quickly.  Shouldn't
> "default configs" handle if this "should" be enabled for a platform or
> not, and let the rest of us just build them with no problems?

If every time a new Kconfig option is added, corresponding default
config updates come with it, sure. I just don't see that happening,
especially when it can be done much more clearly in the Kconfig while
the developer is writing the driver.

> What problems is this causing you?  Are you running out of space in
> kernel packages with drivers that will never be actually used?

Wasted build resources. Wasted disk space on /every/ system the kernel
package is installed on. We're all trying to pare down the kernel
packages to eliminate wasted space and doing it manually means a bunch
of research, sometimes with incorrect assumptions about the results,
needs to be done by someone not usually associated with that code. That
research gets repeated by people maintaining kernel packages for pretty
much every distro.

>> +config COMPILE_TEST
>> +	bool "Compile also drivers which will not load" if EXPERT
> 
> EXPERT is getting to be the "let's hide it here" option, isn't it...
> 
> I don't know, if no one else strongly objects, I can be convinced that
> this is needed, but so far, I don't see why it really is, or what this
> is going to help with.

I'm not convinced adding a || COMPILE_TEST option to every driver that
may be arch specific is the best way to go either. Perhaps adding a new
Kconfig verb called "archdepends on" or something that will evaluate as
true if COMPILE_TEST is enabled but will evaluate the conditional if
not. *waves hands*

-Jeff

-- 
Jeff Mahoney
SUSE Labs

^ permalink raw reply

* Re: [PATCH] build some drivers only when compile-testing
From: Tomi Valkeinen @ 2013-05-23  7:46 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: jirislaby, linux-kernel, Andrew Morton, Linus Torvalds,
	Jeff Mahoney, Alexander Shishkin, Greg Kroah-Hartman, linux-usb,
	Florian Tobias Schandinat, linux-geode, linux-fbdev,
	Richard Cochran, netdev, Ben Hutchings, Keller, Jacob E
In-Reply-To: <1369214326-6558-1-git-send-email-jslaby@suse.cz>

[-- Attachment #1: Type: text/plain, Size: 1355 bytes --]

Hi,

On 22/05/13 12:18, Jiri Slaby wrote:
> Some drivers can be built on more platforms than they run on. This
> causes users and distributors packaging burden when they have to
> manually deselect some drivers from their allmodconfigs. Or sometimes
> it is even impossible to disable the drivers without patching the
> kernel.
> 
> Introduce a new config option COMPILE_TEST and make all those drivers
> to depend on the platform they run on, or on the COMPILE_TEST option.
> Now, when users/distributors choose COMPILE_TEST=n they will not have
> the drivers in their allmodconfig setups, but developers still can
> compile-test them with COMPILE_TEST=y.
> 
> Now the drivers where we use this new option:
> * PTP_1588_CLOCK_PCH: The PCH EG20T is only compatible with Intel Atom
>   processors so it should depend on x86.
> * FB_GEODE: Geode is 32-bit only so only enable it for X86_32.
> * USB_CHIPIDEA_IMX: The OF_DEVICE dependency will be met on powerpc
>   systems -- which do not actually support the hardware via that
>   method.

I had this exact same idea some time ago. The mail below contains some
of my reasoning for this:

http://comments.gmane.org/gmane.linux.kbuild.devel/9829

I proposed a new Kconfig keyword, but Sam was quite against it as the
Kconfig language already does what is required.

 Tomi



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply

* [PATCH] video: simplefb: add mode parsing function
From: Alexandre Courbot @ 2013-05-23  8:03 UTC (permalink / raw)
  To: Stephen Warren, Andrew Morton
  Cc: gnurou-Re5JQEeQqe8AvxtiuMwx3w, linux-fbdev-u79uwXL29TY76Z2rM5mHXA,
	Alexandre Courbot, devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ

The naming scheme of simplefb's mode is precise enough to allow building
the mode structure from it instead of using a static list of modes. This
patch introduces a function that does this. In case exotic modes that
cannot be represented from their name alone are needed, the static list
of modes is still available as a backup.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Stephen, please note that the "r5g6b5" mode initially supported
by the driver becomes "b5g6r5" with the new function. This is because
the least significant bits are defined first in the string - this
makes parsing much easier, notably for modes which do not fill whole
bytes (e.g. 15 bit modes). I hope this is not a problem - it's probably
better to do the change now while the driver is still new.

 .../bindings/video/simple-framebuffer.txt          | 12 +++-
 drivers/video/simplefb.c                           | 66 +++++++++++++++++++++-
 2 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/video/simple-framebuffer.txt b/Documentation/devicetree/bindings/video/simple-framebuffer.txt
index 3ea4605..f933b3d 100644
--- a/Documentation/devicetree/bindings/video/simple-framebuffer.txt
+++ b/Documentation/devicetree/bindings/video/simple-framebuffer.txt
@@ -10,8 +10,16 @@ Required properties:
 - width: The width of the framebuffer in pixels.
 - height: The height of the framebuffer in pixels.
 - stride: The number of bytes in each line of the framebuffer.
-- format: The format of the framebuffer surface. Valid values are:
-  - r5g6b5 (16-bit pixels, d[15:11]=r, d[10:5]=g, d[4:0]=b).
+- format: The format of the framebuffer surface. Described as a sequence of
+	channel/nbbits pairs, where each pair describes how many bits are used
+	by a given color channel. Value channels are "r" (red), "g" (green),
+	"b" (blue) and "a" (alpha). Channels are listed starting in
+	bit-significance order. For instance, a format named "r5g6b5" describes
+	a 16-bit format where red is encoded in the 5 less significant bits,
+	green in the 6 following ones, and blue in the 5 last:
+				BBBBBGGG GGGRRRRR
+				^               ^
+			       MSB             LSB
 
 Example:
 
diff --git a/drivers/video/simplefb.c b/drivers/video/simplefb.c
index 8105c3d..1a1be71 100644
--- a/drivers/video/simplefb.c
+++ b/drivers/video/simplefb.c
@@ -21,6 +21,7 @@
  */
 
 #include <linux/errno.h>
+#include <linux/ctype.h>
 #include <linux/fb.h>
 #include <linux/io.h>
 #include <linux/module.h>
@@ -82,8 +83,64 @@ struct simplefb_format {
 	struct fb_bitfield transp;
 };
 
-struct simplefb_format simplefb_formats[] = {
-	{ "r5g6b5", 16, {11, 5}, {5, 6}, {0, 5}, {0, 0} },
+static struct simplefb_format *simplefb_parse_format(struct device *dev,
+						     const char *str)
+{
+	struct simplefb_format *format;
+	unsigned int cpt = 0;
+	unsigned int i = 0;
+
+	format = devm_kzalloc(dev, sizeof(*format), GFP_KERNEL);
+	if (!format)
+		return ERR_PTR(-ENOMEM);
+
+	while (str[i] != 0) {
+		struct fb_bitfield *field = 0;
+		char c = str[i++];
+
+		switch (c) {
+		case 'r':
+		case 'R':
+			field = &format->red;
+			break;
+		case 'g':
+		case 'G':
+			field = &format->green;
+			break;
+		case 'b':
+		case 'B':
+			field = &format->blue;
+			break;
+		case 'a':
+		case 'A':
+			field = &format->transp;
+			break;
+		default:
+			goto error;
+		}
+
+		c = str[i++];
+		if (c = 0 || !isdigit(c))
+			goto error;
+
+		field->offset = cpt;
+		field->length = c - '0';
+
+		cpt += field->length;
+	}
+
+	format->bits_per_pixel = ((cpt + 7) / 8) * 8;
+	format->name = str;
+	return format;
+
+error:
+	dev_err(dev, "Invalid format string\n");
+	return ERR_PTR(-EINVAL);
+}
+
+/* if you use exotic modes that simplefb_parse_format cannot decode, you can
+   specify them here. */
+static struct simplefb_format simplefb_formats[] = {
 };
 
 struct simplefb_params {
@@ -131,7 +188,10 @@ static int simplefb_parse_dt(struct platform_device *pdev,
 		params->format = &simplefb_formats[i];
 		break;
 	}
-	if (!params->format) {
+	if (!params->format)
+		params->format = simplefb_parse_format(&pdev->dev, format);
+
+	if (!params->format || IS_ERR(params->format)) {
 		dev_err(&pdev->dev, "Invalid format value\n");
 		return -EINVAL;
 	}
-- 
1.8.2.3


^ permalink raw reply related

* [PATCH] OMAPDSS: Fix crash with DT boot
From: Tomi Valkeinen @ 2013-05-23  9:56 UTC (permalink / raw)
  To: linux-omap, linux-fbdev; +Cc: Archit Taneja, Peter Ujfalusi, Tomi Valkeinen

When booting with DT, there's a crash when omapfb is probed. This is
caused by the fact that omapdss+DT is not yet supported, and thus
omapdss is not probed at all. On the other hand, omapfb is always
probed. When omapfb tries to use omapdss, there's a NULL pointer
dereference crash. The same error should most likely happen with omapdrm
and omap_vout also.

To fix this, add an "initialized" state to omapdss. When omapdss has
been probed, it's marked as initialized. omapfb, omapdrm and omap_vout
check this state when they are probed to see that omapdss is actually
there.

Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Tested-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
---
 drivers/gpu/drm/omapdrm/omap_drv.c       |  3 +++
 drivers/media/platform/omap/omap_vout.c  |  3 +++
 drivers/video/omap2/dss/core.c           | 20 +++++++++++++++++++-
 drivers/video/omap2/omapfb/omapfb-main.c |  3 +++
 include/video/omapdss.h                  |  1 +
 5 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/omapdrm/omap_drv.c b/drivers/gpu/drm/omapdrm/omap_drv.c
index 079c54c..902074b 100644
--- a/drivers/gpu/drm/omapdrm/omap_drv.c
+++ b/drivers/gpu/drm/omapdrm/omap_drv.c
@@ -548,6 +548,9 @@ static void pdev_shutdown(struct platform_device *device)
 
 static int pdev_probe(struct platform_device *device)
 {
+	if (omapdss_is_initialized() = false)
+		return -EPROBE_DEFER;
+
 	DBG("%s", device->name);
 	return drm_platform_init(&omap_drm_driver, device);
 }
diff --git a/drivers/media/platform/omap/omap_vout.c b/drivers/media/platform/omap/omap_vout.c
index 96c4a17..0a489bd 100644
--- a/drivers/media/platform/omap/omap_vout.c
+++ b/drivers/media/platform/omap/omap_vout.c
@@ -2144,6 +2144,9 @@ static int __init omap_vout_probe(struct platform_device *pdev)
 	struct omap_dss_device *def_display;
 	struct omap2video_device *vid_dev = NULL;
 
+	if (omapdss_is_initialized() = false)
+		return -EPROBE_DEFER;
+
 	ret = omapdss_compat_init();
 	if (ret) {
 		dev_err(&pdev->dev, "failed to init dss\n");
diff --git a/drivers/video/omap2/dss/core.c b/drivers/video/omap2/dss/core.c
index f8779d4..f49ddb9 100644
--- a/drivers/video/omap2/dss/core.c
+++ b/drivers/video/omap2/dss/core.c
@@ -53,6 +53,8 @@ static char *def_disp_name;
 module_param_named(def_disp, def_disp_name, charp, 0);
 MODULE_PARM_DESC(def_disp, "default display name");
 
+static bool dss_initialized;
+
 const char *omapdss_get_default_display_name(void)
 {
 	return core.default_display_name;
@@ -66,6 +68,12 @@ enum omapdss_version omapdss_get_version(void)
 }
 EXPORT_SYMBOL(omapdss_get_version);
 
+bool omapdss_is_initialized(void)
+{
+	return dss_initialized;
+}
+EXPORT_SYMBOL(omapdss_is_initialized);
+
 struct platform_device *dss_get_core_pdev(void)
 {
 	return core.pdev;
@@ -606,6 +614,8 @@ static int __init omap_dss_init(void)
 		return r;
 	}
 
+	dss_initialized = true;
+
 	return 0;
 }
 
@@ -636,7 +646,15 @@ static int __init omap_dss_init(void)
 
 static int __init omap_dss_init2(void)
 {
-	return omap_dss_register_drivers();
+	int r;
+
+	r = omap_dss_register_drivers();
+	if (r)
+		return r;
+
+	dss_initialized = true;
+
+	return 0;
 }
 
 core_initcall(omap_dss_init);
diff --git a/drivers/video/omap2/omapfb/omapfb-main.c b/drivers/video/omap2/omapfb/omapfb-main.c
index 717f13a..bb5f9fe 100644
--- a/drivers/video/omap2/omapfb/omapfb-main.c
+++ b/drivers/video/omap2/omapfb/omapfb-main.c
@@ -2416,6 +2416,9 @@ static int __init omapfb_probe(struct platform_device *pdev)
 
 	DBG("omapfb_probe\n");
 
+	if (omapdss_is_initialized() = false)
+		return -EPROBE_DEFER;
+
 	if (pdev->num_resources != 0) {
 		dev_err(&pdev->dev, "probed for an unknown device\n");
 		r = -ENODEV;
diff --git a/include/video/omapdss.h b/include/video/omapdss.h
index caefa09..9b52340 100644
--- a/include/video/omapdss.h
+++ b/include/video/omapdss.h
@@ -741,6 +741,7 @@ struct omap_dss_driver {
 };
 
 enum omapdss_version omapdss_get_version(void);
+bool omapdss_is_initialized(void);
 
 int omap_dss_register_driver(struct omap_dss_driver *);
 void omap_dss_unregister_driver(struct omap_dss_driver *);
-- 
1.8.1.2


^ permalink raw reply related

* Re: Introduce a new helper framework for buffer synchronization
From: Daniel Vetter @ 2013-05-23 11:55 UTC (permalink / raw)
  To: Inki Dae
  Cc: Rob Clark, linux-fbdev, DRI mailing list, Kyungmin Park,
	myungjoo.ham, YoungJun Cho, linux-arm-kernel@lists.infradead.org,
	linux-media@vger.kernel.org
In-Reply-To: <033a01ce5604$c32bd250$498376f0$%dae@samsung.com>

On Tue, May 21, 2013 at 11:22 AM, Inki Dae <inki.dae@samsung.com> wrote:
>> -----Original Message-----
>> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
>> Vetter
>> Sent: Tuesday, May 21, 2013 4:45 PM
>> To: Inki Dae
>> Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'DRI mailing list';
>> 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm-
>> kernel@lists.infradead.org; linux-media@vger.kernel.org
>> Subject: Re: Introduce a new helper framework for buffer synchronization
>>
>> On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote:
>> > > - Integration of fence syncing into dma_buf. Imo we should have a
>> > >   per-attachment mode which decides whether map/unmap (and the new
>> sync)
>> > >   should wait for fences or whether the driver takes care of syncing
>> > >   through the new fence framework itself (for direct hw sync).
>> >
>> > I think it's a good idea to have per-attachment mode for buffer sync.
>> But
>> > I'd like to say we make sure what is the purpose of map
>> > (dma_buf_map_attachment)first. This interface is used to get a sgt;
>> > containing pages to physical memory region, and map that region with
>> > device's iommu table. The important thing is that this interface is
>> called
>> > just one time when user wants to share an allocated buffer with dma. But
>> cpu
>> > will try to access the buffer every time as long as it wants. Therefore,
>> we
>> > need cache control every time cpu and dma access the shared buffer:
>> cache
>> > clean when cpu goes to dma and cache invalidate when dma goes to cpu.
>> That
>> > is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE,
>> to
>> > dma buf framework. Of course, Those are ugly so we need a better way: I
>> just
>> > wanted to show you that in such way, we can synchronize the shared
>> buffer
>> > between cpu and dma. By any chance, is there any way that kernel can be
>> > aware of when cpu accesses the shared buffer or is there any point I
>> didn't
>> > catch up?
>>
>> Well dma_buf_map/unmap_attachment should also flush/invalidate any caches,
>> and with the current dma_buf spec those two functions are the _only_ means
>
> I know that dma buf exporter should make sure that cache clean/invalidate
> are done when dma_buf_map/unmap_attachement is called. For this, already we
> do so. However, this function is called when dma buf import is requested by
> user to map a dmabuf fd with dma. This means that dma_buf_map_attachement()
> is called ONCE when user wants to share the dmabuf fd with dma. Actually, in
> case of exynos drm, dma_map_sg_attrs(), performing cache operation
> internally, is called when dmabuf import is requested by user.
>
>> you have to do so. Which strictly means that if you interleave device dma
>> and cpu acccess you need to unmap/map every time.
>>
>> Which is obviously horribly inefficient, but a known gap in the dma_buf
>
> Right, and also this has big overhead.
>
>> interface. Hence why I proposed to add dma_buf_sync_attachment similar to
>> dma_sync_single_for_device:
>>
>> /**
>>  * dma_buf_sync_sg_attachment - sync caches for dma access
>>  * @attach: dma-buf attachment to sync
>>  * @sgt: the sg table to sync (returned by dma_buf_map_attachement)
>>  * @direction: dma direction to sync for
>>  *
>>  * This function synchronizes caches for device dma through the given
>>  * dma-buf attachment when interleaving dma from different devices and the
>>  * cpu. Other device dma needs to be synced also by calls to this
>>  * function (or a pair of dma_buf_map/unmap_attachment calls), cpu access
>>  * needs to be synced with dma_buf_begin/end_cpu_access.
>>  */
>> void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach,
>>                               struct sg_table *sgt,
>>                               enum dma_data_direction direction)
>>
>> Note that "sync" here only means to synchronize caches, not wait for any
>> outstanding fences. This is simply to be consistent with the established
>> lingo of the dma api. How the dma-buf fences fit into this is imo a
>> different topic, but my idea is that all the cache coherency barriers
>> (i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and
>> dma_buf_begin/end_cpu_access) would automatically block for any attached
>> fences (i.e. block for write fences when doing read-only access, block for
>> all fences otherwise).
>
> As I mentioned already, kernel can't aware of when cpu accesses a shared
> buffer: user can access a shared buffer after mmap anytime and the shared
> buffer should be synchronized between cpu and dma. Therefore, the above
> cache coherency barriers should be called every time cpu and dma tries to
> access a shared buffer, checking before and after of cpu and dma access. And
> exactly, the proposed way do such thing. For this, you can refer to the
> below link,
>
> http://www.mail-archive.com/linux-media@vger.kernel.org/msg62124.html
>
> My point is that how kernel can aware of when those cache coherency barriers
> should be called to synchronize caches and buffer access between cpu and
> dma.
>
>>
>> Then we could do a new dma_buf_attach_flags interface for special cases
>> (might also be useful for other things, similar to the recently added
>> flags in the dma api for wc/no-kernel-mapping/...). A new flag like
>> DMA_BUF_ATTACH_NO_AUTOFENCE would then mean that the driver takes care of
>> all
>> fencing for this attachment and the dma-buf functions should not do the
>> automagic fence blocking.
>>
>> > In Linux kernel, especially embedded system, we had tried to implement
>> > generic interfaces for buffer management; how to allocate a buffer and
>> how
>> > to share a buffer. As a result, now we have CMA (Contiguous Memory
>> > Allocator) for buffer allocation, and we have DMA-BUF for buffer sharing
>> > between cpu and dma. And then how to synchronize a buffer between cpu
>> and
>> > dma? I think now it's best time to discuss buffer synchronization
>> mechanism,
>> > and that is very natural.
>>
>> I think it's important to differentiate between the two meanings of sync:
>> - synchronize caches (i.e. cpu cache flushing in most cases)
>> - and synchronize in-flight dma with fences.
>>
>
> Agree, and I meant that the buffer synchronization mechanism includes the
> above two things. And I think the two things should be addressed together.

I'm still confused about what you're trying to achive in the big
picture here exactly. I think the key point is your statement above
that the kernel can't know when the cpu access a dma-buf. That's not
true though:

For userspace mmaps the kernel can shoot down ptes and so prevent
userspace from accessing a buffer. Since that's pretty inefficient
gem/ttm have additional ioctls for cache coherency transitions.
dma_buf currently has no support for explicit coherency transitions
from userpsace, so if pte shootdown is an issue we need to improve
that.

If I read your proposal correctly you instead suggest to _alway_ flush
cache before/after each dma. That will _completely_ kill performance
(been there with i915, trust me).

For cpu access from kernel space we already have explicit mappings all
over the place (kmap&friends), so I don't see any issues with
inserting the relevant begin/end_cpu_access calls just around the cpu
access. The big exception is the framebuffer layer, but even that has
the deferred io support. KMS otoh has a nice frontbuffer dirty
interface to correctly support dumb clients, unfortunately fbcon does
not (yet) use that.

For these reasons I don't see a need to smash cache coherency and
fencing into one problem, and hence why I've proposed to tackle each
of them individually.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply

* RE: Introduce a new helper framework for buffer synchronization
From: Inki Dae @ 2013-05-23 13:37 UTC (permalink / raw)
  To: linux-fbdev
In-Reply-To: <51909DB4.2060208@canonical.com>

> -----Original Message-----
> From: daniel.vetter@ffwll.ch [mailto:daniel.vetter@ffwll.ch] On Behalf Of
> Daniel Vetter
> Sent: Thursday, May 23, 2013 8:56 PM
> To: Inki Dae
> Cc: Rob Clark; linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham;
> YoungJun Cho; linux-arm-kernel@lists.infradead.org; linux-
> media@vger.kernel.org
> Subject: Re: Introduce a new helper framework for buffer synchronization
> 
> On Tue, May 21, 2013 at 11:22 AM, Inki Dae <inki.dae@samsung.com> wrote:
> >> -----Original Message-----
> >> From: Daniel Vetter [mailto:daniel.vetter@ffwll.ch] On Behalf Of Daniel
> >> Vetter
> >> Sent: Tuesday, May 21, 2013 4:45 PM
> >> To: Inki Dae
> >> Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'DRI mailing list';
> >> 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm-
> >> kernel@lists.infradead.org; linux-media@vger.kernel.org
> >> Subject: Re: Introduce a new helper framework for buffer
> synchronization
> >>
> >> On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote:
> >> > > - Integration of fence syncing into dma_buf. Imo we should have a
> >> > >   per-attachment mode which decides whether map/unmap (and the new
> >> sync)
> >> > >   should wait for fences or whether the driver takes care of
> syncing
> >> > >   through the new fence framework itself (for direct hw sync).
> >> >
> >> > I think it's a good idea to have per-attachment mode for buffer sync.
> >> But
> >> > I'd like to say we make sure what is the purpose of map
> >> > (dma_buf_map_attachment)first. This interface is used to get a sgt;
> >> > containing pages to physical memory region, and map that region with
> >> > device's iommu table. The important thing is that this interface is
> >> called
> >> > just one time when user wants to share an allocated buffer with dma.
> But
> >> cpu
> >> > will try to access the buffer every time as long as it wants.
> Therefore,
> >> we
> >> > need cache control every time cpu and dma access the shared buffer:
> >> cache
> >> > clean when cpu goes to dma and cache invalidate when dma goes to cpu.
> >> That
> >> > is why I added new interfaces, DMA_BUF_GET_FENCE and
> DMA_BUF_PUT_FENCE,
> >> to
> >> > dma buf framework. Of course, Those are ugly so we need a better way:
> I
> >> just
> >> > wanted to show you that in such way, we can synchronize the shared
> >> buffer
> >> > between cpu and dma. By any chance, is there any way that kernel can
> be
> >> > aware of when cpu accesses the shared buffer or is there any point I
> >> didn't
> >> > catch up?
> >>
> >> Well dma_buf_map/unmap_attachment should also flush/invalidate any
> caches,
> >> and with the current dma_buf spec those two functions are the _only_
> means
> >
> > I know that dma buf exporter should make sure that cache
> clean/invalidate
> > are done when dma_buf_map/unmap_attachement is called. For this, already
> we
> > do so. However, this function is called when dma buf import is requested
> by
> > user to map a dmabuf fd with dma. This means that
> dma_buf_map_attachement()
> > is called ONCE when user wants to share the dmabuf fd with dma.
Actually,
> in
> > case of exynos drm, dma_map_sg_attrs(), performing cache operation
> > internally, is called when dmabuf import is requested by user.
> >
> >> you have to do so. Which strictly means that if you interleave device
> dma
> >> and cpu acccess you need to unmap/map every time.
> >>
> >> Which is obviously horribly inefficient, but a known gap in the dma_buf
> >
> > Right, and also this has big overhead.
> >
> >> interface. Hence why I proposed to add dma_buf_sync_attachment similar
> to
> >> dma_sync_single_for_device:
> >>
> >> /**
> >>  * dma_buf_sync_sg_attachment - sync caches for dma access
> >>  * @attach: dma-buf attachment to sync
> >>  * @sgt: the sg table to sync (returned by dma_buf_map_attachement)
> >>  * @direction: dma direction to sync for
> >>  *
> >>  * This function synchronizes caches for device dma through the given
> >>  * dma-buf attachment when interleaving dma from different devices and
> the
> >>  * cpu. Other device dma needs to be synced also by calls to this
> >>  * function (or a pair of dma_buf_map/unmap_attachment calls), cpu
> access
> >>  * needs to be synced with dma_buf_begin/end_cpu_access.
> >>  */
> >> void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach,
> >>                               struct sg_table *sgt,
> >>                               enum dma_data_direction direction)
> >>
> >> Note that "sync" here only means to synchronize caches, not wait for
> any
> >> outstanding fences. This is simply to be consistent with the
> established
> >> lingo of the dma api. How the dma-buf fences fit into this is imo a
> >> different topic, but my idea is that all the cache coherency barriers
> >> (i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and
> >> dma_buf_begin/end_cpu_access) would automatically block for any
> attached
> >> fences (i.e. block for write fences when doing read-only access, block
> for
> >> all fences otherwise).
> >
> > As I mentioned already, kernel can't aware of when cpu accesses a shared
> > buffer: user can access a shared buffer after mmap anytime and the
> shared
> > buffer should be synchronized between cpu and dma. Therefore, the above
> > cache coherency barriers should be called every time cpu and dma tries
> to
> > access a shared buffer, checking before and after of cpu and dma access.
> And
> > exactly, the proposed way do such thing. For this, you can refer to the
> > below link,
> >
> > http://www.mail-archive.com/linux-media@vger.kernel.org/msg62124.html
> >
> > My point is that how kernel can aware of when those cache coherency
> barriers
> > should be called to synchronize caches and buffer access between cpu and
> > dma.
> >
> >>
> >> Then we could do a new dma_buf_attach_flags interface for special cases
> >> (might also be useful for other things, similar to the recently added
> >> flags in the dma api for wc/no-kernel-mapping/...). A new flag like
> >> DMA_BUF_ATTACH_NO_AUTOFENCE would then mean that the driver takes care
> of
> >> all
> >> fencing for this attachment and the dma-buf functions should not do the
> >> automagic fence blocking.
> >>
> >> > In Linux kernel, especially embedded system, we had tried to
> implement
> >> > generic interfaces for buffer management; how to allocate a buffer
> and
> >> how
> >> > to share a buffer. As a result, now we have CMA (Contiguous Memory
> >> > Allocator) for buffer allocation, and we have DMA-BUF for buffer
> sharing
> >> > between cpu and dma. And then how to synchronize a buffer between cpu
> >> and
> >> > dma? I think now it's best time to discuss buffer synchronization
> >> mechanism,
> >> > and that is very natural.
> >>
> >> I think it's important to differentiate between the two meanings of
> sync:
> >> - synchronize caches (i.e. cpu cache flushing in most cases)
> >> - and synchronize in-flight dma with fences.
> >>
> >
> > Agree, and I meant that the buffer synchronization mechanism includes
> the
> > above two things. And I think the two things should be addressed
> together.
> 
> I'm still confused about what you're trying to achive in the big
> picture here exactly. I think the key point is your statement above
> that the kernel can't know when the cpu access a dma-buf. That's not
> true though:
> 
> For userspace mmaps the kernel can shoot down ptes and so prevent
> userspace from accessing a buffer.

In this case, cpu access to user space will incur page fault. However, cpu
accesses user space region after mmap in user mode and every time user
wants. And user needs something before passing a buffer accessed into device
driver and also vice versa.

> Since that's pretty inefficient
> gem/ttm have additional ioctls for cache coherency transitions.

Yes, that's the something, and that's a way now we are using if I understood
surely. User side calls those ioctls for cache coherency between cpu and
dma: cache clean after cpu access and before dma access, and cache
invalidate after dma access and before cpu access. I thought it's better to
control cache in kernel side than user because I had found some Linux based
platforms overuse cache operations. So I thought the best place for it is
the proposed fence helper or similar thing: we would need interfaces to
notify the moment, 'before and after of cpu access', into kernel side. My
proposal is to couple those two things, synchronizing caches and in-flight
dma with fences, in kernel side. With this approach, user doesn't need to
care when dma and cpu access will be started, and when dma and cpu access
will be completed anymore when user wants to access a buffer and share it
with dma. All things, cache coherency and buffer fencing between dma and
cpu, will be done in kernel side.

> dma_buf currently has no support for explicit coherency transitions
> from userpsace, so if pte shootdown is an issue we need to improve
> that.
> 
> If I read your proposal correctly you instead suggest to _alway_ flush
> cache before/after each dma. That will _completely_ kill performance
> (been there with i915, trust me).
> 

In case of using a cachable mapped buffer and sharing the buffer with dma,
we need to clean cache after cpu access and before dma access, and we need
to invalidate cache after dma access and before cpu access every time. Isn't
that we are doing so now? I have a little knowledge about Desktop world so
there might be my misunderstanding.

Could you please give me advices and more comments if there is my missing
point or misunderstanding?

> For cpu access from kernel space we already have explicit mappings all
> over the place (kmap&friends), so I don't see any issues with
> inserting the relevant begin/end_cpu_access calls just around the cpu
> access. The big exception is the framebuffer layer, but even that has
> the deferred io support. KMS otoh has a nice frontbuffer dirty
> interface to correctly support dumb clients, unfortunately fbcon does
> not (yet) use that.
> 
> For these reasons I don't see a need to smash cache coherency and
> fencing into one problem, and hence why I've proposed to tackle each
> of them individually.

I might be really missing some points :)

Thanks,
Inki Dae

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch


^ permalink raw reply

* Re: [PATCH] build some drivers only when compile-testing
From: Ben Hutchings @ 2013-05-23 14:01 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jiri Slaby, jirislaby, linux-kernel, Andrew Morton,
	Linus Torvalds, Jeff Mahoney, Alexander Shishkin, linux-usb,
	Florian Tobias Schandinat, linux-geode, linux-fbdev,
	Richard Cochran, netdev, Keller, Jacob E
In-Reply-To: <20130523022327.GB6159@kroah.com>

[-- Attachment #1: Type: text/plain, Size: 4308 bytes --]

On Wed, 2013-05-22 at 19:23 -0700, Greg Kroah-Hartman wrote:
> On Wed, May 22, 2013 at 11:18:46AM +0200, Jiri Slaby wrote:
> > Some drivers can be built on more platforms than they run on. This
> > causes users and distributors packaging burden when they have to
> > manually deselect some drivers from their allmodconfigs. Or sometimes
> > it is even impossible to disable the drivers without patching the
> > kernel.
> > 
> > Introduce a new config option COMPILE_TEST and make all those drivers
> > to depend on the platform they run on, or on the COMPILE_TEST option.
> > Now, when users/distributors choose COMPILE_TEST=n they will not have
> > the drivers in their allmodconfig setups, but developers still can
> > compile-test them with COMPILE_TEST=y.
> 
> I understand the urge, and it's getting hard for distros to handle these
> drivers that just don't work on other architectures, but it's really
> valuable to ensure that they build properly, for those of us that don't
> have many/any cross compilers set up.
> 
> > Now the drivers where we use this new option:
> > * PTP_1588_CLOCK_PCH: The PCH EG20T is only compatible with Intel Atom
> >   processors so it should depend on x86.
> > * FB_GEODE: Geode is 32-bit only so only enable it for X86_32.
> > * USB_CHIPIDEA_IMX: The OF_DEVICE dependency will be met on powerpc
> >   systems -- which do not actually support the hardware via that
> >   method.
> 
> This seems ripe to start to get really messy, really quickly.  Shouldn't
> "default configs" handle if this "should" be enabled for a platform or
> not, and let the rest of us just build them with no problems?

Debian aims to provide a consistent feature set across all supported
architectures so far as possible, and I would expect other distributions
to do the same.  I don't believe anyone is coordinating defconfigs
across architectures to ensure that.

Distributions may also have particular requirements from userland that a
distribution-agnostic defconfig obviously doesn't cover.  (Isn't that
why we had the discussion about 'configure for my distribution' some
months back?)

For the driver configuration, what we provide in Debian is closer to
allmodconfig, only with some attempt to exclude those useless drivers.
Dependencies like this would make it easier to do that.

> What problems is this causing you?  Are you running out of space in
> kernel packages with drivers that will never be actually used?

On most x86 systems this isn't going to be an issue.  But if packages
are built natively (as they are for most distributions) then building
useless drivers for some architecture which doesn't have fast processors
available is a waste of precious resources.

This is particularly bad where the architecture doesn't support generic
kernels that boot on a wide range of machines and therefore we must
build many times over.  These unfortunately tend to be among those with
slower processors (ARM[1], MIPS, SH, ...).  Currently, Debian's linux
package takes ~48 hours to build for armel (ARM processors without FPU)
- and that's without building many of the PCI drivers we could for those
platforms which have PCI support.  So that's a minimum of 2 days to
provide a security update across all architectures[2].

[1] I'm aware that ARM is getting better in this regard and most ARMv7
machines are likely to be supportable by a single kernel image soon.
That doesn't mean the whole problem is solved.

[2] We don't always wait for all builds before releasing/announcing an
update; for example <http://www.debian.org/security/2013/dsa-2669>.  But
that's no comfort for those using the slower architecture.

> > +config COMPILE_TEST
> > +     bool "Compile also drivers which will not load" if EXPERT
> 
> EXPERT is getting to be the "let's hide it here" option, isn't it...

This little detail seems likely to reduce the usefulness of randconfig
as many drivers will become dependent on both EXPERT and COMPILE_TEST
being selected.

> I don't know, if no one else strongly objects, I can be convinced that
> this is needed, but so far, I don't see why it really is, or what this
> is going to help with.

Ben.

-- 
Ben Hutchings
friends: People who know you well, but like you anyway.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: [PATCH] video: simplefb: add mode parsing function
From: Stephen Warren @ 2013-05-23 16:27 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	gnurou-Re5JQEeQqe8AvxtiuMwx3w, Andrew Morton,
	linux-fbdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1369296231-26597-1-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>

On 05/23/2013 02:03 AM, Alexandre Courbot wrote:
> The naming scheme of simplefb's mode is precise enough to allow building
> the mode structure from it instead of using a static list of modes. This
> patch introduces a function that does this. In case exotic modes that
> cannot be represented from their name alone are needed, the static list
> of modes is still available as a backup.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> Stephen, please note that the "r5g6b5" mode initially supported
> by the driver becomes "b5g6r5" with the new function. This is because
> the least significant bits are defined first in the string - this
> makes parsing much easier, notably for modes which do not fill whole
> bytes (e.g. 15 bit modes). I hope this is not a problem - it's probably
> better to do the change now while the driver is still new.

Hmm. searchin Google (and even looking at the VDPAU spec I myself wrote)
does imply that format names like this do typically list the LSBs first.

I guess the problem is that when I decided to change from rgb565 to a
machine-parsable r5g6b5, I didn't think enough to realize that the field
order in the name "rgb565" didn't follow the same convention as when you
write out the fields in the style "r5g6b5"/"b5g6r5":-(

I guess this driver and DT binding are only in linux-next and targeted
at 3.11, and didn't make 3.10, so I'd assert it's probably still OK to
change the DT binding, if this patch also gets into 3.11.

The only two users of it I know are:

a) From U-Boot on the Raspberry Pi, and those patches haven't been
accepted into U-Boot yet.

b) From U-Boot on the Samsung ARM Chromebook yet. I know Olof built a
U-Boot that uses them and it'd linked from the chromiumos.org web pages.
I assume it's not too horrible for him to update them. I don't know how
widely the patch that enables simple-fb in that binary is distributed
though. Olof, care to comment on how much pain it'd be to change?

> diff --git a/Documentation/devicetree/bindings/video/simple-framebuffer.txt b/Documentation/devicetree/bindings/video/simple-framebuffer.txt

> -- format: The format of the framebuffer surface. Valid values are:
> -  - r5g6b5 (16-bit pixels, d[15:11]=r, d[10:5]=g, d[4:0]=b).
> +- format: The format of the framebuffer surface. Described as a sequence of
> +	channel/nbbits pairs, where each pair describes how many bits are used

Change nbbits to bitcount, num-bits, nr-bits?

> +	by a given color channel. Value channels are "r" (red), "g" (green),
> +	"b" (blue) and "a" (alpha).

I think you'll need "x" too, to represent any gaps/unused bits.

> Channels are listed starting in bit-significance order.

I would explicitly add ", starting from the LSB." to the end there.
Perhaps drop "-significance" too?

> For instance, a format named "r5g6b5" describes
> +	a 16-bit format where red is encoded in the 5 less significant bits,
> +	green in the 6 following ones, and blue in the 5 last:
> +				BBBBBGGG GGGRRRRR
> +				^               ^
> +			       MSB             LSB

> diff --git a/drivers/video/simplefb.c b/drivers/video/simplefb.c

> +static struct simplefb_format *simplefb_parse_format(struct device *dev,
> +						     const char *str)
> +{
> +	struct simplefb_format *format;
> +	unsigned int cpt = 0;

What does cpt mean? Something like bit or bitnum or bitindex might be
more descriptive.

> +	unsigned int i = 0;
> +
> +	format = devm_kzalloc(dev, sizeof(*format), GFP_KERNEL);
> +	if (!format)
> +		return ERR_PTR(-ENOMEM);

Can we just add some storage into the FB object itself for this, so we
don't need to make a special allocation? The first parameter to
framebuffer_alloc() allows allocating that, although you would have to
rejig the order of probe() a bit to do that:-( Perhaps it's fine as you
wrote it.

> +		field->offset = cpt;
> +		field->length = c - '0';

What about R10G10B10A2? Something like strtol() is needed here.

> +
> +		cpt += field->length;
> +	}
> +
> +	format->bits_per_pixel = ((cpt + 7) / 8) * 8;

Should this error-check that isn't > 32?

> +	if (!params->format || IS_ERR(params->format)) {

That's just saying IS_ERR_OR_NULL(params->format). It'd be better to
pick one thing to represent errors; either NULL or an error-pointer. I
think having simplefb_parse_format() just return NULL on error would be
best; the caller can't really do anything useful with the information
anyway.


^ permalink raw reply

* Re: [PATCH] video: simplefb: add mode parsing function
From: Geert Uytterhoeven @ 2013-05-23 16:57 UTC (permalink / raw)
  To: Stephen Warren
  Cc: gnurou-Re5JQEeQqe8AvxtiuMwx3w, Linux Fbdev development list,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Alexandre Courbot,
	Andrew Morton
In-Reply-To: <519E438A.9030408-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>

On Thu, May 23, 2013 at 6:27 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>> +
>> +             cpt += field->length;
>> +     }
>> +
>> +     format->bits_per_pixel = ((cpt + 7) / 8) * 8;
>
> Should this error-check that isn't > 32?

So pixels can't be larger than 32 bits?
IIRC, some SGI and Sun graphics cards had e.g. 80 bit pixels (incl. Z buffer).

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH v3 0/9] Clean up write-combining MTRR addition
From: Andy Lutomirski @ 2013-05-23 18:35 UTC (permalink / raw)
  To: linux-kernel, dri-devel, linux-fbdev
  Cc: Daniel Vetter, Jerome Glisse, Alex Deucher, Dave Airlie,
	Andy Lutomirski
In-Reply-To: <cover.1368485053.git.luto@amacapital.net>

On Mon, May 13, 2013 at 4:58 PM, Andy Lutomirski <luto@amacapital.net> wrote:
> A fair number of drivers (mostly graphics) add write-combining MTRRs.
> Most ignore errors and most add the MTRR even on PAT systems which don't
> need to use MTRRs.
>
> This series adds new functions arch_phys_wc_{add,del} that, on PAT-less
> x86 systems with MTRRs, add MTRRs and report errors, and that do nothing
> otherwise.  (Other architectures, if any, with a similar mechanism could
> implement them.)

That's the path to upstream for this?  Should it go through drm-next?
(Sorry for possible noob question -- I've never sent in anything other
than trivial fixes to drm stuff before.)

--Andy

^ permalink raw reply

* Re: [PATCH] video: simplefb: add mode parsing function
From: Stephen Warren @ 2013-05-23 19:21 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: gnurou-Re5JQEeQqe8AvxtiuMwx3w, Linux Fbdev development list,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Alexandre Courbot,
	Andrew Morton
In-Reply-To: <CAMuHMdVs4hy7dWAGxN51XP3YuWBY6rk22Z6tCU+-HujZH4jL2Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 05/23/2013 10:57 AM, Geert Uytterhoeven wrote:
> On Thu, May 23, 2013 at 6:27 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>> +
>>> +             cpt += field->length;
>>> +     }
>>> +
>>> +     format->bits_per_pixel = ((cpt + 7) / 8) * 8;
>>
>> Should this error-check that isn't > 32?
> 
> So pixels can't be larger than 32 bits?
> IIRC, some SGI and Sun graphics cards had e.g. 80 bit pixels (incl. Z buffer).

That's a good point.

Out of curiosity, how does the FB core treat these format definitions?
Are they expected to fit into a 16-/32-/64-/128- bit power-of-two
bit-size, or are they treated as a string of bytes that get serialized
into memory LSB first (or perhaps MSB first on BE systems?)

The difference would be that from a CPU perspective only, if you pack
the RGB components into a u32, then write that to RAM as a u32, then the
in-memory byte-by-byte order is different on different endian systems,
whereas if the FB core treats it as a series of bytes only, then
presumably the in-memory byte-by-byte order is identical irrespective of
host CPU endianness.

^ permalink raw reply

* Re: [PATCH] video: simplefb: add mode parsing function
From: Geert Uytterhoeven @ 2013-05-23 19:42 UTC (permalink / raw)
  To: Stephen Warren
  Cc: gnurou-Re5JQEeQqe8AvxtiuMwx3w, Linux Fbdev development list,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Alexandre Courbot,
	Andrew Morton
In-Reply-To: <519E6C37.2010706-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>

On Thu, May 23, 2013 at 9:21 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
> On 05/23/2013 10:57 AM, Geert Uytterhoeven wrote:
>> On Thu, May 23, 2013 at 6:27 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
>>>> +
>>>> +             cpt += field->length;
>>>> +     }
>>>> +
>>>> +     format->bits_per_pixel = ((cpt + 7) / 8) * 8;
>>>
>>> Should this error-check that isn't > 32?
>>
>> So pixels can't be larger than 32 bits?
>> IIRC, some SGI and Sun graphics cards had e.g. 80 bit pixels (incl. Z buffer).
>
> That's a good point.
>
> Out of curiosity, how does the FB core treat these format definitions?
> Are they expected to fit into a 16-/32-/64-/128- bit power-of-two
> bit-size, or are they treated as a string of bytes that get serialized
> into memory LSB first (or perhaps MSB first on BE systems?)
>
> The difference would be that from a CPU perspective only, if you pack
> the RGB components into a u32, then write that to RAM as a u32, then the
> in-memory byte-by-byte order is different on different endian systems,
> whereas if the FB core treats it as a series of bytes only, then
> presumably the in-memory byte-by-byte order is identical irrespective of
> host CPU endianness.

I don't think any of the current FB drivers support these.

Conceptually, a (packed) frame buffer is just a block of memory,
containing lines
of line_length bytes.
Each line contains xres_virtual pixels, each of bits_per_pixel bits.

As long as bits_per_pixel is an integer multiple of 8, it's
more-or-less well-defined.
We did have issues with endianness on e.g. bits_per_pixel = 4 (2 pixels in
one byte, which order?), bits_per_pixel = 16 (byteswapped RGB565), etc.
If bits_per_pixel is larger than 32, and not an integer multiple of 32,
you indeed can have more of them...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH] build some drivers only when compile-testing
From: Rob Landley @ 2013-05-24  4:50 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Greg Kroah-Hartman, Jiri Slaby, jirislaby, linux-kernel,
	Andrew Morton, Linus Torvalds, Jeff Mahoney, Alexander Shishkin,
	linux-usb, Florian Tobias Schandinat, linux-geode, linux-fbdev,
	Richard Cochran, netdev, Keller, Jacob E
In-Reply-To: <20130523022327.GB6159@kroah.com>

On 05/23/2013 09:01:40 AM, Ben Hutchings wrote:
> On Wed, 2013-05-22 at 19:23 -0700, Greg Kroah-Hartman wrote:
> > On Wed, May 22, 2013 at 11:18:46AM +0200, Jiri Slaby wrote:
> > > Some drivers can be built on more platforms than they run on. This
> > > causes users and distributors packaging burden when they have to
> > > manually deselect some drivers from their allmodconfigs. Or  
> sometimes
> > > it is even impossible to disable the drivers without patching the
> > > kernel.
> > >
> > > Introduce a new config option COMPILE_TEST and make all those  
> drivers
> > > to depend on the platform they run on, or on the COMPILE_TEST  
> option.
> > > Now, when users/distributors choose COMPILE_TEST=n they will not  
> have
> > > the drivers in their allmodconfig setups, but developers still can
> > > compile-test them with COMPILE_TEST=y.
> >
> > I understand the urge, and it's getting hard for distros to handle  
> these
> > drivers that just don't work on other architectures, but it's really
> > valuable to ensure that they build properly, for those of us that  
> don't
> > have many/any cross compilers set up.

In http://landley.net/aboriginal/bin grab the cross-compiler-*.tar.bz2  
tarballs, extract them, add the "bin" subdirectory of each to the  
$PATH. Congratulations, you have cross compilers set up. (They're  
statically linked and relocatable, so should run just about anywhere.  
If they don't, let me know and I'll fix it.)

Example build:

   make ARCH=sparc sparc32_defconfig
   PATH=/home/landley/simple-cross-compiler-sparc/bin:$PATH \
     make ARCH=sparc CROSS_COMPILE=sparc-

Rob

^ permalink raw reply

* Re: [PATCH] video: simplefb: add mode parsing function
From: Alex Courbot @ 2013-05-24  7:30 UTC (permalink / raw)
  To: Stephen Warren
  Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org,
	gnurou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, Andrew Morton,
	linux-fbdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <519E438A.9030408-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>

On 05/24/2013 01:27 AM, Stephen Warren wrote:
>> Stephen, please note that the "r5g6b5" mode initially supported
>> by the driver becomes "b5g6r5" with the new function. This is because
>> the least significant bits are defined first in the string - this
>> makes parsing much easier, notably for modes which do not fill whole
>> bytes (e.g. 15 bit modes). I hope this is not a problem - it's probably
>> better to do the change now while the driver is still new.
>
> Hmm. searchin Google (and even looking at the VDPAU spec I myself wrote)
> does imply that format names like this do typically list the LSBs first.
>
> I guess the problem is that when I decided to change from rgb565 to a
> machine-parsable r5g6b5, I didn't think enough to realize that the field
> order in the name "rgb565" didn't follow the same convention as when you
> write out the fields in the style "r5g6b5"/"b5g6r5":-(
>
> I guess this driver and DT binding are only in linux-next and targeted
> at 3.11, and didn't make 3.10, so I'd assert it's probably still OK to
> change the DT binding, if this patch also gets into 3.11.

Great, keeping it like that then.

>> -- format: The format of the framebuffer surface. Valid values are:
>> -  - r5g6b5 (16-bit pixels, d[15:11]=r, d[10:5]=g, d[4:0]=b).
>> +- format: The format of the framebuffer surface. Described as a sequence of
>> +	channel/nbbits pairs, where each pair describes how many bits are used
>
> Change nbbits to bitcount, num-bits, nr-bits?

Ok.

>> +	by a given color channel. Value channels are "r" (red), "g" (green),
>> +	"b" (blue) and "a" (alpha).
>
> I think you'll need "x" too, to represent any gaps/unused bits.

Are there such formats? 15-bit formats do exist but AFAIK they just drop 
the MSB. Anyway, this can be done easily, so I won't argue. :)

>> Channels are listed starting in bit-significance order.
>
> I would explicitly add ", starting from the LSB." to the end there.
> Perhaps drop "-significance" too?

Agreed, sounds better.

>> +static struct simplefb_format *simplefb_parse_format(struct device *dev,
>> +						     const char *str)
>> +{
>> +	struct simplefb_format *format;
>> +	unsigned int cpt = 0;
>
> What does cpt mean? Something like bit or bitnum or bitindex might be
> more descriptive.

Fixed.

>> +	unsigned int i = 0;
>> +
>> +	format = devm_kzalloc(dev, sizeof(*format), GFP_KERNEL);
>> +	if (!format)
>> +		return ERR_PTR(-ENOMEM);
>
> Can we just add some storage into the FB object itself for this, so we
> don't need to make a special allocation? The first parameter to
> framebuffer_alloc() allows allocating that, although you would have to
> rejig the order of probe() a bit to do that:-( Perhaps it's fine as you
> wrote it.

The less allocations the better - if using framebuffer_alloc does not 
make things more confusing, I'll gladly use that solution.

>> +		field->offset = cpt;
>> +		field->length = c - '0';
>
> What about R10G10B10A2? Something like strtol() is needed here.

True. That's probably more color depth than humans can perceive, but 
will make the mantis shrimp happy.

>> +
>> +		cpt += field->length;
>> +	}
>> +
>> +	format->bits_per_pixel = ((cpt + 7) / 8) * 8;
>
> Should this error-check that isn't > 32?

That would make the mantis shrimp sad.

>> +	if (!params->format || IS_ERR(params->format)) {
>
> That's just saying IS_ERR_OR_NULL(params->format). It'd be better to
> pick one thing to represent errors; either NULL or an error-pointer. I
> think having simplefb_parse_format() just return NULL on error would be
> best; the caller can't really do anything useful with the information
> anyway.

Weird - I actually removed that NULL check since the parse function can 
only return a valid pointed an error code. Probably forgot to 
format-patch again after that.

It seems more customary to let the function that fails decide the error 
code and have it propagated by its caller (even though in the current 
case there is no much variety in the returned error codes). If you see 
no problem with it, I will stick to that way of doing.

Thanks for the review - will send an update soon.
Alex.


-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

^ permalink raw reply

* Re: [PATCH] video: simplefb: add mode parsing function
From: Stephen Warren @ 2013-05-24 15:37 UTC (permalink / raw)
  To: Alex Courbot
  Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org,
	gnurou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org, Andrew Morton,
	linux-fbdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <519F1729.2050003-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>

On 05/24/2013 01:30 AM, Alex Courbot wrote:
> On 05/24/2013 01:27 AM, Stephen Warren wrote:
>>> Stephen, please note that the "r5g6b5" mode initially supported
>>> by the driver becomes "b5g6r5" with the new function. This is because
>>> the least significant bits are defined first in the string - this
>>> makes parsing much easier, notably for modes which do not fill whole
>>> bytes (e.g. 15 bit modes). I hope this is not a problem - it's probably
>>> better to do the change now while the driver is still new.
...
>>> +    by a given color channel. Value channels are "r" (red), "g"
>>> (green),
>>> +    "b" (blue) and "a" (alpha).
>>
>> I think you'll need "x" too, to represent any gaps/unused bits.
> 
> Are there such formats? 15-bit formats do exist but AFAIK they just drop
> the MSB. Anyway, this can be done easily, so I won't argue. :)

Yes, I believe there are e.g. both a2r10g10b10 and b10g10r10a2 for
example, and it's quite common to replace a with x, especially for
scanout buffers. Google certainly has hits for that.

>>> +    if (!params->format || IS_ERR(params->format)) {
>>
>> That's just saying IS_ERR_OR_NULL(params->format). It'd be better to
>> pick one thing to represent errors; either NULL or an error-pointer. I
>> think having simplefb_parse_format() just return NULL on error would be
>> best; the caller can't really do anything useful with the information
>> anyway.
> 
> Weird - I actually removed that NULL check since the parse function can
> only return a valid pointed an error code. Probably forgot to
> format-patch again after that.
> 
> It seems more customary to let the function that fails decide the error
> code and have it propagated by its caller (even though in the current
> case there is no much variety in the returned error codes). If you see
> no problem with it, I will stick to that way of doing.

Using just an error-return is probably fine. I was going to say that the
table lookup might propagate a NULL all the way through to that check,
and hence require both checks above, but I guess that case can't happen,
since if there is no table entry, then simplefb_parse_format() will
always be called, so that's fine.

^ permalink raw reply

* [RFC] Add co-maintainer for fbdev
From: Jean-Christophe PLAGNIOL-VILLARD @ 2013-05-24 15:38 UTC (permalink / raw)
  To: linux-fbdev; +Cc: Arnd Bergmann, Linus Torvalds, linux-kernel, Tomi Valkeinen

Hi Florian,

	As you seems very busy I'd like to propose the help you to handle the
	fbdev subsystem to easier the rich of the fbdev patch to Linus

	As I'm working on fbdev on at91 and others and already Co-Maintain the
	at91 mach on ARM

	And if you are not willing to continue I could take over

Best Regards,
J.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox