Linux Framebuffer Layer development
 help / color / mirror / Atom feed
* Re: [V2 1/7] video: mmp: rb swap setting update for new LCD driver
From: Daniel Drake @ 2013-06-21 16:57 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1370879533-11017-1-git-send-email-jtzhou@marvell.com>

On Mon, Jun 10, 2013 at 9:52 AM, Jett.Zhou <jtzhou@marvell.com> wrote:
> From: Guoqing Li <ligq@marvell.com>
>
> We could set rb swap in two modules: DMA controler and interface
> output after blending.
> This patch move the panel rbswap requirement setting in later module.

link_config originates from the platform's path config. link_config is
undocumented and this patch also changes its meaning.

Previously, bit 0 triggered rbswap, and this behaviour is relied upon
by arch/arm/mach-mmp/ttc_dkb.c

Now bits 27:24 of link_config are used to enable or disable rbswap,
and link_config bit 0 is ignored. According to the specs for the panel
path register, valid values for these new bits are 0 (no swap) and 1
(swap). ttc_dkb has not been updated in this patch series for this new
behaviour.

I don't understand why rbswap is set to 1 in fmt_to_reg for certain
RGB formats. The patch description suggests that the rb swapping can
either be set in fmt_to_reg context (to be written into DMA
controller) *or* in the interface output. If we are now relying on the
interface output control to do RB swapping when appropriate, why are
there still cases when we ask the DMA controller to do the same thing?

I also do not fully understand the requirement for RB swapping. I do
understand that it changes pixel format from RGB to BGR. What is the
point? I can imagine that some panels may require such a pixel format,
however in that case, this configuration should be part of the panel
configuration, not part of the path configuration as it currently is.

Daniel

^ permalink raw reply

* Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework
From: Inki Dae @ 2013-06-21 16:55 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Russell King - ARM Linux, linux-fbdev, YoungJun Cho,
	Kyungmin Park, myungjoo.ham, DRI mailing list,
	linux-arm-kernel@lists.infradead.org, linux-media@vger.kernel.org
In-Reply-To: <1371817628.5882.13.camel@weser.hi.pengutronix.de>

2013/6/21 Lucas Stach <l.stach@pengutronix.de>:
> Hi Inki,
>
> please refrain from sending HTML Mails, it makes proper quoting without
> messing up the layout everywhere pretty hard.
>

Sorry about that. I should have used text mode.

> Am Freitag, den 21.06.2013, 20:01 +0900 schrieb Inki Dae:
> [...]
>
>>         Yeah, you'll some knowledge and understanding about the API
>>         you are
>>         working with to get things right. But I think it's not an
>>         unreasonable
>>         thing to expect the programmer working directly with kernel
>>         interfaces
>>         to read up on how things work.
>>
>>         Second thing: I'll rather have *one* consistent API for every
>>         subsystem,
>>         even if they differ from each other than having to implement
>>         this
>>         syncpoint thing in every subsystem. Remember: a single execbuf
>>         in DRM
>>         might reference both GEM objects backed by dma-buf as well
>>         native SHM or
>>         CMA backed objects. The dma-buf-mgr proposal already allows
>>         you to
>>         handle dma-bufs much the same way during validation than
>>         native GEM
>>         objects.
>>
>> Actually, at first I had implemented a fence helper framework based on
>> reservation and dma fence to provide easy-use-interface for device
>> drivers. However, that was wrong implemention: I had not only
>> customized the dma fence but also not considered dead lock issue.
>> After that, I have reimplemented it as dmabuf sync to solve dead
>> issue, and at that time, I realized that we first need to concentrate
>> on the most basic thing: the fact CPU and CPU, CPU and DMA, or DMA and
>> DMA can access a same buffer, And the fact simple is the best, and the
>> fact we need not only kernel side but also user side interfaces. After
>> that, I collected what is the common part for all subsystems, and I
>> have devised this dmabuf sync framework for it. I'm not really
>> specialist in Desktop world. So question. isn't the execbuf used only
>> for the GPU? the gpu has dedicated video memory(VRAM) so it needs
>> migration mechanism between system memory and the dedicated video
>> memory, and also to consider ordering issue while be migrated.
>>
>
> Yeah, execbuf is pretty GPU specific, but I don't see how this matters
> for this discussion. Also I don't see a big difference between embedded
> and desktop GPUs. Buffer migration is more of a detail here. Both take
> command stream that potentially reference other buffers, which might be
> native GEM or dma-buf backed objects. Both have to make sure the buffers
> are in the right domain (caches cleaned and address mappings set up) and
> are available for the desired operation, meaning you have to sync with
> other DMA engines and maybe also with CPU.

Yeah, right. Then, in case of desktop gpu, does't it need additional
something to do when a buffer/s is/are migrated from system memory to
video memory domain, or from video memory to system memory domain? I
guess the below members does similar thing, and all other DMA devices
would not need them:
        struct fence {
                  ...
                  unsigned int context, seqno;
                  ...
        };

And,
       struct seqno_fence {
                 ...
                 uint32_t seqno_ofs;
                 ...
       };

>
> The only case where sync isn't clearly defined right now by the current
> API entrypoints is when you access memory through the dma-buf fallback
> mmap support, which might happen with some software processing element
> in a video pipeline or something. I agree that we will need a userspace
> interface here, but I think this shouldn't be yet another sync object,
> but rather more a prepare/fini_cpu_access ioctl on the dma-buf which
> hooks into the existing dma-fence and reservation stuff.

I think we don't need addition ioctl commands for that. I am thinking
of using existing resources as possible. My idea also is similar in
using the reservation stuff to your idea because my approach also
should use the dma-buf resource. However, My idea is that a user
process, that wants buffer synchronization with the other, sees a sync
object as a file descriptor like dma-buf does. The below shows simple
my idea about it:

ioctl(dmabuf_fd, DMA_BUF_IOC_OPEN_SYNC, &sync);

flock(sync->fd, LOCK_SH); <- LOCK_SH means a shared lock.
CPU access for read
flock(sync->fd, LOCK_UN);

Or

flock(sync->fd, LOCK_EX); <- LOCK_EX means an exclusive lock
CPU access for write
flock(sync->fd, LOCK_UN);

close(sync->fd);

As you know, that's similar to dmabuf export feature.

In addition, a more simple idea,
flock(dmabuf_fd, LOCK_SH/EX);
CPU access for read/write
flock(dmabuf_fd, LOCK_UN);

However, I'm not sure that the above examples could be worked well,
and there are no problems yet: actually, I don't fully understand
flock mechanism, so looking into it.

>
>>
>>         And to get back to my original point: if you have more than
>>         one task
>>         operating together on a buffer you absolutely need some kind
>>         of real IPC
>>         to sync them up and do something useful. Both you syncpoints
>>         and the
>>         proposed dma-fences only protect the buffer accesses to make
>>         sure
>>         different task don't stomp on each other. There is nothing in
>>         there to
>>         make sure that the output of your pipeline is valid. You have
>>         to take
>>         care of that yourself in userspace. I'll reuse your example to
>>         make it
>>         clear what I mean:
>>
>>         Task A                                         Task B
>>         ------                                         -------
>>         dma_buf_sync_lock(buf1)
>>         CPU write buf1
>>         dma_buf_sync_unlock(buf1)
>>                   ---------schedule Task A again-------
>>         dma_buf_sync_lock(buf1)
>>         CPU write buf1
>>         dma_buf_sync_unlock(buf1)
>>                     ---------schedule Task B---------
>>                                                        qbuf(buf1)
>>
>>         dma_buf_sync_lock(buf1)
>>                                                        ....
>>
>>         This is what can happen if you don't take care of proper
>>         syncing. Task A
>>         writes something to the buffer in expectation that Task B will
>>         take care
>>         of it, but before Task B even gets scheduled Task A overwrites
>>         the
>>         buffer again. Not what you wanted, isn't it?
>>
>> Exactly wrong example. I had already mentioned about that. "In case
>> that data flow goes from A to B, it needs some kind of IPC between the
>> two tasks every time"  So again, your example would have no any
>> problem in case that *two tasks share the same buffer but these tasks
>> access the buffer(buf1) as write, and data of the buffer(buf1) isn't
>> needed to be shared*.  They just need to use the buffer as *storage*.
>> So All they want is to avoid stomping on the buffer in this case.
>>
> Sorry, but I don't see the point. If no one is interested in the data of
> the buffer, why are you sharing it in the first place?
>

Just used as a storage. i.e., Task A fills the buffer with "AAAAAA"
using CPU, And Task B fills the buffer with "BBBBBB" using DMA. They
don't share data of the buffer, but they share *memory region* of the
buffer. That would be very useful for the embedded systems with very
small size system memory.

>>
>>         So to make sure the output of a pipeline of some kind is what
>>         you expect
>>         you have to do syncing with IPC
>>
>> So not true.
>>
>>         . And once you do CPU access it is a
>>         synchronous thing in the stream of events. I see that you
>>         might want to
>>         have some kind of bracketed CPU access even for the fallback
>>         mmap case
>>         for things like V4L2 that don't provide explicit sync by their
>>         own, but
>>         in no way I can see why we would need a user/kernel shared
>>         syncpoint for
>>         this.
>>
>>         > > A more advanced way to achieve this
>>         > > would be using cross-device fences to avoid going through
>>         userspace for
>>         > > every syncpoint.
>>         > >
>>         >
>>         > Ok, maybe there is something I missed. So question. What is
>>         the
>>         > cross-device fences? dma fence?. And how we can achieve the
>>         > synchronization mechanism without going through user space
>>         for every
>>         > syncpoint; CPU and DMA share a same buffer?. And could you
>>         explain it
>>         > in detail as long as possible like I did?
>>         >
>>
>>         Yeah I'm talking about the proposed dma-fences. They would
>>         allow you to
>>         just queue things into the kernel without waiting for a device
>>         operation
>>         to finish. But you still have to make sure that your commands
>>         have the
>>         right order and don't go wild. So for example you could do
>>         something
>>         like this:
>>
>>         Userspace                                   Kernel
>>         ---------                                   ------
>>         1. build DRM command stream
>>         rendering into buf1
>>         2. queue command stream with execbuf
>>                                                     1. validate
>>         command stream
>>                                                      1.1 reference
>>         buf1 for writing
>>                                                          through
>>         dma-buf-mgr
>>                                                     2. kick off GPU
>>         processing
>>         3. qbuf buf1 into V4L2
>>                                                     3. reference buf1
>>         for reading
>>                                                      3.1 wait for
>>         fence from GPU to
>>                                                          signal
>>                                                     4. kick off V4L2
>>         processing
>>
>>
>> That seems like very specific to Desktop GPU. isn't it?
>>
> Would you mind explaining what you think is desktop specific about that?
>

Sorry. there was my misunderstanding. That's not specfic to desktop
gpu. I guess the V4L2 device and GPU device are in a same graphic card
in case of desktop pc, and also they would use same memory domain.
However, If we use other device, that uses system memory as dma
buffer, instead of V4L2 device then doesn't it need specific something
to handle memory domain migration issue? My point is that such thing
is not common portion we want to do. I think things common to entire
archtectures such as ARM, x86, and so on should be placed in
drivers/base/here.

That's just my opinion.

Thanks,
Inki Dae

> Regards,
> Lucas
>
> --
> Pengutronix e.K.                           | Lucas Stach                 |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
>

^ permalink raw reply

* Re: MMP display subsystem questions
From: Daniel Drake @ 2013-06-21 16:14 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <51C3BD65.8070206@marvell.com>

On Thu, Jun 20, 2013 at 8:41 PM, Zhou Zhu <zzhu3@marvell.com> wrote:
> I admit it's an issue of no documents. We would submit a patch to add
> description under Documentation soon later.

That would be much appreciated. Without the required background I am
still having trouble fully understanding your reply and reviewing your
patches.

>> The overlay's address is written to a register that is specific to
>> it's parent path. If we are dealing with two overlays, how does this
>> work? We write both overlay addresses to the same register and the
>> second one "wins"? I checked all of the other overlay-related
>> functions and they all defer to the parent path as well.
>>
>> As an experiment I modified platform data to suggest that my path has
>> 98 overlays and the framebuffer should run on overlay 33. Nonsense,
>> but the framebuffer booted up as normal. It seems like there is
>> something missing in this path/overlay relationship.
>>
>> What is the plan going forward for this overlay management? At the
>> moment there is only one consumer of overlays in the kernel - the
>> framebuffer driver. And the framebuffer will only ever use one. Who
>> are the other intended in-kernel users of overlays?
>
> We have already pushed patches to fix this issue and support overlays.
> Please see patches below:
> [V2 7/7] video: mmp: add video layer set win/addr operations support
> http://lists.infradead.org/pipermail/linux-arm-kernel/2013-June/175071.html

At a glance, I cannot see how this patch answers any of my questions.
I will attempt to review the patches in that series and let you know
my difficulties in understanding. Hopefully that will help us write
good documentation.

>> path_config: this value gets written to registers LCD_SCLK *and*
>> IOPAD_CONTROL. The bit definitions in these registers are totally
>> different. It seems like nonsense to share the same configuration
>> value between two diversely different registers - what is going on
>> here?
>>
>> link_config: Seems to have a dual meaning. Upper 4 bits can be used to
>> specify dumb panel mode in registers like LCD_DUMB_CTRL. Bit 0 is
>> totally unrelated and triggers "swap RB" i.e. use BGR instead of RGB,
>> when dealing with totally different registers. Other bits are ignored.
>> If "disorganised" bitfields are to be haphazardly invented in this way
>> it would at least be nice to have documentation.
>
> Actually we are using mixed configs here and it is not so clear.
> We may update these config to separated configs with more clearly meanings.
> Currently 4 fields is here:
> SCLK source: select source clock for sclk. We have plan to remove it and
> move the source selection into common clock driver.
> IOPAD CONTROL: for iopad control
> DUMB CONTROL: for dumb panel mode
> RB SWAP: for rb swaps in link. Actually it's do required as for some panels
> would swap r/b links.

Yes, separating this makes sense. I look forward to the patch, with
the kind of documentation written above.

> Actually in our usage mode, we'd rather prefer to make one fb connected to
> one path.
> In many usage mode, we may need 2 or more devices (panel, external tv...)
> turned on together so 2 or more fb devices are required to connect to
> separate path.

That makes sense.

>> 4. Which overlay to use - according to my investigation above, this
>> doesn't actually have any meaningful effect on the driver.
>
> As overlay enabled in patch above now, this config is required to make fb
> show on some overlay and other interface(v4l2, private interface) to use
> others.

I don't understand this, but it might be easier to understand once you
have documented what an overlay is and how it relates to a path. How
is v4l2 going to interact with this driver? What private interfaces
are you referring to?

> So according to comments above, fb buffer is still need to be configure in
> dts for path/overlay configs.

Or can we just auto-instantiate one framebuffer per path with good defaults?

> Also there still might be some settings in fb for some further settings like
> vsync usage or yres_virtual size in coming patches.

Wouldn't such information be represented in the modes supported by the panel?

A couple more questions:

1. What is the invert_pixclock setting in the mmp_mode structure? This
causes a currently unused bit to be set in fb_videomode->vmode to be
set (seems dangerous). And then nothing acts upon this bit, as far as
I can see.

2. What about pix_fmt_out? Why is pixel format dumped into the mode
specifier? If there is a supported panel that really can support
different pixel formats, I would expect it to support all the pixel
formats for all the resolution. This also seems unused within the
driver.

Thanks
Daniel

^ permalink raw reply

* Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework
From: Lucas Stach @ 2013-06-21 12:27 UTC (permalink / raw)
  To: Inki Dae
  Cc: Russell King - ARM Linux, linux-fbdev, YoungJun Cho,
	Kyungmin Park, myungjoo.ham, DRI mailing list,
	linux-arm-kernel@lists.infradead.org, linux-media@vger.kernel.org
In-Reply-To: <CAAQKjZOxOMuL3zh_yV7tU2LBcZ7oVryiKa+LgjTM5HLY+va8zQ@mail.gmail.com>

Hi Inki,

please refrain from sending HTML Mails, it makes proper quoting without
messing up the layout everywhere pretty hard.

Am Freitag, den 21.06.2013, 20:01 +0900 schrieb Inki Dae:
[...]

>         Yeah, you'll some knowledge and understanding about the API
>         you are
>         working with to get things right. But I think it's not an
>         unreasonable
>         thing to expect the programmer working directly with kernel
>         interfaces
>         to read up on how things work.
>         
>         Second thing: I'll rather have *one* consistent API for every
>         subsystem,
>         even if they differ from each other than having to implement
>         this
>         syncpoint thing in every subsystem. Remember: a single execbuf
>         in DRM
>         might reference both GEM objects backed by dma-buf as well
>         native SHM or
>         CMA backed objects. The dma-buf-mgr proposal already allows
>         you to
>         handle dma-bufs much the same way during validation than
>         native GEM
>         objects.
>  
> Actually, at first I had implemented a fence helper framework based on
> reservation and dma fence to provide easy-use-interface for device
> drivers. However, that was wrong implemention: I had not only
> customized the dma fence but also not considered dead lock issue.
> After that, I have reimplemented it as dmabuf sync to solve dead
> issue, and at that time, I realized that we first need to concentrate
> on the most basic thing: the fact CPU and CPU, CPU and DMA, or DMA and
> DMA can access a same buffer, And the fact simple is the best, and the
> fact we need not only kernel side but also user side interfaces. After
> that, I collected what is the common part for all subsystems, and I
> have devised this dmabuf sync framework for it. I'm not really
> specialist in Desktop world. So question. isn't the execbuf used only
> for the GPU? the gpu has dedicated video memory(VRAM) so it needs
> migration mechanism between system memory and the dedicated video
> memory, and also to consider ordering issue while be migrated.
>  

Yeah, execbuf is pretty GPU specific, but I don't see how this matters
for this discussion. Also I don't see a big difference between embedded
and desktop GPUs. Buffer migration is more of a detail here. Both take
command stream that potentially reference other buffers, which might be
native GEM or dma-buf backed objects. Both have to make sure the buffers
are in the right domain (caches cleaned and address mappings set up) and
are available for the desired operation, meaning you have to sync with
other DMA engines and maybe also with CPU.

The only case where sync isn't clearly defined right now by the current
API entrypoints is when you access memory through the dma-buf fallback
mmap support, which might happen with some software processing element
in a video pipeline or something. I agree that we will need a userspace
interface here, but I think this shouldn't be yet another sync object,
but rather more a prepare/fini_cpu_access ioctl on the dma-buf which
hooks into the existing dma-fence and reservation stuff.

>         
>         And to get back to my original point: if you have more than
>         one task
>         operating together on a buffer you absolutely need some kind
>         of real IPC
>         to sync them up and do something useful. Both you syncpoints
>         and the
>         proposed dma-fences only protect the buffer accesses to make
>         sure
>         different task don't stomp on each other. There is nothing in
>         there to
>         make sure that the output of your pipeline is valid. You have
>         to take
>         care of that yourself in userspace. I'll reuse your example to
>         make it
>         clear what I mean:
>         
>         Task A                                         Task B
>         ------                                         -------
>         dma_buf_sync_lock(buf1)
>         CPU write buf1
>         dma_buf_sync_unlock(buf1)
>                   ---------schedule Task A again-------
>         dma_buf_sync_lock(buf1)
>         CPU write buf1
>         dma_buf_sync_unlock(buf1)
>                     ---------schedule Task B---------
>                                                        qbuf(buf1)
>         
>         dma_buf_sync_lock(buf1)
>                                                        ....
>         
>         This is what can happen if you don't take care of proper
>         syncing. Task A
>         writes something to the buffer in expectation that Task B will
>         take care
>         of it, but before Task B even gets scheduled Task A overwrites
>         the
>         buffer again. Not what you wanted, isn't it?
>  
> Exactly wrong example. I had already mentioned about that. "In case
> that data flow goes from A to B, it needs some kind of IPC between the
> two tasks every time"  So again, your example would have no any
> problem in case that *two tasks share the same buffer but these tasks
> access the buffer(buf1) as write, and data of the buffer(buf1) isn't
> needed to be shared*.  They just need to use the buffer as *storage*.
> So All they want is to avoid stomping on the buffer in this case.
>  
Sorry, but I don't see the point. If no one is interested in the data of
the buffer, why are you sharing it in the first place?

>         
>         So to make sure the output of a pipeline of some kind is what
>         you expect
>         you have to do syncing with IPC
>  
> So not true.
>  
>         . And once you do CPU access it is a
>         synchronous thing in the stream of events. I see that you
>         might want to
>         have some kind of bracketed CPU access even for the fallback
>         mmap case
>         for things like V4L2 that don't provide explicit sync by their
>         own, but
>         in no way I can see why we would need a user/kernel shared
>         syncpoint for
>         this.
>         
>         > > A more advanced way to achieve this
>         > > would be using cross-device fences to avoid going through
>         userspace for
>         > > every syncpoint.
>         > >
>         >
>         > Ok, maybe there is something I missed. So question. What is
>         the
>         > cross-device fences? dma fence?. And how we can achieve the
>         > synchronization mechanism without going through user space
>         for every
>         > syncpoint; CPU and DMA share a same buffer?. And could you
>         explain it
>         > in detail as long as possible like I did?
>         >
>         
>         Yeah I'm talking about the proposed dma-fences. They would
>         allow you to
>         just queue things into the kernel without waiting for a device
>         operation
>         to finish. But you still have to make sure that your commands
>         have the
>         right order and don't go wild. So for example you could do
>         something
>         like this:
>         
>         Userspace                                   Kernel
>         ---------                                   ------
>         1. build DRM command stream
>         rendering into buf1
>         2. queue command stream with execbuf
>                                                     1. validate
>         command stream
>                                                      1.1 reference
>         buf1 for writing
>                                                          through
>         dma-buf-mgr
>                                                     2. kick off GPU
>         processing
>         3. qbuf buf1 into V4L2
>                                                     3. reference buf1
>         for reading
>                                                      3.1 wait for
>         fence from GPU to
>                                                          signal
>                                                     4. kick off V4L2
>         processing
>         
>  
> That seems like very specific to Desktop GPU. isn't it?
>  
Would you mind explaining what you think is desktop specific about that?

Regards,
Lucas

-- 
Pengutronix e.K.                           | Lucas Stach                 |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


^ permalink raw reply

* Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework
From: Lucas Stach @ 2013-06-21  8:54 UTC (permalink / raw)
  To: Inki Dae
  Cc: 'Russell King - ARM Linux', 'Inki Dae',
	'linux-fbdev', 'YoungJun Cho',
	'Kyungmin Park', 'myungjoo.ham',
	'DRI mailing list', linux-arm-kernel, linux-media
In-Reply-To: <010801ce6da7$896affe0$9c40ffa0$%dae@samsung.com>

Am Donnerstag, den 20.06.2013, 20:15 +0900 schrieb Inki Dae:
[...]
> > > > You already need some kind of IPC between the two tasks, as I suspect
> > > > even in your example it wouldn't make much sense to queue the buffer
> > > > over and over again in task B without task A writing anything to it.
> > So
> > > > task A has to signal task B there is new data in the buffer to be
> > > > processed.
> > > >
> > > > There is no need to share the buffer over and over again just to get
> > the
> > > > two processes to work together on the same thing. Just share the fd
> > > > between both and then do out-of-band completion signaling, as you need
> > > > this anyway. Without this you'll end up with unpredictable behavior.
> > > > Just because sync allows you to access the buffer doesn't mean it's
> > > > valid for your use-case. Without completion signaling you could easily
> > > > end up overwriting your data from task A multiple times before task B
> > > > even tries to lock the buffer for processing.
> > > >
> > > > So the valid flow is (and this already works with the current APIs):
> > > > Task A                                    Task B
> > > > ------                                    ------
> > > > CPU access buffer
> > > >          ----------completion signal--------->
> > > >                                           qbuf (dragging buffer into
> > > >                                           device domain, flush caches,
> > > >                                           reserve buffer etc.)
> > > >                                                     |
> > > >                                           wait for device operation to
> > > >                                           complete
> > > >                                                     |
> > > >                                           dqbuf (dragging buffer back
> > > >                                           into CPU domain, invalidate
> > > >                                           caches, unreserve)
> > > >         <---------completion signal------------
> > > > CPU access buffer
> > > >
> > >
> > > Correct. In case that data flow goes from A to B, it needs some kind
> > > of IPC between the two tasks every time as you said. Then, without
> > > dmabuf-sync, how do think about the case that two tasks share the same
> > > buffer but these tasks access the buffer(buf1) as write, and data of
> > > the buffer(buf1) isn't needed to be shared?
> > >
> > Sorry, I don't see the point you are trying to solve here. If you share
> > a buffer and want its content to be clearly defined at every point in
> > time you have to synchronize the tasks working with the buffer, not just
> > the buffer accesses itself.
> > 
> > Easiest way to do so is doing sync through userspace with out-of-band
> > IPC, like in the example above.
> 
> In my opinion, that's not definitely easiest way. What I try to do is
> to avoid using *the out-of-band IPC*. As I mentioned in document file,
> the conventional mechanism not only makes user application
> complicated-user process needs to understand how the device driver is
> worked-but also may incur performance overhead by using the
> out-of-band IPC. The above my example may not be enough to you but
> there would be other cases able to use my approach efficiently.
> 

Yeah, you'll some knowledge and understanding about the API you are
working with to get things right. But I think it's not an unreasonable
thing to expect the programmer working directly with kernel interfaces
to read up on how things work.

Second thing: I'll rather have *one* consistent API for every subsystem,
even if they differ from each other than having to implement this
syncpoint thing in every subsystem. Remember: a single execbuf in DRM
might reference both GEM objects backed by dma-buf as well native SHM or
CMA backed objects. The dma-buf-mgr proposal already allows you to
handle dma-bufs much the same way during validation than native GEM
objects.

And to get back to my original point: if you have more than one task
operating together on a buffer you absolutely need some kind of real IPC
to sync them up and do something useful. Both you syncpoints and the
proposed dma-fences only protect the buffer accesses to make sure
different task don't stomp on each other. There is nothing in there to
make sure that the output of your pipeline is valid. You have to take
care of that yourself in userspace. I'll reuse your example to make it
clear what I mean:

Task A                                         Task B
------                                         -------
dma_buf_sync_lock(buf1)
CPU write buf1
dma_buf_sync_unlock(buf1)
          ---------schedule Task A again-------
dma_buf_sync_lock(buf1)
CPU write buf1
dma_buf_sync_unlock(buf1)
            ---------schedule Task B---------
                                               qbuf(buf1)
                                                  dma_buf_sync_lock(buf1)
                                               ....

This is what can happen if you don't take care of proper syncing. Task A
writes something to the buffer in expectation that Task B will take care
of it, but before Task B even gets scheduled Task A overwrites the
buffer again. Not what you wanted, isn't it?

So to make sure the output of a pipeline of some kind is what you expect
you have to do syncing with IPC. And once you do CPU access it is a
synchronous thing in the stream of events. I see that you might want to
have some kind of bracketed CPU access even for the fallback mmap case
for things like V4L2 that don't provide explicit sync by their own, but
in no way I can see why we would need a user/kernel shared syncpoint for
this.

> > A more advanced way to achieve this
> > would be using cross-device fences to avoid going through userspace for
> > every syncpoint.
> > 
> 
> Ok, maybe there is something I missed. So question. What is the
> cross-device fences? dma fence?. And how we can achieve the
> synchronization mechanism without going through user space for every
> syncpoint; CPU and DMA share a same buffer?. And could you explain it
> in detail as long as possible like I did?
> 
Yeah I'm talking about the proposed dma-fences. They would allow you to
just queue things into the kernel without waiting for a device operation
to finish. But you still have to make sure that your commands have the
right order and don't go wild. So for example you could do something
like this:

Userspace                                   Kernel
---------                                   ------
1. build DRM command stream
rendering into buf1
2. queue command stream with execbuf
                                            1. validate command stream
                                             1.1 reference buf1 for writing
                                                 through dma-buf-mgr
                                            2. kick off GPU processing
3. qbuf buf1 into V4L2
                                            3. reference buf1 for reading
                                             3.1 wait for fence from GPU to
                                                 signal
                                            4. kick off V4L2 processing

So you don't need to wait in userspace and potentially avoid some
context switches, but you still have to make sure that GPU commands are
queued before you queue the V4L2 operation to make sure things get
operated on in the right order.

Regards,
Lucas

-- 
Pengutronix e.K.                           | Lucas Stach                 |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


^ permalink raw reply

* Re: MMP display subsystem questions
From: Zhou Zhu @ 2013-06-21  2:41 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <CAMLZHHTmSzzcL-aqk6rMjuD8Ct2zno--p7tUGfVVsocWiyhf8A@mail.gmail.com>

Hi, Daniel,

Thank you for using our soc/drivers and thank you for your suggestions.
Please see our comments below.

On 06/20/2013 01:26 AM, Daniel Drake wrote:
> Is there a design document available somewhere? There are very few
> explanatory comments in the code. I am interested in definitions and
> explanations of:
>   - Paths
>   - Overlays
>   - dmafetch (part of overlay configuration)
>   - path_config (part of path configuration)
>   - link_config (part of path configuration)
>
> Looking at the code for answers, I am a little confused.
I admit it's an issue of no documents. We would submit a patch to add 
description under Documentation soon later.
>
> A path is an output device (e.g. a panel or TV). There can be multiple
> outputs connected to the display controller. At the moment I am just
> working with a single dumb panel.
>
> A path can have many overlays, and according to the original commit
> message, an overlay is a buffer displayed on the path's output device.
> However, I simply cannot see how multiple overlays would be supported
> on a path. For example let's look at this function:
>
> static int overlay_set_addr(struct mmp_overlay *overlay, struct mmp_addr *addr)
> {
> struct lcd_regs *regs = path_regs(overlay->path);
>
> /* FIXME: assert addr supported */
> memcpy(&overlay->addr, addr, sizeof(struct mmp_win));
> writel(addr->phys[0], &regs->g_0);
>
> return overlay->addr.phys[0];
> }
>
> The overlay's address is written to a register that is specific to
> it's parent path. If we are dealing with two overlays, how does this
> work? We write both overlay addresses to the same register and the
> second one "wins"? I checked all of the other overlay-related
> functions and they all defer to the parent path as well.
>
> As an experiment I modified platform data to suggest that my path has
> 98 overlays and the framebuffer should run on overlay 33. Nonsense,
> but the framebuffer booted up as normal. It seems like there is
> something missing in this path/overlay relationship.
>
> What is the plan going forward for this overlay management? At the
> moment there is only one consumer of overlays in the kernel - the
> framebuffer driver. And the framebuffer will only ever use one. Who
> are the other intended in-kernel users of overlays?
We have already pushed patches to fix this issue and support overlays. 
Please see patches below:
[V2 7/7] video: mmp: add video layer set win/addr operations support
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-June/175071.html
>
> Moving onto dmafetch_id. Can't find any documentation here. It looks
> like the only meaningful values are 0 and 1, and the only difference
> is that overlay_is_vid() returns 1 if dmafetch_id is 1, and 0
> otherwise. This causes some minor differences in the way registers get
> programmed, but I don't understand exactly what the consequences are
> here. My framebuffer works with dmafetch_id as both 0 and 1.
Yes it's a legacy config there. Please ignore it. We would submit a new 
patch totally remove it soon later.
>
> path_config: this value gets written to registers LCD_SCLK *and*
> IOPAD_CONTROL. The bit definitions in these registers are totally
> different. It seems like nonsense to share the same configuration
> value between two diversely different registers - what is going on
> here?
>
> link_config: Seems to have a dual meaning. Upper 4 bits can be used to
> specify dumb panel mode in registers like LCD_DUMB_CTRL. Bit 0 is
> totally unrelated and triggers "swap RB" i.e. use BGR instead of RGB,
> when dealing with totally different registers. Other bits are ignored.
> If "disorganised" bitfields are to be haphazardly invented in this way
> it would at least be nice to have documentation.
Actually we are using mixed configs here and it is not so clear.
We may update these config to separated configs with more clearly meanings.
Currently 4 fields is here:
SCLK source: select source clock for sclk. We have plan to remove it and 
move the source selection into common clock driver.
IOPAD CONTROL: for iopad control
DUMB CONTROL: for dumb panel mode
RB SWAP: for rb swaps in link. Actually it's do required as for some 
panels would swap r/b links.
We have 2 swaps there, one is after dma fetch for each overlay and 
another is in output after overlay mixing.
We could not simply use bgr format as input might be yuv for which yvu 
would not simple converted to bgr.
Also for input/output swap distinguish, a fix patch is there:
[V2 1/7] video: mmp: rb swap setting update for new LCD driver
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-June/175070.html
We would push patches to make these configs more clear soon later and 
also these description would be added into adding Documentation patch.
>
> Moving onto the devicetree definition. It is clear that representing
> the display controller and the panel in the device tree makes sense;
> this can cause the appropriate probes to happen. But we additionally
> need to trigger the probe of the framebuffer driver. I am not sure if
> the framebuffer that Linux will setup is something that should be
> represented in the device tree (remember that we already have display
> controller and panel). Looking at the configuration items needed to
> setup the framebuffer:
>
> 1. Name - doesn't really matter.
>
> 2. Pixel format - surely this question applies to all framebuffer
> drivers. I guess there is a default, plus a standard framebuffer
> interface that allows it to be changed?
Yes you are right, it's just a default value and would be changed after 
system boot.
>
> 3. Which path to use - how do other framebuffer drivers deal with
> this? The question here is basically do we output to the laptop's
> inbuilt LCD panel, or to the externally connected HDMI TV? Again,
> maybe we can choose a sensible default and allow runtime
> configuration.
Actually in our usage mode, we'd rather prefer to make one fb connected 
to one path.
In many usage mode, we may need 2 or more devices (panel, external 
tv...) turned on together so 2 or more fb devices are required to 
connect to separate path.
>
> 4. Which overlay to use - according to my investigation above, this
> doesn't actually have any meaningful effect on the driver.
As overlay enabled in patch above now, this config is required to make 
fb show on some overlay and other interface(v4l2, private interface) to 
use others.
>
> 5. dmafetch_id - its not clear to me what this is.
>
>
> So, I don't think the framebuffer is something that should be defined
> by the platform (or the device tree). Instead, maybe it should just be
> initialized with good defaults when the first path becomes ready for
> use.
So according to comments above, fb buffer is still need to be configure 
in dts for path/overlay configs.
Also there still might be some settings in fb for some further settings 
like vsync usage or yres_virtual size in coming patches.
>
> Any clarifications appreciated.
>
> Thanks
> Daniel
>

-- 
Thanks,
-Zhou

^ permalink raw reply

* Re: [PATCH resend] fbdev: bfin-lq035q1-fb: Use dev_pm_ops
From: Jingoo Han @ 2013-06-21  0:02 UTC (permalink / raw)
  To: linux-fbdev
In-Reply-To: <1371746326-5087-1-git-send-email-lars@metafoo.de>

On Friday, June 21, 2013 1:39 AM, Lars-Peter Clausen wrote:
> Use dev_pm_ops instead of the legacy suspend/resume callbacks.
> 
> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
> Acked-by: Michael Hennerich <michael.hennerich@analog.com>

It looks good.

> ---
> Since Rafael said he won't take the pm_sleep_ops_ptr() macro, just a resend of
> the previous version of the patch.

Yep, other better and nicer scheme will be merged. :)

Best regards,
Jingoo Han

> ---
>  drivers/video/bfin-lq035q1-fb.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/video/bfin-lq035q1-fb.c b/drivers/video/bfin-lq035q1-fb.c
> index 29d8c04..6084c17 100644
> --- a/drivers/video/bfin-lq035q1-fb.c
> +++ b/drivers/video/bfin-lq035q1-fb.c
> @@ -170,16 +170,19 @@ static int lq035q1_spidev_remove(struct spi_device *spi)
>  	return lq035q1_control(spi, LQ035_SHUT_CTL, LQ035_SHUT);
>  }
> 
> -#ifdef CONFIG_PM
> -static int lq035q1_spidev_suspend(struct spi_device *spi, pm_message_t state)
> +#ifdef CONFIG_PM_SLEEP
> +static int lq035q1_spidev_suspend(struct device *dev)
>  {
> +	struct spi_device *spi = to_spi_device(dev);
> +
>  	return lq035q1_control(spi, LQ035_SHUT_CTL, LQ035_SHUT);
>  }
> 
> -static int lq035q1_spidev_resume(struct spi_device *spi)
> +static int lq035q1_spidev_resume(struct device *dev)
>  {
> -	int ret;
> +	struct spi_device *spi = to_spi_device(dev);
>  	struct spi_control *ctl = spi_get_drvdata(spi);
> +	int ret;
> 
>  	ret = lq035q1_control(spi, LQ035_DRIVER_OUTPUT_CTL, ctl->mode);
>  	if (ret)
> @@ -187,9 +190,13 @@ static int lq035q1_spidev_resume(struct spi_device *spi)
> 
>  	return lq035q1_control(spi, LQ035_SHUT_CTL, LQ035_ON);
>  }
> +
> +static SIMPLE_DEV_PM_OPS(lq035q1_spidev_pm_ops, lq035q1_spidev_suspend,
> +	lq035q1_spidev_resume);
> +#define LQ035Q1_SPIDEV_PM_OPS (&lq035q1_spidev_pm_ops)
> +
>  #else
> -# define lq035q1_spidev_suspend NULL
> -# define lq035q1_spidev_resume  NULL
> +#define LQ035Q1_SPIDEV_PM_OPS NULL
>  #endif
> 
>  /* Power down all displays on reboot, poweroff or halt */
> @@ -708,8 +715,7 @@ static int bfin_lq035q1_probe(struct platform_device *pdev)
>  	info->spidrv.probe    = lq035q1_spidev_probe;
>  	info->spidrv.remove   = lq035q1_spidev_remove;
>  	info->spidrv.shutdown = lq035q1_spidev_shutdown;
> -	info->spidrv.suspend  = lq035q1_spidev_suspend;
> -	info->spidrv.resume   = lq035q1_spidev_resume;
> +	info->spidrv.driver.pm = LQ035Q1_SPIDEV_PM_OPS;
> 
>  	ret = spi_register_driver(&info->spidrv);
>  	if (ret < 0) {
> --
> 1.8.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* [PATCH resend] fbdev: bfin-lq035q1-fb: Use dev_pm_ops
From: Lars-Peter Clausen @ 2013-06-20 16:38 UTC (permalink / raw)
  To: linux-fbdev

Use dev_pm_ops instead of the legacy suspend/resume callbacks.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Acked-by: Michael Hennerich <michael.hennerich@analog.com>
---
Since Rafael said he won't take the pm_sleep_ops_ptr() macro, just a resend of
the previous version of the patch.
---
 drivers/video/bfin-lq035q1-fb.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/video/bfin-lq035q1-fb.c b/drivers/video/bfin-lq035q1-fb.c
index 29d8c04..6084c17 100644
--- a/drivers/video/bfin-lq035q1-fb.c
+++ b/drivers/video/bfin-lq035q1-fb.c
@@ -170,16 +170,19 @@ static int lq035q1_spidev_remove(struct spi_device *spi)
 	return lq035q1_control(spi, LQ035_SHUT_CTL, LQ035_SHUT);
 }
 
-#ifdef CONFIG_PM
-static int lq035q1_spidev_suspend(struct spi_device *spi, pm_message_t state)
+#ifdef CONFIG_PM_SLEEP
+static int lq035q1_spidev_suspend(struct device *dev)
 {
+	struct spi_device *spi = to_spi_device(dev);
+
 	return lq035q1_control(spi, LQ035_SHUT_CTL, LQ035_SHUT);
 }
 
-static int lq035q1_spidev_resume(struct spi_device *spi)
+static int lq035q1_spidev_resume(struct device *dev)
 {
-	int ret;
+	struct spi_device *spi = to_spi_device(dev);
 	struct spi_control *ctl = spi_get_drvdata(spi);
+	int ret;
 
 	ret = lq035q1_control(spi, LQ035_DRIVER_OUTPUT_CTL, ctl->mode);
 	if (ret)
@@ -187,9 +190,13 @@ static int lq035q1_spidev_resume(struct spi_device *spi)
 
 	return lq035q1_control(spi, LQ035_SHUT_CTL, LQ035_ON);
 }
+
+static SIMPLE_DEV_PM_OPS(lq035q1_spidev_pm_ops, lq035q1_spidev_suspend,
+	lq035q1_spidev_resume);
+#define LQ035Q1_SPIDEV_PM_OPS (&lq035q1_spidev_pm_ops)
+
 #else
-# define lq035q1_spidev_suspend NULL
-# define lq035q1_spidev_resume  NULL
+#define LQ035Q1_SPIDEV_PM_OPS NULL
 #endif
 
 /* Power down all displays on reboot, poweroff or halt */
@@ -708,8 +715,7 @@ static int bfin_lq035q1_probe(struct platform_device *pdev)
 	info->spidrv.probe    = lq035q1_spidev_probe;
 	info->spidrv.remove   = lq035q1_spidev_remove;
 	info->spidrv.shutdown = lq035q1_spidev_shutdown;
-	info->spidrv.suspend  = lq035q1_spidev_suspend;
-	info->spidrv.resume   = lq035q1_spidev_resume;
+	info->spidrv.driver.pm = LQ035Q1_SPIDEV_PM_OPS;
 
 	ret = spi_register_driver(&info->spidrv);
 	if (ret < 0) {
-- 
1.8.0


^ permalink raw reply related

* RE: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework
From: Inki Dae @ 2013-06-20 11:15 UTC (permalink / raw)
  To: linux-fbdev
In-Reply-To: <1371467722-665-1-git-send-email-inki.dae@samsung.com>



> -----Original Message-----
> From: Lucas Stach [mailto:l.stach@pengutronix.de]
> Sent: Thursday, June 20, 2013 7:11 PM
> To: Inki Dae
> Cc: 'Russell King - ARM Linux'; 'Inki Dae'; 'linux-fbdev'; 'YoungJun Cho';
> 'Kyungmin Park'; 'myungjoo.ham'; 'DRI mailing list'; linux-arm-
> kernel@lists.infradead.org; linux-media@vger.kernel.org
> Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization
> framework
> 
> Am Donnerstag, den 20.06.2013, 17:24 +0900 schrieb Inki Dae:
> [...]
> > > > In addition, please see the below more detail examples.
> > > >
> > > > The conventional way (without dmabuf-sync) is:
> > > > Task A
> > > > ----------------------------
> > > >  1. CPU accesses buf
> > > >  2. Send the buf to Task B
> > > >  3. Wait for the buf from Task B
> > > >  4. go to 1
> > > >
> > > > Task B
> > > > ---------------------------
> > > > 1. Wait for the buf from Task A
> > > > 2. qbuf the buf
> > > >     2.1 insert the buf to incoming queue
> > > > 3. stream on
> > > >     3.1 dma_map_sg if ready, and move the buf to ready queue
> > > >     3.2 get the buf from ready queue, and dma start.
> > > > 4. dqbuf
> > > >     4.1 dma_unmap_sg after dma operation completion
> > > >     4.2 move the buf to outgoing queue
> > > > 5. back the buf to Task A
> > > > 6. go to 1
> > > >
> > > > In case that two tasks share buffers, and data flow goes from Task A
> to
> > > Task
> > > > B, we would need IPC operation to send and receive buffers properly
> > > between
> > > > those two tasks every time CPU or DMA access to buffers is started
> or
> > > > completed.
> > > >
> > > >
> > > > With dmabuf-sync:
> > > >
> > > > Task A
> > > > ----------------------------
> > > >  1. dma_buf_sync_lock <- synpoint (call by user side)
> > > >  2. CPU accesses buf
> > > >  3. dma_buf_sync_unlock <- syncpoint (call by user side)
> > > >  4. Send the buf to Task B (just one time)
> > > >  5. go to 1
> > > >
> > > >
> > > > Task B
> > > > ---------------------------
> > > > 1. Wait for the buf from Task A (just one time)
> > > > 2. qbuf the buf
> > > >     1.1 insert the buf to incoming queue
> > > > 3. stream on
> > > >     3.1 dma_buf_sync_lock <- syncpoint (call by kernel side)
> > > >     3.2 dma_map_sg if ready, and move the buf to ready queue
> > > >     3.3 get the buf from ready queue, and dma start.
> > > > 4. dqbuf
> > > >     4.1 dma_buf_sync_unlock <- syncpoint (call by kernel side)
> > > >     4.2 dma_unmap_sg after dma operation completion
> > > >     4.3 move the buf to outgoing queue
> > > > 5. go to 1
> > > >
> > > > On the other hand, in case of using dmabuf-sync, as you can see the
> > > above
> > > > example, we would need IPC operation just one time. That way, I
> think we
> > > > could not only reduce performance overhead but also make user
> > > application
> > > > simplified. Of course, this approach can be used for all DMA device
> > > drivers
> > > > such as DRM. I'm not a specialist in v4l2 world so there may be
> missing
> > > > point.
> > > >
> > >
> > > You already need some kind of IPC between the two tasks, as I suspect
> > > even in your example it wouldn't make much sense to queue the buffer
> > > over and over again in task B without task A writing anything to it.
> So
> > > task A has to signal task B there is new data in the buffer to be
> > > processed.
> > >
> > > There is no need to share the buffer over and over again just to get
> the
> > > two processes to work together on the same thing. Just share the fd
> > > between both and then do out-of-band completion signaling, as you need
> > > this anyway. Without this you'll end up with unpredictable behavior.
> > > Just because sync allows you to access the buffer doesn't mean it's
> > > valid for your use-case. Without completion signaling you could easily
> > > end up overwriting your data from task A multiple times before task B
> > > even tries to lock the buffer for processing.
> > >
> > > So the valid flow is (and this already works with the current APIs):
> > > Task A                                    Task B
> > > ------                                    ------
> > > CPU access buffer
> > >          ----------completion signal--------->
> > >                                           qbuf (dragging buffer into
> > >                                           device domain, flush caches,
> > >                                           reserve buffer etc.)
> > >                                                     |
> > >                                           wait for device operation to
> > >                                           complete
> > >                                                     |
> > >                                           dqbuf (dragging buffer back
> > >                                           into CPU domain, invalidate
> > >                                           caches, unreserve)
> > >         <---------completion signal------------
> > > CPU access buffer
> > >
> >
> > Correct. In case that data flow goes from A to B, it needs some kind
> > of IPC between the two tasks every time as you said. Then, without
> > dmabuf-sync, how do think about the case that two tasks share the same
> > buffer but these tasks access the buffer(buf1) as write, and data of
> > the buffer(buf1) isn't needed to be shared?
> >
> Sorry, I don't see the point you are trying to solve here. If you share
> a buffer and want its content to be clearly defined at every point in
> time you have to synchronize the tasks working with the buffer, not just
> the buffer accesses itself.
> 
> Easiest way to do so is doing sync through userspace with out-of-band
> IPC, like in the example above.

In my opinion, that's not definitely easiest way. What I try to do is to avoid using *the out-of-band IPC*. As I mentioned in document file, the conventional mechanism not only makes user application complicated-user process needs to understand how the device driver is worked-but also may incur performance overhead by using the out-of-band IPC. The above my example may not be enough to you but there would be other cases able to use my approach efficiently.

> A more advanced way to achieve this
> would be using cross-device fences to avoid going through userspace for
> every syncpoint.
> 

Ok, maybe there is something I missed. So question. What is the cross-device fences? dma fence?. And how we can achieve the synchronization mechanism without going through user space for every syncpoint; CPU and DMA share a same buffer?. And could you explain it in detail as long as possible like I did?

> >
> > With dmabuf-sync is:
> >
> >  Task A
> >  ----------------------------
> >  1. dma_buf_sync_lock <- synpoint (call by user side)
> >  2. CPU writes something to buf1
> >  3. dma_buf_sync_unlock <- syncpoint (call by user side)
> >  4. copy buf1 to buf2
> Random contents here? What's in the buffer, content from the CPU write,
> or from V4L2 device write?
> 

Please presume that buf1 is physically non contiguous memory, and buf2 is physically contiguous memory; device A without IOMMU is seeing buf2. We would need to copy buf1 to buf2 to send the contents of the buf1 to device A because DMA of the device A cannot access the buf1 directly. And CPU and V4L2 device don't share the contents of the buf1 but share the buf1 as storage.

Thanks,
Inki Dae

> >  5. go to 1
> >
> >
> >  Task B
> >  ---------------------------
> >  1. dma_buf_sync_lock
> >  2. CPU writes something to buf3
> >  3. dma_buf_sync_unlock
> >  4. qbuf the buf3(src) and buf1(dst)
> >      4.1 insert buf3,1 to incoming queue
> >      4.2 dma_buf_sync_lock <- syncpoint (call by kernel side)
> >  5. stream on
> >      5.1 dma_map_sg if ready, and move the buf to ready queue
> >      5.2 get the buf from ready queue, and dma start.
> >  6. dqbuf
> >      6.1 dma_buf_sync_unlock <- syncpoint (call by kernel side)
> >      6.2 dma_unmap_sg after dma operation completion
> >      6.3 move the buf3,1 to outgoing queue
> > 7. go to 1
> >
> 
> Regards,
> Lucas
> --
> Pengutronix e.K.                           | Lucas Stach                 |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


^ permalink raw reply

* Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework
From: Lucas Stach @ 2013-06-20 10:11 UTC (permalink / raw)
  To: Inki Dae
  Cc: 'Russell King - ARM Linux', 'Inki Dae',
	'linux-fbdev', 'YoungJun Cho',
	'Kyungmin Park', 'myungjoo.ham',
	'DRI mailing list', linux-arm-kernel, linux-media
In-Reply-To: <00db01ce6d8f$a3c23dd0$eb46b970$%dae@samsung.com>

Am Donnerstag, den 20.06.2013, 17:24 +0900 schrieb Inki Dae:
[...]
> > > In addition, please see the below more detail examples.
> > >
> > > The conventional way (without dmabuf-sync) is:
> > > Task A
> > > ----------------------------
> > >  1. CPU accesses buf
> > >  2. Send the buf to Task B
> > >  3. Wait for the buf from Task B
> > >  4. go to 1
> > >
> > > Task B
> > > ---------------------------
> > > 1. Wait for the buf from Task A
> > > 2. qbuf the buf
> > >     2.1 insert the buf to incoming queue
> > > 3. stream on
> > >     3.1 dma_map_sg if ready, and move the buf to ready queue
> > >     3.2 get the buf from ready queue, and dma start.
> > > 4. dqbuf
> > >     4.1 dma_unmap_sg after dma operation completion
> > >     4.2 move the buf to outgoing queue
> > > 5. back the buf to Task A
> > > 6. go to 1
> > >
> > > In case that two tasks share buffers, and data flow goes from Task A to
> > Task
> > > B, we would need IPC operation to send and receive buffers properly
> > between
> > > those two tasks every time CPU or DMA access to buffers is started or
> > > completed.
> > >
> > >
> > > With dmabuf-sync:
> > >
> > > Task A
> > > ----------------------------
> > >  1. dma_buf_sync_lock <- synpoint (call by user side)
> > >  2. CPU accesses buf
> > >  3. dma_buf_sync_unlock <- syncpoint (call by user side)
> > >  4. Send the buf to Task B (just one time)
> > >  5. go to 1
> > >
> > >
> > > Task B
> > > ---------------------------
> > > 1. Wait for the buf from Task A (just one time)
> > > 2. qbuf the buf
> > >     1.1 insert the buf to incoming queue
> > > 3. stream on
> > >     3.1 dma_buf_sync_lock <- syncpoint (call by kernel side)
> > >     3.2 dma_map_sg if ready, and move the buf to ready queue
> > >     3.3 get the buf from ready queue, and dma start.
> > > 4. dqbuf
> > >     4.1 dma_buf_sync_unlock <- syncpoint (call by kernel side)
> > >     4.2 dma_unmap_sg after dma operation completion
> > >     4.3 move the buf to outgoing queue
> > > 5. go to 1
> > >
> > > On the other hand, in case of using dmabuf-sync, as you can see the
> > above
> > > example, we would need IPC operation just one time. That way, I think we
> > > could not only reduce performance overhead but also make user
> > application
> > > simplified. Of course, this approach can be used for all DMA device
> > drivers
> > > such as DRM. I'm not a specialist in v4l2 world so there may be missing
> > > point.
> > >
> > 
> > You already need some kind of IPC between the two tasks, as I suspect
> > even in your example it wouldn't make much sense to queue the buffer
> > over and over again in task B without task A writing anything to it. So
> > task A has to signal task B there is new data in the buffer to be
> > processed.
> > 
> > There is no need to share the buffer over and over again just to get the
> > two processes to work together on the same thing. Just share the fd
> > between both and then do out-of-band completion signaling, as you need
> > this anyway. Without this you'll end up with unpredictable behavior.
> > Just because sync allows you to access the buffer doesn't mean it's
> > valid for your use-case. Without completion signaling you could easily
> > end up overwriting your data from task A multiple times before task B
> > even tries to lock the buffer for processing.
> > 
> > So the valid flow is (and this already works with the current APIs):
> > Task A                                    Task B
> > ------                                    ------
> > CPU access buffer
> >          ----------completion signal--------->
> >                                           qbuf (dragging buffer into
> >                                           device domain, flush caches,
> >                                           reserve buffer etc.)
> >                                                     |
> >                                           wait for device operation to
> >                                           complete
> >                                                     |
> >                                           dqbuf (dragging buffer back
> >                                           into CPU domain, invalidate
> >                                           caches, unreserve)
> >         <---------completion signal------------
> > CPU access buffer
> > 
> 
> Correct. In case that data flow goes from A to B, it needs some kind
> of IPC between the two tasks every time as you said. Then, without
> dmabuf-sync, how do think about the case that two tasks share the same
> buffer but these tasks access the buffer(buf1) as write, and data of
> the buffer(buf1) isn't needed to be shared?
> 
Sorry, I don't see the point you are trying to solve here. If you share
a buffer and want its content to be clearly defined at every point in
time you have to synchronize the tasks working with the buffer, not just
the buffer accesses itself.

Easiest way to do so is doing sync through userspace with out-of-band
IPC, like in the example above. A more advanced way to achieve this
would be using cross-device fences to avoid going through userspace for
every syncpoint.

> 
> With dmabuf-sync is:
> 
>  Task A
>  ----------------------------
>  1. dma_buf_sync_lock <- synpoint (call by user side)
>  2. CPU writes something to buf1
>  3. dma_buf_sync_unlock <- syncpoint (call by user side)
>  4. copy buf1 to buf2
Random contents here? What's in the buffer, content from the CPU write,
or from V4L2 device write?

>  5. go to 1
> 
> 
>  Task B
>  ---------------------------
>  1. dma_buf_sync_lock
>  2. CPU writes something to buf3
>  3. dma_buf_sync_unlock
>  4. qbuf the buf3(src) and buf1(dst)
>      4.1 insert buf3,1 to incoming queue
>      4.2 dma_buf_sync_lock <- syncpoint (call by kernel side)
>  5. stream on
>      5.1 dma_map_sg if ready, and move the buf to ready queue
>      5.2 get the buf from ready queue, and dma start.
>  6. dqbuf
>      6.1 dma_buf_sync_unlock <- syncpoint (call by kernel side)
>      6.2 dma_unmap_sg after dma operation completion
>      6.3 move the buf3,1 to outgoing queue
> 7. go to 1
> 

Regards,
Lucas
-- 
Pengutronix e.K.                           | Lucas Stach                 |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


^ permalink raw reply

* Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework
From: Lucas Stach @ 2013-06-20  8:26 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Inki Dae, 'Inki Dae', 'linux-fbdev',
	'YoungJun Cho', 'Kyungmin Park',
	'myungjoo.ham', 'DRI mailing list',
	linux-arm-kernel, linux-media
In-Reply-To: <20130620081733.GN2718@n2100.arm.linux.org.uk>

Am Donnerstag, den 20.06.2013, 09:17 +0100 schrieb Russell King - ARM
Linux:
> On Thu, Jun 20, 2013 at 09:47:07AM +0200, Lucas Stach wrote:
> > Am Donnerstag, den 20.06.2013, 15:43 +0900 schrieb Inki Dae:
> > > 
> > > > -----Original Message-----
> > > > From: dri-devel-bounces+inki.dae=samsung.com@lists.freedesktop.org
> > > > [mailto:dri-devel-bounces+inki.dae=samsung.com@lists.freedesktop.org] On
> > > > Behalf Of Russell King - ARM Linux
> > > > Sent: Thursday, June 20, 2013 3:29 AM
> > > > To: Inki Dae
> > > > Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun
> > > > Cho; linux-media@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> > > > Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization
> > > > framework
> > > > 
> > > > On Thu, Jun 20, 2013 at 12:10:04AM +0900, Inki Dae wrote:
> > > > > On the other hand, the below shows how we could enhance the conventional
> > > > > way with my approach (just example):
> > > > >
> > > > > CPU -> DMA,
> > > > >         ioctl(qbuf command)              ioctl(streamon)
> > > > >               |                                               |
> > > > >               |                                               |
> > > > >         qbuf  <- dma_buf_sync_get   start streaming <- syncpoint
> > > > >
> > > > > dma_buf_sync_get just registers a sync buffer(dmabuf) to sync object.
> > > > And
> > > > > the syncpoint is performed by calling dma_buf_sync_lock(), and then DMA
> > > > > accesses the sync buffer.
> > > > >
> > > > > And DMA -> CPU,
> > > > >         ioctl(dqbuf command)
> > > > >               |
> > > > >               |
> > > > >         dqbuf <- nothing to do
> > > > >
> > > > > Actual syncpoint is when DMA operation is completed (in interrupt
> > > > handler):
> > > > > the syncpoint is performed by calling dma_buf_sync_unlock().
> > > > > Hence,  my approach is to move the syncpoints into just before dma
> > > > access
> > > > > as long as possible.
> > > > 
> > > > What you've just described does *not* work on architectures such as
> > > > ARMv7 which do speculative cache fetches from memory at any time that
> > > > that memory is mapped with a cacheable status, and will lead to data
> > > > corruption.
> > > 
> > > I didn't explain that enough. Sorry about that. 'nothing to do' means that a
> > > dmabuf sync interface isn't called but existing functions are called. So
> > > this may be explained again:
> > >         ioctl(dqbuf command)
> > >             |
> > >             |
> > >         dqbuf <- 1. dma_unmap_sg
> > >                     2. dma_buf_sync_unlock (syncpoint)
> > > 
> > > The syncpoint I mentioned means lock mechanism; not doing cache operation.
> > > 
> > > In addition, please see the below more detail examples.
> > > 
> > > The conventional way (without dmabuf-sync) is:
> > > Task A                             
> > > ----------------------------
> > >  1. CPU accesses buf          
> > >  2. Send the buf to Task B  
> > >  3. Wait for the buf from Task B
> > >  4. go to 1
> > > 
> > > Task B
> > > ---------------------------
> > > 1. Wait for the buf from Task A
> > > 2. qbuf the buf                 
> > >     2.1 insert the buf to incoming queue
> > > 3. stream on
> > >     3.1 dma_map_sg if ready, and move the buf to ready queue
> > >     3.2 get the buf from ready queue, and dma start.
> > > 4. dqbuf
> > >     4.1 dma_unmap_sg after dma operation completion
> > >     4.2 move the buf to outgoing queue
> > > 5. back the buf to Task A
> > > 6. go to 1
> > > 
> > > In case that two tasks share buffers, and data flow goes from Task A to Task
> > > B, we would need IPC operation to send and receive buffers properly between
> > > those two tasks every time CPU or DMA access to buffers is started or
> > > completed.
> > > 
> > > 
> > > With dmabuf-sync:
> > > 
> > > Task A                             
> > > ----------------------------
> > >  1. dma_buf_sync_lock <- synpoint (call by user side)
> > >  2. CPU accesses buf          
> > >  3. dma_buf_sync_unlock <- syncpoint (call by user side)
> > >  4. Send the buf to Task B (just one time)
> > >  5. go to 1
> > > 
> > > 
> > > Task B
> > > ---------------------------
> > > 1. Wait for the buf from Task A (just one time)
> > > 2. qbuf the buf                 
> > >     1.1 insert the buf to incoming queue
> > > 3. stream on
> > >     3.1 dma_buf_sync_lock <- syncpoint (call by kernel side)
> > >     3.2 dma_map_sg if ready, and move the buf to ready queue
> > >     3.3 get the buf from ready queue, and dma start.
> > > 4. dqbuf
> > >     4.1 dma_buf_sync_unlock <- syncpoint (call by kernel side)
> > >     4.2 dma_unmap_sg after dma operation completion
> > >     4.3 move the buf to outgoing queue
> > > 5. go to 1
> > > 
> > > On the other hand, in case of using dmabuf-sync, as you can see the above
> > > example, we would need IPC operation just one time. That way, I think we
> > > could not only reduce performance overhead but also make user application
> > > simplified. Of course, this approach can be used for all DMA device drivers
> > > such as DRM. I'm not a specialist in v4l2 world so there may be missing
> > > point.
> > > 
> > 
> > You already need some kind of IPC between the two tasks, as I suspect
> > even in your example it wouldn't make much sense to queue the buffer
> > over and over again in task B without task A writing anything to it. So
> > task A has to signal task B there is new data in the buffer to be
> > processed.
> 
> Hang on.  Since when did dma_buf become another inter-process IPC
> mechanism?  That's *not* it's design goal, and there's other much
> better mechanisms already provided.
> 
That's why I said out-of-band completion signaling, particularly not
bound to the dma-buf itself.

My thinking was more along the lines of the wayland protocol, where one
process tells the compositor to use a buf as the pixel data for the next
frame and promises not to access it while the compositor uses it. When
the compositor finishes it tells the client that it's now free to reuse
the buffer. None of this is bound to the dma-buf.

Regards,
Lucas
-- 
Pengutronix e.K.                           | Lucas Stach                 |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


^ permalink raw reply

* RE: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework
From: Inki Dae @ 2013-06-20  8:24 UTC (permalink / raw)
  To: linux-fbdev
In-Reply-To: <1371467722-665-1-git-send-email-inki.dae@samsung.com>



> -----Original Message-----
> From: Lucas Stach [mailto:l.stach@pengutronix.de]
> Sent: Thursday, June 20, 2013 4:47 PM
> To: Inki Dae
> Cc: 'Russell King - ARM Linux'; 'Inki Dae'; 'linux-fbdev'; 'YoungJun Cho';
> 'Kyungmin Park'; 'myungjoo.ham'; 'DRI mailing list'; linux-arm-
> kernel@lists.infradead.org; linux-media@vger.kernel.org
> Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization
> framework
> 
> Am Donnerstag, den 20.06.2013, 15:43 +0900 schrieb Inki Dae:
> >
> > > -----Original Message-----
> > > From: dri-devel-bounces+inki.dae=samsung.com@lists.freedesktop.org
> > > [mailto:dri-devel-bounces+inki.dae=samsung.com@lists.freedesktop.org]
> On
> > > Behalf Of Russell King - ARM Linux
> > > Sent: Thursday, June 20, 2013 3:29 AM
> > > To: Inki Dae
> > > Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham;
> YoungJun
> > > Cho; linux-media@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> > > Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer
> synchronization
> > > framework
> > >
> > > On Thu, Jun 20, 2013 at 12:10:04AM +0900, Inki Dae wrote:
> > > > On the other hand, the below shows how we could enhance the
> conventional
> > > > way with my approach (just example):
> > > >
> > > > CPU -> DMA,
> > > >         ioctl(qbuf command)              ioctl(streamon)
> > > >               |                                               |
> > > >               |                                               |
> > > >         qbuf  <- dma_buf_sync_get   start streaming <- syncpoint
> > > >
> > > > dma_buf_sync_get just registers a sync buffer(dmabuf) to sync object.
> > > And
> > > > the syncpoint is performed by calling dma_buf_sync_lock(), and then
> DMA
> > > > accesses the sync buffer.
> > > >
> > > > And DMA -> CPU,
> > > >         ioctl(dqbuf command)
> > > >               |
> > > >               |
> > > >         dqbuf <- nothing to do
> > > >
> > > > Actual syncpoint is when DMA operation is completed (in interrupt
> > > handler):
> > > > the syncpoint is performed by calling dma_buf_sync_unlock().
> > > > Hence,  my approach is to move the syncpoints into just before dma
> > > access
> > > > as long as possible.
> > >
> > > What you've just described does *not* work on architectures such as
> > > ARMv7 which do speculative cache fetches from memory at any time that
> > > that memory is mapped with a cacheable status, and will lead to data
> > > corruption.
> >
> > I didn't explain that enough. Sorry about that. 'nothing to do' means
> that a
> > dmabuf sync interface isn't called but existing functions are called. So
> > this may be explained again:
> >         ioctl(dqbuf command)
> >             |
> >             |
> >         dqbuf <- 1. dma_unmap_sg
> >                     2. dma_buf_sync_unlock (syncpoint)
> >
> > The syncpoint I mentioned means lock mechanism; not doing cache
> operation.
> >
> > In addition, please see the below more detail examples.
> >
> > The conventional way (without dmabuf-sync) is:
> > Task A
> > ----------------------------
> >  1. CPU accesses buf
> >  2. Send the buf to Task B
> >  3. Wait for the buf from Task B
> >  4. go to 1
> >
> > Task B
> > ---------------------------
> > 1. Wait for the buf from Task A
> > 2. qbuf the buf
> >     2.1 insert the buf to incoming queue
> > 3. stream on
> >     3.1 dma_map_sg if ready, and move the buf to ready queue
> >     3.2 get the buf from ready queue, and dma start.
> > 4. dqbuf
> >     4.1 dma_unmap_sg after dma operation completion
> >     4.2 move the buf to outgoing queue
> > 5. back the buf to Task A
> > 6. go to 1
> >
> > In case that two tasks share buffers, and data flow goes from Task A to
> Task
> > B, we would need IPC operation to send and receive buffers properly
> between
> > those two tasks every time CPU or DMA access to buffers is started or
> > completed.
> >
> >
> > With dmabuf-sync:
> >
> > Task A
> > ----------------------------
> >  1. dma_buf_sync_lock <- synpoint (call by user side)
> >  2. CPU accesses buf
> >  3. dma_buf_sync_unlock <- syncpoint (call by user side)
> >  4. Send the buf to Task B (just one time)
> >  5. go to 1
> >
> >
> > Task B
> > ---------------------------
> > 1. Wait for the buf from Task A (just one time)
> > 2. qbuf the buf
> >     1.1 insert the buf to incoming queue
> > 3. stream on
> >     3.1 dma_buf_sync_lock <- syncpoint (call by kernel side)
> >     3.2 dma_map_sg if ready, and move the buf to ready queue
> >     3.3 get the buf from ready queue, and dma start.
> > 4. dqbuf
> >     4.1 dma_buf_sync_unlock <- syncpoint (call by kernel side)
> >     4.2 dma_unmap_sg after dma operation completion
> >     4.3 move the buf to outgoing queue
> > 5. go to 1
> >
> > On the other hand, in case of using dmabuf-sync, as you can see the
> above
> > example, we would need IPC operation just one time. That way, I think we
> > could not only reduce performance overhead but also make user
> application
> > simplified. Of course, this approach can be used for all DMA device
> drivers
> > such as DRM. I'm not a specialist in v4l2 world so there may be missing
> > point.
> >
> 
> You already need some kind of IPC between the two tasks, as I suspect
> even in your example it wouldn't make much sense to queue the buffer
> over and over again in task B without task A writing anything to it. So
> task A has to signal task B there is new data in the buffer to be
> processed.
> 
> There is no need to share the buffer over and over again just to get the
> two processes to work together on the same thing. Just share the fd
> between both and then do out-of-band completion signaling, as you need
> this anyway. Without this you'll end up with unpredictable behavior.
> Just because sync allows you to access the buffer doesn't mean it's
> valid for your use-case. Without completion signaling you could easily
> end up overwriting your data from task A multiple times before task B
> even tries to lock the buffer for processing.
> 
> So the valid flow is (and this already works with the current APIs):
> Task A                                    Task B
> ------                                    ------
> CPU access buffer
>          ----------completion signal--------->
>                                           qbuf (dragging buffer into
>                                           device domain, flush caches,
>                                           reserve buffer etc.)
>                                                     |
>                                           wait for device operation to
>                                           complete
>                                                     |
>                                           dqbuf (dragging buffer back
>                                           into CPU domain, invalidate
>                                           caches, unreserve)
>         <---------completion signal------------
> CPU access buffer
> 

Correct. In case that data flow goes from A to B, it needs some kind of IPC between the two tasks every time as you said. Then, without dmabuf-sync, how do think about the case that two tasks share the same buffer but these tasks access the buffer(buf1) as write, and data of the buffer(buf1) isn't needed to be shared?


With dmabuf-sync is:

 Task A
 ----------------------------
 1. dma_buf_sync_lock <- synpoint (call by user side)
 2. CPU writes something to buf1
 3. dma_buf_sync_unlock <- syncpoint (call by user side)
 4. copy buf1 to buf2
 5. go to 1


 Task B
 ---------------------------
 1. dma_buf_sync_lock
 2. CPU writes something to buf3
 3. dma_buf_sync_unlock
 4. qbuf the buf3(src) and buf1(dst)
     4.1 insert buf3,1 to incoming queue
     4.2 dma_buf_sync_lock <- syncpoint (call by kernel side)
 5. stream on
     5.1 dma_map_sg if ready, and move the buf to ready queue
     5.2 get the buf from ready queue, and dma start.
 6. dqbuf
     6.1 dma_buf_sync_unlock <- syncpoint (call by kernel side)
     6.2 dma_unmap_sg after dma operation completion
     6.3 move the buf3,1 to outgoing queue
7. go to 1


Thanks,
Inki Dae

> 
> Regards,
> Lucas
> --
> Pengutronix e.K.                           | Lucas Stach                 |
> Industrial Linux Solutions                 | http://www.pengutronix.de/  |
> Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
> Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


^ permalink raw reply

* Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework
From: Russell King - ARM Linux @ 2013-06-20  8:17 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Inki Dae, 'Inki Dae', 'linux-fbdev',
	'YoungJun Cho', 'Kyungmin Park',
	'myungjoo.ham', 'DRI mailing list',
	linux-arm-kernel, linux-media
In-Reply-To: <1371714427.4230.64.camel@weser.hi.pengutronix.de>

On Thu, Jun 20, 2013 at 09:47:07AM +0200, Lucas Stach wrote:
> Am Donnerstag, den 20.06.2013, 15:43 +0900 schrieb Inki Dae:
> > 
> > > -----Original Message-----
> > > From: dri-devel-bounces+inki.dae=samsung.com@lists.freedesktop.org
> > > [mailto:dri-devel-bounces+inki.dae=samsung.com@lists.freedesktop.org] On
> > > Behalf Of Russell King - ARM Linux
> > > Sent: Thursday, June 20, 2013 3:29 AM
> > > To: Inki Dae
> > > Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun
> > > Cho; linux-media@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> > > Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization
> > > framework
> > > 
> > > On Thu, Jun 20, 2013 at 12:10:04AM +0900, Inki Dae wrote:
> > > > On the other hand, the below shows how we could enhance the conventional
> > > > way with my approach (just example):
> > > >
> > > > CPU -> DMA,
> > > >         ioctl(qbuf command)              ioctl(streamon)
> > > >               |                                               |
> > > >               |                                               |
> > > >         qbuf  <- dma_buf_sync_get   start streaming <- syncpoint
> > > >
> > > > dma_buf_sync_get just registers a sync buffer(dmabuf) to sync object.
> > > And
> > > > the syncpoint is performed by calling dma_buf_sync_lock(), and then DMA
> > > > accesses the sync buffer.
> > > >
> > > > And DMA -> CPU,
> > > >         ioctl(dqbuf command)
> > > >               |
> > > >               |
> > > >         dqbuf <- nothing to do
> > > >
> > > > Actual syncpoint is when DMA operation is completed (in interrupt
> > > handler):
> > > > the syncpoint is performed by calling dma_buf_sync_unlock().
> > > > Hence,  my approach is to move the syncpoints into just before dma
> > > access
> > > > as long as possible.
> > > 
> > > What you've just described does *not* work on architectures such as
> > > ARMv7 which do speculative cache fetches from memory at any time that
> > > that memory is mapped with a cacheable status, and will lead to data
> > > corruption.
> > 
> > I didn't explain that enough. Sorry about that. 'nothing to do' means that a
> > dmabuf sync interface isn't called but existing functions are called. So
> > this may be explained again:
> >         ioctl(dqbuf command)
> >             |
> >             |
> >         dqbuf <- 1. dma_unmap_sg
> >                     2. dma_buf_sync_unlock (syncpoint)
> > 
> > The syncpoint I mentioned means lock mechanism; not doing cache operation.
> > 
> > In addition, please see the below more detail examples.
> > 
> > The conventional way (without dmabuf-sync) is:
> > Task A                             
> > ----------------------------
> >  1. CPU accesses buf          
> >  2. Send the buf to Task B  
> >  3. Wait for the buf from Task B
> >  4. go to 1
> > 
> > Task B
> > ---------------------------
> > 1. Wait for the buf from Task A
> > 2. qbuf the buf                 
> >     2.1 insert the buf to incoming queue
> > 3. stream on
> >     3.1 dma_map_sg if ready, and move the buf to ready queue
> >     3.2 get the buf from ready queue, and dma start.
> > 4. dqbuf
> >     4.1 dma_unmap_sg after dma operation completion
> >     4.2 move the buf to outgoing queue
> > 5. back the buf to Task A
> > 6. go to 1
> > 
> > In case that two tasks share buffers, and data flow goes from Task A to Task
> > B, we would need IPC operation to send and receive buffers properly between
> > those two tasks every time CPU or DMA access to buffers is started or
> > completed.
> > 
> > 
> > With dmabuf-sync:
> > 
> > Task A                             
> > ----------------------------
> >  1. dma_buf_sync_lock <- synpoint (call by user side)
> >  2. CPU accesses buf          
> >  3. dma_buf_sync_unlock <- syncpoint (call by user side)
> >  4. Send the buf to Task B (just one time)
> >  5. go to 1
> > 
> > 
> > Task B
> > ---------------------------
> > 1. Wait for the buf from Task A (just one time)
> > 2. qbuf the buf                 
> >     1.1 insert the buf to incoming queue
> > 3. stream on
> >     3.1 dma_buf_sync_lock <- syncpoint (call by kernel side)
> >     3.2 dma_map_sg if ready, and move the buf to ready queue
> >     3.3 get the buf from ready queue, and dma start.
> > 4. dqbuf
> >     4.1 dma_buf_sync_unlock <- syncpoint (call by kernel side)
> >     4.2 dma_unmap_sg after dma operation completion
> >     4.3 move the buf to outgoing queue
> > 5. go to 1
> > 
> > On the other hand, in case of using dmabuf-sync, as you can see the above
> > example, we would need IPC operation just one time. That way, I think we
> > could not only reduce performance overhead but also make user application
> > simplified. Of course, this approach can be used for all DMA device drivers
> > such as DRM. I'm not a specialist in v4l2 world so there may be missing
> > point.
> > 
> 
> You already need some kind of IPC between the two tasks, as I suspect
> even in your example it wouldn't make much sense to queue the buffer
> over and over again in task B without task A writing anything to it. So
> task A has to signal task B there is new data in the buffer to be
> processed.

Hang on.  Since when did dma_buf become another inter-process IPC
mechanism?  That's *not* it's design goal, and there's other much
better mechanisms already provided.

What dma_buf is about is passing a DMA buffer from subsystem to another
subsystem via userspace without exposing precise details of the buffer
to userspace.

It is also not about passing a DMA buffer from one subsystem to another
to gain access to the buffer via the other subsystem and bypassing the
access arrangements of the source subsystem.

I'm beginning to think that _that_ should be enforced by subsystems, by
having any buffer imported into a subsystem either not able to be mapped
into userspace, or if it is mappable into userspace, it must be a
read-only mapping.

^ permalink raw reply

* Re: [PATCH 8/8] at91/avr32/atmel_lcdfb: prepare clk before calling enable
From: Nicolas Ferre @ 2013-06-20  7:49 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1371640930-9771-1-git-send-email-b.brezillon@overkiz.com>

On 19/06/2013 13:22, Boris BREZILLON :
> Replace clk_enable/disable with clk_prepare_enable/disable_unprepare to
> avoid common clk framework warnings.
>
> Signed-off-by: Boris BREZILLON <b.brezillon@overkiz.com>

Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>

> ---
>   drivers/video/atmel_lcdfb.c |    8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/video/atmel_lcdfb.c b/drivers/video/atmel_lcdfb.c
> index 540909d..8525457 100644
> --- a/drivers/video/atmel_lcdfb.c
> +++ b/drivers/video/atmel_lcdfb.c
> @@ -893,14 +893,14 @@ static int __init atmel_lcdfb_init_fbinfo(struct atmel_lcdfb_info *sinfo)
>
>   static void atmel_lcdfb_start_clock(struct atmel_lcdfb_info *sinfo)
>   {
> -	clk_enable(sinfo->bus_clk);
> -	clk_enable(sinfo->lcdc_clk);
> +	clk_prepare_enable(sinfo->bus_clk);
> +	clk_prepare_enable(sinfo->lcdc_clk);
>   }
>
>   static void atmel_lcdfb_stop_clock(struct atmel_lcdfb_info *sinfo)
>   {
> -	clk_disable(sinfo->bus_clk);
> -	clk_disable(sinfo->lcdc_clk);
> +	clk_disable_unprepare(sinfo->bus_clk);
> +	clk_disable_unprepare(sinfo->lcdc_clk);
>   }
>
>
>


-- 
Nicolas Ferre

^ permalink raw reply

* Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework
From: Lucas Stach @ 2013-06-20  7:47 UTC (permalink / raw)
  To: Inki Dae
  Cc: 'Russell King - ARM Linux', 'Inki Dae',
	'linux-fbdev', 'YoungJun Cho',
	'Kyungmin Park', 'myungjoo.ham',
	'DRI mailing list', linux-arm-kernel, linux-media
In-Reply-To: <00da01ce6d81$76eb3d60$64c1b820$%dae@samsung.com>

Am Donnerstag, den 20.06.2013, 15:43 +0900 schrieb Inki Dae:
> 
> > -----Original Message-----
> > From: dri-devel-bounces+inki.dae=samsung.com@lists.freedesktop.org
> > [mailto:dri-devel-bounces+inki.dae=samsung.com@lists.freedesktop.org] On
> > Behalf Of Russell King - ARM Linux
> > Sent: Thursday, June 20, 2013 3:29 AM
> > To: Inki Dae
> > Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun
> > Cho; linux-media@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> > Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization
> > framework
> > 
> > On Thu, Jun 20, 2013 at 12:10:04AM +0900, Inki Dae wrote:
> > > On the other hand, the below shows how we could enhance the conventional
> > > way with my approach (just example):
> > >
> > > CPU -> DMA,
> > >         ioctl(qbuf command)              ioctl(streamon)
> > >               |                                               |
> > >               |                                               |
> > >         qbuf  <- dma_buf_sync_get   start streaming <- syncpoint
> > >
> > > dma_buf_sync_get just registers a sync buffer(dmabuf) to sync object.
> > And
> > > the syncpoint is performed by calling dma_buf_sync_lock(), and then DMA
> > > accesses the sync buffer.
> > >
> > > And DMA -> CPU,
> > >         ioctl(dqbuf command)
> > >               |
> > >               |
> > >         dqbuf <- nothing to do
> > >
> > > Actual syncpoint is when DMA operation is completed (in interrupt
> > handler):
> > > the syncpoint is performed by calling dma_buf_sync_unlock().
> > > Hence,  my approach is to move the syncpoints into just before dma
> > access
> > > as long as possible.
> > 
> > What you've just described does *not* work on architectures such as
> > ARMv7 which do speculative cache fetches from memory at any time that
> > that memory is mapped with a cacheable status, and will lead to data
> > corruption.
> 
> I didn't explain that enough. Sorry about that. 'nothing to do' means that a
> dmabuf sync interface isn't called but existing functions are called. So
> this may be explained again:
>         ioctl(dqbuf command)
>             |
>             |
>         dqbuf <- 1. dma_unmap_sg
>                     2. dma_buf_sync_unlock (syncpoint)
> 
> The syncpoint I mentioned means lock mechanism; not doing cache operation.
> 
> In addition, please see the below more detail examples.
> 
> The conventional way (without dmabuf-sync) is:
> Task A                             
> ----------------------------
>  1. CPU accesses buf          
>  2. Send the buf to Task B  
>  3. Wait for the buf from Task B
>  4. go to 1
> 
> Task B
> ---------------------------
> 1. Wait for the buf from Task A
> 2. qbuf the buf                 
>     2.1 insert the buf to incoming queue
> 3. stream on
>     3.1 dma_map_sg if ready, and move the buf to ready queue
>     3.2 get the buf from ready queue, and dma start.
> 4. dqbuf
>     4.1 dma_unmap_sg after dma operation completion
>     4.2 move the buf to outgoing queue
> 5. back the buf to Task A
> 6. go to 1
> 
> In case that two tasks share buffers, and data flow goes from Task A to Task
> B, we would need IPC operation to send and receive buffers properly between
> those two tasks every time CPU or DMA access to buffers is started or
> completed.
> 
> 
> With dmabuf-sync:
> 
> Task A                             
> ----------------------------
>  1. dma_buf_sync_lock <- synpoint (call by user side)
>  2. CPU accesses buf          
>  3. dma_buf_sync_unlock <- syncpoint (call by user side)
>  4. Send the buf to Task B (just one time)
>  5. go to 1
> 
> 
> Task B
> ---------------------------
> 1. Wait for the buf from Task A (just one time)
> 2. qbuf the buf                 
>     1.1 insert the buf to incoming queue
> 3. stream on
>     3.1 dma_buf_sync_lock <- syncpoint (call by kernel side)
>     3.2 dma_map_sg if ready, and move the buf to ready queue
>     3.3 get the buf from ready queue, and dma start.
> 4. dqbuf
>     4.1 dma_buf_sync_unlock <- syncpoint (call by kernel side)
>     4.2 dma_unmap_sg after dma operation completion
>     4.3 move the buf to outgoing queue
> 5. go to 1
> 
> On the other hand, in case of using dmabuf-sync, as you can see the above
> example, we would need IPC operation just one time. That way, I think we
> could not only reduce performance overhead but also make user application
> simplified. Of course, this approach can be used for all DMA device drivers
> such as DRM. I'm not a specialist in v4l2 world so there may be missing
> point.
> 

You already need some kind of IPC between the two tasks, as I suspect
even in your example it wouldn't make much sense to queue the buffer
over and over again in task B without task A writing anything to it. So
task A has to signal task B there is new data in the buffer to be
processed.

There is no need to share the buffer over and over again just to get the
two processes to work together on the same thing. Just share the fd
between both and then do out-of-band completion signaling, as you need
this anyway. Without this you'll end up with unpredictable behavior.
Just because sync allows you to access the buffer doesn't mean it's
valid for your use-case. Without completion signaling you could easily
end up overwriting your data from task A multiple times before task B
even tries to lock the buffer for processing.

So the valid flow is (and this already works with the current APIs):
Task A                                    Task B
------                                    ------
CPU access buffer
         ----------completion signal--------->
                                          qbuf (dragging buffer into 
                                          device domain, flush caches,
                                          reserve buffer etc.)
                                                    |
                                          wait for device operation to
                                          complete
                                                    |
                                          dqbuf (dragging buffer back
                                          into CPU domain, invalidate
                                          caches, unreserve)
        <---------completion signal------------
CPU access buffer


Regards,
Lucas
-- 
Pengutronix e.K.                           | Lucas Stach                 |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |


^ permalink raw reply

* RE: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework
From: Inki Dae @ 2013-06-20  6:43 UTC (permalink / raw)
  To: linux-fbdev
In-Reply-To: <1371467722-665-1-git-send-email-inki.dae@samsung.com>



> -----Original Message-----
> From: dri-devel-bounces+inki.dae=samsung.com@lists.freedesktop.org
> [mailto:dri-devel-bounces+inki.dae=samsung.com@lists.freedesktop.org] On
> Behalf Of Russell King - ARM Linux
> Sent: Thursday, June 20, 2013 3:29 AM
> To: Inki Dae
> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun
> Cho; linux-media@vger.kernel.org; linux-arm-kernel@lists.infradead.org
> Subject: Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization
> framework
> 
> On Thu, Jun 20, 2013 at 12:10:04AM +0900, Inki Dae wrote:
> > On the other hand, the below shows how we could enhance the conventional
> > way with my approach (just example):
> >
> > CPU -> DMA,
> >         ioctl(qbuf command)              ioctl(streamon)
> >               |                                               |
> >               |                                               |
> >         qbuf  <- dma_buf_sync_get   start streaming <- syncpoint
> >
> > dma_buf_sync_get just registers a sync buffer(dmabuf) to sync object.
> And
> > the syncpoint is performed by calling dma_buf_sync_lock(), and then DMA
> > accesses the sync buffer.
> >
> > And DMA -> CPU,
> >         ioctl(dqbuf command)
> >               |
> >               |
> >         dqbuf <- nothing to do
> >
> > Actual syncpoint is when DMA operation is completed (in interrupt
> handler):
> > the syncpoint is performed by calling dma_buf_sync_unlock().
> > Hence,  my approach is to move the syncpoints into just before dma
> access
> > as long as possible.
> 
> What you've just described does *not* work on architectures such as
> ARMv7 which do speculative cache fetches from memory at any time that
> that memory is mapped with a cacheable status, and will lead to data
> corruption.

I didn't explain that enough. Sorry about that. 'nothing to do' means that a
dmabuf sync interface isn't called but existing functions are called. So
this may be explained again:
        ioctl(dqbuf command)
            |
            |
        dqbuf <- 1. dma_unmap_sg
                    2. dma_buf_sync_unlock (syncpoint)

The syncpoint I mentioned means lock mechanism; not doing cache operation.

In addition, please see the below more detail examples.

The conventional way (without dmabuf-sync) is:
Task A                             
----------------------------
 1. CPU accesses buf          
 2. Send the buf to Task B  
 3. Wait for the buf from Task B
 4. go to 1

Task B
---------------------------
1. Wait for the buf from Task A
2. qbuf the buf                 
    2.1 insert the buf to incoming queue
3. stream on
    3.1 dma_map_sg if ready, and move the buf to ready queue
    3.2 get the buf from ready queue, and dma start.
4. dqbuf
    4.1 dma_unmap_sg after dma operation completion
    4.2 move the buf to outgoing queue
5. back the buf to Task A
6. go to 1

In case that two tasks share buffers, and data flow goes from Task A to Task
B, we would need IPC operation to send and receive buffers properly between
those two tasks every time CPU or DMA access to buffers is started or
completed.


With dmabuf-sync:

Task A                             
----------------------------
 1. dma_buf_sync_lock <- synpoint (call by user side)
 2. CPU accesses buf          
 3. dma_buf_sync_unlock <- syncpoint (call by user side)
 4. Send the buf to Task B (just one time)
 5. go to 1


Task B
---------------------------
1. Wait for the buf from Task A (just one time)
2. qbuf the buf                 
    1.1 insert the buf to incoming queue
3. stream on
    3.1 dma_buf_sync_lock <- syncpoint (call by kernel side)
    3.2 dma_map_sg if ready, and move the buf to ready queue
    3.3 get the buf from ready queue, and dma start.
4. dqbuf
    4.1 dma_buf_sync_unlock <- syncpoint (call by kernel side)
    4.2 dma_unmap_sg after dma operation completion
    4.3 move the buf to outgoing queue
5. go to 1

On the other hand, in case of using dmabuf-sync, as you can see the above
example, we would need IPC operation just one time. That way, I think we
could not only reduce performance overhead but also make user application
simplified. Of course, this approach can be used for all DMA device drivers
such as DRM. I'm not a specialist in v4l2 world so there may be missing
point.

Thanks,
Inki Dae

> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel


^ permalink raw reply

* [PATCH 2/2] fb: fix atyfb unused data warnings
From: Randy Dunlap @ 2013-06-20  2:39 UTC (permalink / raw)
  To: LKML, Linux Fbdev development list
  Cc: Andrew Morton, Paul Mackerras, Benjamin Herrenschmidt,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: Randy Dunlap <rdunlap@infradead.org>

Fix compiler warnings of data defined but not used.  They are only used
with certain kconfig settings.

drivers/video/aty/atyfb_base.c:534:13: warning: 'ram_dram' defined but not used [-Wunused-variable]
drivers/video/aty/atyfb_base.c:535:13: warning: 'ram_resv' defined but not used [-Wunused-variable]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc:	Paul Mackerras <paulus@samba.org>
Cc:	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc:	linux-fbdev@vger.kernel.org
Cc:	Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc:	Tomi Valkeinen <tomi.valkeinen@ti.com>
---
 drivers/video/aty/atyfb_base.c |    2 ++
 1 file changed, 2 insertions(+)

--- linux-next-20130619.orig/drivers/video/aty/atyfb_base.c
+++ linux-next-20130619/drivers/video/aty/atyfb_base.c
@@ -531,8 +531,10 @@ static int correct_chipset(struct atyfb_
 	return 0;
 }
 
+#if defined(CONFIG_FB_ATY_GX) || defined(CONFIG_FB_ATY_CT)
 static char ram_dram[] = "DRAM";
 static char ram_resv[] = "RESV";
+#endif
 #ifdef CONFIG_FB_ATY_GX
 static char ram_vram[] = "VRAM";
 #endif /* CONFIG_FB_ATY_GX */

^ permalink raw reply

* [PATCH 1/2] fb: fix atyfb build warning
From: Randy Dunlap @ 2013-06-20  2:38 UTC (permalink / raw)
  To: LKML, Linux Fbdev development list
  Cc: Andrew Morton, Paul Mackerras, Benjamin Herrenschmidt,
	Jean-Christophe Plagniol-Villard, Tomi Valkeinen

From: Randy Dunlap <rdunlap@infradead.org>

Fix build warning when neither of CONFIG_FB_ATY_GX or
CONFIG_FB_ATY_CT is enabled, since ARRAY_SIZE(aty_chips) is 0 in
that case.

drivers/video/aty/atyfb_base.c:437:11: warning: overflow in implicit constant conversion [-Woverflow]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc:	Paul Mackerras <paulus@samba.org>
Cc:	Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc:	linux-fbdev@vger.kernel.org
Cc:	Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
Cc:	Tomi Valkeinen <tomi.valkeinen@ti.com>
---
 drivers/video/aty/atyfb_base.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- linux-next-20130619.orig/drivers/video/aty/atyfb_base.c
+++ linux-next-20130619/drivers/video/aty/atyfb_base.c
@@ -434,8 +434,8 @@ static int correct_chipset(struct atyfb_
 	const char *name;
 	int i;
 
-	for (i = ARRAY_SIZE(aty_chips) - 1; i >= 0; i--)
-		if (par->pci_id = aty_chips[i].pci_id)
+	for (i = ARRAY_SIZE(aty_chips); i > 0; i--)
+		if (par->pci_id = aty_chips[i - 1].pci_id)
 			break;
 
 	if (i < 0)

^ permalink raw reply

* Re: [RFC PATCH v2] dmabuf-sync: Introduce buffer synchronization framework
From: Russell King - ARM Linux @ 2013-06-19 18:29 UTC (permalink / raw)
  To: Inki Dae
  Cc: Lucas Stach, linux-fbdev, YoungJun Cho, Kyungmin Park,
	myungjoo.ham, DRI mailing list,
	linux-arm-kernel@lists.infradead.org, linux-media@vger.kernel.org
In-Reply-To: <CAAQKjZNJD4HpnJQ7iE+Gez36066M6U0YQeUEdA0+UcSOKqeghg@mail.gmail.com>

On Thu, Jun 20, 2013 at 12:10:04AM +0900, Inki Dae wrote:
> On the other hand, the below shows how we could enhance the conventional
> way with my approach (just example):
> 
> CPU -> DMA,
>         ioctl(qbuf command)              ioctl(streamon)
>               |                                               |
>               |                                               |
>         qbuf  <- dma_buf_sync_get   start streaming <- syncpoint
> 
> dma_buf_sync_get just registers a sync buffer(dmabuf) to sync object. And
> the syncpoint is performed by calling dma_buf_sync_lock(), and then DMA
> accesses the sync buffer.
> 
> And DMA -> CPU,
>         ioctl(dqbuf command)
>               |
>               |
>         dqbuf <- nothing to do
> 
> Actual syncpoint is when DMA operation is completed (in interrupt handler):
> the syncpoint is performed by calling dma_buf_sync_unlock().
> Hence,  my approach is to move the syncpoints into just before dma access
> as long as possible.

What you've just described does *not* work on architectures such as
ARMv7 which do speculative cache fetches from memory at any time that
that memory is mapped with a cacheable status, and will lead to data
corruption.

^ permalink raw reply

* MMP display subsystem questions
From: Daniel Drake @ 2013-06-19 17:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

I am working on adding OLPC XO-1.75 (MMP2) and XO-4 (MMP3) support to
the new MMP framebuffer subsystem (drivers/video/mmp). This involves
writing devicetree bindings and documenting them, which calls for a
good understanding of the subsystem.

I have a few questions.

Is there a design document available somewhere? There are very few
explanatory comments in the code. I am interested in definitions and
explanations of:
 - Paths
 - Overlays
 - dmafetch (part of overlay configuration)
 - path_config (part of path configuration)
 - link_config (part of path configuration)

Looking at the code for answers, I am a little confused.

A path is an output device (e.g. a panel or TV). There can be multiple
outputs connected to the display controller. At the moment I am just
working with a single dumb panel.

A path can have many overlays, and according to the original commit
message, an overlay is a buffer displayed on the path's output device.
However, I simply cannot see how multiple overlays would be supported
on a path. For example let's look at this function:

static int overlay_set_addr(struct mmp_overlay *overlay, struct mmp_addr *addr)
{
struct lcd_regs *regs = path_regs(overlay->path);

/* FIXME: assert addr supported */
memcpy(&overlay->addr, addr, sizeof(struct mmp_win));
writel(addr->phys[0], &regs->g_0);

return overlay->addr.phys[0];
}

The overlay's address is written to a register that is specific to
it's parent path. If we are dealing with two overlays, how does this
work? We write both overlay addresses to the same register and the
second one "wins"? I checked all of the other overlay-related
functions and they all defer to the parent path as well.

As an experiment I modified platform data to suggest that my path has
98 overlays and the framebuffer should run on overlay 33. Nonsense,
but the framebuffer booted up as normal. It seems like there is
something missing in this path/overlay relationship.

What is the plan going forward for this overlay management? At the
moment there is only one consumer of overlays in the kernel - the
framebuffer driver. And the framebuffer will only ever use one. Who
are the other intended in-kernel users of overlays?

Moving onto dmafetch_id. Can't find any documentation here. It looks
like the only meaningful values are 0 and 1, and the only difference
is that overlay_is_vid() returns 1 if dmafetch_id is 1, and 0
otherwise. This causes some minor differences in the way registers get
programmed, but I don't understand exactly what the consequences are
here. My framebuffer works with dmafetch_id as both 0 and 1.

path_config: this value gets written to registers LCD_SCLK *and*
IOPAD_CONTROL. The bit definitions in these registers are totally
different. It seems like nonsense to share the same configuration
value between two diversely different registers - what is going on
here?

link_config: Seems to have a dual meaning. Upper 4 bits can be used to
specify dumb panel mode in registers like LCD_DUMB_CTRL. Bit 0 is
totally unrelated and triggers "swap RB" i.e. use BGR instead of RGB,
when dealing with totally different registers. Other bits are ignored.
If "disorganised" bitfields are to be haphazardly invented in this way
it would at least be nice to have documentation.

Moving onto the devicetree definition. It is clear that representing
the display controller and the panel in the device tree makes sense;
this can cause the appropriate probes to happen. But we additionally
need to trigger the probe of the framebuffer driver. I am not sure if
the framebuffer that Linux will setup is something that should be
represented in the device tree (remember that we already have display
controller and panel). Looking at the configuration items needed to
setup the framebuffer:

1. Name - doesn't really matter.

2. Pixel format - surely this question applies to all framebuffer
drivers. I guess there is a default, plus a standard framebuffer
interface that allows it to be changed?

3. Which path to use - how do other framebuffer drivers deal with
this? The question here is basically do we output to the laptop's
inbuilt LCD panel, or to the externally connected HDMI TV? Again,
maybe we can choose a sensible default and allow runtime
configuration.

4. Which overlay to use - according to my investigation above, this
doesn't actually have any meaningful effect on the driver.

5. dmafetch_id - its not clear to me what this is.


So, I don't think the framebuffer is something that should be defined
by the platform (or the device tree). Instead, maybe it should just be
initialized with good defaults when the first path becomes ready for
use.

Any clarifications appreciated.

Thanks
Daniel

^ permalink raw reply

* Re: [RFC PATCH 5/5] ARM: Samsung: Remove MIPI PHY setup code
From: Sylwester Nawrocki @ 2013-06-19 17:24 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <51BE25A0.6050202@samsung.com>

On 06/16/2013 10:52 PM, Kukjin Kim wrote:
> On 06/15/13 02:45, Sylwester Nawrocki wrote:
>> > Generic PHY drivers are used to handle the MIPI CSIS and MIPI DSIM
>> > DPHYs so we can remove now unused code at arch/arm/plat-samsung.
>
> If so, sounds good :)
> 
>> > In case there is any board file for S5PV210 platforms using MIPI
>> > CSIS/DSIM (not any upstream currently) it should use the generic
>> > PHY API to bind the PHYs to respective PHY consumer drivers.
>
> To be honest, I didn't test this on boards but if the working is fine, 
> please go ahead without RFC.

Thanks for review. I've tested it on Exynos4412 based board, and will
check also on Exynos4210 TRATS before posting the final version.
It seems to work fine, I just won't be able to test on any non-dt
platform (s5pv210), and there is currently no users of MIPI CSIS/DSIM
on s5pv210. Moreover this series depends on the generic PHY API,
perhaps it can be merged for 3.11.

[1] http://www.spinics.net/lists/arm-kernel/msg251232.html

Regards,
Sylwester

^ permalink raw reply

* Re: [RFC PATCH 3/5] video: exynos_dsi: Use generic PHY driver
From: Tomasz Figa @ 2013-06-19 17:20 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <51C1E61C.8030808@samsung.com>

On Wednesday 19 of June 2013 19:10:52 Sylwester Nawrocki wrote:
> On 06/16/2013 11:15 PM, Tomasz Figa wrote:
> > On Friday 14 of June 2013 19:45:49 Sylwester Nawrocki wrote:
> >> > Use the generic PHY API instead of the platform callback to control
> >> > the MIPI DSIM DPHY.
> >> > 
> >> > Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
> >> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> >> > ---
> >> > 
> >> >  drivers/video/display/source-exynos_dsi.c |   36
> >> > 
> >> > +++++++++-------------------- include/video/exynos_dsi.h
> >> > 
> >> > |    5 ----
> >> >  
> >> >  2 files changed, 11 insertions(+), 30 deletions(-)
> > 
> > Yes, this is what I was really missing a lot while developing this
> > driver.
> > 
> > Definitely looks good! It's a shame we don't have this driver in
> > mainline yet ;)
> 
> Yes, I should have mentioned in the cover letter this patch depends
> on modified version of this [1] patch set of yours. I'll drop this
> patch and will update the driver staying in mainline now, but I won't
> be able to test it, on a non-dt platform.
> 
> I guess even some pre-eliminary display (panel) API would be helpful.
> The CDF development seems to have been stalled for some time. I wonder
> if we could first have something that works for limited set of devices
> and be extending it gradually, rather than living with zero support
> for displays on DT based ARM platforms.

Well, the problem is that once we define a binding for displays, we will 
have to keep support for this binding even if we decide to change 
something.

But as I discussed with Laurent and Alexandre at LinuxCon Japan, we should 
be able to reuse V4L2 bindings for our purposes, so someone just needs to 
code a proof of concept implementation that doesn't necessarily provide 
full functionality yet, but allows to make something work. Probably based 
on already posted RFC versions of CDF.

CCed Laurent and Alexandre, as they might be able to shed even more light 
on this.

Best regards,
Tomasz

> 
> [1] http://www.spinics.net/lists/linux-fbdev/msg09689.html
> 
> Regards,
> Sylwester

^ permalink raw reply

* Re: [RFC PATCH 3/5] video: exynos_dsi: Use generic PHY driver
From: Sylwester Nawrocki @ 2013-06-19 17:10 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <1502179.yCdHZgQgqV@flatron>

On 06/16/2013 11:15 PM, Tomasz Figa wrote:
> On Friday 14 of June 2013 19:45:49 Sylwester Nawrocki wrote:
>> > Use the generic PHY API instead of the platform callback to control
>> > the MIPI DSIM DPHY.
>> > 
>> > Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
>> > Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>> > ---
>> >  drivers/video/display/source-exynos_dsi.c |   36
>> > +++++++++-------------------- include/video/exynos_dsi.h               
>> > |    5 ----
>> >  2 files changed, 11 insertions(+), 30 deletions(-)
>
> Yes, this is what I was really missing a lot while developing this driver.
> 
> Definitely looks good! It's a shame we don't have this driver in mainline 
> yet ;)

Yes, I should have mentioned in the cover letter this patch depends
on modified version of this [1] patch set of yours. I'll drop this
patch and will update the driver staying in mainline now, but I won't
be able to test it, on a non-dt platform.

I guess even some pre-eliminary display (panel) API would be helpful.
The CDF development seems to have been stalled for some time. I wonder
if we could first have something that works for limited set of devices
and be extending it gradually, rather than living with zero support
for displays on DT based ARM platforms.

[1] http://www.spinics.net/lists/linux-fbdev/msg09689.html

Regards,
Sylwester

^ permalink raw reply

* Re: [PATCH] build some drivers only when compile-testing
From: Mark Brown @ 2013-06-19 16:38 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Jiri Slaby, Michal Marek, Jeff Mahoney, Greg Kroah-Hartman,
	jirislaby, linux-kernel, Andrew Morton, Linus Torvalds,
	Alexander Shishkin, linux-usb, Florian Tobias Schandinat,
	linux-geode, linux-fbdev, Richard Cochran, netdev, Ben Hutchings,
	Keller, Jacob E, tomi.valkeinen
In-Reply-To: <20130618083425.GI5461@arwen.pp.htv.fi>

[-- Attachment #1: Type: text/plain, Size: 883 bytes --]

On Tue, Jun 18, 2013 at 11:34:26AM +0300, Felipe Balbi wrote:

> MUSB alone has 8 different arch choices. Before, it used to be that core
> driver was dependendent on all of them, so whenever someone wanted to
> build MUSB for another arch, they had to introdude their glue code and
> modify the dependency of the core driver.

> Also EHCI, it works on pretty much everything, so does DWC3.

> DWC3 already has three possibilities but I know of at least 3 others
> which could show up soonish.

If the number of architectures supported is getting large enough then
it's probably reasonable to just enable the driver all the time,
probably it'll also appear on some PCI cards or something anyway.  The
things people are complaining about here are things that are clearly
unlikely to appear on other architectures like a particular SoC vendor's
custom IPs that they don't license out.

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [RFC PATCH 1/5] phy: Add driver for Exynos MIPI CSIS/DSIM DPHYs
From: Sylwester Nawrocki @ 2013-06-19 16:32 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <3575249.Vg82ll65tN@flatron>

Hi Tomasz,

On 06/16/2013 11:11 PM, Tomasz Figa wrote:
> Hi Sylwester,
> 
> Looks good, but I added some nitpicks inline.
> 
> On Friday 14 of June 2013 19:45:47 Sylwester Nawrocki wrote:
>> Add a PHY provider driver for the Samsung S5P/Exynos SoC MIPI CSI-2
>> receiver and MIPI DSI transmitter DPHYs.
>>
>> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>> ---
>>  .../bindings/phy/exynos-video-mipi-phy.txt         |   16 ++
>>  drivers/phy/Kconfig                                |   10 ++
>>  drivers/phy/Makefile                               |    3 +-
>>  drivers/phy/exynos_video_mipi_phy.c                |  166
>> ++++++++++++++++++++ 4 files changed, 194 insertions(+), 1 deletion(-)
>>  create mode 100644
>> Documentation/devicetree/bindings/phy/exynos-video-mipi-phy.txt create
>> mode 100644 drivers/phy/exynos_video_mipi_phy.c
>>
>> diff --git
>> a/Documentation/devicetree/bindings/phy/exynos-video-mipi-phy.txt
>> b/Documentation/devicetree/bindings/phy/exynos-video-mipi-phy.txt new
>> file mode 100644
>> index 0000000..32311c89
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/phy/exynos-video-mipi-phy.txt
>> @@ -0,0 +1,16 @@
>> +Samsung S5P/EXYNOS SoC series MIPI CSIS/DSIM DPHY
>> +-------------------------------------------------
>> +
>> +Required properties:
>> +- compatible : "samsung,<soc_name>-video-phy", currently most SoCs can
> 
> I don't like this <soc_name> here. It sounds like any SoC name can be put 
> here. IMHO just listing all supported compatible values should be enough.

Hmm, OK, I'll simply put there the one compatible string supported now.

>> claim +  compatibility with the S5PV210 MIPI CSIS/DSIM PHY and thus
>> should use +  "samsung,s5pv210-video-phy";
>> +- reg : offset and length of the MIPI DPHY register set;
>> +- #phy-cells : from the generic phy bindings, must be 1;
>> +
>> +For "samsung,s5pv210-video-phy" compatible DPHYs the second cell in the
>> PHY +specifier identifies the DPHY and its meaning is as follows:
>> +  0 - MIPI CSIS 0,
>> +  1 - MIPI DSIM 0,
>> +  2 - MIPI CSIS 1,
>> +  3 - MIPI DSIM 1.
>> diff --git a/drivers/phy/Kconfig b/drivers/phy/Kconfig
>> index 0764a54..d234e99 100644
>> --- a/drivers/phy/Kconfig
>> +++ b/drivers/phy/Kconfig
>> @@ -11,3 +11,13 @@ menuconfig GENERIC_PHY
>>  	  devices present in the kernel. This layer will have the generic
>>  	  API by which phy drivers can create PHY using the phy framework 
> and
>>  	  phy users can obtain reference to the PHY.
>> +
>> +if GENERIC_PHY
>> +
>> +config EXYNOS_VIDEO_MIPI_PHY
>> +	bool "S5P/EXYNOS MIPI CSI-2/DSI PHY driver"
>> +	depends on OF
> 
> Hmm. Is this driver designed only for OF-enabled boards?

Yes, there seems currently to be no users of MIPI CSIS/DSIM in the mainline 
kernel among the non-dt platforms, so I initially focused on DT only. I will
rework this driver to make it usable on non-dt platforms, but I currently 
have not way to fully test it. I believe S5PV210 will get migrated to device
tree sooner than anyone needs the functionality this driver provides on 
non-dt, and S5PC100 seems to be forgotten anyway.

>> +	help
>> +	  Support for MIPI CSI-2 and MIPI DSI DPHY found on Samsung
>> +	  S5P and EXYNOS SoCs.
>> +endif
>> diff --git a/drivers/phy/Makefile b/drivers/phy/Makefile
>> index 9e9560f..b16f2c1 100644
>> --- a/drivers/phy/Makefile
>> +++ b/drivers/phy/Makefile
>> @@ -2,4 +2,5 @@
>>  # Makefile for the phy drivers.
>>  #
>>
>> -obj-$(CONFIG_GENERIC_PHY)	+= phy-core.o
>> +obj-$(CONFIG_GENERIC_PHY)		+= phy-core.o
>> +obj-$(CONFIG_EXYNOS_VIDEO_MIPI_PHY)	+= exynos_video_mipi_phy.o
>> diff --git a/drivers/phy/exynos_video_mipi_phy.c
>> b/drivers/phy/exynos_video_mipi_phy.c new file mode 100644
>> index 0000000..8d4976f
>> --- /dev/null
>> +++ b/drivers/phy/exynos_video_mipi_phy.c
>> @@ -0,0 +1,166 @@
>> +/*
>> + * Samsung S5P/EXYNOS SoC series MIPI CSIS/DSIM DPHY driver
>> + *
>> + * Copyright (C) 2013 Samsung Electronics Co., Ltd.
>> + * Author: Sylwester Nawrocki <s.nawrocki@samsung.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as +
>> * published by the Free Software Foundation.
>> + */
>> +
>> +#include <linux/delay.h>
>> +#include <linux/io.h>
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/of.h>
>> +#include <linux/of_address.h>
>> +#include <linux/phy/phy.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/spinlock.h>
>> +
>> +/* MIPI_PHYn_CONTROL register bit definitions */
>> +#define EXYNOS_MIPI_PHY_ENABLE		(1 << 0)
>> +#define EXYNOS_MIPI_PHY_SRESETN		(1 << 1)
>> +#define EXYNOS_MIPI_PHY_MRESETN		(1 << 2)
>> +#define EXYNOS_MIPI_PHY_RESET_MASK	(3 << 1)
>> +
>> +#define EXYNOS_MAX_VIDEO_PHYS		4
>> +
>> +struct exynos_video_phy {
>> +	spinlock_t slock;
>> +	struct phy *phys[EXYNOS_MAX_VIDEO_PHYS];
>> +	void __iomem *regs;
>> +};
>> +
>> +/*
>> + * The @id argument specifies MIPI CSIS or DSIM PHY as follows:
>> + *  0 - MIPI CSIS 0
>> + *  1 - MIPI DSIM 0
>> + *  2 - MIPI CSIS 1
>> + *  3 - MIPI DSIM 1
>> + */
>> +static int set_phy_state(struct exynos_video_phy *state,
>> +					unsigned int id, int on)
>> +{
>> +	void __iomem *addr = id < 2 ? state->regs : state->regs + 4;
> 
> I don't find this statement too readable. What about:
> 
> 	void __iomem *addr = state->regs;
> 
> and below:
> 
> 	/* CSIS 1 and DSIM 1 PHYs have separate register */
> 	if (id >= 2)
> 		addr += 4;

OK, thanks for the suggestion. I've addressed this in v2.

>> +	unsigned long flags;
>> +	u32 reg, reset;
>> +
>> +	pr_debug("%s(): id: %d, on: %d, addr: %#x, base: %#x\n",
>> +		 __func__, id, on, (u32)addr, (u32)state->regs);
>> +
>> +	if (WARN_ON(id > EXYNOS_MAX_VIDEO_PHYS))
>> +		return -EINVAL;
>> +
>> +	if (id & 1)
> 
> Nice trick ;), but not very readable. What about creating an enum of PHYs 
> and using those defined values here:
> 
> 	if (id = PHY_DSI0 || id = PHY_DSI1)

Yeah, good point. Fixed.

>> +		reset = EXYNOS_MIPI_PHY_MRESETN;
>> +	else
>> +		reset = EXYNOS_MIPI_PHY_SRESETN;
>> +
>> +	spin_lock_irqsave(&state->slock, flags);
>> +
>> +	reg = readl(addr);
>> +	if (on)
>> +		reg |= reset;
>> +	else
>> +		reg &= ~reset;
>> +	writel(reg, addr);
>> +
>> +	if (on)
>> +		reg |= EXYNOS_MIPI_PHY_ENABLE;
> 
> I believe this is a kind of reference counting, but a comment here would 
> be nice.

Indeed, I've added a comment.

>> +	else if (!(reg & EXYNOS_MIPI_PHY_RESET_MASK))
>> +		reg &= ~EXYNOS_MIPI_PHY_ENABLE;
>> +
>> +	writel(reg, addr);
>> +
>> +	spin_unlock_irqrestore(&state->slock, flags);
>> +	return 0;
>> +}
>> +
>> +static int exynos_video_phy_power_on(struct phy *phy)
>> +{
>> +	struct exynos_video_phy *state = dev_get_drvdata(&phy->dev);
>> +	return set_phy_state(state, phy->id, 1);
>> +}
>> +
>> +static int exynos_video_phy_power_off(struct phy *phy)
>> +{
>> +	struct exynos_video_phy *state = dev_get_drvdata(&phy->dev);
>> +	return set_phy_state(state, phy->id, 0);
>> +}
>> +
>> +static struct phy *exynos_video_phy_xlate(struct device *dev,
>> +					struct of_phandle_args *args)
>> +{
>> +	struct exynos_video_phy *state = dev_get_drvdata(dev);
>> +
>> +	if (WARN_ON(args->args[0] > EXYNOS_MAX_VIDEO_PHYS))
>> +		return NULL;
>> +
>> +	return state->phys[args->args[0]];
>> +}
>> +
>> +static struct phy_ops exynos_video_phy_ops = {
>> +	.power_on	= exynos_video_phy_power_on,
>> +	.power_off	= exynos_video_phy_power_off,
>> +	.owner		= THIS_MODULE,
>> +};
>> +
>> +static int exynos_video_phy_probe(struct platform_device *pdev)
>> +{
>> +	struct exynos_video_phy *state;
>> +	struct device *dev = &pdev->dev;
>> +	struct resource res;
>> +	struct phy_provider *phy_provider;
>> +	int ret, i;
>> +
>> +	state = devm_kzalloc(dev, sizeof(*state), GFP_KERNEL);
>> +	if (!state)
>> +		return -ENOMEM;
>> +
>> +	ret = of_address_to_resource(dev->of_node, 0, &res);
>> +	if (ret < 0)
>> +		return ret;
> 
> You can use platform_get_resource() here to get a resource generated for 
> you by of_platform_populate().
> 
> In addition you don't need to check the pointer returned by 
> platform_get_resource() because it is checked in devm_ioremap_resource().

Fixed.

>> +
>> +	state->regs = devm_ioremap_resource(dev, &res);
>> +	if (IS_ERR(state->regs))
>> +		return PTR_ERR(state->regs);
>> +
>> +	dev_set_drvdata(dev, state);
>> +
>> +	phy_provider = devm_of_phy_provider_register(dev, THIS_MODULE,
>> +					    exynos_video_phy_xlate);
>> +	if (IS_ERR(phy_provider))
>> +		return PTR_ERR(phy_provider);
>> +
>> +	for (i = 0; i < EXYNOS_MAX_VIDEO_PHYS; i++) {
>> +		state->phys[i] = devm_phy_create(dev, i, 
> &exynos_video_phy_ops,
>> +									
> state);
>> +		if (IS_ERR(state->phys[i])) {
>> +			dev_err(dev, "failed to create PHY %d\n", i);
>> +			return PTR_ERR(state->phys[i]);
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static const struct of_device_id exynos_video_phy_of_match[] = {
>> +	{ .compatible = "samsung,s5pv210-video-phy" },
>> +	{ },
>> +};
> 
> IMHO a MODULE_DEVICE_TABLE should be added here.

True, added after modifying the Kconfig entry to allow this driver 
to be built as a module.

Regards,
Sylwester

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox