All of lore.kernel.org
 help / color / mirror / Atom feed
* Introduce a new helper framework for buffer synchronization
@ 2013-05-09  7:33 Inki Dae
  2013-05-13  8:00   ` Maarten Lankhorst
  0 siblings, 1 reply; 114+ messages in thread
From: Inki Dae @ 2013-05-09  7:33 UTC (permalink / raw)
  To: Maarten Lankhorst, Rob Clark, Daniel Vetter
  Cc: linux-fbdev, DRI mailing list, Kyungmin Park, myungjoo.ham,
	YoungJun Cho, linux-arm-kernel@lists.infradead.org,
	linux-media@vger.kernel.org


[-- Attachment #1.1: Type: text/plain, Size: 7354 bytes --]

Hi all,

This post introduces a new helper framework based on dma fence. And the
purpose of this post is to collect other opinions and advices before RFC
posting.

First of all, this helper framework, called fence helper, is in progress
yet so might not have enough comments in codes and also might need to be
more cleaned up. Moreover, we might be missing some parts of the dma fence.
However, I'd like to say that all things mentioned below has been tested
with Linux platform and worked well.

The below link is fence helper codes,

https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?h=dma-fence-work&id=59096850b48ba0c5cd944dfc1dc4c2b8a92c12f5

And fence helper is based on the below patches,

http://lists.freedesktop.org/archives/dri-devel/2013-January/033430.html

Fence helper includes easy-to-use user and kernel interfaces for applying a
buffer synchronization mechanism into device driver and user space.

And the below link is example codes for device drivers,

https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?h=dma-fence-work&id=750a6bee163882e9f0cda48ce21eca7d18614e1c

And also the below link is user interface relevant codes for user process,

https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?h=dma-fence-work&id=d2e30e8d8564b42c838108c86dae23138ddae7be

The dma fence framework[1] had already been posted by Maarten and Rob. And
the dma fence is used with reservation framework.

Simple mechanism of the dma fence is as the followings,
1. one dmabuf has one reservation object when one buffer is exported into a
dmabuf.

2. a thread creates its own fence object and sets the fence object to a
reservation object of a dmabuf: a dmabuf is a buffer a thread wants
accessing.

3. dma fence provides fence_default_wait() to block a thread and
fence_signal() to wake up a thread: a thread is blocked once trying to
access a shared buffer if another fence object had already been set to a
reservation object of a shared buffer. And then a blocked thread is waked
up by a owner when the owner- the owner is a thread accessing the shared
buffer - signals the shared buffer.

PS. Know that all buffers to be shared should be exported into dmabuf fd.

However, dma fence just provides basic interfaces for buffer
synchronization between DMA and DMA. Therefore developers should consider
how interfaces of the dma fence should be called and also the resources of
the dma fence should be managed. Thus, it might not be easy to apply the
dma fence framework into their device drivers.

Moreover, we need user interfaces for buffer synchronization between CPU
and DMA. For example, we cannot aware of when and how CPU tries to access a
shared buffer. So user applications need to notify the fact to dma fence
framework so that the dma fence framework can synchronize the shared buffer
between CPU and DMA.

Fence helper includes the following features and interfaces.

Features.
1. provide optimized buffer synchronization method - we need to check if a
thread tried to access buffer with READ or WRITE, and if a thread tries to
access same buffer with READ or WRITE. The reason is that we can avoid
unnecessary read-lock by buffer synchronization and perform proper cache
operation.

2. provide buffer synchronization to READ, WRITE and DMA access - Fence
helper has three access types: READ, WRITE, and DMA. And these types can be
compounded like below, READ, WRITE, READ | DMA or WRITE | DMA.

3. provide user interfaces for buffer synchronization between CPU and DMA -
Fence helper provides user interfaces and the user interfaces has been
implemented in dma-buf framework. User application can call these
interfaces before and after cpu access.

4. provide cache control method - Fence helper performs proper cache
operation looking before and after of CPU and DMA access. Actual cache
operation is done just before committing a fence after memory access by CPU
or DMA was completed. For cache operation, dma-buf interfaces,
dma_buf_begin_cpu_access() and dma_buf_end_cpu_access(), are used.

5. provide easy-to-use user and kernel interfaces - Fence helper has four
exported interfaces: these can be called by device driver and user
application like below,

Interfaces.
fence_helper_init() - this function creates a new fence helper object
internally and returns the fence helper object.

fence_helper_init_reserve() - this function sets up a shard buffer into the
fence helper object. And mutiple shared buffers can be set to there.

fence_helper_commit_reserve() - this function synchronizes a shared buffer
checking fence helper objects already committeds and commits a fence helper
object to a shared buffer. The checking means that a caller waits for all
threads, which committed their fence helper objects, signal the shared
buffer.  And the commit means that the caller is using a shared buffer so
the shared buffer access by another thread isn't permitted: any thread
trying to access the shared buffer will be blocked until a owner signals
the shared buffer.

fence_helper_signal() - this function signals a shared buffer. This means
that the shared buffer access by a caller has been completed so other
threads can access the shared buffer. at this moment, if there are blocked
threads, they are waked up and then they commit their fences to the same
shared buffer: they was blocked at fence_helper_commit_reserve() so are
waked up at there.

And tutorial.
        when setting up dma relevant registers
                struct fence_helper *fh;

                fh = fence_helper_init(NULL, NULL);
                fence_helper_init_reserve(fh, dmabuf, WRITE or READ or DMA);

        just before dma start
                fence_helper_commit_reserve(fh);

        after the completion of dma operation
                fence_helper_signal(fh);

User interfaces.
DMA_BUF_GET_FENCE - this ioctl command performs the above 1-3 steps of
interfaces. a fence helper object is created by fence_helper_init() and a
pointer of the fence helper object is passed to user.

DMA_BUF_PUT_FENCE - this ioctl command performs the above 4 step of
interfaces. the pointer made of step 1 is passed into kernel through this
ioctl command.

And tutorial for user process.
        just before cpu access
                struct dma_buf_fence *df;

                df->type = DMA_BUF_ACCESS_READ or DMA_BUF_ACCESS_WRITE;
                ioctl(fd, DMA_BUF_GET_FENCE, &df);

        after memset or memcpy
                ioctl(fd, DMA_BUF_PUT_FENCE, &df);

Resource management.
Basically, a fence helper object is created and released internally;
created at fence_helper_init() and released at fence_helper_signal(). And
also this can be done case by case like below,
- If there was any thread blocked, its own resource is released at
fence_helper_commit_reserve() after the thread was waked up.
- If a blocked thread was timed out without fence_helper_singal() call;
anyone hasn't signaled a shared buffer, its own resource is released at
fence_helper_commit_reserve().
- if a shared buffer access by a thread was completed without blocked, its
own resource is released at fence_helper_signal().

References.
[1] http://lists.freedesktop.org/archives/dri-devel/2013-January/033436.html
[2] http://lists.freedesktop.org/archives/dri-devel/2013-January/033434.html

Thanks,
Inki Dae

[-- Attachment #1.2: Type: text/html, Size: 8785 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 114+ messages in thread
* Introduce a dmabuf sync framework for buffer synchronization
  2013-05-23 11:55       ` Daniel Vetter
@ 2013-06-07  9:02 ` Inki Dae
  -1 siblings, 0 replies; 114+ messages in thread
From: Inki Dae @ 2013-06-07  9:02 UTC (permalink / raw)
  To: linux-fbdev

Hi all,

Came back :)

Please, ignore previous fence helper. I have re-implemented buffer
synchronization mechanism, dmabuf sync, based on DMA BUF and wound/wait
style lock v5[1] and tested it with 2d gpu and drm kms drivers.

The below is dmabuf sync framework codes,
	
https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
h=dmabuf-sync&id®6c5a0146ab72ea64d9fc91af4595aacf9a5139
	
And g2d driver with dmabuf sync framework,
	
https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
h=dmabuf-sync&id@30bdee9bab5841ad32faade528d04cc0c5fc94

And also exynos drm kms framework,
	
https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/?
h=dmabuf-sync&idla548e9ea9e865592719ef6b1cde58366af9f5c

[1] https://patchwork-mail1.kernel.org/patch/2625321/

The purpose of this framework is not only to couple synchronizing caches and
buffer access between CPU and CPU, CPU and DMA, and DMA and DMA but also to
provide easy-to-use interfaces for device drivers: we may need
user interfaces but there is no good way for exposing the interfaces to user
side yet.

Basically, the mechanism of this framework has the following steps,
Initialize dmabuf sync object - A task gets a new sync object allocated and
initialized by dmabuf_sync_init(). This work should be done when a context
or similar thing of device driver is created or before cpu access.
    
Add a dmabuf to the sync object - A task can add a dmabuf or more that this
task wants to access to its own sync object. This work should be done just
before setting up dma buffer relevant registers or buffer cpu access.
    
Lock a sync object - A task tries to lock all dmabufs added in the sync
object. And for this, ww-mutexes is used to resolve dead lock. This work
should be done before dma start or cpu access. The lock means DMA or CPU of
current task can access dmabufs so the other cannot access dmabufs.

Unlock a sync object - A task tries to unlock all dmabufs added in the sync
object. This work should be done when after the completion of dma operation
or CPU access. The unlock means DMA or CPU access of current task has been
completed so the other can access the dmabufs.

In addition, this framework includes the following two features,
Consider read and write lock - this feature is for more performance: we
don't need to take a lock in case that a buffer was accessed for read and
current task tries to access the buffer for read. So when current task tries
to take a lock, this feature checks previous access type and desired type to
a given buffer and then decides if it has to take a lock to the buffer or
not.
    
Consider lockup and resource management - this feature considers the case
that any task never unlocks a buffer after taking a lock to the buffer. In
this case, a timer handler to this task is called sometimes later and then
the buffer is unlocked by workqueue handler to avoid lockup and release the
buffer relevant resources.
    
Tutorial.
	when allocating and initializing device context
		struct dmabuf_sync *sync;

		sync = dmabuf_sync_init(NULL, NULL);

	when setting up dma buffer relevant registers
		/* it can be called repeatly for multiple buffers. */
		dmabuf_sync_get(sync, dmabuf);

	just before dma start or cpu access
		dmabuf_sync_lock(sync);

	after the completion of dma operation or cpu access
		dmabuf_sync_unlock(sync);
		dmabuf_sync_put_all(sync);
		dmabuf_sync_fini(sync);


Deadlock reproduction with dmabuf sync.
For deadlock test, I had tried to reproduce deadlock situation like below
and the below approach had been worked well,
(Please, presume that two tasks share two dmabufs together.)

	[Task A]
	struct dmabuf_sync *sync;

	sync = dmabuf_sync_init(NULL, NULL);
	
	dmabuf_sync_get(sync, dmabuf A);
	dmabuf_sync_get(sync, dmabuf B);

	while(1) {
		dmabuf_sync_lock(sync);
		sleep(1);
		dmabuf_sync_unlock(sync);
		sleep(1);
	}

	[Task B]
	struct dmabuf_sync *sync;

	sync = dmabuf_sync_init(NULL, NULL);
	
	dmabuf_sync_get(sync, dmabuf C);
	dmabuf_sync_get(sync, dmabuf A);

	while(1) {
		dmabuf_sync_lock(sync); <-- deadlock
		sleep(1);
		dmabuf_sync_unlock(sync);
		sleep(1);
	}

With the above example codes, deadlock occurs when Task B called
dmabuf_sync_lock function: internally, Task B takes a lock to dmabuf C and
then tries to take a lock to dmabuf A. But at this time, ww_mutex_lock()
returns -EDEADLK because ctx->acquired became 1 once taking a lock to dmabuf
C. And then task B unlocks dmabuf C and takes a slowpath lock to dmabuf A.
And then once Task A unlocks dmabuf A, Task B tries to take a lock to dmabuf
C again.

And the below is my concerns and opinions,
A dma buf has a reservation object when a buffer is exported to the dma buf
- I'm not sure but it seems that reservation object is used for x86 gpu;
having vram and different domain, or similar devices. So in case of embedded
system, most dma devices and cpu share system memory so I think that
reservation object should be considered for them also because basically,
buffer synchronization mechanism should be worked based on dmabuf. For this,
I have added four members to reservation_object; shared_cnt and shared for
read lock, accessed_type for cache operation, and locked for timeout case.
Hence, some devices might need specific something for itself. So how about
remaining only common part for reservation_object structure; it seems that
fence_excl, fence_shared, and so on are not common part.

Now wound/wait mutex doesn't consider read and write lock - In case of using
ww-mutexes for buffer synchronization, it seems that we need read and write
lock for more performance; read access and then read access doesn't need to
be locked. For this, I have added three members, shared_cnt and shared to
reservation_object and this is just to show you how we can use read lock.
However, I'm sure that there is a better/more generic way.

The above all things is just quick implementation for buffer synchronization
so this should be more cleaned up and there might be my missing point.
Please give me your advices and opinions.

Thanks,
Inki Dae


^ permalink raw reply	[flat|nested] 114+ messages in thread

end of thread, other threads:[~2013-06-07  9:02 UTC | newest]

Thread overview: 114+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-09  7:33 Introduce a new helper framework for buffer synchronization Inki Dae
2013-05-13  8:00 ` Maarten Lankhorst
2013-05-13  8:00   ` Maarten Lankhorst
2013-05-13  8:00   ` Maarten Lankhorst
2013-05-13  9:21   ` Inki Dae
2013-05-13  9:21     ` Inki Dae
2013-05-13  9:21     ` Inki Dae
2013-05-13  9:52     ` Maarten Lankhorst
2013-05-13  9:52       ` Maarten Lankhorst
2013-05-13  9:52       ` Maarten Lankhorst
2013-05-13 11:24   ` Inki Dae
2013-05-13 11:24     ` Inki Dae
2013-05-13 11:24     ` Inki Dae
2013-05-13 11:40     ` Maarten Lankhorst
2013-05-13 11:40       ` Maarten Lankhorst
2013-05-13 11:40       ` Maarten Lankhorst
2013-05-13 19:29     ` Tomasz Figa
2013-05-13 19:29       ` Tomasz Figa
2013-05-13 19:29       ` Tomasz Figa
2013-05-13 12:21   ` Inki Dae
2013-05-13 12:21     ` Inki Dae
2013-05-13 12:21     ` Inki Dae
2013-05-13 13:48     ` Rob Clark
2013-05-13 13:48       ` Rob Clark
2013-05-13 13:48       ` Rob Clark
2013-05-13 17:18       ` Inki Dae
2013-05-13 17:58         ` Rob Clark
2013-05-13 17:58           ` Rob Clark
2013-05-13 17:58           ` Rob Clark
2013-05-14  2:52   ` Inki Dae
2013-05-14  2:52     ` Inki Dae
2013-05-14  2:52     ` Inki Dae
2013-05-14 13:38     ` Rob Clark
2013-05-14 13:38       ` Rob Clark
2013-05-14 13:38       ` Rob Clark
2013-05-15  5:19   ` Inki Dae
2013-05-15  5:19     ` Inki Dae
2013-05-15  5:19     ` Inki Dae
2013-05-15 14:06     ` Rob Clark
2013-05-15 14:06       ` Rob Clark
2013-05-15 14:06       ` Rob Clark
2013-05-17  4:19       ` Daniel Vetter
2013-05-17  4:19         ` Daniel Vetter
2013-05-17  4:19         ` Daniel Vetter
2013-05-18  7:43         ` Inki Dae
2013-05-18  6:47       ` Inki Dae
2013-05-20 21:13         ` Daniel Vetter
2013-05-20 21:13           ` Daniel Vetter
2013-05-20 21:13           ` Daniel Vetter
2013-05-20 21:30           ` Daniel Vetter
2013-05-20 21:30             ` Daniel Vetter
2013-05-20 21:30             ` Daniel Vetter
2013-05-21  7:03   ` Inki Dae
2013-05-21  7:03     ` Inki Dae
2013-05-21  7:03     ` Inki Dae
2013-05-21  7:44     ` Daniel Vetter
2013-05-21  7:44       ` Daniel Vetter
2013-05-21  7:44       ` Daniel Vetter
2013-05-21  9:22   ` Inki Dae
2013-05-21  9:22     ` Inki Dae
2013-05-21  9:22     ` Inki Dae
2013-05-23 11:55     ` Daniel Vetter
2013-05-23 11:55       ` Daniel Vetter
2013-05-23 11:55       ` Daniel Vetter
2013-05-23 13:37   ` Inki Dae
2013-05-23 13:37     ` Inki Dae
2013-05-23 13:37     ` Inki Dae
2013-05-27 10:38   ` Inki Dae
2013-05-27 10:38     ` Inki Dae
2013-05-27 10:38     ` Inki Dae
2013-05-27 15:23     ` Maarten Lankhorst
2013-05-27 15:23       ` Maarten Lankhorst
2013-05-27 15:23       ` Maarten Lankhorst
2013-05-27 15:23       ` Maarten Lankhorst
2013-05-27 15:47     ` Rob Clark
2013-05-27 15:47       ` Rob Clark
2013-05-27 15:47       ` Rob Clark
2013-05-27 16:02       ` Daniel Vetter
2013-05-27 16:02         ` Daniel Vetter
2013-05-27 16:02         ` Daniel Vetter
2013-05-28  2:49   ` Inki Dae
2013-05-28  2:49     ` Inki Dae
2013-05-28  2:49     ` Inki Dae
2013-05-28  7:23     ` Maarten Lankhorst
2013-05-28  7:23       ` Maarten Lankhorst
2013-05-28  7:23       ` Maarten Lankhorst
2013-05-28  7:23       ` Maarten Lankhorst
2013-05-28  3:56   ` Inki Dae
2013-05-28  3:56     ` Inki Dae
2013-05-28  3:56     ` Inki Dae
2013-05-28  3:56     ` Inki Dae
2013-05-28 10:32     ` Daniel Vetter
2013-05-28 10:32       ` Daniel Vetter
2013-05-28 10:32       ` Daniel Vetter
2013-05-28 13:48     ` Rob Clark
2013-05-28 13:48       ` Rob Clark
2013-05-28 13:48       ` Rob Clark
2013-05-28  4:04   ` Inki Dae
2013-05-28  4:04     ` Inki Dae
2013-05-28  4:04     ` Inki Dae
2013-05-28 14:43   ` Inki Dae
2013-05-28 14:43     ` Inki Dae
2013-05-28 14:43     ` Inki Dae
2013-05-28 14:50   ` Inki Dae
2013-05-28 14:50     ` Inki Dae
2013-05-28 14:50     ` Inki Dae
2013-05-28 16:49     ` Daniel Vetter
2013-05-28 16:49       ` Daniel Vetter
2013-05-28 16:49       ` Daniel Vetter
2013-05-29  2:21   ` Inki Dae
2013-05-29  2:21     ` Inki Dae
2013-05-29  2:21     ` Inki Dae
  -- strict thread matches above, loose matches on Subject: below --
2013-06-07  9:02 Introduce a dmabuf sync " Inki Dae
2013-06-07  9:02 ` Inki Dae

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.