* Introduce a new helper framework for buffer synchronization [not found] <CAAQKjZNNw4qddo6bE5OY_CahrqDtqkxdO7Pm9RCguXyj9F4cMQ@mail.gmail.com> @ 2013-05-13 8:00 ` Maarten Lankhorst 2013-05-13 9:21 ` Inki Dae 0 siblings, 1 reply; 35+ messages in thread From: Maarten Lankhorst @ 2013-05-13 8:00 UTC (permalink / raw) To: linux-arm-kernel Op 09-05-13 09:33, Inki Dae schreef: > Hi all, > > This post introduces a new helper framework based on dma fence. And the > purpose of this post is to collect other opinions and advices before RFC > posting. > > First of all, this helper framework, called fence helper, is in progress > yet so might not have enough comments in codes and also might need to be > more cleaned up. Moreover, we might be missing some parts of the dma fence. > However, I'd like to say that all things mentioned below has been tested > with Linux platform and worked well. > .... > > And tutorial for user process. > just before cpu access > struct dma_buf_fence *df; > > df->type = DMA_BUF_ACCESS_READ or DMA_BUF_ACCESS_WRITE; > ioctl(fd, DMA_BUF_GET_FENCE, &df); > > after memset or memcpy > ioctl(fd, DMA_BUF_PUT_FENCE, &df); NAK. Userspace doesn't need to trigger fences. It can do a buffer idle wait, and postpone submitting new commands until after it's done using the buffer. Kernel space doesn't need the root hole you created by giving a dereferencing a pointer passed from userspace. Your next exercise should be to write a security exploit from the api you created here. It's the only way to learn how to write safe code. Hint: df.ctx = mmap(..); ~Maarten ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-13 8:00 ` Introduce a new helper framework for buffer synchronization Maarten Lankhorst @ 2013-05-13 9:21 ` Inki Dae 2013-05-13 9:52 ` Maarten Lankhorst 0 siblings, 1 reply; 35+ messages in thread From: Inki Dae @ 2013-05-13 9:21 UTC (permalink / raw) To: linux-arm-kernel > -----Original Message----- > From: Maarten Lankhorst [mailto:maarten.lankhorst at canonical.com] > Sent: Monday, May 13, 2013 5:01 PM > To: Inki Dae > Cc: Rob Clark; Daniel Vetter; DRI mailing list; linux-arm- > kernel at lists.infradead.org; linux-media at vger.kernel.org; linux-fbdev; > Kyungmin Park; myungjoo.ham; YoungJun Cho > Subject: Re: Introduce a new helper framework for buffer synchronization > > Op 09-05-13 09:33, Inki Dae schreef: > > Hi all, > > > > This post introduces a new helper framework based on dma fence. And the > > purpose of this post is to collect other opinions and advices before RFC > > posting. > > > > First of all, this helper framework, called fence helper, is in progress > > yet so might not have enough comments in codes and also might need to be > > more cleaned up. Moreover, we might be missing some parts of the dma > fence. > > However, I'd like to say that all things mentioned below has been tested > > with Linux platform and worked well. > > > .... > > > > And tutorial for user process. > > just before cpu access > > struct dma_buf_fence *df; > > > > df->type = DMA_BUF_ACCESS_READ or DMA_BUF_ACCESS_WRITE; > > ioctl(fd, DMA_BUF_GET_FENCE, &df); > > > > after memset or memcpy > > ioctl(fd, DMA_BUF_PUT_FENCE, &df); > NAK. > > Userspace doesn't need to trigger fences. It can do a buffer idle wait, > and postpone submitting new commands until after it's done using the > buffer. Hi Maarten, It seems that you say user should wait for a buffer like KDS does: KDS uses select() to postpone submitting new commands. But I think this way assumes that every data flows a DMA device to a CPU. For example, a CPU should keep polling for the completion of a buffer access by a DMA device. This means that the this way isn't considered for data flow to opposite case; CPU to DMA device. > Kernel space doesn't need the root hole you created by giving a > dereferencing a pointer passed from userspace. > Your next exercise should be to write a security exploit from the api you > created here. It's the only way to learn how to write safe code. Hint: > df.ctx = mmap(..); > Also I'm not clear to use our way yet and that is why I posted. As you mentioned, it seems like that using mmap() is more safe. But there is one issue it makes me confusing. For your hint, df.ctx = mmap(..), the issue is that dmabuf mmap can be used to map a dmabuf with user space. And the dmabuf means a physical memory region allocated by some allocator such as drm gem or ion. There might be my missing point so could you please give me more comments? Thanks, Inki Dae > ~Maarten ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-13 9:21 ` Inki Dae @ 2013-05-13 9:52 ` Maarten Lankhorst 2013-05-13 11:24 ` Inki Dae 0 siblings, 1 reply; 35+ messages in thread From: Maarten Lankhorst @ 2013-05-13 9:52 UTC (permalink / raw) To: linux-arm-kernel Op 13-05-13 11:21, Inki Dae schreef: > >> -----Original Message----- >> From: Maarten Lankhorst [mailto:maarten.lankhorst at canonical.com] >> Sent: Monday, May 13, 2013 5:01 PM >> To: Inki Dae >> Cc: Rob Clark; Daniel Vetter; DRI mailing list; linux-arm- >> kernel at lists.infradead.org; linux-media at vger.kernel.org; linux-fbdev; >> Kyungmin Park; myungjoo.ham; YoungJun Cho >> Subject: Re: Introduce a new helper framework for buffer synchronization >> >> Op 09-05-13 09:33, Inki Dae schreef: >>> Hi all, >>> >>> This post introduces a new helper framework based on dma fence. And the >>> purpose of this post is to collect other opinions and advices before RFC >>> posting. >>> >>> First of all, this helper framework, called fence helper, is in progress >>> yet so might not have enough comments in codes and also might need to be >>> more cleaned up. Moreover, we might be missing some parts of the dma >> fence. >>> However, I'd like to say that all things mentioned below has been tested >>> with Linux platform and worked well. >>> .... >>> >>> And tutorial for user process. >>> just before cpu access >>> struct dma_buf_fence *df; >>> >>> df->type = DMA_BUF_ACCESS_READ or DMA_BUF_ACCESS_WRITE; >>> ioctl(fd, DMA_BUF_GET_FENCE, &df); >>> >>> after memset or memcpy >>> ioctl(fd, DMA_BUF_PUT_FENCE, &df); >> NAK. >> >> Userspace doesn't need to trigger fences. It can do a buffer idle wait, >> and postpone submitting new commands until after it's done using the >> buffer. > Hi Maarten, > > It seems that you say user should wait for a buffer like KDS does: KDS uses > select() to postpone submitting new commands. But I think this way assumes > that every data flows a DMA device to a CPU. For example, a CPU should keep > polling for the completion of a buffer access by a DMA device. This means > that the this way isn't considered for data flow to opposite case; CPU to > DMA device. Not really. You do both things the same way. You first wait for the bo to be idle, this could be implemented by adding poll support to the dma-buf fd. Then you either do your read or write. Since userspace is supposed to be the one controlling the bo it should stay idle at that point. If you have another thread queueing the buffer againbefore your thread is done that's a bug in the application, and can be solved with userspace locking primitives. No need for the kernel to get involved. >> Kernel space doesn't need the root hole you created by giving a >> dereferencing a pointer passed from userspace. >> Your next exercise should be to write a security exploit from the api you >> created here. It's the only way to learn how to write safe code. Hint: >> df.ctx = mmap(..); >> > Also I'm not clear to use our way yet and that is why I posted. As you > mentioned, it seems like that using mmap() is more safe. But there is one > issue it makes me confusing. For your hint, df.ctx = mmap(..), the issue is > that dmabuf mmap can be used to map a dmabuf with user space. And the dmabuf > means a physical memory region allocated by some allocator such as drm gem > or ion. > > There might be my missing point so could you please give me more comments? > My point was that userspace could change df.ctx to some mmap'd memory, forcing the kernel to execute some code prepared by userspace. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-13 9:52 ` Maarten Lankhorst @ 2013-05-13 11:24 ` Inki Dae 2013-05-13 11:40 ` Maarten Lankhorst 2013-05-13 19:29 ` Tomasz Figa 0 siblings, 2 replies; 35+ messages in thread From: Inki Dae @ 2013-05-13 11:24 UTC (permalink / raw) To: linux-arm-kernel > -----Original Message----- > From: Maarten Lankhorst [mailto:maarten.lankhorst at canonical.com] > Sent: Monday, May 13, 2013 6:52 PM > To: Inki Dae > Cc: 'Rob Clark'; 'Daniel Vetter'; 'DRI mailing list'; linux-arm- > kernel at lists.infradead.org; linux-media at vger.kernel.org; 'linux-fbdev'; > 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho' > Subject: Re: Introduce a new helper framework for buffer synchronization > > Op 13-05-13 11:21, Inki Dae schreef: > > > >> -----Original Message----- > >> From: Maarten Lankhorst [mailto:maarten.lankhorst at canonical.com] > >> Sent: Monday, May 13, 2013 5:01 PM > >> To: Inki Dae > >> Cc: Rob Clark; Daniel Vetter; DRI mailing list; linux-arm- > >> kernel at lists.infradead.org; linux-media at vger.kernel.org; linux-fbdev; > >> Kyungmin Park; myungjoo.ham; YoungJun Cho > >> Subject: Re: Introduce a new helper framework for buffer > synchronization > >> > >> Op 09-05-13 09:33, Inki Dae schreef: > >>> Hi all, > >>> > >>> This post introduces a new helper framework based on dma fence. And > the > >>> purpose of this post is to collect other opinions and advices before > RFC > >>> posting. > >>> > >>> First of all, this helper framework, called fence helper, is in > progress > >>> yet so might not have enough comments in codes and also might need to > be > >>> more cleaned up. Moreover, we might be missing some parts of the dma > >> fence. > >>> However, I'd like to say that all things mentioned below has been > tested > >>> with Linux platform and worked well. > >>> .... > >>> > >>> And tutorial for user process. > >>> just before cpu access > >>> struct dma_buf_fence *df; > >>> > >>> df->type = DMA_BUF_ACCESS_READ or DMA_BUF_ACCESS_WRITE; > >>> ioctl(fd, DMA_BUF_GET_FENCE, &df); > >>> > >>> after memset or memcpy > >>> ioctl(fd, DMA_BUF_PUT_FENCE, &df); > >> NAK. > >> > >> Userspace doesn't need to trigger fences. It can do a buffer idle wait, > >> and postpone submitting new commands until after it's done using the > >> buffer. > > Hi Maarten, > > > > It seems that you say user should wait for a buffer like KDS does: KDS > uses > > select() to postpone submitting new commands. But I think this way > assumes > > that every data flows a DMA device to a CPU. For example, a CPU should > keep > > polling for the completion of a buffer access by a DMA device. This > means > > that the this way isn't considered for data flow to opposite case; CPU > to > > DMA device. > Not really. You do both things the same way. You first wait for the bo to > be idle, this could be implemented by adding poll support to the dma-buf > fd. > Then you either do your read or write. Since userspace is supposed to be > the one controlling the bo it should stay idle at that point. If you have > another thread queueing > the buffer againbefore your thread is done that's a bug in the application, > and can be solved with userspace locking primitives. No need for the > kernel to get involved. > Yes, that is how we have synchronized buffer between CPU and DMA device until now without buffer synchronization mechanism. I thought that it's best to make user not considering anything: user can access a buffer regardless of any DMA device controlling and the buffer synchronization is performed in kernel level. Moreover, I think we could optimize graphics and multimedia hardware performance because hardware can do more works: one thread accesses a shared buffer and the other controls DMA device with the shared buffer in parallel. Thus, we could avoid sequential processing and that is my intention. Aren't you think about that we could improve hardware utilization with such way or other? of course, there could be better way. > >> Kernel space doesn't need the root hole you created by giving a > >> dereferencing a pointer passed from userspace. > >> Your next exercise should be to write a security exploit from the api > you > >> created here. It's the only way to learn how to write safe code. Hint: > >> df.ctx = mmap(..); > >> > > Also I'm not clear to use our way yet and that is why I posted. As you > > mentioned, it seems like that using mmap() is more safe. But there is > one > > issue it makes me confusing. For your hint, df.ctx = mmap(..), the issue > is > > that dmabuf mmap can be used to map a dmabuf with user space. And the > dmabuf > > means a physical memory region allocated by some allocator such as drm > gem > > or ion. > > > > There might be my missing point so could you please give me more > comments? > > > My point was that userspace could change df.ctx to some mmap'd memory, > forcing the kernel to execute some code prepared by userspace. Understood. I have to find a better way. And for this, I'd like to listen attentively more opinions and advices. Thanks for comments, Inki Dae ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-13 11:24 ` Inki Dae @ 2013-05-13 11:40 ` Maarten Lankhorst 2013-05-13 12:21 ` Inki Dae 2013-05-13 19:29 ` Tomasz Figa 1 sibling, 1 reply; 35+ messages in thread From: Maarten Lankhorst @ 2013-05-13 11:40 UTC (permalink / raw) To: linux-arm-kernel Op 13-05-13 13:24, Inki Dae schreef: >> and can be solved with userspace locking primitives. No need for the >> kernel to get involved. >> > Yes, that is how we have synchronized buffer between CPU and DMA device > until now without buffer synchronization mechanism. I thought that it's best > to make user not considering anything: user can access a buffer regardless > of any DMA device controlling and the buffer synchronization is performed in > kernel level. Moreover, I think we could optimize graphics and multimedia > hardware performance because hardware can do more works: one thread accesses > a shared buffer and the other controls DMA device with the shared buffer in > parallel. Thus, we could avoid sequential processing and that is my > intention. Aren't you think about that we could improve hardware utilization > with such way or other? of course, there could be better way. > If you don't want to block the hardware the only option is to allocate a temp bo and blit to/from it using the hardware. OpenGL already does that when you want to read back, because otherwise the entire pipeline can get stalled. The overhead of command submission for that shouldn't be high, if it is you should really try to optimize that first before coming up with this crazy scheme. In that case you still wouldn't give userspace control over the fences. I don't see any way that can end well. What if userspace never signals? What if userspace gets killed by oom killer. Who keeps track of that? ~Maarten ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-13 11:40 ` Maarten Lankhorst @ 2013-05-13 12:21 ` Inki Dae 2013-05-13 13:48 ` Rob Clark 0 siblings, 1 reply; 35+ messages in thread From: Inki Dae @ 2013-05-13 12:21 UTC (permalink / raw) To: linux-arm-kernel > -----Original Message----- > From: linux-fbdev-owner at vger.kernel.org [mailto:linux-fbdev- > owner at vger.kernel.org] On Behalf Of Maarten Lankhorst > Sent: Monday, May 13, 2013 8:41 PM > To: Inki Dae > Cc: 'Rob Clark'; 'Daniel Vetter'; 'DRI mailing list'; linux-arm- > kernel at lists.infradead.org; linux-media at vger.kernel.org; 'linux-fbdev'; > 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho' > Subject: Re: Introduce a new helper framework for buffer synchronization > > Op 13-05-13 13:24, Inki Dae schreef: > >> and can be solved with userspace locking primitives. No need for the > >> kernel to get involved. > >> > > Yes, that is how we have synchronized buffer between CPU and DMA device > > until now without buffer synchronization mechanism. I thought that it's > best > > to make user not considering anything: user can access a buffer > regardless > > of any DMA device controlling and the buffer synchronization is > performed in > > kernel level. Moreover, I think we could optimize graphics and > multimedia > > hardware performance because hardware can do more works: one thread > accesses > > a shared buffer and the other controls DMA device with the shared buffer > in > > parallel. Thus, we could avoid sequential processing and that is my > > intention. Aren't you think about that we could improve hardware > utilization > > with such way or other? of course, there could be better way. > > > If you don't want to block the hardware the only option is to allocate a > temp bo and blit to/from it using the hardware. > OpenGL already does that when you want to read back, because otherwise the > entire pipeline can get stalled. > The overhead of command submission for that shouldn't be high, if it is > you should really try to optimize that first > before coming up with this crazy scheme. > I have considered all devices sharing buffer with CPU; multimedia, display controller and graphics devices (including GPU). And we could provide easy-to-use user interfaces for buffer synchronization. Of course, the proposed user interfaces may be so ugly yet but at least, I think we need something else for it. > In that case you still wouldn't give userspace control over the fences. I > don't see any way that can end well. > What if userspace never signals? What if userspace gets killed by oom > killer. Who keeps track of that? > In all cases, all kernel resources to user fence will be released by kernel once the fence is timed out: never signaling and process killing by oom killer makes the fence timed out. And if we use mmap mechanism you mentioned before, I think user resource could also be freed properly. Thanks, Inki Dae > ~Maarten > -- > To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-13 12:21 ` Inki Dae @ 2013-05-13 13:48 ` Rob Clark [not found] ` <CAAQKjZP=iOmHRpHZCbZD3v_RKUFSn0eM_WVZZvhe7F9g3eTmPA@mail.gmail.com> 0 siblings, 1 reply; 35+ messages in thread From: Rob Clark @ 2013-05-13 13:48 UTC (permalink / raw) To: linux-arm-kernel On Mon, May 13, 2013 at 8:21 AM, Inki Dae <inki.dae@samsung.com> wrote: > >> In that case you still wouldn't give userspace control over the fences. I >> don't see any way that can end well. >> What if userspace never signals? What if userspace gets killed by oom >> killer. Who keeps track of that? >> > > In all cases, all kernel resources to user fence will be released by kernel > once the fence is timed out: never signaling and process killing by oom > killer makes the fence timed out. And if we use mmap mechanism you mentioned > before, I think user resource could also be freed properly. I tend to agree w/ Maarten here.. there is no good reason for userspace to be *signaling* fences. The exception might be some blob gpu drivers which don't have enough knowledge in the kernel to figure out what to do. (In which case you can add driver private ioctls for that.. still not the right thing to do but at least you don't make a public API out of it.) BR, -R ^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <CAAQKjZP=iOmHRpHZCbZD3v_RKUFSn0eM_WVZZvhe7F9g3eTmPA@mail.gmail.com>]
* Introduce a new helper framework for buffer synchronization [not found] ` <CAAQKjZP=iOmHRpHZCbZD3v_RKUFSn0eM_WVZZvhe7F9g3eTmPA@mail.gmail.com> @ 2013-05-13 17:58 ` Rob Clark 2013-05-14 2:52 ` Inki Dae 0 siblings, 1 reply; 35+ messages in thread From: Rob Clark @ 2013-05-13 17:58 UTC (permalink / raw) To: linux-arm-kernel On Mon, May 13, 2013 at 1:18 PM, Inki Dae <inki.dae@samsung.com> wrote: > > > 2013/5/13 Rob Clark <robdclark@gmail.com> >> >> On Mon, May 13, 2013 at 8:21 AM, Inki Dae <inki.dae@samsung.com> wrote: >> > >> >> In that case you still wouldn't give userspace control over the fences. >> >> I >> >> don't see any way that can end well. >> >> What if userspace never signals? What if userspace gets killed by oom >> >> killer. Who keeps track of that? >> >> >> > >> > In all cases, all kernel resources to user fence will be released by >> > kernel >> > once the fence is timed out: never signaling and process killing by oom >> > killer makes the fence timed out. And if we use mmap mechanism you >> > mentioned >> > before, I think user resource could also be freed properly. >> >> >> I tend to agree w/ Maarten here.. there is no good reason for >> userspace to be *signaling* fences. The exception might be some blob >> gpu drivers which don't have enough knowledge in the kernel to figure >> out what to do. (In which case you can add driver private ioctls for >> that.. still not the right thing to do but at least you don't make a >> public API out of it.) >> > > Please do not care whether those are generic or not. Let's see the following > three things. First, it's cache operation. As you know, ARM SoC has ACP > (Accelerator Coherency Port) and can be connected to DMA engine or similar > devices. And this port is used for cache coherency between CPU cache and DMA > device. However, most devices on ARM based embedded systems don't use the > ACP port. So they need proper cache operation before and after of DMA or CPU > access in case of using cachable mapping. Actually, I see many Linux based > platforms call cache control interfaces directly for that. I think the > reason, they do so, is that kernel isn't aware of when and how CPU accessed > memory. I think we had kicked around the idea of giving dmabuf's a prepare/finish ioctl quite some time back. This is probably something that should be at least a bit decoupled from fences. (Possibly 'prepare' waits for dma access to complete, but not the other way around.) And I did implement in omapdrm support for simulating coherency via page fault-in / shoot-down.. It is one option that makes it completely transparent to userspace, although there is some performance const, so I suppose it depends a bit on your use-case. > And second, user process has to do so many things in case of using shared > memory with DMA device. User process should understand how DMA device is > operated and when interfaces for controling the DMA device are called. Such > things would make user application so complicated. > > And third, it's performance optimization to multimedia and graphics devices. > As I mentioned already, we should consider sequential processing for buffer > sharing between CPU and DMA device. This means that CPU should stay with > idle until DMA device is completed and vise versa. > > That is why I proposed such user interfaces. Of course, these interfaces > might be so ugly yet: for this, Maarten pointed already out and I agree with > him. But there must be another better way. Aren't you think we need similar > thing? With such interfaces, cache control and buffer synchronization can be > performed in kernel level. Moreover, user applization doesn't need to > consider DMA device controlling anymore. Therefore, one thread can access a > shared buffer and the other can control DMA device with the shared buffer in > parallel. We can really make the best use of CPU and DMA idle time. In other > words, we can really make the best use of multi tasking OS, Linux. > > So could you please tell me about that there is any reason we don't use > public API for it? I think we can add and use public API if NECESSARY. well, for cache management, I think it is a better idea.. I didn't really catch that this was the motivation from the initial patch, but maybe I read it too quickly. But cache can be decoupled from synchronization, because CPU access is not asynchronous. For userspace/CPU access to buffer, you should: 1) wait for buffer 2) prepare-access 3) ... do whatever cpu access to buffer ... 4) finish-access 5) submit buffer for new dma-operation I suppose you could combine the syscall for #1 and #2.. not sure if that is a good idea or not. But you don't need to. And there is never really any need for userspace to signal a fence. BR, -R > Thanks, > Inki Dae > >> >> BR, >> -R >> _______________________________________________ >> dri-devel mailing list >> dri-devel at lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/dri-devel > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-13 17:58 ` Rob Clark @ 2013-05-14 2:52 ` Inki Dae 2013-05-14 13:38 ` Rob Clark 0 siblings, 1 reply; 35+ messages in thread From: Inki Dae @ 2013-05-14 2:52 UTC (permalink / raw) To: linux-arm-kernel > -----Original Message----- > From: Rob Clark [mailto:robdclark at gmail.com] > Sent: Tuesday, May 14, 2013 2:58 AM > To: Inki Dae > Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun > Cho; linux-arm-kernel at lists.infradead.org; linux-media at vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Mon, May 13, 2013 at 1:18 PM, Inki Dae <inki.dae@samsung.com> wrote: > > > > > > 2013/5/13 Rob Clark <robdclark@gmail.com> > >> > >> On Mon, May 13, 2013 at 8:21 AM, Inki Dae <inki.dae@samsung.com> wrote: > >> > > >> >> In that case you still wouldn't give userspace control over the > fences. > >> >> I > >> >> don't see any way that can end well. > >> >> What if userspace never signals? What if userspace gets killed by > oom > >> >> killer. Who keeps track of that? > >> >> > >> > > >> > In all cases, all kernel resources to user fence will be released by > >> > kernel > >> > once the fence is timed out: never signaling and process killing by > oom > >> > killer makes the fence timed out. And if we use mmap mechanism you > >> > mentioned > >> > before, I think user resource could also be freed properly. > >> > >> > >> I tend to agree w/ Maarten here.. there is no good reason for > >> userspace to be *signaling* fences. The exception might be some blob > >> gpu drivers which don't have enough knowledge in the kernel to figure > >> out what to do. (In which case you can add driver private ioctls for > >> that.. still not the right thing to do but at least you don't make a > >> public API out of it.) > >> > > > > Please do not care whether those are generic or not. Let's see the > following > > three things. First, it's cache operation. As you know, ARM SoC has ACP > > (Accelerator Coherency Port) and can be connected to DMA engine or > similar > > devices. And this port is used for cache coherency between CPU cache and > DMA > > device. However, most devices on ARM based embedded systems don't use > the > > ACP port. So they need proper cache operation before and after of DMA or > CPU > > access in case of using cachable mapping. Actually, I see many Linux > based > > platforms call cache control interfaces directly for that. I think the > > reason, they do so, is that kernel isn't aware of when and how CPU > accessed > > memory. > > I think we had kicked around the idea of giving dmabuf's a > prepare/finish ioctl quite some time back. This is probably something > that should be at least a bit decoupled from fences. (Possibly > 'prepare' waits for dma access to complete, but not the other way > around.) > > And I did implement in omapdrm support for simulating coherency via > page fault-in / shoot-down.. It is one option that makes it > completely transparent to userspace, although there is some > performance const, so I suppose it depends a bit on your use-case. > > > And second, user process has to do so many things in case of using > shared > > memory with DMA device. User process should understand how DMA device is > > operated and when interfaces for controling the DMA device are called. > Such > > things would make user application so complicated. > > > > And third, it's performance optimization to multimedia and graphics > devices. > > As I mentioned already, we should consider sequential processing for > buffer > > sharing between CPU and DMA device. This means that CPU should stay with > > idle until DMA device is completed and vise versa. > > > > That is why I proposed such user interfaces. Of course, these interfaces > > might be so ugly yet: for this, Maarten pointed already out and I agree > with > > him. But there must be another better way. Aren't you think we need > similar > > thing? With such interfaces, cache control and buffer synchronization > can be > > performed in kernel level. Moreover, user applization doesn't need to > > consider DMA device controlling anymore. Therefore, one thread can > access a > > shared buffer and the other can control DMA device with the shared > buffer in > > parallel. We can really make the best use of CPU and DMA idle time. In > other > > words, we can really make the best use of multi tasking OS, Linux. > > > > So could you please tell me about that there is any reason we don't use > > public API for it? I think we can add and use public API if NECESSARY. > > well, for cache management, I think it is a better idea.. I didn't > really catch that this was the motivation from the initial patch, but > maybe I read it too quickly. But cache can be decoupled from > synchronization, because CPU access is not asynchronous. For > userspace/CPU access to buffer, you should: > > 1) wait for buffer > 2) prepare-access > 3) ... do whatever cpu access to buffer ... > 4) finish-access > 5) submit buffer for new dma-operation > For data flow from CPU to DMA device, 1) wait for buffer 2) prepare-access (dma_buf_begin_cpu_access) 3) cpu access to buffer For data flow from DMA device to CPU 1) wait for buffer 2) finish-access (dma_buf_end _cpu_access) 3) dma access to buffer 1) and 2) are coupled with one function: we have implemented fence_helper_commit_reserve() for it. Cache control(cache clean or cache invalidate) is performed properly checking previous access type and current access type. And the below is actual codes for it, static void fence_helper_cache_ops(struct fence_helper *fh) { struct seqno_fence_dmabuf *sfd; list_for_each_entry(sfd, &fh->sf.sync_buf_list, list) { struct dma_buf *dmabuf = sfd->sync_buf; if (WARN_ON(!dmabuf)) continue; /* first time access. */ if (!dmabuf->access_type) goto out; if (dmabuf->access_type == DMA_BUF_ACCESS_WRITE && ((fh->type == (DMA_BUF_ACCESS_READ | DMA_BUF_ACCESS_DMA)) || (fh->type == (DMA_BUF_ACCESS_WRITE | DMA_BUF_ACCESS_DMA)))) /* cache clean */ dma_buf_end_cpu_access(dmabuf, 0, dmabuf->size, DMA_TO_DEVICE); else if (dmabuf->access_type == (DMA_BUF_ACCESS_WRITE | DMA_BUF_ACCESS_DMA) && (fh->type == DMA_BUF_ACCESS_READ)) /* cache invalidate */ dma_buf_begin_cpu_access(dmabuf, 0, dmabuf->size, DMA_FROM_DEVICE); out: /* Update access type to new one. */ dmabuf->access_type = fh->type; } } The above function is called after wait for buffer. Thus, we can check who (CPU or DMA) and how (READ or WRITE) accessed and accesses buffer with this approach. In other words, now kernel is aware of buffer access by CPU also. Thanks, Inki Dae > I suppose you could combine the syscall for #1 and #2.. not sure if > that is a good idea or not. But you don't need to. And there is > never really any need for userspace to signal a fence. > > BR, > -R > > > Thanks, > > Inki Dae > > > >> > >> BR, > >> -R > >> _______________________________________________ > >> dri-devel mailing list > >> dri-devel at lists.freedesktop.org > >> http://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-14 2:52 ` Inki Dae @ 2013-05-14 13:38 ` Rob Clark 2013-05-15 5:19 ` Inki Dae 0 siblings, 1 reply; 35+ messages in thread From: Rob Clark @ 2013-05-14 13:38 UTC (permalink / raw) To: linux-arm-kernel On Mon, May 13, 2013 at 10:52 PM, Inki Dae <inki.dae@samsung.com> wrote: >> well, for cache management, I think it is a better idea.. I didn't >> really catch that this was the motivation from the initial patch, but >> maybe I read it too quickly. But cache can be decoupled from >> synchronization, because CPU access is not asynchronous. For >> userspace/CPU access to buffer, you should: >> >> 1) wait for buffer >> 2) prepare-access >> 3) ... do whatever cpu access to buffer ... >> 4) finish-access >> 5) submit buffer for new dma-operation >> > > > For data flow from CPU to DMA device, > 1) wait for buffer > 2) prepare-access (dma_buf_begin_cpu_access) > 3) cpu access to buffer > > > For data flow from DMA device to CPU > 1) wait for buffer Right, but CPU access isn't asynchronous (from the point of view of the CPU), so there isn't really any wait step at this point. And if you do want the CPU to be able to signal a fence from userspace for some reason, you probably what something file/fd based so the refcnting/cleanup when process dies doesn't leave some pending DMA action wedged. But I don't really see the point of that complexity when the CPU access isn't asynchronous in the first place. BR, -R > 2) finish-access (dma_buf_end _cpu_access) > 3) dma access to buffer > > 1) and 2) are coupled with one function: we have implemented > fence_helper_commit_reserve() for it. > > Cache control(cache clean or cache invalidate) is performed properly > checking previous access type and current access type. > And the below is actual codes for it, ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-14 13:38 ` Rob Clark @ 2013-05-15 5:19 ` Inki Dae 2013-05-15 14:06 ` Rob Clark 0 siblings, 1 reply; 35+ messages in thread From: Inki Dae @ 2013-05-15 5:19 UTC (permalink / raw) To: linux-arm-kernel > -----Original Message----- > From: Rob Clark [mailto:robdclark at gmail.com] > Sent: Tuesday, May 14, 2013 10:39 PM > To: Inki Dae > Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun > Cho; linux-arm-kernel at lists.infradead.org; linux-media at vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Mon, May 13, 2013 at 10:52 PM, Inki Dae <inki.dae@samsung.com> wrote: > >> well, for cache management, I think it is a better idea.. I didn't > >> really catch that this was the motivation from the initial patch, but > >> maybe I read it too quickly. But cache can be decoupled from > >> synchronization, because CPU access is not asynchronous. For > >> userspace/CPU access to buffer, you should: > >> > >> 1) wait for buffer > >> 2) prepare-access > >> 3) ... do whatever cpu access to buffer ... > >> 4) finish-access > >> 5) submit buffer for new dma-operation > >> > > > > > > For data flow from CPU to DMA device, > > 1) wait for buffer > > 2) prepare-access (dma_buf_begin_cpu_access) > > 3) cpu access to buffer > > > > > > For data flow from DMA device to CPU > > 1) wait for buffer > > Right, but CPU access isn't asynchronous (from the point of view of > the CPU), so there isn't really any wait step at this point. And if > you do want the CPU to be able to signal a fence from userspace for > some reason, you probably what something file/fd based so the > refcnting/cleanup when process dies doesn't leave some pending DMA > action wedged. But I don't really see the point of that complexity > when the CPU access isn't asynchronous in the first place. > There was my missing comments, please see the below sequence. For data flow from CPU to DMA device and then from DMA device to CPU, 1) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) - including prepare-access (dma_buf_begin_cpu_access) 2) cpu access to buffer 3) wait for buffer <- at device driver - but CPU is already accessing the buffer so blocked. 4) signal <- at user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...) 5) the thread, blocked at 3), is waked up by 4). - and then finish-access (dma_buf_end_cpu_access) 6) dma access to buffer 7) wait for buffer <- at user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) - but DMA is already accessing the buffer so blocked. 8) signal <- at device driver 9) the thread, blocked@7), is waked up by 8) - and then prepare-access (dma_buf_begin_cpu_access) 10 cpu access to buffer Basically, 'wait for buffer' includes buffer synchronization, committing processing, and cache operation. The buffer synchronization means that a current thread should wait for other threads accessing a shared buffer until the completion of their access. And the committing processing means that a current thread possesses the shared buffer so any trying to access the shared buffer by another thread makes the thread to be blocked. However, as I already mentioned before, it seems that these user interfaces are so ugly yet. So we need better way. Give me more comments if there is my missing point :) Thanks, Inki Dae > BR, > -R > > > > 2) finish-access (dma_buf_end _cpu_access) > > 3) dma access to buffer > > > > 1) and 2) are coupled with one function: we have implemented > > fence_helper_commit_reserve() for it. > > > > Cache control(cache clean or cache invalidate) is performed properly > > checking previous access type and current access type. > > And the below is actual codes for it, ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-15 5:19 ` Inki Dae @ 2013-05-15 14:06 ` Rob Clark 2013-05-17 4:19 ` Daniel Vetter [not found] ` <CAAQKjZP37koEPob6yqpn-WxxTh3+O=twyvRzDiEhVJTD8BxQzw@mail.gmail.com> 0 siblings, 2 replies; 35+ messages in thread From: Rob Clark @ 2013-05-15 14:06 UTC (permalink / raw) To: linux-arm-kernel On Wed, May 15, 2013 at 1:19 AM, Inki Dae <inki.dae@samsung.com> wrote: > > >> -----Original Message----- >> From: Rob Clark [mailto:robdclark at gmail.com] >> Sent: Tuesday, May 14, 2013 10:39 PM >> To: Inki Dae >> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun >> Cho; linux-arm-kernel at lists.infradead.org; linux-media at vger.kernel.org >> Subject: Re: Introduce a new helper framework for buffer synchronization >> >> On Mon, May 13, 2013 at 10:52 PM, Inki Dae <inki.dae@samsung.com> wrote: >> >> well, for cache management, I think it is a better idea.. I didn't >> >> really catch that this was the motivation from the initial patch, but >> >> maybe I read it too quickly. But cache can be decoupled from >> >> synchronization, because CPU access is not asynchronous. For >> >> userspace/CPU access to buffer, you should: >> >> >> >> 1) wait for buffer >> >> 2) prepare-access >> >> 3) ... do whatever cpu access to buffer ... >> >> 4) finish-access >> >> 5) submit buffer for new dma-operation >> >> >> > >> > >> > For data flow from CPU to DMA device, >> > 1) wait for buffer >> > 2) prepare-access (dma_buf_begin_cpu_access) >> > 3) cpu access to buffer >> > >> > >> > For data flow from DMA device to CPU >> > 1) wait for buffer >> >> Right, but CPU access isn't asynchronous (from the point of view of >> the CPU), so there isn't really any wait step at this point. And if >> you do want the CPU to be able to signal a fence from userspace for >> some reason, you probably what something file/fd based so the >> refcnting/cleanup when process dies doesn't leave some pending DMA >> action wedged. But I don't really see the point of that complexity >> when the CPU access isn't asynchronous in the first place. >> > > There was my missing comments, please see the below sequence. > > For data flow from CPU to DMA device and then from DMA device to CPU, > 1) wait for buffer <-@user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) > - including prepare-access (dma_buf_begin_cpu_access) > 2) cpu access to buffer > 3) wait for buffer <-@device driver > - but CPU is already accessing the buffer so blocked. > 4) signal <-@user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...) > 5) the thread, blocked at 3), is waked up by 4). > - and then finish-access (dma_buf_end_cpu_access) right, I understand you can have background threads, etc, in userspace. But there are already plenty of synchronization primitives that can be used for cpu->cpu synchronization, either within the same process or between multiple processes. For cpu access, even if it is handled by background threads/processes, I think it is better to use the traditional pthreads or unix synchronization primitives. They have existed forever, they are well tested, and they work. So while it seems nice and orthogonal/clean to couple cache and synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the same generic way, but I think in practice we have to make things more complex than they otherwise need to be to do this. Otherwise I think we'll be having problems with badly behaved or crashing userspace. BR, -R > 6) dma access to buffer > 7) wait for buffer <-@user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) > - but DMA is already accessing the buffer so blocked. > 8) signal <-@device driver > 9) the thread, blocked at 7), is waked up by 8) > - and then prepare-access (dma_buf_begin_cpu_access) > 10 cpu access to buffer > > Basically, 'wait for buffer' includes buffer synchronization, committing > processing, and cache operation. The buffer synchronization means that a > current thread should wait for other threads accessing a shared buffer until > the completion of their access. And the committing processing means that a > current thread possesses the shared buffer so any trying to access the > shared buffer by another thread makes the thread to be blocked. However, as > I already mentioned before, it seems that these user interfaces are so ugly > yet. So we need better way. > > Give me more comments if there is my missing point :) > > Thanks, > Inki Dae > >> BR, >> -R >> >> >> > 2) finish-access (dma_buf_end _cpu_access) >> > 3) dma access to buffer >> > >> > 1) and 2) are coupled with one function: we have implemented >> > fence_helper_commit_reserve() for it. >> > >> > Cache control(cache clean or cache invalidate) is performed properly >> > checking previous access type and current access type. >> > And the below is actual codes for it, > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-15 14:06 ` Rob Clark @ 2013-05-17 4:19 ` Daniel Vetter [not found] ` <CAAQKjZP37koEPob6yqpn-WxxTh3+O=twyvRzDiEhVJTD8BxQzw@mail.gmail.com> 1 sibling, 0 replies; 35+ messages in thread From: Daniel Vetter @ 2013-05-17 4:19 UTC (permalink / raw) To: linux-arm-kernel On Wed, May 15, 2013 at 4:06 PM, Rob Clark <robdclark@gmail.com> wrote: > So while it seems nice and orthogonal/clean to couple cache and > synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the > same generic way, but I think in practice we have to make things more > complex than they otherwise need to be to do this. Otherwise I think > we'll be having problems with badly behaved or crashing userspace. I haven't read through the entire thread careful but imo this is very important. If we add a fence interface which allows userspace to block dma this is a no-go. The only thing we need is to sync up with all outstanding dma operations and flush caches for cpu access. If broken userspace starts to issue new dma (or multiple thread stomp onto each another) that's not a problem dma fences/syncpoints should try to solve. This way we can concentrate on solving the (already challenging) device-to-device sync issues without additional complexities which cpu->cpu sync would impose. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <CAAQKjZP37koEPob6yqpn-WxxTh3+O=twyvRzDiEhVJTD8BxQzw@mail.gmail.com>]
* Introduce a new helper framework for buffer synchronization [not found] ` <CAAQKjZP37koEPob6yqpn-WxxTh3+O=twyvRzDiEhVJTD8BxQzw@mail.gmail.com> @ 2013-05-20 21:13 ` Daniel Vetter 2013-05-20 21:30 ` Daniel Vetter 0 siblings, 1 reply; 35+ messages in thread From: Daniel Vetter @ 2013-05-20 21:13 UTC (permalink / raw) To: linux-arm-kernel On Sat, May 18, 2013 at 03:47:43PM +0900, Inki Dae wrote: > 2013/5/15 Rob Clark <robdclark@gmail.com> > > > On Wed, May 15, 2013 at 1:19 AM, Inki Dae <inki.dae@samsung.com> wrote: > > > > > > > > >> -----Original Message----- > > >> From: Rob Clark [mailto:robdclark at gmail.com] > > >> Sent: Tuesday, May 14, 2013 10:39 PM > > >> To: Inki Dae > > >> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun > > >> Cho; linux-arm-kernel at lists.infradead.org; linux-media at vger.kernel.org > > >> Subject: Re: Introduce a new helper framework for buffer synchronization > > >> > > >> On Mon, May 13, 2013 at 10:52 PM, Inki Dae <inki.dae@samsung.com> > > wrote: > > >> >> well, for cache management, I think it is a better idea.. I didn't > > >> >> really catch that this was the motivation from the initial patch, but > > >> >> maybe I read it too quickly. But cache can be decoupled from > > >> >> synchronization, because CPU access is not asynchronous. For > > >> >> userspace/CPU access to buffer, you should: > > >> >> > > >> >> 1) wait for buffer > > >> >> 2) prepare-access > > >> >> 3) ... do whatever cpu access to buffer ... > > >> >> 4) finish-access > > >> >> 5) submit buffer for new dma-operation > > >> >> > > >> > > > >> > > > >> > For data flow from CPU to DMA device, > > >> > 1) wait for buffer > > >> > 2) prepare-access (dma_buf_begin_cpu_access) > > >> > 3) cpu access to buffer > > >> > > > >> > > > >> > For data flow from DMA device to CPU > > >> > 1) wait for buffer > > >> > > >> Right, but CPU access isn't asynchronous (from the point of view of > > >> the CPU), so there isn't really any wait step at this point. And if > > >> you do want the CPU to be able to signal a fence from userspace for > > >> some reason, you probably what something file/fd based so the > > >> refcnting/cleanup when process dies doesn't leave some pending DMA > > >> action wedged. But I don't really see the point of that complexity > > >> when the CPU access isn't asynchronous in the first place. > > >> > > > > > > There was my missing comments, please see the below sequence. > > > > > > For data flow from CPU to DMA device and then from DMA device to CPU, > > > 1) wait for buffer <-@user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) > > > - including prepare-access (dma_buf_begin_cpu_access) > > > 2) cpu access to buffer > > > 3) wait for buffer <-@device driver > > > - but CPU is already accessing the buffer so blocked. > > > 4) signal <-@user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...) > > > 5) the thread, blocked at 3), is waked up by 4). > > > - and then finish-access (dma_buf_end_cpu_access) > > > > right, I understand you can have background threads, etc, in > > userspace. But there are already plenty of synchronization primitives > > that can be used for cpu->cpu synchronization, either within the same > > process or between multiple processes. For cpu access, even if it is > > handled by background threads/processes, I think it is better to use > > the traditional pthreads or unix synchronization primitives. They > > have existed forever, they are well tested, and they work. > > > > So while it seems nice and orthogonal/clean to couple cache and > > synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the > > same generic way, but I think in practice we have to make things more > > complex than they otherwise need to be to do this. Otherwise I think > > we'll be having problems with badly behaved or crashing userspace. > > > > > Right, this is very important. I think it's not esay to solve this > issue. Aand at least for ARM based embedded system, such feature is useful > to us; coupling cache operation and buffer synchronization. I'd like to > collect other opinions and advices to solve this issue. Maybe we have a bit a misunderstanding here. The kernel really should take care of sync and cache coherency, and I agree that with the current dma_buf code (and the proposed fences) that part is still a bit hazy. But the kernel should not allow userspace to block access to a buffer until userspace is done. It should only sync with any oustanding fences and flush buffers before that userspace access happens. Then the kernel would again flush caches on the next dma access (which hopefully is only started once userspace completed access). Atm this isn't possible in an efficient way since the dma_buf api only exposes map/unmap attachment and not a function to just sync an existing mapping like dma_sync_single_for_device. I guess we should add a dma_buf_sync_attachment interface for that - we already have range-based cpu sync support with the begin/end_cpu_access interfaces. Yours, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-20 21:13 ` Daniel Vetter @ 2013-05-20 21:30 ` Daniel Vetter 2013-05-21 7:03 ` Inki Dae 0 siblings, 1 reply; 35+ messages in thread From: Daniel Vetter @ 2013-05-20 21:30 UTC (permalink / raw) To: linux-arm-kernel On Mon, May 20, 2013 at 11:13:04PM +0200, Daniel Vetter wrote: > On Sat, May 18, 2013 at 03:47:43PM +0900, Inki Dae wrote: > > 2013/5/15 Rob Clark <robdclark@gmail.com> > > > > > On Wed, May 15, 2013 at 1:19 AM, Inki Dae <inki.dae@samsung.com> wrote: > > > > > > > > > > > >> -----Original Message----- > > > >> From: Rob Clark [mailto:robdclark at gmail.com] > > > >> Sent: Tuesday, May 14, 2013 10:39 PM > > > >> To: Inki Dae > > > >> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; YoungJun > > > >> Cho; linux-arm-kernel at lists.infradead.org; linux-media at vger.kernel.org > > > >> Subject: Re: Introduce a new helper framework for buffer synchronization > > > >> > > > >> On Mon, May 13, 2013 at 10:52 PM, Inki Dae <inki.dae@samsung.com> > > > wrote: > > > >> >> well, for cache management, I think it is a better idea.. I didn't > > > >> >> really catch that this was the motivation from the initial patch, but > > > >> >> maybe I read it too quickly. But cache can be decoupled from > > > >> >> synchronization, because CPU access is not asynchronous. For > > > >> >> userspace/CPU access to buffer, you should: > > > >> >> > > > >> >> 1) wait for buffer > > > >> >> 2) prepare-access > > > >> >> 3) ... do whatever cpu access to buffer ... > > > >> >> 4) finish-access > > > >> >> 5) submit buffer for new dma-operation > > > >> >> > > > >> > > > > >> > > > > >> > For data flow from CPU to DMA device, > > > >> > 1) wait for buffer > > > >> > 2) prepare-access (dma_buf_begin_cpu_access) > > > >> > 3) cpu access to buffer > > > >> > > > > >> > > > > >> > For data flow from DMA device to CPU > > > >> > 1) wait for buffer > > > >> > > > >> Right, but CPU access isn't asynchronous (from the point of view of > > > >> the CPU), so there isn't really any wait step at this point. And if > > > >> you do want the CPU to be able to signal a fence from userspace for > > > >> some reason, you probably what something file/fd based so the > > > >> refcnting/cleanup when process dies doesn't leave some pending DMA > > > >> action wedged. But I don't really see the point of that complexity > > > >> when the CPU access isn't asynchronous in the first place. > > > >> > > > > > > > > There was my missing comments, please see the below sequence. > > > > > > > > For data flow from CPU to DMA device and then from DMA device to CPU, > > > > 1) wait for buffer <-@user side - ioctl(fd, DMA_BUF_GET_FENCE, ...) > > > > - including prepare-access (dma_buf_begin_cpu_access) > > > > 2) cpu access to buffer > > > > 3) wait for buffer <-@device driver > > > > - but CPU is already accessing the buffer so blocked. > > > > 4) signal <-@user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...) > > > > 5) the thread, blocked at 3), is waked up by 4). > > > > - and then finish-access (dma_buf_end_cpu_access) > > > > > > right, I understand you can have background threads, etc, in > > > userspace. But there are already plenty of synchronization primitives > > > that can be used for cpu->cpu synchronization, either within the same > > > process or between multiple processes. For cpu access, even if it is > > > handled by background threads/processes, I think it is better to use > > > the traditional pthreads or unix synchronization primitives. They > > > have existed forever, they are well tested, and they work. > > > > > > So while it seems nice and orthogonal/clean to couple cache and > > > synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the > > > same generic way, but I think in practice we have to make things more > > > complex than they otherwise need to be to do this. Otherwise I think > > > we'll be having problems with badly behaved or crashing userspace. > > > > > > > > Right, this is very important. I think it's not esay to solve this > > issue. Aand at least for ARM based embedded system, such feature is useful > > to us; coupling cache operation and buffer synchronization. I'd like to > > collect other opinions and advices to solve this issue. > > Maybe we have a bit a misunderstanding here. The kernel really should take > care of sync and cache coherency, and I agree that with the current > dma_buf code (and the proposed fences) that part is still a bit hazy. But > the kernel should not allow userspace to block access to a buffer until > userspace is done. It should only sync with any oustanding fences and > flush buffers before that userspace access happens. > > Then the kernel would again flush caches on the next dma access (which > hopefully is only started once userspace completed access). Atm this isn't > possible in an efficient way since the dma_buf api only exposes map/unmap > attachment and not a function to just sync an existing mapping like > dma_sync_single_for_device. I guess we should add a > dma_buf_sync_attachment interface for that - we already have range-based > cpu sync support with the begin/end_cpu_access interfaces. I think my mail here was a bit unclear, so let me try to rephrase. Re-reading through part of this discussion I think you raise some good shortcomings of the current dma_buf interfaces and the proposed fence patches. But I think we should address the individual pieces bit-by-bit. On a quick survey I see a few parts to what you're trying to solve: - More efficient cache coherency management. I think we already have all required interfaces for efficient cpu-side access with begin/end_cpu_access. So I think we only need a device-side dma sync mechanism to be able to flush cpu caches after userspace/cpu access has completed (before the next dma op). - More efficient mmap handling. The current dma_buf mmap support is explicitly designed as something simply, but probably dead-slow for last-resort fallback ops. I'm not sure whether we should fix this up and extend with special ioctls. But the current coherency interface should be good enough for you to write an efficient private mmap implementation for exynos drm. - Integration of fence syncing into dma_buf. Imo we should have a per-attachment mode which decides whether map/unmap (and the new sync) should wait for fences or whether the driver takes care of syncing through the new fence framework itself (for direct hw sync). Imo cpu access should also have such a flag. I guess in both cases we should sync by default. - cpu/cpu sync additions to dma_buf. Like I've said, I'm not sold at all on this idea. I think it would be best if we try to fix up all the other current shortcomings first though and then take a fresh look at this issue here. Have I missed or misunderstood something? Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-20 21:30 ` Daniel Vetter @ 2013-05-21 7:03 ` Inki Dae 2013-05-21 7:44 ` Daniel Vetter 0 siblings, 1 reply; 35+ messages in thread From: Inki Dae @ 2013-05-21 7:03 UTC (permalink / raw) To: linux-arm-kernel > -----Original Message----- > From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel > Vetter > Sent: Tuesday, May 21, 2013 6:31 AM > To: Inki Dae > Cc: Rob Clark; linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; > YoungJun Cho; linux-arm-kernel at lists.infradead.org; linux- > media at vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Mon, May 20, 2013 at 11:13:04PM +0200, Daniel Vetter wrote: > > On Sat, May 18, 2013 at 03:47:43PM +0900, Inki Dae wrote: > > > 2013/5/15 Rob Clark <robdclark@gmail.com> > > > > > > > On Wed, May 15, 2013 at 1:19 AM, Inki Dae <inki.dae@samsung.com> > wrote: > > > > > > > > > > > > > > >> -----Original Message----- > > > > >> From: Rob Clark [mailto:robdclark at gmail.com] > > > > >> Sent: Tuesday, May 14, 2013 10:39 PM > > > > >> To: Inki Dae > > > > >> Cc: linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; > YoungJun > > > > >> Cho; linux-arm-kernel at lists.infradead.org; linux- > media at vger.kernel.org > > > > >> Subject: Re: Introduce a new helper framework for buffer > synchronization > > > > >> > > > > >> On Mon, May 13, 2013 at 10:52 PM, Inki Dae <inki.dae@samsung.com> > > > > wrote: > > > > >> >> well, for cache management, I think it is a better idea.. I > didn't > > > > >> >> really catch that this was the motivation from the initial > patch, but > > > > >> >> maybe I read it too quickly. But cache can be decoupled from > > > > >> >> synchronization, because CPU access is not asynchronous. For > > > > >> >> userspace/CPU access to buffer, you should: > > > > >> >> > > > > >> >> 1) wait for buffer > > > > >> >> 2) prepare-access > > > > >> >> 3) ... do whatever cpu access to buffer ... > > > > >> >> 4) finish-access > > > > >> >> 5) submit buffer for new dma-operation > > > > >> >> > > > > >> > > > > > >> > > > > > >> > For data flow from CPU to DMA device, > > > > >> > 1) wait for buffer > > > > >> > 2) prepare-access (dma_buf_begin_cpu_access) > > > > >> > 3) cpu access to buffer > > > > >> > > > > > >> > > > > > >> > For data flow from DMA device to CPU > > > > >> > 1) wait for buffer > > > > >> > > > > >> Right, but CPU access isn't asynchronous (from the point of view > of > > > > >> the CPU), so there isn't really any wait step at this point. And > if > > > > >> you do want the CPU to be able to signal a fence from userspace > for > > > > >> some reason, you probably what something file/fd based so the > > > > >> refcnting/cleanup when process dies doesn't leave some pending > DMA > > > > >> action wedged. But I don't really see the point of that > complexity > > > > >> when the CPU access isn't asynchronous in the first place. > > > > >> > > > > > > > > > > There was my missing comments, please see the below sequence. > > > > > > > > > > For data flow from CPU to DMA device and then from DMA device to > CPU, > > > > > 1) wait for buffer <-@user side - ioctl(fd, > DMA_BUF_GET_FENCE, ...) > > > > > - including prepare-access (dma_buf_begin_cpu_access) > > > > > 2) cpu access to buffer > > > > > 3) wait for buffer <-@device driver > > > > > - but CPU is already accessing the buffer so blocked. > > > > > 4) signal <-@user side - ioctl(fd, DMA_BUF_PUT_FENCE, ...) > > > > > 5) the thread, blocked at 3), is waked up by 4). > > > > > - and then finish-access (dma_buf_end_cpu_access) > > > > > > > > right, I understand you can have background threads, etc, in > > > > userspace. But there are already plenty of synchronization > primitives > > > > that can be used for cpu->cpu synchronization, either within the > same > > > > process or between multiple processes. For cpu access, even if it > is > > > > handled by background threads/processes, I think it is better to use > > > > the traditional pthreads or unix synchronization primitives. They > > > > have existed forever, they are well tested, and they work. > > > > > > > > So while it seems nice and orthogonal/clean to couple cache and > > > > synchronization and handle dma->cpu and cpu->cpu and cpu->dma in the > > > > same generic way, but I think in practice we have to make things > more > > > > complex than they otherwise need to be to do this. Otherwise I > think > > > > we'll be having problems with badly behaved or crashing userspace. > > > > > > > > > > > Right, this is very important. I think it's not esay to solve this > > > issue. Aand at least for ARM based embedded system, such feature is > useful > > > to us; coupling cache operation and buffer synchronization. I'd like > to > > > collect other opinions and advices to solve this issue. > > > > Maybe we have a bit a misunderstanding here. The kernel really should > take > > care of sync and cache coherency, and I agree that with the current > > dma_buf code (and the proposed fences) that part is still a bit hazy. > But > > the kernel should not allow userspace to block access to a buffer until > > userspace is done. It should only sync with any oustanding fences and > > flush buffers before that userspace access happens. > > > > Then the kernel would again flush caches on the next dma access (which > > hopefully is only started once userspace completed access). Atm this > isn't > > possible in an efficient way since the dma_buf api only exposes > map/unmap > > attachment and not a function to just sync an existing mapping like > > dma_sync_single_for_device. I guess we should add a > > dma_buf_sync_attachment interface for that - we already have range-based > > cpu sync support with the begin/end_cpu_access interfaces. > > I think my mail here was a bit unclear, so let me try to rephrase. > Re-reading through part of this discussion I think you raise some good > shortcomings of the current dma_buf interfaces and the proposed fence > patches. But I think we should address the individual pieces bit-by-bit. > On a quick survey I see a few parts to what you're trying to solve: > > - More efficient cache coherency management. I think we already have all > required interfaces for efficient cpu-side access with > begin/end_cpu_access. So I think we only need a device-side dma sync > mechanism to be able to flush cpu caches after userspace/cpu access has > completed (before the next dma op). > > - More efficient mmap handling. The current dma_buf mmap support is > explicitly designed as something simply, but probably dead-slow for > last-resort fallback ops. I'm not sure whether we should fix this up and > extend with special ioctls. But the current coherency interface should > be good enough for you to write an efficient private mmap implementation > for exynos drm. I agree with you. > > - Integration of fence syncing into dma_buf. Imo we should have a > per-attachment mode which decides whether map/unmap (and the new sync) > should wait for fences or whether the driver takes care of syncing > through the new fence framework itself (for direct hw sync). I think it's a good idea to have per-attachment mode for buffer sync. But I'd like to say we make sure what is the purpose of map (dma_buf_map_attachment)first. This interface is used to get a sgt; containing pages to physical memory region, and map that region with device's iommu table. The important thing is that this interface is called just one time when user wants to share an allocated buffer with dma. But cpu will try to access the buffer every time as long as it wants. Therefore, we need cache control every time cpu and dma access the shared buffer: cache clean when cpu goes to dma and cache invalidate when dma goes to cpu. That is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE, to dma buf framework. Of course, Those are ugly so we need a better way: I just wanted to show you that in such way, we can synchronize the shared buffer between cpu and dma. By any chance, is there any way that kernel can be aware of when cpu accesses the shared buffer or is there any point I didn't catch up? > Imo cpu access should also have such a flag. I guess in both cases we should > sync by default. > > - cpu/cpu sync additions to dma_buf. Like I've said, I'm not sold at all I think we should concentrate on cpu and dma sync. > on this idea. I think it would be best if we try to fix up all the other > current shortcomings first though and then take a fresh look at this > issue here. Right, agree. > > Have I missed or misunderstood something? > Your comments are very useful to me. Thanks again. In Linux kernel, especially embedded system, we had tried to implement generic interfaces for buffer management; how to allocate a buffer and how to share a buffer. As a result, now we have CMA (Contiguous Memory Allocator) for buffer allocation, and we have DMA-BUF for buffer sharing between cpu and dma. And then how to synchronize a buffer between cpu and dma? I think now it's best time to discuss buffer synchronization mechanism, and that is very natural. Please give me more comments and advices if there is my missing or misunderstanding point. Thanks, Inki Dae > Cheers, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-21 7:03 ` Inki Dae @ 2013-05-21 7:44 ` Daniel Vetter 2013-05-21 9:22 ` Inki Dae 0 siblings, 1 reply; 35+ messages in thread From: Daniel Vetter @ 2013-05-21 7:44 UTC (permalink / raw) To: linux-arm-kernel On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote: > > - Integration of fence syncing into dma_buf. Imo we should have a > > per-attachment mode which decides whether map/unmap (and the new sync) > > should wait for fences or whether the driver takes care of syncing > > through the new fence framework itself (for direct hw sync). > > I think it's a good idea to have per-attachment mode for buffer sync. But > I'd like to say we make sure what is the purpose of map > (dma_buf_map_attachment)first. This interface is used to get a sgt; > containing pages to physical memory region, and map that region with > device's iommu table. The important thing is that this interface is called > just one time when user wants to share an allocated buffer with dma. But cpu > will try to access the buffer every time as long as it wants. Therefore, we > need cache control every time cpu and dma access the shared buffer: cache > clean when cpu goes to dma and cache invalidate when dma goes to cpu. That > is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE, to > dma buf framework. Of course, Those are ugly so we need a better way: I just > wanted to show you that in such way, we can synchronize the shared buffer > between cpu and dma. By any chance, is there any way that kernel can be > aware of when cpu accesses the shared buffer or is there any point I didn't > catch up? Well dma_buf_map/unmap_attachment should also flush/invalidate any caches, and with the current dma_buf spec those two functions are the _only_ means you have to do so. Which strictly means that if you interleave device dma and cpu acccess you need to unmap/map every time. Which is obviously horribly inefficient, but a known gap in the dma_buf interface. Hence why I proposed to add dma_buf_sync_attachment similar to dma_sync_single_for_device: /** * dma_buf_sync_sg_attachment - sync caches for dma access * @attach: dma-buf attachment to sync * @sgt: the sg table to sync (returned by dma_buf_map_attachement) * @direction: dma direction to sync for * * This function synchronizes caches for device dma through the given * dma-buf attachment when interleaving dma from different devices and the * cpu. Other device dma needs to be synced also by calls to this * function (or a pair of dma_buf_map/unmap_attachment calls), cpu access * needs to be synced with dma_buf_begin/end_cpu_access. */ void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach, struct sg_table *sgt, enum dma_data_direction direction) Note that "sync" here only means to synchronize caches, not wait for any outstanding fences. This is simply to be consistent with the established lingo of the dma api. How the dma-buf fences fit into this is imo a different topic, but my idea is that all the cache coherency barriers (i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and dma_buf_begin/end_cpu_access) would automatically block for any attached fences (i.e. block for write fences when doing read-only access, block for all fences otherwise). Then we could do a new dma_buf_attach_flags interface for special cases (might also be useful for other things, similar to the recently added flags in the dma api for wc/no-kernel-mapping/...). A new flag like DMA_BUF_ATTACH_NO_AUTOFENCE would then mean that the driver takes care of all fencing for this attachment and the dma-buf functions should not do the automagic fence blocking. > In Linux kernel, especially embedded system, we had tried to implement > generic interfaces for buffer management; how to allocate a buffer and how > to share a buffer. As a result, now we have CMA (Contiguous Memory > Allocator) for buffer allocation, and we have DMA-BUF for buffer sharing > between cpu and dma. And then how to synchronize a buffer between cpu and > dma? I think now it's best time to discuss buffer synchronization mechanism, > and that is very natural. I think it's important to differentiate between the two meanings of sync: - synchronize caches (i.e. cpu cache flushing in most cases) - and synchronize in-flight dma with fences. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-21 7:44 ` Daniel Vetter @ 2013-05-21 9:22 ` Inki Dae 2013-05-23 11:55 ` Daniel Vetter 0 siblings, 1 reply; 35+ messages in thread From: Inki Dae @ 2013-05-21 9:22 UTC (permalink / raw) To: linux-arm-kernel > -----Original Message----- > From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel > Vetter > Sent: Tuesday, May 21, 2013 4:45 PM > To: Inki Dae > Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'DRI mailing list'; > 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm- > kernel at lists.infradead.org; linux-media at vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote: > > > - Integration of fence syncing into dma_buf. Imo we should have a > > > per-attachment mode which decides whether map/unmap (and the new > sync) > > > should wait for fences or whether the driver takes care of syncing > > > through the new fence framework itself (for direct hw sync). > > > > I think it's a good idea to have per-attachment mode for buffer sync. > But > > I'd like to say we make sure what is the purpose of map > > (dma_buf_map_attachment)first. This interface is used to get a sgt; > > containing pages to physical memory region, and map that region with > > device's iommu table. The important thing is that this interface is > called > > just one time when user wants to share an allocated buffer with dma. But > cpu > > will try to access the buffer every time as long as it wants. Therefore, > we > > need cache control every time cpu and dma access the shared buffer: > cache > > clean when cpu goes to dma and cache invalidate when dma goes to cpu. > That > > is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE, > to > > dma buf framework. Of course, Those are ugly so we need a better way: I > just > > wanted to show you that in such way, we can synchronize the shared > buffer > > between cpu and dma. By any chance, is there any way that kernel can be > > aware of when cpu accesses the shared buffer or is there any point I > didn't > > catch up? > > Well dma_buf_map/unmap_attachment should also flush/invalidate any caches, > and with the current dma_buf spec those two functions are the _only_ means I know that dma buf exporter should make sure that cache clean/invalidate are done when dma_buf_map/unmap_attachement is called. For this, already we do so. However, this function is called when dma buf import is requested by user to map a dmabuf fd with dma. This means that dma_buf_map_attachement() is called ONCE when user wants to share the dmabuf fd with dma. Actually, in case of exynos drm, dma_map_sg_attrs(), performing cache operation internally, is called when dmabuf import is requested by user. > you have to do so. Which strictly means that if you interleave device dma > and cpu acccess you need to unmap/map every time. > > Which is obviously horribly inefficient, but a known gap in the dma_buf Right, and also this has big overhead. > interface. Hence why I proposed to add dma_buf_sync_attachment similar to > dma_sync_single_for_device: > > /** > * dma_buf_sync_sg_attachment - sync caches for dma access > * @attach: dma-buf attachment to sync > * @sgt: the sg table to sync (returned by dma_buf_map_attachement) > * @direction: dma direction to sync for > * > * This function synchronizes caches for device dma through the given > * dma-buf attachment when interleaving dma from different devices and the > * cpu. Other device dma needs to be synced also by calls to this > * function (or a pair of dma_buf_map/unmap_attachment calls), cpu access > * needs to be synced with dma_buf_begin/end_cpu_access. > */ > void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach, > struct sg_table *sgt, > enum dma_data_direction direction) > > Note that "sync" here only means to synchronize caches, not wait for any > outstanding fences. This is simply to be consistent with the established > lingo of the dma api. How the dma-buf fences fit into this is imo a > different topic, but my idea is that all the cache coherency barriers > (i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and > dma_buf_begin/end_cpu_access) would automatically block for any attached > fences (i.e. block for write fences when doing read-only access, block for > all fences otherwise). As I mentioned already, kernel can't aware of when cpu accesses a shared buffer: user can access a shared buffer after mmap anytime and the shared buffer should be synchronized between cpu and dma. Therefore, the above cache coherency barriers should be called every time cpu and dma tries to access a shared buffer, checking before and after of cpu and dma access. And exactly, the proposed way do such thing. For this, you can refer to the below link, http://www.mail-archive.com/linux-media at vger.kernel.org/msg62124.html My point is that how kernel can aware of when those cache coherency barriers should be called to synchronize caches and buffer access between cpu and dma. > > Then we could do a new dma_buf_attach_flags interface for special cases > (might also be useful for other things, similar to the recently added > flags in the dma api for wc/no-kernel-mapping/...). A new flag like > DMA_BUF_ATTACH_NO_AUTOFENCE would then mean that the driver takes care of > all > fencing for this attachment and the dma-buf functions should not do the > automagic fence blocking. > > > In Linux kernel, especially embedded system, we had tried to implement > > generic interfaces for buffer management; how to allocate a buffer and > how > > to share a buffer. As a result, now we have CMA (Contiguous Memory > > Allocator) for buffer allocation, and we have DMA-BUF for buffer sharing > > between cpu and dma. And then how to synchronize a buffer between cpu > and > > dma? I think now it's best time to discuss buffer synchronization > mechanism, > > and that is very natural. > > I think it's important to differentiate between the two meanings of sync: > - synchronize caches (i.e. cpu cache flushing in most cases) > - and synchronize in-flight dma with fences. > Agree, and I meant that the buffer synchronization mechanism includes the above two things. And I think the two things should be addressed together. Thanks, Inki Dae > Cheers, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-21 9:22 ` Inki Dae @ 2013-05-23 11:55 ` Daniel Vetter 2013-05-23 13:37 ` Inki Dae 0 siblings, 1 reply; 35+ messages in thread From: Daniel Vetter @ 2013-05-23 11:55 UTC (permalink / raw) To: linux-arm-kernel On Tue, May 21, 2013 at 11:22 AM, Inki Dae <inki.dae@samsung.com> wrote: >> -----Original Message----- >> From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel >> Vetter >> Sent: Tuesday, May 21, 2013 4:45 PM >> To: Inki Dae >> Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'DRI mailing list'; >> 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm- >> kernel at lists.infradead.org; linux-media at vger.kernel.org >> Subject: Re: Introduce a new helper framework for buffer synchronization >> >> On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote: >> > > - Integration of fence syncing into dma_buf. Imo we should have a >> > > per-attachment mode which decides whether map/unmap (and the new >> sync) >> > > should wait for fences or whether the driver takes care of syncing >> > > through the new fence framework itself (for direct hw sync). >> > >> > I think it's a good idea to have per-attachment mode for buffer sync. >> But >> > I'd like to say we make sure what is the purpose of map >> > (dma_buf_map_attachment)first. This interface is used to get a sgt; >> > containing pages to physical memory region, and map that region with >> > device's iommu table. The important thing is that this interface is >> called >> > just one time when user wants to share an allocated buffer with dma. But >> cpu >> > will try to access the buffer every time as long as it wants. Therefore, >> we >> > need cache control every time cpu and dma access the shared buffer: >> cache >> > clean when cpu goes to dma and cache invalidate when dma goes to cpu. >> That >> > is why I added new interfaces, DMA_BUF_GET_FENCE and DMA_BUF_PUT_FENCE, >> to >> > dma buf framework. Of course, Those are ugly so we need a better way: I >> just >> > wanted to show you that in such way, we can synchronize the shared >> buffer >> > between cpu and dma. By any chance, is there any way that kernel can be >> > aware of when cpu accesses the shared buffer or is there any point I >> didn't >> > catch up? >> >> Well dma_buf_map/unmap_attachment should also flush/invalidate any caches, >> and with the current dma_buf spec those two functions are the _only_ means > > I know that dma buf exporter should make sure that cache clean/invalidate > are done when dma_buf_map/unmap_attachement is called. For this, already we > do so. However, this function is called when dma buf import is requested by > user to map a dmabuf fd with dma. This means that dma_buf_map_attachement() > is called ONCE when user wants to share the dmabuf fd with dma. Actually, in > case of exynos drm, dma_map_sg_attrs(), performing cache operation > internally, is called when dmabuf import is requested by user. > >> you have to do so. Which strictly means that if you interleave device dma >> and cpu acccess you need to unmap/map every time. >> >> Which is obviously horribly inefficient, but a known gap in the dma_buf > > Right, and also this has big overhead. > >> interface. Hence why I proposed to add dma_buf_sync_attachment similar to >> dma_sync_single_for_device: >> >> /** >> * dma_buf_sync_sg_attachment - sync caches for dma access >> * @attach: dma-buf attachment to sync >> * @sgt: the sg table to sync (returned by dma_buf_map_attachement) >> * @direction: dma direction to sync for >> * >> * This function synchronizes caches for device dma through the given >> * dma-buf attachment when interleaving dma from different devices and the >> * cpu. Other device dma needs to be synced also by calls to this >> * function (or a pair of dma_buf_map/unmap_attachment calls), cpu access >> * needs to be synced with dma_buf_begin/end_cpu_access. >> */ >> void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach, >> struct sg_table *sgt, >> enum dma_data_direction direction) >> >> Note that "sync" here only means to synchronize caches, not wait for any >> outstanding fences. This is simply to be consistent with the established >> lingo of the dma api. How the dma-buf fences fit into this is imo a >> different topic, but my idea is that all the cache coherency barriers >> (i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and >> dma_buf_begin/end_cpu_access) would automatically block for any attached >> fences (i.e. block for write fences when doing read-only access, block for >> all fences otherwise). > > As I mentioned already, kernel can't aware of when cpu accesses a shared > buffer: user can access a shared buffer after mmap anytime and the shared > buffer should be synchronized between cpu and dma. Therefore, the above > cache coherency barriers should be called every time cpu and dma tries to > access a shared buffer, checking before and after of cpu and dma access. And > exactly, the proposed way do such thing. For this, you can refer to the > below link, > > http://www.mail-archive.com/linux-media at vger.kernel.org/msg62124.html > > My point is that how kernel can aware of when those cache coherency barriers > should be called to synchronize caches and buffer access between cpu and > dma. > >> >> Then we could do a new dma_buf_attach_flags interface for special cases >> (might also be useful for other things, similar to the recently added >> flags in the dma api for wc/no-kernel-mapping/...). A new flag like >> DMA_BUF_ATTACH_NO_AUTOFENCE would then mean that the driver takes care of >> all >> fencing for this attachment and the dma-buf functions should not do the >> automagic fence blocking. >> >> > In Linux kernel, especially embedded system, we had tried to implement >> > generic interfaces for buffer management; how to allocate a buffer and >> how >> > to share a buffer. As a result, now we have CMA (Contiguous Memory >> > Allocator) for buffer allocation, and we have DMA-BUF for buffer sharing >> > between cpu and dma. And then how to synchronize a buffer between cpu >> and >> > dma? I think now it's best time to discuss buffer synchronization >> mechanism, >> > and that is very natural. >> >> I think it's important to differentiate between the two meanings of sync: >> - synchronize caches (i.e. cpu cache flushing in most cases) >> - and synchronize in-flight dma with fences. >> > > Agree, and I meant that the buffer synchronization mechanism includes the > above two things. And I think the two things should be addressed together. I'm still confused about what you're trying to achive in the big picture here exactly. I think the key point is your statement above that the kernel can't know when the cpu access a dma-buf. That's not true though: For userspace mmaps the kernel can shoot down ptes and so prevent userspace from accessing a buffer. Since that's pretty inefficient gem/ttm have additional ioctls for cache coherency transitions. dma_buf currently has no support for explicit coherency transitions from userpsace, so if pte shootdown is an issue we need to improve that. If I read your proposal correctly you instead suggest to _alway_ flush cache before/after each dma. That will _completely_ kill performance (been there with i915, trust me). For cpu access from kernel space we already have explicit mappings all over the place (kmap&friends), so I don't see any issues with inserting the relevant begin/end_cpu_access calls just around the cpu access. The big exception is the framebuffer layer, but even that has the deferred io support. KMS otoh has a nice frontbuffer dirty interface to correctly support dumb clients, unfortunately fbcon does not (yet) use that. For these reasons I don't see a need to smash cache coherency and fencing into one problem, and hence why I've proposed to tackle each of them individually. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-23 11:55 ` Daniel Vetter @ 2013-05-23 13:37 ` Inki Dae 2013-05-27 10:38 ` Inki Dae 0 siblings, 1 reply; 35+ messages in thread From: Inki Dae @ 2013-05-23 13:37 UTC (permalink / raw) To: linux-arm-kernel > -----Original Message----- > From: daniel.vetter at ffwll.ch [mailto:daniel.vetter at ffwll.ch] On Behalf Of > Daniel Vetter > Sent: Thursday, May 23, 2013 8:56 PM > To: Inki Dae > Cc: Rob Clark; linux-fbdev; DRI mailing list; Kyungmin Park; myungjoo.ham; > YoungJun Cho; linux-arm-kernel at lists.infradead.org; linux- > media at vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Tue, May 21, 2013 at 11:22 AM, Inki Dae <inki.dae@samsung.com> wrote: > >> -----Original Message----- > >> From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel > >> Vetter > >> Sent: Tuesday, May 21, 2013 4:45 PM > >> To: Inki Dae > >> Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'DRI mailing list'; > >> 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho'; linux-arm- > >> kernel at lists.infradead.org; linux-media at vger.kernel.org > >> Subject: Re: Introduce a new helper framework for buffer > synchronization > >> > >> On Tue, May 21, 2013 at 04:03:06PM +0900, Inki Dae wrote: > >> > > - Integration of fence syncing into dma_buf. Imo we should have a > >> > > per-attachment mode which decides whether map/unmap (and the new > >> sync) > >> > > should wait for fences or whether the driver takes care of > syncing > >> > > through the new fence framework itself (for direct hw sync). > >> > > >> > I think it's a good idea to have per-attachment mode for buffer sync. > >> But > >> > I'd like to say we make sure what is the purpose of map > >> > (dma_buf_map_attachment)first. This interface is used to get a sgt; > >> > containing pages to physical memory region, and map that region with > >> > device's iommu table. The important thing is that this interface is > >> called > >> > just one time when user wants to share an allocated buffer with dma. > But > >> cpu > >> > will try to access the buffer every time as long as it wants. > Therefore, > >> we > >> > need cache control every time cpu and dma access the shared buffer: > >> cache > >> > clean when cpu goes to dma and cache invalidate when dma goes to cpu. > >> That > >> > is why I added new interfaces, DMA_BUF_GET_FENCE and > DMA_BUF_PUT_FENCE, > >> to > >> > dma buf framework. Of course, Those are ugly so we need a better way: > I > >> just > >> > wanted to show you that in such way, we can synchronize the shared > >> buffer > >> > between cpu and dma. By any chance, is there any way that kernel can > be > >> > aware of when cpu accesses the shared buffer or is there any point I > >> didn't > >> > catch up? > >> > >> Well dma_buf_map/unmap_attachment should also flush/invalidate any > caches, > >> and with the current dma_buf spec those two functions are the _only_ > means > > > > I know that dma buf exporter should make sure that cache > clean/invalidate > > are done when dma_buf_map/unmap_attachement is called. For this, already > we > > do so. However, this function is called when dma buf import is requested > by > > user to map a dmabuf fd with dma. This means that > dma_buf_map_attachement() > > is called ONCE when user wants to share the dmabuf fd with dma. Actually, > in > > case of exynos drm, dma_map_sg_attrs(), performing cache operation > > internally, is called when dmabuf import is requested by user. > > > >> you have to do so. Which strictly means that if you interleave device > dma > >> and cpu acccess you need to unmap/map every time. > >> > >> Which is obviously horribly inefficient, but a known gap in the dma_buf > > > > Right, and also this has big overhead. > > > >> interface. Hence why I proposed to add dma_buf_sync_attachment similar > to > >> dma_sync_single_for_device: > >> > >> /** > >> * dma_buf_sync_sg_attachment - sync caches for dma access > >> * @attach: dma-buf attachment to sync > >> * @sgt: the sg table to sync (returned by dma_buf_map_attachement) > >> * @direction: dma direction to sync for > >> * > >> * This function synchronizes caches for device dma through the given > >> * dma-buf attachment when interleaving dma from different devices and > the > >> * cpu. Other device dma needs to be synced also by calls to this > >> * function (or a pair of dma_buf_map/unmap_attachment calls), cpu > access > >> * needs to be synced with dma_buf_begin/end_cpu_access. > >> */ > >> void dma_buf_sync_sg_attachment(struct dma_buf_attachment *attach, > >> struct sg_table *sgt, > >> enum dma_data_direction direction) > >> > >> Note that "sync" here only means to synchronize caches, not wait for > any > >> outstanding fences. This is simply to be consistent with the > established > >> lingo of the dma api. How the dma-buf fences fit into this is imo a > >> different topic, but my idea is that all the cache coherency barriers > >> (i.e. dma_buf_map/unmap_attachment, dma_buf_sync_sg_attachment and > >> dma_buf_begin/end_cpu_access) would automatically block for any > attached > >> fences (i.e. block for write fences when doing read-only access, block > for > >> all fences otherwise). > > > > As I mentioned already, kernel can't aware of when cpu accesses a shared > > buffer: user can access a shared buffer after mmap anytime and the > shared > > buffer should be synchronized between cpu and dma. Therefore, the above > > cache coherency barriers should be called every time cpu and dma tries > to > > access a shared buffer, checking before and after of cpu and dma access. > And > > exactly, the proposed way do such thing. For this, you can refer to the > > below link, > > > > http://www.mail-archive.com/linux-media at vger.kernel.org/msg62124.html > > > > My point is that how kernel can aware of when those cache coherency > barriers > > should be called to synchronize caches and buffer access between cpu and > > dma. > > > >> > >> Then we could do a new dma_buf_attach_flags interface for special cases > >> (might also be useful for other things, similar to the recently added > >> flags in the dma api for wc/no-kernel-mapping/...). A new flag like > >> DMA_BUF_ATTACH_NO_AUTOFENCE would then mean that the driver takes care > of > >> all > >> fencing for this attachment and the dma-buf functions should not do the > >> automagic fence blocking. > >> > >> > In Linux kernel, especially embedded system, we had tried to > implement > >> > generic interfaces for buffer management; how to allocate a buffer > and > >> how > >> > to share a buffer. As a result, now we have CMA (Contiguous Memory > >> > Allocator) for buffer allocation, and we have DMA-BUF for buffer > sharing > >> > between cpu and dma. And then how to synchronize a buffer between cpu > >> and > >> > dma? I think now it's best time to discuss buffer synchronization > >> mechanism, > >> > and that is very natural. > >> > >> I think it's important to differentiate between the two meanings of > sync: > >> - synchronize caches (i.e. cpu cache flushing in most cases) > >> - and synchronize in-flight dma with fences. > >> > > > > Agree, and I meant that the buffer synchronization mechanism includes > the > > above two things. And I think the two things should be addressed > together. > > I'm still confused about what you're trying to achive in the big > picture here exactly. I think the key point is your statement above > that the kernel can't know when the cpu access a dma-buf. That's not > true though: > > For userspace mmaps the kernel can shoot down ptes and so prevent > userspace from accessing a buffer. In this case, cpu access to user space will incur page fault. However, cpu accesses user space region after mmap in user mode and every time user wants. And user needs something before passing a buffer accessed into device driver and also vice versa. > Since that's pretty inefficient > gem/ttm have additional ioctls for cache coherency transitions. Yes, that's the something, and that's a way now we are using if I understood surely. User side calls those ioctls for cache coherency between cpu and dma: cache clean after cpu access and before dma access, and cache invalidate after dma access and before cpu access. I thought it's better to control cache in kernel side than user because I had found some Linux based platforms overuse cache operations. So I thought the best place for it is the proposed fence helper or similar thing: we would need interfaces to notify the moment, 'before and after of cpu access', into kernel side. My proposal is to couple those two things, synchronizing caches and in-flight dma with fences, in kernel side. With this approach, user doesn't need to care when dma and cpu access will be started, and when dma and cpu access will be completed anymore when user wants to access a buffer and share it with dma. All things, cache coherency and buffer fencing between dma and cpu, will be done in kernel side. > dma_buf currently has no support for explicit coherency transitions > from userpsace, so if pte shootdown is an issue we need to improve > that. > > If I read your proposal correctly you instead suggest to _alway_ flush > cache before/after each dma. That will _completely_ kill performance > (been there with i915, trust me). > In case of using a cachable mapped buffer and sharing the buffer with dma, we need to clean cache after cpu access and before dma access, and we need to invalidate cache after dma access and before cpu access every time. Isn't that we are doing so now? I have a little knowledge about Desktop world so there might be my misunderstanding. Could you please give me advices and more comments if there is my missing point or misunderstanding? > For cpu access from kernel space we already have explicit mappings all > over the place (kmap&friends), so I don't see any issues with > inserting the relevant begin/end_cpu_access calls just around the cpu > access. The big exception is the framebuffer layer, but even that has > the deferred io support. KMS otoh has a nice frontbuffer dirty > interface to correctly support dumb clients, unfortunately fbcon does > not (yet) use that. > > For these reasons I don't see a need to smash cache coherency and > fencing into one problem, and hence why I've proposed to tackle each > of them individually. I might be really missing some points :) Thanks, Inki Dae > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-23 13:37 ` Inki Dae @ 2013-05-27 10:38 ` Inki Dae 2013-05-27 15:23 ` Maarten Lankhorst 2013-05-27 15:47 ` Rob Clark 0 siblings, 2 replies; 35+ messages in thread From: Inki Dae @ 2013-05-27 10:38 UTC (permalink / raw) To: linux-arm-kernel Hi all, I have been removed previous branch and added new one with more cleanup. This time, the fence helper doesn't include user side interfaces and cache operation relevant codes anymore because not only we are not sure that coupling those two things, synchronizing caches and buffer access between CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a good idea yet but also existing codes for user side have problems with badly behaved or crashing userspace. So this could be more discussed later. The below is a new branch, https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f ence-helper And fence helper codes, https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 And example codes for device driver, https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae I think the time is not yet ripe for RFC posting: maybe existing dma fence and reservation need more review and addition work. So I'd glad for somebody giving other opinions and advices in advance before RFC posting. Thanks, Inki Dae ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-27 10:38 ` Inki Dae @ 2013-05-27 15:23 ` Maarten Lankhorst 2013-05-28 2:49 ` Inki Dae 2013-05-27 15:47 ` Rob Clark 1 sibling, 1 reply; 35+ messages in thread From: Maarten Lankhorst @ 2013-05-27 15:23 UTC (permalink / raw) To: linux-arm-kernel Hey, Op 27-05-13 12:38, Inki Dae schreef: > Hi all, > > I have been removed previous branch and added new one with more cleanup. > This time, the fence helper doesn't include user side interfaces and cache > operation relevant codes anymore because not only we are not sure that > coupling those two things, synchronizing caches and buffer access between > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a > good idea yet but also existing codes for user side have problems with badly > behaved or crashing userspace. So this could be more discussed later. > > The below is a new branch, > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f > ence-helper > > And fence helper codes, > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > > And example codes for device driver, > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > > I think the time is not yet ripe for RFC posting: maybe existing dma fence > and reservation need more review and addition work. So I'd glad for somebody > giving other opinions and advices in advance before RFC posting. > NAK. For examples for how to handle locking properly, see Documentation/ww-mutex-design.txt in my recent tree. I could list what I believe is wrong with your implementation, but real problem is that the approach you're taking is wrong. ~Maarten ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-27 15:23 ` Maarten Lankhorst @ 2013-05-28 2:49 ` Inki Dae 2013-05-28 7:23 ` Maarten Lankhorst 0 siblings, 1 reply; 35+ messages in thread From: Inki Dae @ 2013-05-28 2:49 UTC (permalink / raw) To: linux-arm-kernel > -----Original Message----- > From: Maarten Lankhorst [mailto:maarten.lankhorst at canonical.com] > Sent: Tuesday, May 28, 2013 12:23 AM > To: Inki Dae > Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'YoungJun Cho'; 'Kyungmin > Park'; 'myungjoo.ham'; 'DRI mailing list'; linux-arm- > kernel at lists.infradead.org; linux-media at vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > Hey, > > Op 27-05-13 12:38, Inki Dae schreef: > > Hi all, > > > > I have been removed previous branch and added new one with more cleanup. > > This time, the fence helper doesn't include user side interfaces and > cache > > operation relevant codes anymore because not only we are not sure that > > coupling those two things, synchronizing caches and buffer access > between > > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is > a > > good idea yet but also existing codes for user side have problems with > badly > > behaved or crashing userspace. So this could be more discussed later. > > > > The below is a new branch, > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/?h=dma-f > > ence-helper > > > > And fence helper codes, > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/commit/? > > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > > > > And example codes for device driver, > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/commit/? > > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > > > > I think the time is not yet ripe for RFC posting: maybe existing dma > fence > > and reservation need more review and addition work. So I'd glad for > somebody > > giving other opinions and advices in advance before RFC posting. > > > NAK. > > For examples for how to handle locking properly, see Documentation/ww- > mutex-design.txt in my recent tree. > I could list what I believe is wrong with your implementation, but real > problem is that the approach you're taking is wrong. I just removed ticket stubs to show my approach you guys as simple as possible, and I just wanted to show that we could use buffer synchronization mechanism without ticket stubs. Question, WW-Mutexes could be used for all devices? I guess this has dependence on x86 gpu: gpu has VRAM and it means different memory domain. And could you tell my why shared fence should have only eight objects? I think we could need more than eight objects for read access. Anyway I think I don't surely understand yet so there might be my missing point. Thanks, Inki Dae > > ~Maarten ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-28 2:49 ` Inki Dae @ 2013-05-28 7:23 ` Maarten Lankhorst 0 siblings, 0 replies; 35+ messages in thread From: Maarten Lankhorst @ 2013-05-28 7:23 UTC (permalink / raw) To: linux-arm-kernel Hey, Op 28-05-13 04:49, Inki Dae schreef: > >> -----Original Message----- >> From: Maarten Lankhorst [mailto:maarten.lankhorst at canonical.com] >> Sent: Tuesday, May 28, 2013 12:23 AM >> To: Inki Dae >> Cc: 'Daniel Vetter'; 'Rob Clark'; 'linux-fbdev'; 'YoungJun Cho'; 'Kyungmin >> Park'; 'myungjoo.ham'; 'DRI mailing list'; linux-arm- >> kernel at lists.infradead.org; linux-media at vger.kernel.org >> Subject: Re: Introduce a new helper framework for buffer synchronization >> >> Hey, >> >> Op 27-05-13 12:38, Inki Dae schreef: >>> Hi all, >>> >>> I have been removed previous branch and added new one with more cleanup. >>> This time, the fence helper doesn't include user side interfaces and >> cache >>> operation relevant codes anymore because not only we are not sure that >>> coupling those two things, synchronizing caches and buffer access >> between >>> CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is >> a >>> good idea yet but also existing codes for user side have problems with >> badly >>> behaved or crashing userspace. So this could be more discussed later. >>> >>> The below is a new branch, >>> >>> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- >> exynos.git/?h=dma-f >>> ence-helper >>> >>> And fence helper codes, >>> >>> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- >> exynos.git/commit/? >>> h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 >>> >>> And example codes for device driver, >>> >>> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- >> exynos.git/commit/? >>> h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae >>> >>> I think the time is not yet ripe for RFC posting: maybe existing dma >> fence >>> and reservation need more review and addition work. So I'd glad for >> somebody >>> giving other opinions and advices in advance before RFC posting. >>> >> NAK. >> >> For examples for how to handle locking properly, see Documentation/ww- >> mutex-design.txt in my recent tree. >> I could list what I believe is wrong with your implementation, but real >> problem is that the approach you're taking is wrong. > I just removed ticket stubs to show my approach you guys as simple as > possible, and I just wanted to show that we could use buffer synchronization > mechanism without ticket stubs. The tickets have been removed in favor of a ww_context. Moving it in as a base primitive allows more locking abuse to be detected, and makes some other things easier too. > Question, WW-Mutexes could be used for all devices? I guess this has > dependence on x86 gpu: gpu has VRAM and it means different memory domain. > And could you tell my why shared fence should have only eight objects? I > think we could need more than eight objects for read access. Anyway I think > I don't surely understand yet so there might be my missing point. Yes, ww mutexes are not limited in any way to x86. They're a locking mechanism. When you acquired the ww mutexes for all buffer objects, all it does is say at that point in time you have exclusively acquired the locks of all bo's. After locking everything you can read the fence pointers safely, queue waits, and set a new fence pointer on all reservation_objects. You only need a single fence on all those objects, so 8 is plenty. Nonetheless this was a limitation of my earlier design, and I'll dynamically allocate fence_shared in the future. ~Maarten ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-27 10:38 ` Inki Dae 2013-05-27 15:23 ` Maarten Lankhorst @ 2013-05-27 15:47 ` Rob Clark 2013-05-27 16:02 ` Daniel Vetter 2013-05-28 3:56 ` Inki Dae 1 sibling, 2 replies; 35+ messages in thread From: Rob Clark @ 2013-05-27 15:47 UTC (permalink / raw) To: linux-arm-kernel On Mon, May 27, 2013 at 6:38 AM, Inki Dae <inki.dae@samsung.com> wrote: > Hi all, > > I have been removed previous branch and added new one with more cleanup. > This time, the fence helper doesn't include user side interfaces and cache > operation relevant codes anymore because not only we are not sure that > coupling those two things, synchronizing caches and buffer access between > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a > good idea yet but also existing codes for user side have problems with badly > behaved or crashing userspace. So this could be more discussed later. > > The below is a new branch, > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f > ence-helper > > And fence helper codes, > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > > And example codes for device driver, > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > > I think the time is not yet ripe for RFC posting: maybe existing dma fence > and reservation need more review and addition work. So I'd glad for somebody > giving other opinions and advices in advance before RFC posting. thoughts from a *really* quick, pre-coffee, first look: * any sort of helper to simplify single-buffer sort of use-cases (v4l) probably wouldn't want to bake in assumption that seqno_fence is used. * I guess g2d is probably not actually a simple use case, since I expect you can submit blits involving multiple buffers :-P * otherwise, you probably don't want to depend on dmabuf, which is why reservation/fence is split out the way it is.. you want to be able to use a single reservation/fence mechanism within your driver without having to care about which buffers are exported to dmabuf's and which are not. Creating a dmabuf for every GEM bo is too heavyweight. I'm not entirely sure if reservation/fence could/should be made any simpler for multi-buffer users. Probably the best thing to do is just get reservation/fence rolled out in a few drivers and see if some common patterns emerge. BR, -R > > Thanks, > Inki Dae > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-27 15:47 ` Rob Clark @ 2013-05-27 16:02 ` Daniel Vetter 2013-05-28 4:04 ` Inki Dae 2013-05-28 3:56 ` Inki Dae 1 sibling, 1 reply; 35+ messages in thread From: Daniel Vetter @ 2013-05-27 16:02 UTC (permalink / raw) To: linux-arm-kernel On Mon, May 27, 2013 at 5:47 PM, Rob Clark <robdclark@gmail.com> wrote: > On Mon, May 27, 2013 at 6:38 AM, Inki Dae <inki.dae@samsung.com> wrote: >> Hi all, >> >> I have been removed previous branch and added new one with more cleanup. >> This time, the fence helper doesn't include user side interfaces and cache >> operation relevant codes anymore because not only we are not sure that >> coupling those two things, synchronizing caches and buffer access between >> CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is a >> good idea yet but also existing codes for user side have problems with badly >> behaved or crashing userspace. So this could be more discussed later. >> >> The below is a new branch, >> >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/?h=dma-f >> ence-helper >> >> And fence helper codes, >> >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? >> h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 >> >> And example codes for device driver, >> >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm-exynos.git/commit/? >> h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae >> >> I think the time is not yet ripe for RFC posting: maybe existing dma fence >> and reservation need more review and addition work. So I'd glad for somebody >> giving other opinions and advices in advance before RFC posting. > > thoughts from a *really* quick, pre-coffee, first look: > * any sort of helper to simplify single-buffer sort of use-cases (v4l) > probably wouldn't want to bake in assumption that seqno_fence is used. Yeah, which is why Maarten&I discussed ideas already for what needs to be improved in the current dma-buf interface code to make this Just Work. At least as long as a driver doesn't want to add new fences, which would be especially useful for all kinds of gpu access. > * I guess g2d is probably not actually a simple use case, since I > expect you can submit blits involving multiple buffers :-P Yeah, on a quick read the current fence helper code seems to be a bit limited in scope. > * otherwise, you probably don't want to depend on dmabuf, which is why > reservation/fence is split out the way it is.. you want to be able to > use a single reservation/fence mechanism within your driver without > having to care about which buffers are exported to dmabuf's and which > are not. Creating a dmabuf for every GEM bo is too heavyweight. That's pretty much the reason that reservations are free-standing from dma_bufs. The idea is to embed them into the gem/ttm/v4l buffer object. Maarten also has some helpers to keep track of multi-buffer ww_mutex locking and fence attaching in his reservation helpers, but I think we should wait with those until we have drivers using them. For now I think the priority should be to get the basic stuff in and ttm as the first user established. Then we can go nuts later on. > I'm not entirely sure if reservation/fence could/should be made any > simpler for multi-buffer users. Probably the best thing to do is just > get reservation/fence rolled out in a few drivers and see if some > common patterns emerge. I think we can make the 1 buffer per dma op (i.e. 1:1 dma_buf->reservation : fence mapping) work fairly simple in dma_buf with maybe a dma_buf_attachment_start_dma/end_dma helpers. But there's also still the open that currently there's no way to flush cpu caches for dma access without unmapping the attachement (or resorting to trick which might not work), so we have a few gaping holes in the interface already anyway. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-27 16:02 ` Daniel Vetter @ 2013-05-28 4:04 ` Inki Dae 0 siblings, 0 replies; 35+ messages in thread From: Inki Dae @ 2013-05-28 4:04 UTC (permalink / raw) To: linux-arm-kernel > -----Original Message----- > From: daniel.vetter at ffwll.ch [mailto:daniel.vetter at ffwll.ch] On Behalf Of > Daniel Vetter > Sent: Tuesday, May 28, 2013 1:02 AM > To: Rob Clark > Cc: Inki Dae; Maarten Lankhorst; linux-fbdev; YoungJun Cho; Kyungmin Park; > myungjoo.ham; DRI mailing list; linux-arm-kernel at lists.infradead.org; > linux-media at vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Mon, May 27, 2013 at 5:47 PM, Rob Clark <robdclark@gmail.com> wrote: > > On Mon, May 27, 2013 at 6:38 AM, Inki Dae <inki.dae@samsung.com> wrote: > >> Hi all, > >> > >> I have been removed previous branch and added new one with more cleanup. > >> This time, the fence helper doesn't include user side interfaces and > cache > >> operation relevant codes anymore because not only we are not sure that > >> coupling those two things, synchronizing caches and buffer access > between > >> CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side > is a > >> good idea yet but also existing codes for user side have problems with > badly > >> behaved or crashing userspace. So this could be more discussed later. > >> > >> The below is a new branch, > >> > >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/?h=dma-f > >> ence-helper > >> > >> And fence helper codes, > >> > >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/commit/? > >> h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > >> > >> And example codes for device driver, > >> > >> https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/commit/? > >> h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > >> > >> I think the time is not yet ripe for RFC posting: maybe existing dma > fence > >> and reservation need more review and addition work. So I'd glad for > somebody > >> giving other opinions and advices in advance before RFC posting. > > > > thoughts from a *really* quick, pre-coffee, first look: > > * any sort of helper to simplify single-buffer sort of use-cases (v4l) > > probably wouldn't want to bake in assumption that seqno_fence is used. > > Yeah, which is why Maarten&I discussed ideas already for what needs to > be improved in the current dma-buf interface code to make this Just > Work. At least as long as a driver doesn't want to add new fences, > which would be especially useful for all kinds of gpu access. > > > * I guess g2d is probably not actually a simple use case, since I > > expect you can submit blits involving multiple buffers :-P > > Yeah, on a quick read the current fence helper code seems to be a bit > limited in scope. > > > * otherwise, you probably don't want to depend on dmabuf, which is why > > reservation/fence is split out the way it is.. you want to be able to > > use a single reservation/fence mechanism within your driver without > > having to care about which buffers are exported to dmabuf's and which > > are not. Creating a dmabuf for every GEM bo is too heavyweight. > > That's pretty much the reason that reservations are free-standing from > dma_bufs. The idea is to embed them into the gem/ttm/v4l buffer > object. Maarten also has some helpers to keep track of multi-buffer > ww_mutex locking and fence attaching in his reservation helpers, but I > think we should wait with those until we have drivers using them. > > For now I think the priority should be to get the basic stuff in and > ttm as the first user established. Then we can go nuts later on. > > > I'm not entirely sure if reservation/fence could/should be made any > > simpler for multi-buffer users. Probably the best thing to do is just > > get reservation/fence rolled out in a few drivers and see if some > > common patterns emerge. > > I think we can make the 1 buffer per dma op (i.e. 1:1 > dma_buf->reservation : fence mapping) work fairly simple in dma_buf > with maybe a dma_buf_attachment_start_dma/end_dma helpers. But there's > also still the open that currently there's no way to flush cpu caches > for dma access without unmapping the attachement (or resorting to That was what I tried adding user interfaces to dmabuf: coupling synchronizing caches and buffer access between CPU and CPU, CPU and DMA, and DMA and DMA with fences in kernel side. We need something to do between mapping and unmapping attachment. > trick which might not work), so we have a few gaping holes in the > interface already anyway. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-27 15:47 ` Rob Clark 2013-05-27 16:02 ` Daniel Vetter @ 2013-05-28 3:56 ` Inki Dae 2013-05-28 10:32 ` Daniel Vetter 2013-05-28 13:48 ` Rob Clark 1 sibling, 2 replies; 35+ messages in thread From: Inki Dae @ 2013-05-28 3:56 UTC (permalink / raw) To: linux-arm-kernel > -----Original Message----- > From: linux-fbdev-owner at vger.kernel.org [mailto:linux-fbdev- > owner at vger.kernel.org] On Behalf Of Rob Clark > Sent: Tuesday, May 28, 2013 12:48 AM > To: Inki Dae > Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin > Park; myungjoo.ham; DRI mailing list; linux-arm-kernel at lists.infradead.org; > linux-media at vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Mon, May 27, 2013 at 6:38 AM, Inki Dae <inki.dae@samsung.com> wrote: > > Hi all, > > > > I have been removed previous branch and added new one with more cleanup. > > This time, the fence helper doesn't include user side interfaces and > cache > > operation relevant codes anymore because not only we are not sure that > > coupling those two things, synchronizing caches and buffer access > between > > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is > a > > good idea yet but also existing codes for user side have problems with > badly > > behaved or crashing userspace. So this could be more discussed later. > > > > The below is a new branch, > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/?h=dma-f > > ence-helper > > > > And fence helper codes, > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/commit/? > > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > > > > And example codes for device driver, > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > exynos.git/commit/? > > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > > > > I think the time is not yet ripe for RFC posting: maybe existing dma > fence > > and reservation need more review and addition work. So I'd glad for > somebody > > giving other opinions and advices in advance before RFC posting. > > thoughts from a *really* quick, pre-coffee, first look: > * any sort of helper to simplify single-buffer sort of use-cases (v4l) > probably wouldn't want to bake in assumption that seqno_fence is used. > * I guess g2d is probably not actually a simple use case, since I > expect you can submit blits involving multiple buffers :-P I don't think so. One and more buffers can be used: seqno_fence also has only one buffer. Actually, we have already applied this approach to most devices; multimedia, gpu and display controller. And this approach shows more performance; reduced power consumption against traditional way. And g2d example is just to show you how to apply my approach to device driver. > * otherwise, you probably don't want to depend on dmabuf, which is why > reservation/fence is split out the way it is.. you want to be able to > use a single reservation/fence mechanism within your driver without > having to care about which buffers are exported to dmabuf's and which > are not. Creating a dmabuf for every GEM bo is too heavyweight. Right. But I think we should dealt with this separately. Actually, we are trying to use reservation for gpu pipe line synchronization such as sgx sync object and this approach is used without dmabuf. In order words, some device can use only reservation for such pipe line synchronization and at the same time, fence helper or similar thing with dmabuf for buffer synchronization. > > I'm not entirely sure if reservation/fence could/should be made any > simpler for multi-buffer users. Probably the best thing to do is just > get reservation/fence rolled out in a few drivers and see if some > common patterns emerge. > > BR, > -R > > > > > Thanks, > > Inki Dae > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-28 3:56 ` Inki Dae @ 2013-05-28 10:32 ` Daniel Vetter 2013-05-28 14:43 ` Inki Dae 2013-05-28 13:48 ` Rob Clark 1 sibling, 1 reply; 35+ messages in thread From: Daniel Vetter @ 2013-05-28 10:32 UTC (permalink / raw) To: linux-arm-kernel On Tue, May 28, 2013 at 12:56:57PM +0900, Inki Dae wrote: > > > > -----Original Message----- > > From: linux-fbdev-owner at vger.kernel.org [mailto:linux-fbdev- > > owner at vger.kernel.org] On Behalf Of Rob Clark > > Sent: Tuesday, May 28, 2013 12:48 AM > > To: Inki Dae > > Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin > > Park; myungjoo.ham; DRI mailing list; > linux-arm-kernel at lists.infradead.org; > > linux-media at vger.kernel.org > > Subject: Re: Introduce a new helper framework for buffer synchronization > > > > On Mon, May 27, 2013 at 6:38 AM, Inki Dae <inki.dae@samsung.com> wrote: > > > Hi all, > > > > > > I have been removed previous branch and added new one with more cleanup. > > > This time, the fence helper doesn't include user side interfaces and > > cache > > > operation relevant codes anymore because not only we are not sure that > > > coupling those two things, synchronizing caches and buffer access > > between > > > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is > > a > > > good idea yet but also existing codes for user side have problems with > > badly > > > behaved or crashing userspace. So this could be more discussed later. > > > > > > The below is a new branch, > > > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > > exynos.git/?h=dma-f > > > ence-helper > > > > > > And fence helper codes, > > > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > > exynos.git/commit/? > > > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > > > > > > And example codes for device driver, > > > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > > exynos.git/commit/? > > > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > > > > > > I think the time is not yet ripe for RFC posting: maybe existing dma > > fence > > > and reservation need more review and addition work. So I'd glad for > > somebody > > > giving other opinions and advices in advance before RFC posting. > > > > thoughts from a *really* quick, pre-coffee, first look: > > * any sort of helper to simplify single-buffer sort of use-cases (v4l) > > probably wouldn't want to bake in assumption that seqno_fence is used. > > * I guess g2d is probably not actually a simple use case, since I > > expect you can submit blits involving multiple buffers :-P > > I don't think so. One and more buffers can be used: seqno_fence also has > only one buffer. Actually, we have already applied this approach to most > devices; multimedia, gpu and display controller. And this approach shows > more performance; reduced power consumption against traditional way. And g2d > example is just to show you how to apply my approach to device driver. Note that seqno_fence is an implementation pattern for a certain type of direct hw->hw synchronization which uses a shared dma_buf to exchange the sync cookie. The dma_buf attached to the seqno_fence has _nothing_ to do with the dma_buf the fence actually coordinates access to. I think that confusing is a large reason for why Maarten&I don't understand what you want to achieve with your fence helpers. Currently they're using the seqno_fence, but totally not in a way the seqno_fence was meant to be used. Note that with the current code there is only a pointer from dma_bufs to the fence. The fence itself has _no_ pointer to the dma_buf it syncs. This shouldn't be a problem since the fence fastpath for already signalled fences is completely barrier&lock free (it's just a load+bit-test), and fences are meant to be embedded into whatever dma tracking structure you already have, so no overhead there. The only ugly part is the fence refcounting, but I don't think we can drop that. Note that you completely reinvent this part of Maarten's fence patches by adding new r/w_complete completions to the reservation object, which completely replaces the fence stuff. Also note that a list of reservation entries is again meant to be used only when submitting the dma to the gpu. With your patches you seem to hang onto that list until dma completes. This has the ugly side-effect that you need to allocate these reservation entries with kmalloc, whereas if you just use them in the execbuf ioctl to construct the dma you can usually embed it into something else you need already anyway. At least i915 and ttm based drivers can work that way. Furthermore fences are specifically constructed as frankenstein-monsters between completion/waitqueues and callbacks. All the different use-cases need the different aspects: - busy/idle checks and bo retiring need the completion semantics - callbacks (in interrupt context) are used for hybrid hw->irq handler->hw sync approaches > > > * otherwise, you probably don't want to depend on dmabuf, which is why > > reservation/fence is split out the way it is.. you want to be able to > > use a single reservation/fence mechanism within your driver without > > having to care about which buffers are exported to dmabuf's and which > > are not. Creating a dmabuf for every GEM bo is too heavyweight. > > Right. But I think we should dealt with this separately. Actually, we are > trying to use reservation for gpu pipe line synchronization such as sgx sync > object and this approach is used without dmabuf. In order words, some device > can use only reservation for such pipe line synchronization and at the same > time, fence helper or similar thing with dmabuf for buffer synchronization. I think a quick overview of the different pieces would be in order. - ww_mutex: This is just the locking primite which allows you to grab multiple mutexes without the need to check for ordering (and so fear deadlocks). - fence: This is just a fancy kind of completion with a bit of support for hw->hw completion events. The fences themselve have no tie-in with buffers, ww_mutexes or anything else. - reservation: This ties together an object (doesn't need to be a buffer, could be any other gpu resource - see the drm/vmwgfx driver if you want your mind blown) with fences. Note that there's no connection the other way round, i.e. with the current patches you can't see which reservations are using which fences. Also note that other objects than reservations could point at fences, e.g. when the provide shared/exclusive semantics are not good enough for your needs. The ww_mutex in the reservation is simply the (fancy) lock which protects each reservation. The reason we need something fancy is that you need to lock all objects which are synced by the same fence toghether, otherwise you can race and construct deadlocks in the hw->hw sync part of fences. - dma_buf integration: This is simply a pointer to the reservation object of the underlying buffer object. We need a pointer here since otherwise you can construct deadlocks between dma_buf objects and native buffer objects. The crux is also that dma_buf integration comes last - before you can do that you need to have your driver converted over to use ww_mutexes/fences/reservations. I hope that helps to unconfuse things a bit and help you understand the different pieces of the fence/reservation/ww_mutex patches floating around. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-28 10:32 ` Daniel Vetter @ 2013-05-28 14:43 ` Inki Dae 0 siblings, 0 replies; 35+ messages in thread From: Inki Dae @ 2013-05-28 14:43 UTC (permalink / raw) To: linux-arm-kernel Hi Daniel, Thank you so much. And so very useful.:) Sorry but could be give me more comments to the below my comments? There are still things making me confusing.:( > -----Original Message----- > From: Daniel Vetter [mailto:daniel.vetter at ffwll.ch] On Behalf Of Daniel > Vetter > Sent: Tuesday, May 28, 2013 7:33 PM > To: Inki Dae > Cc: 'Rob Clark'; 'Maarten Lankhorst'; 'Daniel Vetter'; 'linux-fbdev'; > 'YoungJun Cho'; 'Kyungmin Park'; 'myungjoo.ham'; 'DRI mailing list'; > linux-arm-kernel at lists.infradead.org; linux-media at vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Tue, May 28, 2013 at 12:56:57PM +0900, Inki Dae wrote: > > > > > > > -----Original Message----- > > > From: linux-fbdev-owner at vger.kernel.org [mailto:linux-fbdev- > > > owner at vger.kernel.org] On Behalf Of Rob Clark > > > Sent: Tuesday, May 28, 2013 12:48 AM > > > To: Inki Dae > > > Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; > Kyungmin > > > Park; myungjoo.ham; DRI mailing list; > > linux-arm-kernel at lists.infradead.org; > > > linux-media at vger.kernel.org > > > Subject: Re: Introduce a new helper framework for buffer > synchronization > > > > > > On Mon, May 27, 2013 at 6:38 AM, Inki Dae <inki.dae@samsung.com> wrote: > > > > Hi all, > > > > > > > > I have been removed previous branch and added new one with more > cleanup. > > > > This time, the fence helper doesn't include user side interfaces and > > > cache > > > > operation relevant codes anymore because not only we are not sure > that > > > > coupling those two things, synchronizing caches and buffer access > > > between > > > > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel > side is > > > a > > > > good idea yet but also existing codes for user side have problems > with > > > badly > > > > behaved or crashing userspace. So this could be more discussed later. > > > > > > > > The below is a new branch, > > > > > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > > > exynos.git/?h=dma-f > > > > ence-helper > > > > > > > > And fence helper codes, > > > > > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > > > exynos.git/commit/? > > > > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > > > > > > > > And example codes for device driver, > > > > > > > > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > > > exynos.git/commit/? > > > > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > > > > > > > > I think the time is not yet ripe for RFC posting: maybe existing dma > > > fence > > > > and reservation need more review and addition work. So I'd glad for > > > somebody > > > > giving other opinions and advices in advance before RFC posting. > > > > > > thoughts from a *really* quick, pre-coffee, first look: > > > * any sort of helper to simplify single-buffer sort of use-cases (v4l) > > > probably wouldn't want to bake in assumption that seqno_fence is used. > > > * I guess g2d is probably not actually a simple use case, since I > > > expect you can submit blits involving multiple buffers :-P > > > > I don't think so. One and more buffers can be used: seqno_fence also has > > only one buffer. Actually, we have already applied this approach to most > > devices; multimedia, gpu and display controller. And this approach shows > > more performance; reduced power consumption against traditional way. And > g2d > > example is just to show you how to apply my approach to device driver. > > Note that seqno_fence is an implementation pattern for a certain type of > direct hw->hw synchronization which uses a shared dma_buf to exchange the > sync cookie. I'm afraid that I don't understand hw->hw synchronization. hw->hw synchronization means that device has a hardware feature which supports buffer synchronization hardware internally? And what is the sync cookie? > The dma_buf attached to the seqno_fence has _nothing_ to do > with the dma_buf the fence actually coordinates access to. > > I think that confusing is a large reason for why Maarten&I don't > understand what you want to achieve with your fence helpers. Currently > they're using the seqno_fence, but totally not in a way the seqno_fence > was meant to be used. > > Note that with the current code there is only a pointer from dma_bufs to > the fence. The fence itself has _no_ pointer to the dma_buf it syncs. This > shouldn't be a problem since the fence fastpath for already signalled > fences is completely barrier&lock free (it's just a load+bit-test), and > fences are meant to be embedded into whatever dma tracking structure you > already have, so no overhead there. The only ugly part is the fence > refcounting, but I don't think we can drop that. The below is the proposed way, dma device has to create a fence before accessing a shared buffer, and then check if there are other dma which are accessing the shared buffer; if exist then the dma device should be blocked, and then it sets the fence to reservation object of the shared buffer. And then the dma begins access to the shared buffer. And finally, the dma signals its own fence so that other blocked can be waked up. However, if there was another dma blocked before signaling then another dma can access invalid fence object after waked up because the fence object can be released after signaling. So I made another dma takes a one reference before blocked. And I think origin version also takes a reference at ticket_commit(). Is there wrong point in proposed way? > > Note that you completely reinvent this part of Maarten's fence patches by > adding new r/w_complete completions to the reservation object, which > completely replaces the fence stuff. > > Also note that a list of reservation entries is again meant to be used > only when submitting the dma to the gpu. With your patches you seem to > hang onto that list until dma completes. Yeah, that is for prevent a dma from accessing a shared buffer while the other is accessing the shared buffer. Maybe this way means software synchronization. And your saying seems that there is another mechanism for dealing with such thing. Could you give me more comments for it? is there really another mechanism? > This has the ugly side-effect > that you need to allocate these reservation entries with kmalloc, whereas > if you just use them in the execbuf ioctl to construct the dma you can > usually embed it into something else you need already anyway. At least > i915 and ttm based drivers can work that way. > Maybe there is my misunderstanding but I thought now dma fence and relevant stubs tend to depend on Desktop world, especially x86 gpu. So I thought we need a little bit customizing the dma fence for embedded system. > Furthermore fences are specifically constructed as frankenstein-monsters > between completion/waitqueues and callbacks. All the different use-cases > need the different aspects: > - busy/idle checks and bo retiring need the completion semantics > - callbacks (in interrupt context) are used for hybrid hw->irq handler->hw > sync approaches > > > > > > * otherwise, you probably don't want to depend on dmabuf, which is why > > > reservation/fence is split out the way it is.. you want to be able to > > > use a single reservation/fence mechanism within your driver without > > > having to care about which buffers are exported to dmabuf's and which > > > are not. Creating a dmabuf for every GEM bo is too heavyweight. > > > > Right. But I think we should dealt with this separately. Actually, we > are > > trying to use reservation for gpu pipe line synchronization such as sgx > sync > > object and this approach is used without dmabuf. In order words, some > device > > can use only reservation for such pipe line synchronization and at the > same > > time, fence helper or similar thing with dmabuf for buffer > synchronization. > > I think a quick overview of the different pieces would be in order. > > - ww_mutex: This is just the locking primite which allows you to grab > multiple mutexes without the need to check for ordering (and so fear > deadlocks). Should ww_mutex be necessarily used for embedded system? I'm not sure that there are any cases we have to use this feature like x86 gpu. That is also why I removed ticket stubs from existing reservation framework. > > - fence: This is just a fancy kind of completion with a bit of support for > hw->hw completion events. hw->hw completion event means completion of dma operation? Otherwise, hardware buffer synchronization feature? If the latter, how fence can be used for embedded system which has no hardware feature for buffer synchronization? Actually, that made me confusing at initial work. > The fences themselve have no tie-in with > buffers, ww_mutexes or anything else. > > - reservation: This ties together an object (doesn't need to be a buffer, > could be any other gpu resource - see the drm/vmwgfx driver if you want > your mind blown) with fences. Note that there's no connection the other > way round, i.e. with the current patches you can't see which > reservations are using which fences. Also note that other objects than > reservations could point at fences, e.g. when the provide > shared/exclusive semantics are not good enough for your needs. > > The ww_mutex in the reservation is simply the (fancy) lock which > protects each reservation. The reason we need something fancy is that > you need to lock all objects which are synced by the same fence > toghether, otherwise you can race and construct deadlocks in the hw->hw > sync part of fences. > > - dma_buf integration: This is simply a pointer to the reservation object > of the underlying buffer object. We need a pointer here since otherwise > you can construct deadlocks between dma_buf objects and native buffer > objects. > > The crux is also that dma_buf integration comes last - before you can do > that you need to have your driver converted over to use > ww_mutexes/fences/reservations. > > I hope that helps to unconfuse things a bit and help you understand the > different pieces of the fence/reservation/ww_mutex patches floating > around. Thank you again.:) Thanks, Inki Dae > > Cheers, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-28 3:56 ` Inki Dae 2013-05-28 10:32 ` Daniel Vetter @ 2013-05-28 13:48 ` Rob Clark 2013-05-28 14:50 ` Inki Dae 1 sibling, 1 reply; 35+ messages in thread From: Rob Clark @ 2013-05-28 13:48 UTC (permalink / raw) To: linux-arm-kernel On Mon, May 27, 2013 at 11:56 PM, Inki Dae <inki.dae@samsung.com> wrote: > > >> -----Original Message----- >> From: linux-fbdev-owner at vger.kernel.org [mailto:linux-fbdev- >> owner at vger.kernel.org] On Behalf Of Rob Clark >> Sent: Tuesday, May 28, 2013 12:48 AM >> To: Inki Dae >> Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin >> Park; myungjoo.ham; DRI mailing list; > linux-arm-kernel at lists.infradead.org; >> linux-media at vger.kernel.org >> Subject: Re: Introduce a new helper framework for buffer synchronization >> >> On Mon, May 27, 2013 at 6:38 AM, Inki Dae <inki.dae@samsung.com> wrote: >> > Hi all, >> > >> > I have been removed previous branch and added new one with more cleanup. >> > This time, the fence helper doesn't include user side interfaces and >> cache >> > operation relevant codes anymore because not only we are not sure that >> > coupling those two things, synchronizing caches and buffer access >> between >> > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side is >> a >> > good idea yet but also existing codes for user side have problems with >> badly >> > behaved or crashing userspace. So this could be more discussed later. >> > >> > The below is a new branch, >> > >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- >> exynos.git/?h=dma-f >> > ence-helper >> > >> > And fence helper codes, >> > >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- >> exynos.git/commit/? >> > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 >> > >> > And example codes for device driver, >> > >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- >> exynos.git/commit/? >> > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae >> > >> > I think the time is not yet ripe for RFC posting: maybe existing dma >> fence >> > and reservation need more review and addition work. So I'd glad for >> somebody >> > giving other opinions and advices in advance before RFC posting. >> >> thoughts from a *really* quick, pre-coffee, first look: >> * any sort of helper to simplify single-buffer sort of use-cases (v4l) >> probably wouldn't want to bake in assumption that seqno_fence is used. >> * I guess g2d is probably not actually a simple use case, since I >> expect you can submit blits involving multiple buffers :-P > > I don't think so. One and more buffers can be used: seqno_fence also has > only one buffer. Actually, we have already applied this approach to most > devices; multimedia, gpu and display controller. And this approach shows > more performance; reduced power consumption against traditional way. And g2d > example is just to show you how to apply my approach to device driver. no, you need the ww-mutex / reservation stuff any time you have multiple independent devices (or rings/contexts for hw that can support multiple contexts) which can do operations with multiple buffers. So you could conceivably hit this w/ gpu + g2d if multiple buffers where shared between the two. vram migration and such 'desktop stuff' might make the problem worse, but just because you don't have vram doesn't mean you don't have a problem with multiple buffers. >> * otherwise, you probably don't want to depend on dmabuf, which is why >> reservation/fence is split out the way it is.. you want to be able to >> use a single reservation/fence mechanism within your driver without >> having to care about which buffers are exported to dmabuf's and which >> are not. Creating a dmabuf for every GEM bo is too heavyweight. > > Right. But I think we should dealt with this separately. Actually, we are > trying to use reservation for gpu pipe line synchronization such as sgx sync > object and this approach is used without dmabuf. In order words, some device > can use only reservation for such pipe line synchronization and at the same > time, fence helper or similar thing with dmabuf for buffer synchronization. it is probably easier to approach from the reverse direction.. ie, get reservation/synchronization right first, and then dmabuf. (Well, that isn't really a problem because Maarten's reservation/fence patches support dmabuf from the beginning.) BR, -R >> >> I'm not entirely sure if reservation/fence could/should be made any >> simpler for multi-buffer users. Probably the best thing to do is just >> get reservation/fence rolled out in a few drivers and see if some >> common patterns emerge. >> >> BR, >> -R >> >> > >> > Thanks, >> > Inki Dae >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in >> the body of a message to majordomo at vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-28 13:48 ` Rob Clark @ 2013-05-28 14:50 ` Inki Dae 2013-05-28 16:49 ` Daniel Vetter 0 siblings, 1 reply; 35+ messages in thread From: Inki Dae @ 2013-05-28 14:50 UTC (permalink / raw) To: linux-arm-kernel > -----Original Message----- > From: linux-fbdev-owner at vger.kernel.org [mailto:linux-fbdev- > owner at vger.kernel.org] On Behalf Of Rob Clark > Sent: Tuesday, May 28, 2013 10:49 PM > To: Inki Dae > Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; Kyungmin > Park; myungjoo.ham; DRI mailing list; linux-arm-kernel at lists.infradead.org; > linux-media at vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Mon, May 27, 2013 at 11:56 PM, Inki Dae <inki.dae@samsung.com> wrote: > > > > > >> -----Original Message----- > >> From: linux-fbdev-owner at vger.kernel.org [mailto:linux-fbdev- > >> owner at vger.kernel.org] On Behalf Of Rob Clark > >> Sent: Tuesday, May 28, 2013 12:48 AM > >> To: Inki Dae > >> Cc: Maarten Lankhorst; Daniel Vetter; linux-fbdev; YoungJun Cho; > Kyungmin > >> Park; myungjoo.ham; DRI mailing list; > > linux-arm-kernel at lists.infradead.org; > >> linux-media at vger.kernel.org > >> Subject: Re: Introduce a new helper framework for buffer > synchronization > >> > >> On Mon, May 27, 2013 at 6:38 AM, Inki Dae <inki.dae@samsung.com> wrote: > >> > Hi all, > >> > > >> > I have been removed previous branch and added new one with more > cleanup. > >> > This time, the fence helper doesn't include user side interfaces and > >> cache > >> > operation relevant codes anymore because not only we are not sure > that > >> > coupling those two things, synchronizing caches and buffer access > >> between > >> > CPU and CPU, CPU and DMA, and DMA and DMA with fences, in kernel side > is > >> a > >> > good idea yet but also existing codes for user side have problems > with > >> badly > >> > behaved or crashing userspace. So this could be more discussed later. > >> > > >> > The below is a new branch, > >> > > >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > >> exynos.git/?h=dma-f > >> > ence-helper > >> > > >> > And fence helper codes, > >> > > >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > >> exynos.git/commit/? > >> > h=dma-fence-helper&id=adcbc0fe7e285ce866e5816e5e21443dcce01005 > >> > > >> > And example codes for device driver, > >> > > >> > https://git.kernel.org/cgit/linux/kernel/git/daeinki/drm- > >> exynos.git/commit/? > >> > h=dma-fence-helper&id=d2ce7af23835789602a99d0ccef1f53cdd5caaae > >> > > >> > I think the time is not yet ripe for RFC posting: maybe existing dma > >> fence > >> > and reservation need more review and addition work. So I'd glad for > >> somebody > >> > giving other opinions and advices in advance before RFC posting. > >> > >> thoughts from a *really* quick, pre-coffee, first look: > >> * any sort of helper to simplify single-buffer sort of use-cases (v4l) > >> probably wouldn't want to bake in assumption that seqno_fence is used. > >> * I guess g2d is probably not actually a simple use case, since I > >> expect you can submit blits involving multiple buffers :-P > > > > I don't think so. One and more buffers can be used: seqno_fence also has > > only one buffer. Actually, we have already applied this approach to most > > devices; multimedia, gpu and display controller. And this approach shows > > more performance; reduced power consumption against traditional way. And > g2d > > example is just to show you how to apply my approach to device driver. > > no, you need the ww-mutex / reservation stuff any time you have > multiple independent devices (or rings/contexts for hw that can > support multiple contexts) which can do operations with multiple > buffers. I think I already used reservation stuff any time in that way except ww-mutex. And I'm not sure that embedded system really needs ww-mutex. If there is any case, could you tell me the case? I really need more advice and understanding :) Thanks, Inki Dae So you could conceivably hit this w/ gpu + g2d if multiple > buffers where shared between the two. vram migration and such > 'desktop stuff' might make the problem worse, but just because you > don't have vram doesn't mean you don't have a problem with multiple > buffers. > > >> * otherwise, you probably don't want to depend on dmabuf, which is why > >> reservation/fence is split out the way it is.. you want to be able to > >> use a single reservation/fence mechanism within your driver without > >> having to care about which buffers are exported to dmabuf's and which > >> are not. Creating a dmabuf for every GEM bo is too heavyweight. > > > > Right. But I think we should dealt with this separately. Actually, we > are > > trying to use reservation for gpu pipe line synchronization such as sgx > sync > > object and this approach is used without dmabuf. In order words, some > device > > can use only reservation for such pipe line synchronization and at the > same > > time, fence helper or similar thing with dmabuf for buffer > synchronization. > > it is probably easier to approach from the reverse direction.. ie, get > reservation/synchronization right first, and then dmabuf. (Well, that > isn't really a problem because Maarten's reservation/fence patches > support dmabuf from the beginning.) > > BR, > -R > > >> > >> I'm not entirely sure if reservation/fence could/should be made any > >> simpler for multi-buffer users. Probably the best thing to do is just > >> get reservation/fence rolled out in a few drivers and see if some > >> common patterns emerge. > >> > >> BR, > >> -R > >> > >> > > >> > Thanks, > >> > Inki Dae > >> > > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-fbdev" > in > >> the body of a message to majordomo at vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-28 14:50 ` Inki Dae @ 2013-05-28 16:49 ` Daniel Vetter 2013-05-29 2:21 ` Inki Dae 0 siblings, 1 reply; 35+ messages in thread From: Daniel Vetter @ 2013-05-28 16:49 UTC (permalink / raw) To: linux-arm-kernel On Tue, May 28, 2013 at 4:50 PM, Inki Dae <inki.dae@samsung.com> wrote: > I think I already used reservation stuff any time in that way except > ww-mutex. And I'm not sure that embedded system really needs ww-mutex. If > there is any case, > could you tell me the case? I really need more advice and understanding :) If you have only one driver, you can get away without ww_mutex. drm/i915 does it, all buffer state is protected by dev->struct_mutex. But as soon as you have multiple drivers sharing buffers with dma_buf things will blow up. Yep, current prime is broken and can lead to deadlocks. In practice it doesn't (yet) matter since only the X server does the sharing dance, and that one's single-threaded. Now you can claim that since you have all buffers pinned in embedded gfx anyway, you don't care. But both in desktop gfx and embedded gfx the real fun starts once you put fences into the mix and link them up with buffers, then every command submission risks that deadlock. Furthermore you can get unlucky and construct a circle of fences waiting on each another (only though if the fence singalling fires off the next batchbuffer asynchronously). To prevent such deadlocks you _absolutely_ need to lock _all_ buffers that take part in a command submission at once. To do that you either need a global lock (ugh) or ww_mutexes. So ww_mutexes are the fundamental ingredient of all this, if you don't see why you need them then everything piled on top is broken. I think until you've understood why exactly we need ww_mutexes there's not much point in discussing the finer issues of fences, reservation objects and how to integrate it with dma_bufs exactly. I'll try to clarify the motivating example in the ww_mutex documentation a bit, but I dunno how else I could explain this ... Yours, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-28 16:49 ` Daniel Vetter @ 2013-05-29 2:21 ` Inki Dae 0 siblings, 0 replies; 35+ messages in thread From: Inki Dae @ 2013-05-29 2:21 UTC (permalink / raw) To: linux-arm-kernel > -----Original Message----- > From: daniel.vetter at ffwll.ch [mailto:daniel.vetter at ffwll.ch] On Behalf Of > Daniel Vetter > Sent: Wednesday, May 29, 2013 1:50 AM > To: Inki Dae > Cc: Rob Clark; Maarten Lankhorst; linux-fbdev; YoungJun Cho; Kyungmin Park; > myungjoo.ham; DRI mailing list; linux-arm-kernel at lists.infradead.org; > linux-media at vger.kernel.org > Subject: Re: Introduce a new helper framework for buffer synchronization > > On Tue, May 28, 2013 at 4:50 PM, Inki Dae <inki.dae@samsung.com> wrote: > > I think I already used reservation stuff any time in that way except > > ww-mutex. And I'm not sure that embedded system really needs ww-mutex. > If > > there is any case, > > could you tell me the case? I really need more advice and > understanding :) > > If you have only one driver, you can get away without ww_mutex. > drm/i915 does it, all buffer state is protected by dev->struct_mutex. > But as soon as you have multiple drivers sharing buffers with dma_buf > things will blow up. > > Yep, current prime is broken and can lead to deadlocks. > > In practice it doesn't (yet) matter since only the X server does the > sharing dance, and that one's single-threaded. Now you can claim that > since you have all buffers pinned in embedded gfx anyway, you don't > care. But both in desktop gfx and embedded gfx the real fun starts > once you put fences into the mix and link them up with buffers, then > every command submission risks that deadlock. Furthermore you can get > unlucky and construct a circle of fences waiting on each another (only > though if the fence singalling fires off the next batchbuffer > asynchronously). In our case, we haven't ever experienced deadlock yet but there is still possible to face with deadlock in case that a process is sharing two buffer with another process like below, Process A committed buffer A and waits for buffer B, Process B committed buffer B and waits for buffer A That is deadlock and it seems that you say we can resolve deadlock issue with ww-mutexes. And it seems that we can replace our block-wakeup mechanism with mutex lock for more performance. > > To prevent such deadlocks you _absolutely_ need to lock _all_ buffers > that take part in a command submission at once. To do that you either > need a global lock (ugh) or ww_mutexes. > > So ww_mutexes are the fundamental ingredient of all this, if you don't > see why you need them then everything piled on top is broken. I think > until you've understood why exactly we need ww_mutexes there's not > much point in discussing the finer issues of fences, reservation > objects and how to integrate it with dma_bufs exactly. > > I'll try to clarify the motivating example in the ww_mutex > documentation a bit, but I dunno how else I could explain this ... > I don't really want for you waste your time on me. I will trying to apply ww-mutexes (v4) to the proposed framework for more understanding. Thanks for your advices.:) Inki Dae > Yours, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 35+ messages in thread
* Introduce a new helper framework for buffer synchronization 2013-05-13 11:24 ` Inki Dae 2013-05-13 11:40 ` Maarten Lankhorst @ 2013-05-13 19:29 ` Tomasz Figa 1 sibling, 0 replies; 35+ messages in thread From: Tomasz Figa @ 2013-05-13 19:29 UTC (permalink / raw) To: linux-arm-kernel Hi, On Monday 13 of May 2013 20:24:01 Inki Dae wrote: > > -----Original Message----- > > From: Maarten Lankhorst [mailto:maarten.lankhorst at canonical.com] > > Sent: Monday, May 13, 2013 6:52 PM > > To: Inki Dae > > Cc: 'Rob Clark'; 'Daniel Vetter'; 'DRI mailing list'; linux-arm- > > kernel at lists.infradead.org; linux-media at vger.kernel.org; > > 'linux-fbdev'; > > 'Kyungmin Park'; 'myungjoo.ham'; 'YoungJun Cho' > > Subject: Re: Introduce a new helper framework for buffer > > synchronization> > > Op 13-05-13 11:21, Inki Dae schreef: > > >> -----Original Message----- > > >> From: Maarten Lankhorst [mailto:maarten.lankhorst at canonical.com] > > >> Sent: Monday, May 13, 2013 5:01 PM > > >> To: Inki Dae > > >> Cc: Rob Clark; Daniel Vetter; DRI mailing list; linux-arm- > > >> kernel at lists.infradead.org; linux-media at vger.kernel.org; > > >> linux-fbdev; > > >> Kyungmin Park; myungjoo.ham; YoungJun Cho > > >> Subject: Re: Introduce a new helper framework for buffer > > > > synchronization > > > > >> Op 09-05-13 09:33, Inki Dae schreef: > > >>> Hi all, > > >>> > > >>> This post introduces a new helper framework based on dma fence. > > >>> And > > > > the > > > > >>> purpose of this post is to collect other opinions and advices > > >>> before > > > > RFC > > > > >>> posting. > > >>> > > >>> First of all, this helper framework, called fence helper, is in > > > > progress > > > > >>> yet so might not have enough comments in codes and also might need > > >>> to > > > > be > > > > >>> more cleaned up. Moreover, we might be missing some parts of the > > >>> dma > > >> > > >> fence. > > >> > > >>> However, I'd like to say that all things mentioned below has been > > > > tested > > > > >>> with Linux platform and worked well. > > >>> .... > > >>> > > >>> And tutorial for user process. > > >>> > > >>> just before cpu access > > >>> > > >>> struct dma_buf_fence *df; > > >>> > > >>> df->type = DMA_BUF_ACCESS_READ or > > DMA_BUF_ACCESS_WRITE; > > > >>> ioctl(fd, DMA_BUF_GET_FENCE, &df); > > >>> > > >>> after memset or memcpy > > >>> > > >>> ioctl(fd, DMA_BUF_PUT_FENCE, &df); > > >> > > >> NAK. > > >> > > >> Userspace doesn't need to trigger fences. It can do a buffer idle > > >> wait, > > >> and postpone submitting new commands until after it's done using > > >> the > > >> buffer. > > > > > > Hi Maarten, > > > > > > It seems that you say user should wait for a buffer like KDS does: > > > KDS > > > > uses > > > > > select() to postpone submitting new commands. But I think this way > > > > assumes > > > > > that every data flows a DMA device to a CPU. For example, a CPU > > > should > > > > keep > > > > > polling for the completion of a buffer access by a DMA device. This > > > > means > > > > > that the this way isn't considered for data flow to opposite case; > > > CPU > > > > to > > > > > DMA device. > > > > Not really. You do both things the same way. You first wait for the bo > > to be idle, this could be implemented by adding poll support to the > > dma-buf fd. > > Then you either do your read or write. Since userspace is supposed to > > be the one controlling the bo it should stay idle at that point. If > > you have another thread queueing > > the buffer againbefore your thread is done that's a bug in the > > application, > > > and can be solved with userspace locking primitives. No need for the > > kernel to get involved. > > Yes, that is how we have synchronized buffer between CPU and DMA device > until now without buffer synchronization mechanism. I thought that it's > best to make user not considering anything: user can access a buffer > regardless of any DMA device controlling and the buffer synchronization > is performed in kernel level. Moreover, I think we could optimize > graphics and multimedia hardware performance because hardware can do > more works: one thread accesses a shared buffer and the other controls > DMA device with the shared buffer in parallel. Could you explain this point? I thought that if there is a shared buffer accessible for user and DMA device, only one of them can use it at the moment, i.e. the buffer is useless for the reading user (or read DMA) until (write) DMA (or writing user) finishes writing for it. Is it incorrect? Or this is not the point here? I'm not an expert here and I'm just trying to understand the idea, so correct me if I'm wrong. Thanks in advance. Best regards, Tomasz > Thus, we could avoid > sequential processing and that is my intention. Aren't you think about > that we could improve hardware utilization with such way or other? of > course, there could be better way. > > > >> Kernel space doesn't need the root hole you created by giving a > > >> dereferencing a pointer passed from userspace. > > >> Your next exercise should be to write a security exploit from the > > >> api > > > > you > > > > >> created here. It's the only way to learn how to write safe code. > > >> Hint: > > >> df.ctx = mmap(..); > > > > > > Also I'm not clear to use our way yet and that is why I posted. As > > > you > > > mentioned, it seems like that using mmap() is more safe. But there > > > is > > > > one > > > > > issue it makes me confusing. For your hint, df.ctx = mmap(..), the > > > issue> > > is > > > > > that dmabuf mmap can be used to map a dmabuf with user space. And > > > the > > > > dmabuf > > > > > means a physical memory region allocated by some allocator such as > > > drm > > > > gem > > > > > or ion. > > > > > > There might be my missing point so could you please give me more > > > > comments? > > > > My point was that userspace could change df.ctx to some mmap'd memory, > > forcing the kernel to execute some code prepared by userspace. > > Understood. I have to find a better way. And for this, I'd like to > listen attentively more opinions and advices. > > Thanks for comments, > Inki Dae > > _______________________________________________ > dri-devel mailing list > dri-devel at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2013-05-29  2:21 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAAQKjZNNw4qddo6bE5OY_CahrqDtqkxdO7Pm9RCguXyj9F4cMQ@mail.gmail.com>
2013-05-13  8:00 ` Introduce a new helper framework for buffer synchronization Maarten Lankhorst
2013-05-13  9:21   ` Inki Dae
2013-05-13  9:52     ` Maarten Lankhorst
2013-05-13 11:24       ` Inki Dae
2013-05-13 11:40         ` Maarten Lankhorst
2013-05-13 12:21           ` Inki Dae
2013-05-13 13:48             ` Rob Clark
     [not found]               ` <CAAQKjZP=iOmHRpHZCbZD3v_RKUFSn0eM_WVZZvhe7F9g3eTmPA@mail.gmail.com>
2013-05-13 17:58                 ` Rob Clark
2013-05-14  2:52                   ` Inki Dae
2013-05-14 13:38                     ` Rob Clark
2013-05-15  5:19                       ` Inki Dae
2013-05-15 14:06                         ` Rob Clark
2013-05-17  4:19                           ` Daniel Vetter
     [not found]                           ` <CAAQKjZP37koEPob6yqpn-WxxTh3+O=twyvRzDiEhVJTD8BxQzw@mail.gmail.com>
2013-05-20 21:13                             ` Daniel Vetter
2013-05-20 21:30                               ` Daniel Vetter
2013-05-21  7:03                                 ` Inki Dae
2013-05-21  7:44                                   ` Daniel Vetter
2013-05-21  9:22                                     ` Inki Dae
2013-05-23 11:55                                       ` Daniel Vetter
2013-05-23 13:37                                         ` Inki Dae
2013-05-27 10:38                                           ` Inki Dae
2013-05-27 15:23                                             ` Maarten Lankhorst
2013-05-28  2:49                                               ` Inki Dae
2013-05-28  7:23                                                 ` Maarten Lankhorst
2013-05-27 15:47                                             ` Rob Clark
2013-05-27 16:02                                               ` Daniel Vetter
2013-05-28  4:04                                                 ` Inki Dae
2013-05-28  3:56                                               ` Inki Dae
2013-05-28 10:32                                                 ` Daniel Vetter
2013-05-28 14:43                                                   ` Inki Dae
2013-05-28 13:48                                                 ` Rob Clark
2013-05-28 14:50                                                   ` Inki Dae
2013-05-28 16:49                                                     ` Daniel Vetter
2013-05-29  2:21                                                       ` Inki Dae
2013-05-13 19:29         ` Tomasz Figa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).