dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* dma-buf non-coherent mmap
@ 2013-10-31 17:00 Thomas Hellstrom
  2013-10-31 17:52 ` Rob Clark
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Hellstrom @ 2013-10-31 17:00 UTC (permalink / raw)
  To: dri-devel@lists.freedesktop.org, linaro-mm-sig, Daniel Vetter

Hi!

I'm just looking over what's needed to implement drm Prime / dma-buf 
exports + imports in the vmwgfx driver. It seems like most dma-bufs ops 
are quite straightforward to implement except user-space mmap().

The reason being that vmwgfx dma-bufs will be using completely 
non-coherent memory, whenever there needs to be CPU accesses.

The accelerated contents resides in an opaque structure on the device 
into which we can DMA to and from, so for mmap to work we need to zap 
ptes and DMA to the device when doing something accelerated, and on the 
first page-fault DMA data back and wait for idle if the device did a 
write to the dma-buf.

Now this shouldn't really be a problem if dma-bufs were only used for 
cross-device sharing, but since people apparently want to use dma-buf 
file handles to share CPU data between processes it really becomes a 
serious problem.

Needless to say we'd want to limit the size of the DMAs, and have mmap 
users request regions for read, and mark regions dirty for write, 
something similar to gallium's texture transfer objects.

Any ideas?

/Thomas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-10-31 17:00 dma-buf non-coherent mmap Thomas Hellstrom
@ 2013-10-31 17:52 ` Rob Clark
  2013-10-31 20:40   ` Thomas Hellstrom
  0 siblings, 1 reply; 21+ messages in thread
From: Rob Clark @ 2013-10-31 17:52 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: linaro-mm-sig@lists.linaro.org, Daniel Vetter,
	dri-devel@lists.freedesktop.org

On Thu, Oct 31, 2013 at 1:00 PM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
> Hi!
>
> I'm just looking over what's needed to implement drm Prime / dma-buf exports
> + imports in the vmwgfx driver. It seems like most dma-bufs ops are quite
> straightforward to implement except user-space mmap().
>
> The reason being that vmwgfx dma-bufs will be using completely non-coherent
> memory, whenever there needs to be CPU accesses.
>
> The accelerated contents resides in an opaque structure on the device into
> which we can DMA to and from, so for mmap to work we need to zap ptes and
> DMA to the device when doing something accelerated, and on the first
> page-fault DMA data back and wait for idle if the device did a write to the
> dma-buf.
>
> Now this shouldn't really be a problem if dma-bufs were only used for
> cross-device sharing, but since people apparently want to use dma-buf file
> handles to share CPU data between processes it really becomes a serious
> problem.
>
> Needless to say we'd want to limit the size of the DMAs, and have mmap users
> request regions for read, and mark regions dirty for write, something
> similar to gallium's texture transfer objects.
>
> Any ideas?

well, I think vmwgfx is part of the reason we decided mmap would be
optional for dmabuf.  So perhaps it is an option to simply ignore
mmap?

BR,
-R

> /Thomas
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-10-31 17:52 ` Rob Clark
@ 2013-10-31 20:40   ` Thomas Hellstrom
  2013-10-31 20:48     ` Dave Airlie
  2013-10-31 21:10     ` Rob Clark
  0 siblings, 2 replies; 21+ messages in thread
From: Thomas Hellstrom @ 2013-10-31 20:40 UTC (permalink / raw)
  To: Rob Clark
  Cc: linaro-mm-sig@lists.linaro.org, Daniel Vetter,
	dri-devel@lists.freedesktop.org

On 10/31/2013 06:52 PM, Rob Clark wrote:
> On Thu, Oct 31, 2013 at 1:00 PM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
>> Hi!
>>
>> I'm just looking over what's needed to implement drm Prime / dma-buf exports
>> + imports in the vmwgfx driver. It seems like most dma-bufs ops are quite
>> straightforward to implement except user-space mmap().
>>
>> The reason being that vmwgfx dma-bufs will be using completely non-coherent
>> memory, whenever there needs to be CPU accesses.
>>
>> The accelerated contents resides in an opaque structure on the device into
>> which we can DMA to and from, so for mmap to work we need to zap ptes and
>> DMA to the device when doing something accelerated, and on the first
>> page-fault DMA data back and wait for idle if the device did a write to the
>> dma-buf.
>>
>> Now this shouldn't really be a problem if dma-bufs were only used for
>> cross-device sharing, but since people apparently want to use dma-buf file
>> handles to share CPU data between processes it really becomes a serious
>> problem.
>>
>> Needless to say we'd want to limit the size of the DMAs, and have mmap users
>> request regions for read, and mark regions dirty for write, something
>> similar to gallium's texture transfer objects.
>>
>> Any ideas?
> well, I think vmwgfx is part of the reason we decided mmap would be
> optional for dmabuf.  So perhaps it is an option to simply ignore
> mmap?
>
> BR,
> -R

Well, I'd be happy to avoid mmap, but then what does optional mean in 
this context?
That all generic user-space apps *must* implement a workaround if mmap 
isn't implemented?

It's unfortunate a bit like implicit synchronization mentioned in 
section 3) in Direct Userspace Access/mmap Support
in the kernel dma-buf doc: It should be avoided, otherwise it might be 
relied upon by userspace and exporters
not implementing it will suffer.

In reality, people will start using mmap() and won't care to implement 
workarounds if it's not supported, and drivers like
vmwgfx and non-coherent architectures will suffer.

I haven't looked closely at how DRI3 or Wayland/weston use or will use 
dma-buf, but if they rely on mmap, we're sort
of lost. MIR uses the following scheme:
1) Create a GBM buffer
2) Get a Prime handle, export to client
3) Client imports prime handle, casts into a GBM buffer which is 
typecast to a dumb buffer
4) Client uses DRM dumb mmap().   :(.

Now, step 3) and step 4) are clearly API violations, but it works by 
coincidence on major drivers.
Now this usage-pattern could easily (and IMHO should) be blocked in DRM 
by refusing to map dumb buffers
that haven't been created as dumb buffers, but If dma-bufs are to be 
shared by user-space apps to transfer
contents in this manner, we need a reliable replacement that works also 
for non-coherent situations.

Ideally I'd like to see tex[sub]image-like operations, but with dma-buf 
mmap in place, I'm afraid that's what will
be used...

Regars,
Thomas











>
>> /Thomas
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-10-31 20:40   ` Thomas Hellstrom
@ 2013-10-31 20:48     ` Dave Airlie
  2013-10-31 21:07       ` Thomas Hellstrom
  2013-10-31 21:10     ` Rob Clark
  1 sibling, 1 reply; 21+ messages in thread
From: Dave Airlie @ 2013-10-31 20:48 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: linaro-mm-sig@lists.linaro.org, Daniel Vetter,
	dri-devel@lists.freedesktop.org

On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
> On 10/31/2013 06:52 PM, Rob Clark wrote:
>>
>> On Thu, Oct 31, 2013 at 1:00 PM, Thomas Hellstrom <thellstrom@vmware.com>
>> wrote:
>>>
>>> Hi!
>>>
>>> I'm just looking over what's needed to implement drm Prime / dma-buf
>>> exports
>>> + imports in the vmwgfx driver. It seems like most dma-bufs ops are quite
>>> straightforward to implement except user-space mmap().
>>>
>>> The reason being that vmwgfx dma-bufs will be using completely
>>> non-coherent
>>> memory, whenever there needs to be CPU accesses.
>>>
>>> The accelerated contents resides in an opaque structure on the device
>>> into
>>> which we can DMA to and from, so for mmap to work we need to zap ptes and
>>> DMA to the device when doing something accelerated, and on the first
>>> page-fault DMA data back and wait for idle if the device did a write to
>>> the
>>> dma-buf.
>>>
>>> Now this shouldn't really be a problem if dma-bufs were only used for
>>> cross-device sharing, but since people apparently want to use dma-buf
>>> file
>>> handles to share CPU data between processes it really becomes a serious
>>> problem.
>>>
>>> Needless to say we'd want to limit the size of the DMAs, and have mmap
>>> users
>>> request regions for read, and mark regions dirty for write, something
>>> similar to gallium's texture transfer objects.
>>>
>>> Any ideas?
>>
>> well, I think vmwgfx is part of the reason we decided mmap would be
>> optional for dmabuf.  So perhaps it is an option to simply ignore
>> mmap?
>>
>> BR,
>> -R
>
>
> Well, I'd be happy to avoid mmap, but then what does optional mean in this
> context?
> That all generic user-space apps *must* implement a workaround if mmap isn't
> implemented?
>
> It's unfortunate a bit like implicit synchronization mentioned in section 3)
> in Direct Userspace Access/mmap Support
> in the kernel dma-buf doc: It should be avoided, otherwise it might be
> relied upon by userspace and exporters
> not implementing it will suffer.
>
> In reality, people will start using mmap() and won't care to implement
> workarounds if it's not supported, and drivers like
> vmwgfx and non-coherent architectures will suffer.
>
> I haven't looked closely at how DRI3 or Wayland/weston use or will use
> dma-buf, but if they rely on mmap, we're sort
> of lost. MIR uses the following scheme:

DRI3 and wayland won't use dma-buf mmap directly,

using dma-buf mmap directly is wrong for anything that shares objects
with itself.

I personally wish we hadn't added mmap support to dma-buf at all, but
some people
had some use cases that they'll never implement.

If you export a dma-buf to be used by a client it should be using
drivers on the client
to import the dma-buf and then it should be using mesa.

Dave.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-10-31 20:48     ` Dave Airlie
@ 2013-10-31 21:07       ` Thomas Hellstrom
  2013-10-31 23:00         ` Daniel Vetter
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Hellstrom @ 2013-10-31 21:07 UTC (permalink / raw)
  To: Dave Airlie
  Cc: linaro-mm-sig@lists.linaro.org, Daniel Vetter,
	dri-devel@lists.freedesktop.org

On 10/31/2013 09:48 PM, Dave Airlie wrote:
> On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
>> On 10/31/2013 06:52 PM, Rob Clark wrote:
>>> On Thu, Oct 31, 2013 at 1:00 PM, Thomas Hellstrom <thellstrom@vmware.com>
>>> wrote:
>>>> Hi!
>>>>
>>>> I'm just looking over what's needed to implement drm Prime / dma-buf
>>>> exports
>>>> + imports in the vmwgfx driver. It seems like most dma-bufs ops are quite
>>>> straightforward to implement except user-space mmap().
>>>>
>>>> The reason being that vmwgfx dma-bufs will be using completely
>>>> non-coherent
>>>> memory, whenever there needs to be CPU accesses.
>>>>
>>>> The accelerated contents resides in an opaque structure on the device
>>>> into
>>>> which we can DMA to and from, so for mmap to work we need to zap ptes and
>>>> DMA to the device when doing something accelerated, and on the first
>>>> page-fault DMA data back and wait for idle if the device did a write to
>>>> the
>>>> dma-buf.
>>>>
>>>> Now this shouldn't really be a problem if dma-bufs were only used for
>>>> cross-device sharing, but since people apparently want to use dma-buf
>>>> file
>>>> handles to share CPU data between processes it really becomes a serious
>>>> problem.
>>>>
>>>> Needless to say we'd want to limit the size of the DMAs, and have mmap
>>>> users
>>>> request regions for read, and mark regions dirty for write, something
>>>> similar to gallium's texture transfer objects.
>>>>
>>>> Any ideas?
>>> well, I think vmwgfx is part of the reason we decided mmap would be
>>> optional for dmabuf.  So perhaps it is an option to simply ignore
>>> mmap?
>>>
>>> BR,
>>> -R
>>
>> Well, I'd be happy to avoid mmap, but then what does optional mean in this
>> context?
>> That all generic user-space apps *must* implement a workaround if mmap isn't
>> implemented?
>>
>> It's unfortunate a bit like implicit synchronization mentioned in section 3)
>> in Direct Userspace Access/mmap Support
>> in the kernel dma-buf doc: It should be avoided, otherwise it might be
>> relied upon by userspace and exporters
>> not implementing it will suffer.
>>
>> In reality, people will start using mmap() and won't care to implement
>> workarounds if it's not supported, and drivers like
>> vmwgfx and non-coherent architectures will suffer.
>>
>> I haven't looked closely at how DRI3 or Wayland/weston use or will use
>> dma-buf, but if they rely on mmap, we're sort
>> of lost. MIR uses the following scheme:
> DRI3 and wayland won't use dma-buf mmap directly,
>
> using dma-buf mmap directly is wrong for anything that shares objects
> with itself.

That sounds good to hear. Perhaps we should add that to the dma-buf docs.

> I personally wish we hadn't added mmap support to dma-buf at all, but
> some people
> had some use cases that they'll never implement.
>
> If you export a dma-buf to be used by a client it should be using
> drivers on the client
> to import the dma-buf and then it should be using mesa.
Agreed.

/Thomas

>
> Dave

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-10-31 20:40   ` Thomas Hellstrom
  2013-10-31 20:48     ` Dave Airlie
@ 2013-10-31 21:10     ` Rob Clark
  2013-10-31 21:38       ` Thomas Hellstrom
  1 sibling, 1 reply; 21+ messages in thread
From: Rob Clark @ 2013-10-31 21:10 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: linaro-mm-sig@lists.linaro.org, Daniel Vetter,
	dri-devel@lists.freedesktop.org

On Thu, Oct 31, 2013 at 4:40 PM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
> On 10/31/2013 06:52 PM, Rob Clark wrote:
>>
>> On Thu, Oct 31, 2013 at 1:00 PM, Thomas Hellstrom <thellstrom@vmware.com>
>> wrote:
>>>
>>> Hi!
>>>
>>> I'm just looking over what's needed to implement drm Prime / dma-buf
>>> exports
>>> + imports in the vmwgfx driver. It seems like most dma-bufs ops are quite
>>> straightforward to implement except user-space mmap().
>>>
>>> The reason being that vmwgfx dma-bufs will be using completely
>>> non-coherent
>>> memory, whenever there needs to be CPU accesses.
>>>
>>> The accelerated contents resides in an opaque structure on the device
>>> into
>>> which we can DMA to and from, so for mmap to work we need to zap ptes and
>>> DMA to the device when doing something accelerated, and on the first
>>> page-fault DMA data back and wait for idle if the device did a write to
>>> the
>>> dma-buf.
>>>
>>> Now this shouldn't really be a problem if dma-bufs were only used for
>>> cross-device sharing, but since people apparently want to use dma-buf
>>> file
>>> handles to share CPU data between processes it really becomes a serious
>>> problem.
>>>
>>> Needless to say we'd want to limit the size of the DMAs, and have mmap
>>> users
>>> request regions for read, and mark regions dirty for write, something
>>> similar to gallium's texture transfer objects.
>>>
>>> Any ideas?
>>
>> well, I think vmwgfx is part of the reason we decided mmap would be
>> optional for dmabuf.  So perhaps it is an option to simply ignore
>> mmap?
>>
>> BR,
>> -R
>
>
> Well, I'd be happy to avoid mmap, but then what does optional mean in this
> context?
> That all generic user-space apps *must* implement a workaround if mmap isn't
> implemented?

well, mmap was mostly just added because it was needed by android for
ION on dmabuf.

I think it is an option to just not use dmabuf mmap in a linux
userspace.  I mean, we could also define some ioctls to give us pwrite
and other similar sort of functionality if it is needed.

BR,
-R

> It's unfortunate a bit like implicit synchronization mentioned in section 3)
> in Direct Userspace Access/mmap Support
> in the kernel dma-buf doc: It should be avoided, otherwise it might be
> relied upon by userspace and exporters
> not implementing it will suffer.
>
> In reality, people will start using mmap() and won't care to implement
> workarounds if it's not supported, and drivers like
> vmwgfx and non-coherent architectures will suffer.
>
> I haven't looked closely at how DRI3 or Wayland/weston use or will use
> dma-buf, but if they rely on mmap, we're sort
> of lost. MIR uses the following scheme:
> 1) Create a GBM buffer
> 2) Get a Prime handle, export to client
> 3) Client imports prime handle, casts into a GBM buffer which is typecast to
> a dumb buffer
> 4) Client uses DRM dumb mmap().   :(.
>
> Now, step 3) and step 4) are clearly API violations, but it works by
> coincidence on major drivers.
> Now this usage-pattern could easily (and IMHO should) be blocked in DRM by
> refusing to map dumb buffers
> that haven't been created as dumb buffers, but If dma-bufs are to be shared
> by user-space apps to transfer
> contents in this manner, we need a reliable replacement that works also for
> non-coherent situations.
>
> Ideally I'd like to see tex[sub]image-like operations, but with dma-buf mmap
> in place, I'm afraid that's what will
> be used...
>
> Regars,
> Thomas
>
>
>
>
>
>
>
>
>
>
>
>
>>
>>> /Thomas
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-10-31 21:10     ` Rob Clark
@ 2013-10-31 21:38       ` Thomas Hellstrom
  2013-10-31 22:43         ` [Linaro-mm-sig] " Benjamin Gaignard
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Hellstrom @ 2013-10-31 21:38 UTC (permalink / raw)
  To: Rob Clark
  Cc: linaro-mm-sig@lists.linaro.org, Daniel Vetter, Thomas Hellstrom,
	dri-devel@lists.freedesktop.org

On 10/31/2013 10:10 PM, Rob Clark wrote:
> On Thu, Oct 31, 2013 at 4:40 PM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
>> On 10/31/2013 06:52 PM, Rob Clark wrote:
>>> On Thu, Oct 31, 2013 at 1:00 PM, Thomas Hellstrom <thellstrom@vmware.com>
>>> wrote:
>>>> Hi!
>>>>
>>>> I'm just looking over what's needed to implement drm Prime / dma-buf
>>>> exports
>>>> + imports in the vmwgfx driver. It seems like most dma-bufs ops are quite
>>>> straightforward to implement except user-space mmap().
>>>>
>>>> The reason being that vmwgfx dma-bufs will be using completely
>>>> non-coherent
>>>> memory, whenever there needs to be CPU accesses.
>>>>
>>>> The accelerated contents resides in an opaque structure on the device
>>>> into
>>>> which we can DMA to and from, so for mmap to work we need to zap ptes and
>>>> DMA to the device when doing something accelerated, and on the first
>>>> page-fault DMA data back and wait for idle if the device did a write to
>>>> the
>>>> dma-buf.
>>>>
>>>> Now this shouldn't really be a problem if dma-bufs were only used for
>>>> cross-device sharing, but since people apparently want to use dma-buf
>>>> file
>>>> handles to share CPU data between processes it really becomes a serious
>>>> problem.
>>>>
>>>> Needless to say we'd want to limit the size of the DMAs, and have mmap
>>>> users
>>>> request regions for read, and mark regions dirty for write, something
>>>> similar to gallium's texture transfer objects.
>>>>
>>>> Any ideas?
>>> well, I think vmwgfx is part of the reason we decided mmap would be
>>> optional for dmabuf.  So perhaps it is an option to simply ignore
>>> mmap?
>>>
>>> BR,
>>> -R
>>
>> Well, I'd be happy to avoid mmap, but then what does optional mean in this
>> context?
>> That all generic user-space apps *must* implement a workaround if mmap isn't
>> implemented?
> well, mmap was mostly just added because it was needed by android for
> ION on dmabuf.
>
> I think it is an option to just not use dmabuf mmap in a linux
> userspace.  I mean, we could also define some ioctls to give us pwrite
> and other similar sort of functionality if it is needed.

I think that if direct user-space cpu-access to dma-buf is ever needed 
by linux,
something like that is a better option. Best IMHO would be to avoid 
user-space
cpu-access completely.

Regards,
/Thomas


>
> BR,
> -R
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Linaro-mm-sig] dma-buf non-coherent mmap
  2013-10-31 21:38       ` Thomas Hellstrom
@ 2013-10-31 22:43         ` Benjamin Gaignard
  0 siblings, 0 replies; 21+ messages in thread
From: Benjamin Gaignard @ 2013-10-31 22:43 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: linaro-mm-sig@lists.linaro.org, dri-devel@lists.freedesktop.org

user-space cpu access to dma-buf is needed for example in Gstreamer
pipeline when you have mix of harware element (a video decoder) and
software element (like colorspace converter).

Gstreamer already support dma-buf in a specific gstallocator:
http://cgit.freedesktop.org/gstreamer/gst-plugins-base/tree/gst-libs/gst/allocators/gstdmabuf.c

Regards,
Benjamin

2013/10/31 Thomas Hellstrom <thellstrom@vmware.com>:
> On 10/31/2013 10:10 PM, Rob Clark wrote:
>>
>> On Thu, Oct 31, 2013 at 4:40 PM, Thomas Hellstrom <thellstrom@vmware.com>
>> wrote:
>>>
>>> On 10/31/2013 06:52 PM, Rob Clark wrote:
>>>>
>>>> On Thu, Oct 31, 2013 at 1:00 PM, Thomas Hellstrom
>>>> <thellstrom@vmware.com>
>>>> wrote:
>>>>>
>>>>> Hi!
>>>>>
>>>>> I'm just looking over what's needed to implement drm Prime / dma-buf
>>>>> exports
>>>>> + imports in the vmwgfx driver. It seems like most dma-bufs ops are
>>>>> quite
>>>>> straightforward to implement except user-space mmap().
>>>>>
>>>>> The reason being that vmwgfx dma-bufs will be using completely
>>>>> non-coherent
>>>>> memory, whenever there needs to be CPU accesses.
>>>>>
>>>>> The accelerated contents resides in an opaque structure on the device
>>>>> into
>>>>> which we can DMA to and from, so for mmap to work we need to zap ptes
>>>>> and
>>>>> DMA to the device when doing something accelerated, and on the first
>>>>> page-fault DMA data back and wait for idle if the device did a write to
>>>>> the
>>>>> dma-buf.
>>>>>
>>>>> Now this shouldn't really be a problem if dma-bufs were only used for
>>>>> cross-device sharing, but since people apparently want to use dma-buf
>>>>> file
>>>>> handles to share CPU data between processes it really becomes a serious
>>>>> problem.
>>>>>
>>>>> Needless to say we'd want to limit the size of the DMAs, and have mmap
>>>>> users
>>>>> request regions for read, and mark regions dirty for write, something
>>>>> similar to gallium's texture transfer objects.
>>>>>
>>>>> Any ideas?
>>>>
>>>> well, I think vmwgfx is part of the reason we decided mmap would be
>>>> optional for dmabuf.  So perhaps it is an option to simply ignore
>>>> mmap?
>>>>
>>>> BR,
>>>> -R
>>>
>>>
>>> Well, I'd be happy to avoid mmap, but then what does optional mean in
>>> this
>>> context?
>>> That all generic user-space apps *must* implement a workaround if mmap
>>> isn't
>>> implemented?
>>
>> well, mmap was mostly just added because it was needed by android for
>> ION on dmabuf.
>>
>> I think it is an option to just not use dmabuf mmap in a linux
>> userspace.  I mean, we could also define some ioctls to give us pwrite
>> and other similar sort of functionality if it is needed.
>
>
> I think that if direct user-space cpu-access to dma-buf is ever needed by
> linux,
> something like that is a better option. Best IMHO would be to avoid
> user-space
> cpu-access completely.
>
> Regards,
> /Thomas
>
>
>>
>> BR,
>> -R
>>
>
> _______________________________________________
> Linaro-mm-sig mailing list
> Linaro-mm-sig@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-mm-sig



-- 
Benjamin Gaignard

Graphic Working Group

Linaro.org │ Open source software for ARM SoCs

Follow Linaro: Facebook | Twitter | Blog
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-10-31 21:07       ` Thomas Hellstrom
@ 2013-10-31 23:00         ` Daniel Vetter
  2013-11-01  0:17           ` Jakob Bornecrantz
  0 siblings, 1 reply; 21+ messages in thread
From: Daniel Vetter @ 2013-10-31 23:00 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: linaro-mm-sig@lists.linaro.org, Daniel Vetter,
	dri-devel@lists.freedesktop.org

On Thu, Oct 31, 2013 at 10:07:25PM +0100, Thomas Hellstrom wrote:
> On 10/31/2013 09:48 PM, Dave Airlie wrote:
> >On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
> >>Well, I'd be happy to avoid mmap, but then what does optional mean in this
> >>context?
> >>That all generic user-space apps *must* implement a workaround if mmap isn't
> >>implemented?
> >>
> >>It's unfortunate a bit like implicit synchronization mentioned in section 3)
> >>in Direct Userspace Access/mmap Support
> >>in the kernel dma-buf doc: It should be avoided, otherwise it might be
> >>relied upon by userspace and exporters
> >>not implementing it will suffer.
> >>
> >>In reality, people will start using mmap() and won't care to implement
> >>workarounds if it's not supported, and drivers like
> >>vmwgfx and non-coherent architectures will suffer.
> >>
> >>I haven't looked closely at how DRI3 or Wayland/weston use or will use
> >>dma-buf, but if they rely on mmap, we're sort
> >>of lost. MIR uses the following scheme:
> >DRI3 and wayland won't use dma-buf mmap directly,
> >
> >using dma-buf mmap directly is wrong for anything that shares objects
> >with itself.
> 
> That sounds good to hear. Perhaps we should add that to the dma-buf docs.

Userspace mmap was essentially added as a concession to the android ion
guys since they really, really wanted it. We've tried to tell them that
it's a horrible idea (see all the fun with coherency and syncing), but
they said that they have userspace for it already and so we let it be.

Imo if you're not running Android userspace there's no need for this at
all.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-10-31 23:00         ` Daniel Vetter
@ 2013-11-01  0:17           ` Jakob Bornecrantz
  2013-11-01  0:25             ` Rob Clark
  0 siblings, 1 reply; 21+ messages in thread
From: Jakob Bornecrantz @ 2013-11-01  0:17 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linaro-mm-sig@lists.linaro.org, Daniel Vetter, Thomas Hellstrom,
	dri-devel@lists.freedesktop.org

On Fri, Nov 1, 2013 at 12:00 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Oct 31, 2013 at 10:07:25PM +0100, Thomas Hellstrom wrote:
>> On 10/31/2013 09:48 PM, Dave Airlie wrote:
>> >On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
>> >>Well, I'd be happy to avoid mmap, but then what does optional mean in this
>> >>context?
>> >>That all generic user-space apps *must* implement a workaround if mmap isn't
>> >>implemented?
>> >>
>> >>It's unfortunate a bit like implicit synchronization mentioned in section 3)
>> >>in Direct Userspace Access/mmap Support
>> >>in the kernel dma-buf doc: It should be avoided, otherwise it might be
>> >>relied upon by userspace and exporters
>> >>not implementing it will suffer.
>> >>
>> >>In reality, people will start using mmap() and won't care to implement
>> >>workarounds if it's not supported, and drivers like
>> >>vmwgfx and non-coherent architectures will suffer.
>> >>
>> >>I haven't looked closely at how DRI3 or Wayland/weston use or will use
>> >>dma-buf, but if they rely on mmap, we're sort
>> >>of lost. MIR uses the following scheme:
>> >DRI3 and wayland won't use dma-buf mmap directly,
>> >
>> >using dma-buf mmap directly is wrong for anything that shares objects
>> >with itself.
>>
>> That sounds good to hear. Perhaps we should add that to the dma-buf docs.
>
> Userspace mmap was essentially added as a concession to the android ion
> guys since they really, really wanted it. We've tried to tell them that
> it's a horrible idea (see all the fun with coherency and syncing), but
> they said that they have userspace for it already and so we let it be.
>
> Imo if you're not running Android userspace there's no need for this at
> all.

But now it turns out that gstreamer is using it and our life is hell. We should
have made it not work for _any_ driver if CONFIG_ANDRIOD wasn't set.

Cheers, Jakob.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-11-01  0:17           ` Jakob Bornecrantz
@ 2013-11-01  0:25             ` Rob Clark
  2013-11-01  0:37               ` Jakob Bornecrantz
  0 siblings, 1 reply; 21+ messages in thread
From: Rob Clark @ 2013-11-01  0:25 UTC (permalink / raw)
  To: Jakob Bornecrantz
  Cc: linaro-mm-sig@lists.linaro.org, Daniel Vetter, Thomas Hellstrom,
	dri-devel@lists.freedesktop.org

On Thu, Oct 31, 2013 at 8:17 PM, Jakob Bornecrantz <wallbraker@gmail.com> wrote:
> On Fri, Nov 1, 2013 at 12:00 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Thu, Oct 31, 2013 at 10:07:25PM +0100, Thomas Hellstrom wrote:
>>> On 10/31/2013 09:48 PM, Dave Airlie wrote:
>>> >On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
>>> >>Well, I'd be happy to avoid mmap, but then what does optional mean in this
>>> >>context?
>>> >>That all generic user-space apps *must* implement a workaround if mmap isn't
>>> >>implemented?
>>> >>
>>> >>It's unfortunate a bit like implicit synchronization mentioned in section 3)
>>> >>in Direct Userspace Access/mmap Support
>>> >>in the kernel dma-buf doc: It should be avoided, otherwise it might be
>>> >>relied upon by userspace and exporters
>>> >>not implementing it will suffer.
>>> >>
>>> >>In reality, people will start using mmap() and won't care to implement
>>> >>workarounds if it's not supported, and drivers like
>>> >>vmwgfx and non-coherent architectures will suffer.
>>> >>
>>> >>I haven't looked closely at how DRI3 or Wayland/weston use or will use
>>> >>dma-buf, but if they rely on mmap, we're sort
>>> >>of lost. MIR uses the following scheme:
>>> >DRI3 and wayland won't use dma-buf mmap directly,
>>> >
>>> >using dma-buf mmap directly is wrong for anything that shares objects
>>> >with itself.
>>>
>>> That sounds good to hear. Perhaps we should add that to the dma-buf docs.
>>
>> Userspace mmap was essentially added as a concession to the android ion
>> guys since they really, really wanted it. We've tried to tell them that
>> it's a horrible idea (see all the fun with coherency and syncing), but
>> they said that they have userspace for it already and so we let it be.
>>
>> Imo if you're not running Android userspace there's no need for this at
>> all.
>
> But now it turns out that gstreamer is using it and our life is hell. We should
> have made it not work for _any_ driver if CONFIG_ANDRIOD wasn't set.

well, at the moment mmap is only implemented by a few of the arm
drivers, from the looks of it.  So I guess in the near term it is
mostly going to be of interest to the SoC crowd.  Not sure about the
long term, perhaps we should see about turning gallium
pipe_transfer/map stuff into a kernel interface (ioctl's directly on
the dmabuf fd, perhaps?)

BR,
-R

> Cheers, Jakob.
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-11-01  0:25             ` Rob Clark
@ 2013-11-01  0:37               ` Jakob Bornecrantz
  2013-11-01  0:57                 ` Rob Clark
  0 siblings, 1 reply; 21+ messages in thread
From: Jakob Bornecrantz @ 2013-11-01  0:37 UTC (permalink / raw)
  To: Rob Clark
  Cc: linaro-mm-sig@lists.linaro.org, Daniel Vetter, Thomas Hellstrom,
	dri-devel@lists.freedesktop.org

On Fri, Nov 1, 2013 at 1:25 AM, Rob Clark <robdclark@gmail.com> wrote:
> On Thu, Oct 31, 2013 at 8:17 PM, Jakob Bornecrantz <wallbraker@gmail.com> wrote:
>> On Fri, Nov 1, 2013 at 12:00 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>>> On Thu, Oct 31, 2013 at 10:07:25PM +0100, Thomas Hellstrom wrote:
>>>> On 10/31/2013 09:48 PM, Dave Airlie wrote:
>>>> >On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
>>>> >>Well, I'd be happy to avoid mmap, but then what does optional mean in this
>>>> >>context?
>>>> >>That all generic user-space apps *must* implement a workaround if mmap isn't
>>>> >>implemented?
>>>> >>
>>>> >>It's unfortunate a bit like implicit synchronization mentioned in section 3)
>>>> >>in Direct Userspace Access/mmap Support
>>>> >>in the kernel dma-buf doc: It should be avoided, otherwise it might be
>>>> >>relied upon by userspace and exporters
>>>> >>not implementing it will suffer.
>>>> >>
>>>> >>In reality, people will start using mmap() and won't care to implement
>>>> >>workarounds if it's not supported, and drivers like
>>>> >>vmwgfx and non-coherent architectures will suffer.
>>>> >>
>>>> >>I haven't looked closely at how DRI3 or Wayland/weston use or will use
>>>> >>dma-buf, but if they rely on mmap, we're sort
>>>> >>of lost. MIR uses the following scheme:
>>>> >DRI3 and wayland won't use dma-buf mmap directly,
>>>> >
>>>> >using dma-buf mmap directly is wrong for anything that shares objects
>>>> >with itself.
>>>>
>>>> That sounds good to hear. Perhaps we should add that to the dma-buf docs.
>>>
>>> Userspace mmap was essentially added as a concession to the android ion
>>> guys since they really, really wanted it. We've tried to tell them that
>>> it's a horrible idea (see all the fun with coherency and syncing), but
>>> they said that they have userspace for it already and so we let it be.
>>>
>>> Imo if you're not running Android userspace there's no need for this at
>>> all.
>>
>> But now it turns out that gstreamer is using it and our life is hell. We should
>> have made it not work for _any_ driver if CONFIG_ANDRIOD wasn't set.
>
> well, at the moment mmap is only implemented by a few of the arm
> drivers, from the looks of it.  So I guess in the near term it is
> mostly going to be of interest to the SoC crowd.

Ah okay, I thought this was desktop gstreamer.

> Not sure about the long term, perhaps we should see about
> turning gallium pipe_transfer/map stuff into a kernel interface
> (ioctl's directly on the dmabuf fd, perhaps?)

Or they could just create a OpenGL context, I know it sounds heavy weight.
But somebody will eventually want to synchronize this with a different client
API in a non-blocking way. And OpenGL or some other Khronos API already
have that integration, no real need to reinvent the wheel. Then again I think
both GBM and DRI have hooks for mapping and unmapping buffers, so those
might suffice.

Cheers, Jakob.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-11-01  0:37               ` Jakob Bornecrantz
@ 2013-11-01  0:57                 ` Rob Clark
  2013-11-01 10:03                   ` Lucas Stach
  0 siblings, 1 reply; 21+ messages in thread
From: Rob Clark @ 2013-11-01  0:57 UTC (permalink / raw)
  To: Jakob Bornecrantz
  Cc: linaro-mm-sig@lists.linaro.org, Daniel Vetter, Thomas Hellstrom,
	dri-devel@lists.freedesktop.org

On Thu, Oct 31, 2013 at 8:37 PM, Jakob Bornecrantz <wallbraker@gmail.com> wrote:
> On Fri, Nov 1, 2013 at 1:25 AM, Rob Clark <robdclark@gmail.com> wrote:
>> On Thu, Oct 31, 2013 at 8:17 PM, Jakob Bornecrantz <wallbraker@gmail.com> wrote:
>>> On Fri, Nov 1, 2013 at 12:00 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
>>>> On Thu, Oct 31, 2013 at 10:07:25PM +0100, Thomas Hellstrom wrote:
>>>>> On 10/31/2013 09:48 PM, Dave Airlie wrote:
>>>>> >On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
>>>>> >>Well, I'd be happy to avoid mmap, but then what does optional mean in this
>>>>> >>context?
>>>>> >>That all generic user-space apps *must* implement a workaround if mmap isn't
>>>>> >>implemented?
>>>>> >>
>>>>> >>It's unfortunate a bit like implicit synchronization mentioned in section 3)
>>>>> >>in Direct Userspace Access/mmap Support
>>>>> >>in the kernel dma-buf doc: It should be avoided, otherwise it might be
>>>>> >>relied upon by userspace and exporters
>>>>> >>not implementing it will suffer.
>>>>> >>
>>>>> >>In reality, people will start using mmap() and won't care to implement
>>>>> >>workarounds if it's not supported, and drivers like
>>>>> >>vmwgfx and non-coherent architectures will suffer.
>>>>> >>
>>>>> >>I haven't looked closely at how DRI3 or Wayland/weston use or will use
>>>>> >>dma-buf, but if they rely on mmap, we're sort
>>>>> >>of lost. MIR uses the following scheme:
>>>>> >DRI3 and wayland won't use dma-buf mmap directly,
>>>>> >
>>>>> >using dma-buf mmap directly is wrong for anything that shares objects
>>>>> >with itself.
>>>>>
>>>>> That sounds good to hear. Perhaps we should add that to the dma-buf docs.
>>>>
>>>> Userspace mmap was essentially added as a concession to the android ion
>>>> guys since they really, really wanted it. We've tried to tell them that
>>>> it's a horrible idea (see all the fun with coherency and syncing), but
>>>> they said that they have userspace for it already and so we let it be.
>>>>
>>>> Imo if you're not running Android userspace there's no need for this at
>>>> all.
>>>
>>> But now it turns out that gstreamer is using it and our life is hell. We should
>>> have made it not work for _any_ driver if CONFIG_ANDRIOD wasn't set.
>>
>> well, at the moment mmap is only implemented by a few of the arm
>> drivers, from the looks of it.  So I guess in the near term it is
>> mostly going to be of interest to the SoC crowd.
>
> Ah okay, I thought this was desktop gstreamer.
>
>> Not sure about the long term, perhaps we should see about
>> turning gallium pipe_transfer/map stuff into a kernel interface
>> (ioctl's directly on the dmabuf fd, perhaps?)
>
> Or they could just create a OpenGL context, I know it sounds heavy weight.
> But somebody will eventually want to synchronize this with a different client
> API in a non-blocking way. And OpenGL or some other Khronos API already
> have that integration, no real need to reinvent the wheel. Then again I think
> both GBM and DRI have hooks for mapping and unmapping buffers, so those
> might suffice.

GL(ES) context might be a bit of a PITA, at least until open src mesa
drivers are a bit more common in arm world.  But not really against a
userspace API.. wonder how hard that would be to stuff in GBM?  At
least that is a simple enough API that some sort of null-gbm "driver"
could be made for folks who don't have GL(ES)..

BR,
-R

> Cheers, Jakob.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-11-01  0:57                 ` Rob Clark
@ 2013-11-01 10:03                   ` Lucas Stach
  2013-11-01 10:17                     ` Daniel Vetter
  0 siblings, 1 reply; 21+ messages in thread
From: Lucas Stach @ 2013-11-01 10:03 UTC (permalink / raw)
  To: Rob Clark
  Cc: linaro-mm-sig@lists.linaro.org, Daniel Vetter, Thomas Hellstrom,
	dri-devel@lists.freedesktop.org

Am Donnerstag, den 31.10.2013, 20:57 -0400 schrieb Rob Clark:
> On Thu, Oct 31, 2013 at 8:37 PM, Jakob Bornecrantz <wallbraker@gmail.com> wrote:
> > On Fri, Nov 1, 2013 at 1:25 AM, Rob Clark <robdclark@gmail.com> wrote:
> >> On Thu, Oct 31, 2013 at 8:17 PM, Jakob Bornecrantz <wallbraker@gmail.com> wrote:
> >>> On Fri, Nov 1, 2013 at 12:00 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>> On Thu, Oct 31, 2013 at 10:07:25PM +0100, Thomas Hellstrom wrote:
> >>>>> On 10/31/2013 09:48 PM, Dave Airlie wrote:
> >>>>> >On Fri, Nov 1, 2013 at 6:40 AM, Thomas Hellstrom <thellstrom@vmware.com> wrote:
> >>>>> >>Well, I'd be happy to avoid mmap, but then what does optional mean in this
> >>>>> >>context?
> >>>>> >>That all generic user-space apps *must* implement a workaround if mmap isn't
> >>>>> >>implemented?
> >>>>> >>
> >>>>> >>It's unfortunate a bit like implicit synchronization mentioned in section 3)
> >>>>> >>in Direct Userspace Access/mmap Support
> >>>>> >>in the kernel dma-buf doc: It should be avoided, otherwise it might be
> >>>>> >>relied upon by userspace and exporters
> >>>>> >>not implementing it will suffer.
> >>>>> >>
> >>>>> >>In reality, people will start using mmap() and won't care to implement
> >>>>> >>workarounds if it's not supported, and drivers like
> >>>>> >>vmwgfx and non-coherent architectures will suffer.
> >>>>> >>
> >>>>> >>I haven't looked closely at how DRI3 or Wayland/weston use or will use
> >>>>> >>dma-buf, but if they rely on mmap, we're sort
> >>>>> >>of lost. MIR uses the following scheme:
> >>>>> >DRI3 and wayland won't use dma-buf mmap directly,
> >>>>> >
> >>>>> >using dma-buf mmap directly is wrong for anything that shares objects
> >>>>> >with itself.
> >>>>>
> >>>>> That sounds good to hear. Perhaps we should add that to the dma-buf docs.
> >>>>
> >>>> Userspace mmap was essentially added as a concession to the android ion
> >>>> guys since they really, really wanted it. We've tried to tell them that
> >>>> it's a horrible idea (see all the fun with coherency and syncing), but
> >>>> they said that they have userspace for it already and so we let it be.
> >>>>
> >>>> Imo if you're not running Android userspace there's no need for this at
> >>>> all.
> >>>
> >>> But now it turns out that gstreamer is using it and our life is hell. We should
> >>> have made it not work for _any_ driver if CONFIG_ANDRIOD wasn't set.
> >>
> >> well, at the moment mmap is only implemented by a few of the arm
> >> drivers, from the looks of it.  So I guess in the near term it is
> >> mostly going to be of interest to the SoC crowd.
> >
> > Ah okay, I thought this was desktop gstreamer.
> >

GStreamer needs _some_ way of accessing the buffer contents with the
CPU, this doesn't necessarily have to be the dumb mmap we have right
now.
DMA-BUF support in GSt is not really finished (I know we have a lot of
patches internally to make it actually work with anything, which are
trickling upstream right now), so this might be a good time to hammer
out how it should be done, so we can adapt GSt before people start to
rely on the dma-buf fallback mmap.

I would think the bracketed mmap idea that was thrown around by Rob some
time ago when the mmap topic first surfaced is simple enough that we
don't need additional userspace helpers and should be enough to make the
coherency issue disappear.

> >> Not sure about the long term, perhaps we should see about
> >> turning gallium pipe_transfer/map stuff into a kernel interface
> >> (ioctl's directly on the dmabuf fd, perhaps?)
> >
> > Or they could just create a OpenGL context, I know it sounds heavy weight.
> > But somebody will eventually want to synchronize this with a different client
> > API in a non-blocking way. And OpenGL or some other Khronos API already
> > have that integration, no real need to reinvent the wheel. Then again I think
> > both GBM and DRI have hooks for mapping and unmapping buffers, so those
> > might suffice.
> 
> GL(ES) context might be a bit of a PITA, at least until open src mesa
> drivers are a bit more common in arm world.  But not really against a
> userspace API.. wonder how hard that would be to stuff in GBM?  At
> least that is a simple enough API that some sort of null-gbm "driver"
> could be made for folks who don't have GL(ES)..
> 
I already see the need for a dumb GBM driver for importing buffers into
plain KMS drivers, so we can stick them on a plane or something. The
only thing that crashes in here is GBM being a part of MESA. We
certainly don't want GStreamer to depend on MESA just to access some
buffers the right way.

Regards,
Lucas

-- 
Pengutronix e.K.                           | Lucas Stach                 |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-11-01 10:03                   ` Lucas Stach
@ 2013-11-01 10:17                     ` Daniel Vetter
  2013-11-01 13:22                       ` Rob Clark
  2013-11-01 13:54                       ` Thomas Hellstrom
  0 siblings, 2 replies; 21+ messages in thread
From: Daniel Vetter @ 2013-11-01 10:17 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Thomas Hellstrom, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org

On Fri, Nov 1, 2013 at 11:03 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
> GStreamer needs _some_ way of accessing the buffer contents with the
> CPU, this doesn't necessarily have to be the dumb mmap we have right
> now.
> DMA-BUF support in GSt is not really finished (I know we have a lot of
> patches internally to make it actually work with anything, which are
> trickling upstream right now), so this might be a good time to hammer
> out how it should be done, so we can adapt GSt before people start to
> rely on the dma-buf fallback mmap.
>
> I would think the bracketed mmap idea that was thrown around by Rob some
> time ago when the mmap topic first surfaced is simple enough that we
> don't need additional userspace helpers and should be enough to make the
> coherency issue disappear.

Yeah, if we add a BEGIN_MMAP/END_MMAP ioctl or so to the dma-buf mmap
stuff and simply demand that userspace calls that we'd have something
half-sane at least. Since we currently don't really have any real
users we could still do this abi change I think.

One open issue is whether the BEGIN_MMAP ioctl should also synchronize
with any outstanding gpu access. Imo it should, since that's pretty
much how all the current drm drivers work, but maybe we should reserve
space for a few flags so that this can be extended later on - Android
seems to be pretty insistent on using explicit syncpoints everywhere
instead of implicit synchronization. Or maybe we should have the flag
already, but reject implicit syncing until Maarten's dma_fence stuff
is in and enabled, dunno. Adding him to cc.

The clear thing otoh is that these two ioctls should only serve as
barriers, not as locks as has been proposed in some other RFCs
floating around (iirc the one from exynos was borked in this fashion).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-11-01 10:17                     ` Daniel Vetter
@ 2013-11-01 13:22                       ` Rob Clark
  2013-11-01 13:32                         ` Lucas Stach
  2013-11-01 13:54                       ` Thomas Hellstrom
  1 sibling, 1 reply; 21+ messages in thread
From: Rob Clark @ 2013-11-01 13:22 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Thomas Hellstrom, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org

On Fri, Nov 1, 2013 at 6:17 AM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> On Fri, Nov 1, 2013 at 11:03 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> GStreamer needs _some_ way of accessing the buffer contents with the
>> CPU, this doesn't necessarily have to be the dumb mmap we have right
>> now.
>> DMA-BUF support in GSt is not really finished (I know we have a lot of
>> patches internally to make it actually work with anything, which are
>> trickling upstream right now), so this might be a good time to hammer
>> out how it should be done, so we can adapt GSt before people start to
>> rely on the dma-buf fallback mmap.
>>
>> I would think the bracketed mmap idea that was thrown around by Rob some
>> time ago when the mmap topic first surfaced is simple enough that we
>> don't need additional userspace helpers and should be enough to make the
>> coherency issue disappear.
>
> Yeah, if we add a BEGIN_MMAP/END_MMAP ioctl or so to the dma-buf mmap
> stuff and simply demand that userspace calls that we'd have something
> half-sane at least. Since we currently don't really have any real
> users we could still do this abi change I think.
>
> One open issue is whether the BEGIN_MMAP ioctl should also synchronize
> with any outstanding gpu access. Imo it should, since that's pretty
> much how all the current drm drivers work, but maybe we should reserve
> space for a few flags so that this can be extended later on - Android
> seems to be pretty insistent on using explicit syncpoints everywhere
> instead of implicit synchronization. Or maybe we should have the flag
> already, but reject implicit syncing until Maarten's dma_fence stuff
> is in and enabled, dunno. Adding him to cc.

I suppose I wouldn't mind if we made BEGIN/END_MMAP as sort of clones
of the CPU_PREP/FINI ioctls I have in msm.. where the CPU_PREP takes a
flag/bitmask to indicate READ/WRITE/NOSYNC

That doesn't really help with partial buffer access, like you'd get if
you were using the buffer for vertex upload.  Although not really sure
if we'd be seeing dmabuf's for that sort of use-case.  We could make a
more elaborate kernel interface that maps better to all the different
cases in pipe_transfer.  Or we could just say that that is overkill.
I don't really have strong opinion about that (so someone who does,
send an RFC).  But I do agree with the "barriers only, not locks".

BR,
-R

> The clear thing otoh is that these two ioctls should only serve as
> barriers, not as locks as has been proposed in some other RFCs
> floating around (iirc the one from exynos was borked in this fashion).
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-11-01 13:22                       ` Rob Clark
@ 2013-11-01 13:32                         ` Lucas Stach
  2013-11-04  7:53                           ` Thomas Hellstrom
  0 siblings, 1 reply; 21+ messages in thread
From: Lucas Stach @ 2013-11-01 13:32 UTC (permalink / raw)
  To: Rob Clark
  Cc: Thomas Hellstrom, Daniel Vetter, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org

Am Freitag, den 01.11.2013, 09:22 -0400 schrieb Rob Clark:
> On Fri, Nov 1, 2013 at 6:17 AM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > On Fri, Nov 1, 2013 at 11:03 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
> >> GStreamer needs _some_ way of accessing the buffer contents with the
> >> CPU, this doesn't necessarily have to be the dumb mmap we have right
> >> now.
> >> DMA-BUF support in GSt is not really finished (I know we have a lot of
> >> patches internally to make it actually work with anything, which are
> >> trickling upstream right now), so this might be a good time to hammer
> >> out how it should be done, so we can adapt GSt before people start to
> >> rely on the dma-buf fallback mmap.
> >>
> >> I would think the bracketed mmap idea that was thrown around by Rob some
> >> time ago when the mmap topic first surfaced is simple enough that we
> >> don't need additional userspace helpers and should be enough to make the
> >> coherency issue disappear.
> >
> > Yeah, if we add a BEGIN_MMAP/END_MMAP ioctl or so to the dma-buf mmap
> > stuff and simply demand that userspace calls that we'd have something
> > half-sane at least. Since we currently don't really have any real
> > users we could still do this abi change I think.
> >
> > One open issue is whether the BEGIN_MMAP ioctl should also synchronize
> > with any outstanding gpu access. Imo it should, since that's pretty
> > much how all the current drm drivers work, but maybe we should reserve
> > space for a few flags so that this can be extended later on - Android
> > seems to be pretty insistent on using explicit syncpoints everywhere
> > instead of implicit synchronization. Or maybe we should have the flag
> > already, but reject implicit syncing until Maarten's dma_fence stuff
> > is in and enabled, dunno. Adding him to cc.
> 
I don't think we need that right now. We may prepare for implicit sync
by having the flag ready in the API, but don't force the exporters to
implement it right now.

> I suppose I wouldn't mind if we made BEGIN/END_MMAP as sort of clones
> of the CPU_PREP/FINI ioctls I have in msm.. where the CPU_PREP takes a
> flag/bitmask to indicate READ/WRITE/NOSYNC
> 
> That doesn't really help with partial buffer access, like you'd get if
> you were using the buffer for vertex upload.  Although not really sure
> if we'd be seeing dmabuf's for that sort of use-case.  We could make a
> more elaborate kernel interface that maps better to all the different
> cases in pipe_transfer.  Or we could just say that that is overkill.
> I don't really have strong opinion about that (so someone who does,
> send an RFC).  But I do agree with the "barriers only, not locks".
> 
I just checked back with our GStreamer guy and he thinks that for the
use-cases he sees today it would make a lot of sense to have some way to
indicate that userspace is only going to access a partial range of the
buffer. I think this may be a crucial optimization on all kinds of
devices where you have to ensure coherency manually.

Personally I would like to have the same kind of partial access
indicators in DRMs cpu_prep/fini to optimize all the userspace
suballocations we have today.

I think we are all on the same page with the barriers only thing.

Regards,
Lucas

-- 
Pengutronix e.K.                           | Lucas Stach                 |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-11-01 10:17                     ` Daniel Vetter
  2013-11-01 13:22                       ` Rob Clark
@ 2013-11-01 13:54                       ` Thomas Hellstrom
  1 sibling, 0 replies; 21+ messages in thread
From: Thomas Hellstrom @ 2013-11-01 13:54 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: dri-devel@lists.freedesktop.org, linaro-mm-sig@lists.linaro.org

On 11/01/2013 11:17 AM, Daniel Vetter wrote:
> On Fri, Nov 1, 2013 at 11:03 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
>> GStreamer needs _some_ way of accessing the buffer contents with the
>> CPU, this doesn't necessarily have to be the dumb mmap we have right
>> now.
>> DMA-BUF support in GSt is not really finished (I know we have a lot of
>> patches internally to make it actually work with anything, which are
>> trickling upstream right now), so this might be a good time to hammer
>> out how it should be done, so we can adapt GSt before people start to
>> rely on the dma-buf fallback mmap.
>>
>> I would think the bracketed mmap idea that was thrown around by Rob some
>> time ago when the mmap topic first surfaced is simple enough that we
>> don't need additional userspace helpers and should be enough to make the
>> coherency issue disappear.
> Yeah, if we add a BEGIN_MMAP/END_MMAP ioctl or so to the dma-buf mmap
> stuff and simply demand that userspace calls that we'd have something
> half-sane at least. Since we currently don't really have any real
> users we could still do this abi change I think.
>
> One open issue is whether the BEGIN_MMAP ioctl should also synchronize
> with any outstanding gpu access. Imo it should, since that's pretty
> much how all the current drm drivers work, but maybe we should reserve
> space for a few flags so that this can be extended later on - Android
> seems to be pretty insistent on using explicit syncpoints everywhere
> instead of implicit synchronization. Or maybe we should have the flag
> already, but reject implicit syncing until Maarten's dma_fence stuff
> is in and enabled, dunno. Adding him to cc.
>
> The clear thing otoh is that these two ioctls should only serve as
> barriers, not as locks as has been proposed in some other RFCs
> floating around (iirc the one from exynos was borked in this fashion).
> -Daniel
I think one of the worst use-cases we could imagine would be a client 
doing something like the old kde screen saver that drew hundreds of
thousands of small squares and in addition would flush for each square. 
Here begin/mmap end/mmap would not suffice, and _mandatory_ syncing 
would also be a bad idea.

We'd need to have a region supplied. Something like gallium texture 
transfers would work reasonably well, but they allow direct mappings of 
the underlying surface / buffer, which means that a lazy client might 
allocate texture transfers covering the whole buffer if coherent drivers 
made that a reasonably fast operation.

access using something like texsubimage (or region-type pwrites / 
preads) would at least involve a memcpy / kernel copy and would have the 
client think twice before accessing the dma-buf in this way, but I 
understand that this might sound frustrating for writers of coherent 
drivers. (I would have opposed it :))

But I'd like to stress that a single BEGIN_MMAP / END_MMAP wouldn't help 
us much, and that we need a read/ write/bidirectional indication.

Thanks,
/Thomas

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-11-01 13:32                         ` Lucas Stach
@ 2013-11-04  7:53                           ` Thomas Hellstrom
  2013-11-04 10:22                             ` Lucas Stach
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Hellstrom @ 2013-11-04  7:53 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Daniel Vetter, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org

[-- Attachment #1: Type: text/plain, Size: 4744 bytes --]

On 11/01/2013 02:32 PM, Lucas Stach wrote:
> Am Freitag, den 01.11.2013, 09:22 -0400 schrieb Rob Clark:
>> On Fri, Nov 1, 2013 at 6:17 AM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>>> On Fri, Nov 1, 2013 at 11:03 AM, Lucas Stach <l.stach@pengutronix.de> wrote:
>>>> GStreamer needs _some_ way of accessing the buffer contents with the
>>>> CPU, this doesn't necessarily have to be the dumb mmap we have right
>>>> now.
>>>> DMA-BUF support in GSt is not really finished (I know we have a lot of
>>>> patches internally to make it actually work with anything, which are
>>>> trickling upstream right now), so this might be a good time to hammer
>>>> out how it should be done, so we can adapt GSt before people start to
>>>> rely on the dma-buf fallback mmap.
>>>>
>>>> I would think the bracketed mmap idea that was thrown around by Rob some
>>>> time ago when the mmap topic first surfaced is simple enough that we
>>>> don't need additional userspace helpers and should be enough to make the
>>>> coherency issue disappear.
>>> Yeah, if we add a BEGIN_MMAP/END_MMAP ioctl or so to the dma-buf mmap
>>> stuff and simply demand that userspace calls that we'd have something
>>> half-sane at least. Since we currently don't really have any real
>>> users we could still do this abi change I think.
>>>
>>> One open issue is whether the BEGIN_MMAP ioctl should also synchronize
>>> with any outstanding gpu access. Imo it should, since that's pretty
>>> much how all the current drm drivers work, but maybe we should reserve
>>> space for a few flags so that this can be extended later on - Android
>>> seems to be pretty insistent on using explicit syncpoints everywhere
>>> instead of implicit synchronization. Or maybe we should have the flag
>>> already, but reject implicit syncing until Maarten's dma_fence stuff
>>> is in and enabled, dunno. Adding him to cc.
> I don't think we need that right now. We may prepare for implicit sync
> by having the flag ready in the API, but don't force the exporters to
> implement it right now.
>
>> I suppose I wouldn't mind if we made BEGIN/END_MMAP as sort of clones
>> of the CPU_PREP/FINI ioctls I have in msm.. where the CPU_PREP takes a
>> flag/bitmask to indicate READ/WRITE/NOSYNC
>>
>> That doesn't really help with partial buffer access, like you'd get if
>> you were using the buffer for vertex upload.  Although not really sure
>> if we'd be seeing dmabuf's for that sort of use-case.  We could make a
>> more elaborate kernel interface that maps better to all the different
>> cases in pipe_transfer.  Or we could just say that that is overkill.
>> I don't really have strong opinion about that (so someone who does,
>> send an RFC).  But I do agree with the "barriers only, not locks".
>>
> I just checked back with our GStreamer guy and he thinks that for the
> use-cases he sees today it would make a lot of sense to have some way to
> indicate that userspace is only going to access a partial range of the
> buffer. I think this may be a crucial optimization on all kinds of
> devices where you have to ensure coherency manually.
>
> Personally I would like to have the same kind of partial access
> indicators in DRMs cpu_prep/fini to optimize all the userspace
> suballocations we have today.
>
> I think we are all on the same page with the barriers only thing.
>
> Regards,
> Lucas
>

OK, so I've attached something that would work on top of the current 
mmap() interface, and that would basically be NOPs on
coherent architectures. Comment welcome. In order for this to be 
meaningful, we would need to make it mandatory if mmap is
implemented. One outstanding issue Is how this might play with cache 
line alignment. We might be able to require the implementation to handle 
this, making sure that synced regions align both to the device cache and 
the cpu cache, and that, for example, a
sync_for_cpu(DMA_TO_DEVICE) may actually result in device caches in the 
pad area being flushed...

Region pwrites / preads having basically an identical interface (except 
the linear flag) but also taking a user pointer and stride, would also 
work just fine. Linear preads / pwrites could, I guess, be implemented 
the intended way using write / read syscalls on the file-descriptor.

Still, I must say I'd prefer if we could just avoid mmap and the major 
drivers didn't implement it.
Second on my list is probably pwrites / preads, and third this proposed 
interface. My preference order is based on the way vmwgfx handles
this stuff from the Xorg driver today, doing unsynchronized region 
preads / pwrites from a shadow user-space buffer, keeping track on 
dirtied regions on both sides. (If it works for the Xorg driver, it 
should work for anything else :)).

Thanks,
/Thomas


[-- Attachment #2: dma-buf-ioctl.h --]
[-- Type: text/x-chdr, Size: 2400 bytes --]



enum dma_buf_data_direction {
	DMA_BUF_BIDIRECTIONAL,
	DMA_BUF_TO_DEVICE,
	DMA_BUF_FROM_DEVICE
};

/**
 * struct dma_buf_sync_arg - Argument to 
 *
 * @start_x:   Upper left X coordinate of area to be synced.
 * Unit is format-dependent
 * @start_y:   Upper left Y coordinate of area to be synced.
 * Unit is format-dependent.
 * @width:     Width of area to be synced. Grows to the right.
 * Unit is format-dependent.
 * @height:    Height of area to be synced. Grows downwards.
 * Unit is format-dependent.
 * @direction: Intended transfer direction of data.
 * @flags:     Flags to tune the synchronizing behaviour.
 */

struct dma_buf_sync_region_arg {
	__u32 buf_identifier;
	__u32 start_x;
	__u32 start_y;
	__u32 width;
	__u32 height;
	enum dma_buf_data_direction direction;
	__u32 flags;
	__u32 pad;
};
	
/**
 * Force sync any outstanding rendering of the exporter before returning.
 * This is strictly a performance hint. The user may omit this flag to
 * speed up execution of the call, if it is known that previous rendering
 * affecting (by read or write) the synchronized area has already finished. 
 */
#define DMA_BUF_SYNC_FLAG_SYNC  (1 << 0);

/**
 * Treat @start_x as byte offset into buffer and @width as byte
 * synchronization length. The values of @start_y and @height are ignored.
 * (Separate ioctl?)
 */
#define DMA_BUF_SYNC_FLAG_LINEAR (1 << 1);

/**
 * Allow the implementation to coalesce sync_for_device calls, until either
 * a) An explicit flush
 * b) A sync for cpu call with DMA_BUF_BIDIRECTIONAL or DMA_BUF_TO_DEVICE
 *
 * Note: An implementation may choose to ignore this flag.
 */
#define DMA_BUF_SYNC_FLAG_COALESCE (1 << 2); 

/**
 * IOCTLS-
 *
 * Kernel waits should, if possible, be performed interruptible, and the
 * ioctl may sett errno to EINTR if the ioctl needs to be restarted.
 * To be discussed: Any sync operation may not affect areas outside the
 * region indicated. (Good for vmwgfx, but plays ill with cache line alignment)
 */

/**
 * Sync for CPU.
 */
#define DMA_BUF_SYNC_REGION_FOR_CPU             \ 
_IOW(DMA_BUF_BASE, 0, struct dma_buf_sync_region_arg)

/**
 * Sync for device. This is the default state of a dma-buf. 
 */
#define DMA_BUF_SYNC_REGION_FOR_DEVICE				\
_IOW(DMA_BUF_BASE, 1, struct dma_buf_sync_region_arg)

/**
 * Flush any coalesced SYNC_REGION_FOR_DEVICE
 */
#define DMA_BUF_SYNC_REGION_FLUSH		\
_IOW(DMA_BUF_BASE, 2, __u32)

[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-11-04  7:53                           ` Thomas Hellstrom
@ 2013-11-04 10:22                             ` Lucas Stach
  2013-11-04 10:48                               ` Thomas Hellstrom
  0 siblings, 1 reply; 21+ messages in thread
From: Lucas Stach @ 2013-11-04 10:22 UTC (permalink / raw)
  To: Thomas Hellstrom
  Cc: Daniel Vetter, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org

Hi Thomas,

I inlined the patch, to discuss more easily.

Am Montag, den 04.11.2013, 08:53 +0100 schrieb Thomas Hellstrom:
> enum dma_buf_data_direction {
>         DMA_BUF_BIDIRECTIONAL,
>         DMA_BUF_TO_DEVICE,
>         DMA_BUF_FROM_DEVICE
> };

I don't think the DMA API semantic makes much sense here. Doing a sync
from device is at least a bit complicated, as we would have to track
down the device in which domain the buffer is currently residing for
write. This may be doable by bouncing the sync op from the exporter to
the right attachment.

The other way around is even more messier: a DMA-BUF is by definition
shared between multiple devices, so which of them should you sync to?
All of them?

I think the API used for userspace access should only be concerned with
making sure the CPU gets a coherent view to the resource at the point of
sharing i.e. sysmem. All DMA-BUF users should make sure that they sync
their writes to the point of sharing strictly before signaling that they
are done with the buffer.
Maybe this is just a naming problem, but IMHO the API should make it
really clear that the sync only makes sure that the data reaches the
point of sharing.

> 
> /**
>  * struct dma_buf_sync_arg - Argument to 
>  *
>  * @start_x:   Upper left X coordinate of area to be synced.
>  * Unit is format-dependent
>  * @start_y:   Upper left Y coordinate of area to be synced.
>  * Unit is format-dependent.
>  * @width:     Width of area to be synced. Grows to the right.
>  * Unit is format-dependent.
>  * @height:    Height of area to be synced. Grows downwards.
>  * Unit is format-dependent.
While I see why you would need this I don't really like the concept of a
two dimensional resource pushed into the API. You already need the
linear flag to make this work with 1D resources and I don't see what
happens when you encounter 3D or array resources.

Don't take this as a strong rejection as I see why this is useful, but I
don't like how the interface looks like right now.

>  * @direction: Intended transfer direction of data.
>  * @flags:     Flags to tune the synchronizing behaviour.
>  */
> 
> struct dma_buf_sync_region_arg {
>         __u32 buf_identifier;
>         __u32 start_x;
>         __u32 start_y;
>         __u32 width;
>         __u32 height;
>         enum dma_buf_data_direction direction;
>         __u32 flags;
>         __u32 pad;
> };
>         
> /**
>  * Force sync any outstanding rendering of the exporter before
> returning.
>  * This is strictly a performance hint. The user may omit this flag to
>  * speed up execution of the call, if it is known that previous
> rendering
>  * affecting (by read or write) the synchronized area has already
> finished. 
>  */
> #define DMA_BUF_SYNC_FLAG_SYNC  (1 << 0);
> 
> /**
>  * Treat @start_x as byte offset into buffer and @width as byte
>  * synchronization length. The values of @start_y and @height are
> ignored.
>  * (Separate ioctl?)
>  */
> #define DMA_BUF_SYNC_FLAG_LINEAR (1 << 1);
> 
> /**
>  * Allow the implementation to coalesce sync_for_device calls, until
> either
>  * a) An explicit flush
>  * b) A sync for cpu call with DMA_BUF_BIDIRECTIONAL or
> DMA_BUF_TO_DEVICE
>  *
>  * Note: An implementation may choose to ignore this flag.
>  */
> #define DMA_BUF_SYNC_FLAG_COALESCE (1 << 2); 
> 
> /**
>  * IOCTLS-
>  *
>  * Kernel waits should, if possible, be performed interruptible, and
> the
>  * ioctl may sett errno to EINTR if the ioctl needs to be restarted.
>  * To be discussed: Any sync operation may not affect areas outside
> the
>  * region indicated. (Good for vmwgfx, but plays ill with cache line
> alignment)
>  */
> 
> /**
>  * Sync for CPU.
>  */
> #define DMA_BUF_SYNC_REGION_FOR_CPU             \ 
> _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync_region_arg)
> 
> /**
>  * Sync for device. This is the default state of a dma-buf. 
>  */
> #define DMA_BUF_SYNC_REGION_FOR_DEVICE                          \
> _IOW(DMA_BUF_BASE, 1, struct dma_buf_sync_region_arg)
> 
> /**
>  * Flush any coalesced SYNC_REGION_FOR_DEVICE
>  */
> #define DMA_BUF_SYNC_REGION_FLUSH               \
> _IOW(DMA_BUF_BASE, 2, __u32)
> 
What is the use-case for this? Why don't you think that usespace could
be smart enough to coalesce the flush on its own?

Regards,
Lucas
-- 
Pengutronix e.K.                           | Lucas Stach                 |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: dma-buf non-coherent mmap
  2013-11-04 10:22                             ` Lucas Stach
@ 2013-11-04 10:48                               ` Thomas Hellstrom
  0 siblings, 0 replies; 21+ messages in thread
From: Thomas Hellstrom @ 2013-11-04 10:48 UTC (permalink / raw)
  To: Lucas Stach
  Cc: Daniel Vetter, dri-devel@lists.freedesktop.org,
	linaro-mm-sig@lists.linaro.org

On 11/04/2013 11:22 AM, Lucas Stach wrote:
> Hi Thomas,
>
> I inlined the patch, to discuss more easily.
>
> Am Montag, den 04.11.2013, 08:53 +0100 schrieb Thomas Hellstrom:
>> enum dma_buf_data_direction {
>>          DMA_BUF_BIDIRECTIONAL,
>>          DMA_BUF_TO_DEVICE,
>>          DMA_BUF_FROM_DEVICE
>> };
> I don't think the DMA API semantic makes much sense here. Doing a sync
> from device is at least a bit complicated, as we would have to track
> down the device in which domain the buffer is currently residing for
> write. This may be doable by bouncing the sync op from the exporter to
> the right attachment.
>
> The other way around is even more messier: a DMA-BUF is by definition
> shared between multiple devices, so which of them should you sync to?
> All of them?
>
> I think the API used for userspace access should only be concerned with
> making sure the CPU gets a coherent view to the resource at the point of
> sharing i.e. sysmem. All DMA-BUF users should make sure that they sync
> their writes to the point of sharing strictly before signaling that they
> are done with the buffer.
> Maybe this is just a naming problem, but IMHO the API should make it
> really clear that the sync only makes sure that the data reaches the
> point of sharing.

Good point. I think the options need to remain, but we could rename
DEVICE to STORAGE, or whatever In our case,

sync_for_cpu(DMA_BUF_TO_STORAGE) would be a no-op, whereas,
sync_for_cpu(DMA_BUF_BIDIRECTIONAL) would trigger a DMA read.

>
>> /**
>>   * struct dma_buf_sync_arg - Argument to
>>   *
>>   * @start_x:   Upper left X coordinate of area to be synced.
>>   * Unit is format-dependent
>>   * @start_y:   Upper left Y coordinate of area to be synced.
>>   * Unit is format-dependent.
>>   * @width:     Width of area to be synced. Grows to the right.
>>   * Unit is format-dependent.
>>   * @height:    Height of area to be synced. Grows downwards.
>>   * Unit is format-dependent.
> While I see why you would need this I don't really like the concept of a
> two dimensional resource pushed into the API. You already need the
> linear flag to make this work with 1D resources and I don't see what
> happens when you encounter 3D or array resources.
>
> Don't take this as a strong rejection as I see why this is useful, but I
> don't like how the interface looks like right now.

I understand. I don't think 1D only syncing options would suffice 
performance-wise, as
the use-case I'm most afraid of is people trying to share 2D contents 
between user-space
processes.

What do you think about supplying descriptor objects that could either 
describe a
linear area, 2D area or whatever? Whenever (if) a new descriptor type is 
added to the interface we could
have a legacy implementation that implements it in terms of 1D and 2D syncs.

>
>>   * @direction: Intended transfer direction of data.
>>   * @flags:     Flags to tune the synchronizing behaviour.
>>   */
>>
>> struct dma_buf_sync_region_arg {
>>          __u32 buf_identifier;
>>          __u32 start_x;
>>          __u32 start_y;
>>          __u32 width;
>>          __u32 height;
>>          enum dma_buf_data_direction direction;
>>          __u32 flags;
>>          __u32 pad;
>> };
>>          
>> /**
>>   * Force sync any outstanding rendering of the exporter before
>> returning.
>>   * This is strictly a performance hint. The user may omit this flag to
>>   * speed up execution of the call, if it is known that previous
>> rendering
>>   * affecting (by read or write) the synchronized area has already
>> finished.
>>   */
>> #define DMA_BUF_SYNC_FLAG_SYNC  (1 << 0);
>>
>> /**
>>   * Treat @start_x as byte offset into buffer and @width as byte
>>   * synchronization length. The values of @start_y and @height are
>> ignored.
>>   * (Separate ioctl?)
>>   */
>> #define DMA_BUF_SYNC_FLAG_LINEAR (1 << 1);
>>
>> /**
>>   * Allow the implementation to coalesce sync_for_device calls, until
>> either
>>   * a) An explicit flush
>>   * b) A sync for cpu call with DMA_BUF_BIDIRECTIONAL or
>> DMA_BUF_TO_DEVICE
>>   *
>>   * Note: An implementation may choose to ignore this flag.
>>   */
>> #define DMA_BUF_SYNC_FLAG_COALESCE (1 << 2);
>>
>> /**
>>   * IOCTLS-
>>   *
>>   * Kernel waits should, if possible, be performed interruptible, and
>> the
>>   * ioctl may sett errno to EINTR if the ioctl needs to be restarted.
>>   * To be discussed: Any sync operation may not affect areas outside
>> the
>>   * region indicated. (Good for vmwgfx, but plays ill with cache line
>> alignment)
>>   */
>>
>> /**
>>   * Sync for CPU.
>>   */
>> #define DMA_BUF_SYNC_REGION_FOR_CPU             \
>> _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync_region_arg)
>>
>> /**
>>   * Sync for device. This is the default state of a dma-buf.
>>   */
>> #define DMA_BUF_SYNC_REGION_FOR_DEVICE                          \
>> _IOW(DMA_BUF_BASE, 1, struct dma_buf_sync_region_arg)
>>
>> /**
>>   * Flush any coalesced SYNC_REGION_FOR_DEVICE
>>   */
>> #define DMA_BUF_SYNC_REGION_FLUSH               \
>> _IOW(DMA_BUF_BASE, 2, __u32)
>>
> What is the use-case for this? Why don't you think that usespace could
> be smart enough to coalesce the flush on its own?

Right. A thinko. We could clearly leave that out.

/Thomas

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-11-04 10:48 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-31 17:00 dma-buf non-coherent mmap Thomas Hellstrom
2013-10-31 17:52 ` Rob Clark
2013-10-31 20:40   ` Thomas Hellstrom
2013-10-31 20:48     ` Dave Airlie
2013-10-31 21:07       ` Thomas Hellstrom
2013-10-31 23:00         ` Daniel Vetter
2013-11-01  0:17           ` Jakob Bornecrantz
2013-11-01  0:25             ` Rob Clark
2013-11-01  0:37               ` Jakob Bornecrantz
2013-11-01  0:57                 ` Rob Clark
2013-11-01 10:03                   ` Lucas Stach
2013-11-01 10:17                     ` Daniel Vetter
2013-11-01 13:22                       ` Rob Clark
2013-11-01 13:32                         ` Lucas Stach
2013-11-04  7:53                           ` Thomas Hellstrom
2013-11-04 10:22                             ` Lucas Stach
2013-11-04 10:48                               ` Thomas Hellstrom
2013-11-01 13:54                       ` Thomas Hellstrom
2013-10-31 21:10     ` Rob Clark
2013-10-31 21:38       ` Thomas Hellstrom
2013-10-31 22:43         ` [Linaro-mm-sig] " Benjamin Gaignard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).