From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jerome Glisse <j.glisse@gmail.com>
Subject: Re: Fence, timeline and android sync points
Date: Thu, 14 Aug 2014 10:23:30 -0400
Message-ID: <20140814142329.GC2000@gmail.com>
References: <20140812221340.GB5746@gmail.com>
 <20140813082822.GO10500@phenom.ffwll.local>
 <20140813133602.GA2666@gmail.com>
 <20140813155420.GG10500@phenom.ffwll.local>
 <20140813170719.GD2666@gmail.com>
 <20140814090834.GK10500@phenom.ffwll.local>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Return-path: <dri-devel-bounces@lists.freedesktop.org>
Received: from mail-qg0-f51.google.com (mail-qg0-f51.google.com
 [209.85.192.51])
 by gabe.freedesktop.org (Postfix) with ESMTP id 59B526E6C1
 for <dri-devel@lists.freedesktop.org>; Thu, 14 Aug 2014 07:23:34 -0700 (PDT)
Received: by mail-qg0-f51.google.com with SMTP id a108so1085087qge.10
 for <dri-devel@lists.freedesktop.org>; Thu, 14 Aug 2014 07:23:33 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <20140814090834.GK10500@phenom.ffwll.local>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: daniel.vetter@ffwll.ch, dri-devel@lists.freedesktop.org, bskeggs@redhat.com
List-Id: dri-devel@lists.freedesktop.org

On Thu, Aug 14, 2014 at 11:08:34AM +0200, Daniel Vetter wrote:
> On Wed, Aug 13, 2014 at 01:07:20PM -0400, Jerome Glisse wrote:
> > Let me make this crystal clear this must be a valid kernel page that ha=
ve a
> > valid kernel mapping for the lifetime of the device. Hence there is no =
access
> > to mmio space or anything, just a regular kernel page. If can not rely =
on that
> > this is a sad world.
> > =

> > That being said, yes i am aware that some device incapacity to write to=
 such
> > a page. For those dumb hardware what you need to do is have the irq han=
dler
> > write to this page on behalf of the hardware. But i would like to know =
any
> > hardware than can not write a single dword from its ring buffer.
> > =

> > The only tricky part in this, is when device is unloaded and driver is =
removing
> > itself, it obviously need to synchronize itself with anyone possibly wa=
iting on
> > it and possibly reading. But truly this is not that hard to solve.
> > =

> > So tell me once the above is clear what kind of scary thing can happen =
when cpu
> > or a device _read_ a kernel page ?
> =

> It's not reading it, it's making sense of what you read. In i915 we had
> exactly the (timeline, seqno) value pair design for fences for a long
> time, and we're switching away from it since it stops working when you
> have preemption and scheduler. Or at least it gets really interesting to
> interpret the data written into the page.
> =

> So I don't want to expose that to other drivers if we decided that
> exposing this internally is a stupid idea.

I am well aware of that, but context scheduling really is what the timeline=
 i
talk about is. The user timeline should be consider like a single cpu threa=
d on
to which operation involving different hw are scheduled. You can have the hw
switch from one timeline to another and a seqno is only meaningful per time=
line.

The whole preemption and scheduling is something bound to happen on gpu and=
 we
will want to integrate with core scheduler to manage time slice allocated to
process, but the we need the concept of thread in which operation on same hw
block are processed in a linear fashion but still allowing concurrency with
other hw block.

> =

> > > =

> > > > > So from that pov (presuming I didn't miss anything) your proposal=
 is
> > > > > identical to what we have, minor some different color choices (li=
ke where
> > > > > to place the callback queue).
> > > > =

> > > > No callback is the mantra here, and instead of bolting free living =
fence
> > > > to buffer object, they are associated with timeline which means you=
 do not
> > > > need to go over all buffer object to know what you need to wait for.
> > > =

> > > Ok, then I guess I didn't understand that part of your the proposal. =
Can
> > > you please elaborate a bit more how you want to synchronize mulitple
> > > drivers accessing a dma-buf object and what piece of state we need to
> > > associate to the dma-buf to make this happen?
> > =

> > Beauty of it you associate ziltch to the buffer. So for existing cs ioc=
tl where
> > the implicit synchronization is the rule it enforce mandatory synchroni=
zation
> > accross all hw timeline on which a buffer shows up :
> >   for_each_buffer_in_cmdbuffer(buffer, cmdbuf) {
> >     if (!cmdbuf_write_to_buffer(buffer, cmdbuf))
> >       continue;
> >     for_each_process_sharing_buffer(buffer, process) {
> >       schedule_fence(process->implicit_timeline, cmdbuf->fence)
> >     }
> >   }
> > =

> > Next time another process use current ioctl with implicit sync it will =
synch with
> > the last fence for any shared resource. This might sounds bad but truel=
y as it is
> > right now this is already how it happens (at least for radeon).
> =

> Well i915 is a lot better than that. And I'm not going to implement some
> special-case for dma-buf shared buffers just because radeon sucks and
> wants to enforce that suckage on everyone else.

I guess i am having hard time to express myself, what i am saying here is t=
hat
implicit synchronization sucks because it has to assume the worst case and =
this
is what current code does, and i am sure intel is doing something similar w=
ith
today code.

Explicit synchronization allow more flexibility but fence code as it is des=
igned
does not allow to fully do down that line. By associating fence to buffer o=
bject
which is the biggest shortcoming of implicit sync.

> =

> So let's cut this short: If you absolutely insist I guess we could ditch
> the callback stuff from fences, but I really don't see the problem with
> radeon just not using that and then being happy. We can easily implement a
> bit of insulation code _just_ for radeon so that the only thing radeon
> does is wake up a process (which then calls the callback if it's something
> special).

Like i said feel free to ignore me. I am just genuinely want to have the be=
st
solution inside the linux kernel and i do think that fence and callback and
buffer association is not that solution. I tried to explain why but i might
be failing or missing something.

> Otoh I don't care about what ttm and radeon do, for i915 the important
> stuff is integration with android syncpts and being able to do explicit
> fencing for e.g. svm stuff. We can do that with what's merged in 3.17 and
> I expect that those patches will land in 3.18, at least the internal
> integration.
> =

> It would be cool if we could get tear-free optimus working on desktop
> linux, but that flat out doesn't pay my bills here. So I think I'll let
> you guys figure this out yourself.

Sad to learn that Intel no longer have any interest in the linux desktop.

Cheers,
J=E9r=F4me