From mboxrd@z Thu Jan 1 00:00:00 1970 From: Serguei Sagalovitch Subject: Re: [RFC] drm/radeon: userfence IOCTL Date: Mon, 13 Apr 2015 11:37:42 -0400 Message-ID: <552BE2C6.4080700@amd.com> References: <1428936737-19103-1-git-send-email-deathsimple@vodafone.de> <552BDFEA.6070806@amd.com> <552BE228.3020207@vodafone.de> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1732745626==" Return-path: Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1bbn0109.outbound.protection.outlook.com [157.56.111.109]) by gabe.freedesktop.org (Postfix) with ESMTP id B23D46E26A for ; Mon, 13 Apr 2015 10:09:27 -0700 (PDT) In-Reply-To: <552BE228.3020207@vodafone.de> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: =?UTF-8?B?Q2hyaXN0aWFuIEvDtm5pZw==?= , dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1732745626== Content-Type: multipart/alternative; boundary="------------070708050800030807040004" --------------070708050800030807040004 Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: quoted-printable > Another alternative would be to use the userspace mapping to check=20 the BO value This is what I was thinking. Sincerely yours, Serguei Sagalovitch On 15-04-13 11:35 AM, Christian K=C3=B6nig wrote: > On 13.04.2015 17:25, Serguei Sagalovitch wrote: >> > the BO to be kept in the same place while it is mapped inside the=20 >> kernel page table >> ... >> > So this requires that we pin down the BO for the duration of the=20 >> wait IOCTL. >> >> But my understanding is that it should be not duration of "wait"=20 >> IOCTL but "duration" of command buffer execution. >> >> BTW: I would assume that this is not the new scenario. >> >> This is scenario: >> - User allocate BO >> - User get CPU address for BO >> - User submit command buffer to write to BO >> - User could "poll" / "read" or "write" BO data by CPU >> >> So when TTM needs to move BO to another location it should also=20 >> update CPU "mapping" correctly so user will always read / write the=20 >> correct data. >> >> Did I miss anything? > > The problem is that kernel mappings are not updated when TTM moves the=20 > buffer around. In the case of a swapped out buffer that wouldn't even=20 > be possible cause kernel mappings aren't pageable. > > You just can't map the BO into kernel space unless you have it pinned=20 > down, so you can't check the current value written in the BO in your=20 > IOCTL. > > One alternative is to send all interrupts in question unfiltered to=20 > user space and let userspace do the check if the right value was=20 > written or not. But I assume that this would be rather bad for=20 > performance. > > Another alternative would be to use the userspace mapping to check the=20 > BO value, but this approach isn't compatible with a GPU scheduler.=20 > E.g. you can't really do cross process space memory access in device=20 > drivers. > > Regards, > Christian. > >> >> >> Sincerely yours, >> Serguei Sagalovitch >> >> On 15-04-13 10:52 AM, Christian K=C3=B6nig wrote: >>> Hello everyone, >>> >>> we have a requirement for a bit different kind of fence handling. Cur= rently we handle fences completely inside the kernel, but in the future w= e would like to emit multiple fences inside the same IB as well. >>> >>> This works by adding multiple fence commands into an IB which just wr= ite their value to a specific location inside a BO and trigger the approp= riate hardware interrupt. >>> >>> The user part of the driver stack should then be able to call an IOCT= L to wait for the interrupt and block for the value (or something larger)= to be written to the specific location. >>> >>> This has the advantage that you can have multiple synchronization poi= nts in the same IB and don't need to split up your draw commands over sev= eral IBs so that the kernel can insert kernel fences in between. >>> >>> The following set of patches tries to implement exactly this IOCTL. T= he big problem with that IOCTL is that TTM needs the BO to be kept in the= same place while it is mapped inside the kernel page table. So this requ= ires that we pin down the BO for the duration of the wait IOCTL. >>> >>> This practically gives userspace a way of pinning down BOs for as lon= g as it wants, without the ability for the kernel for intervention. >>> >>> Any ideas how to avoid those problems? Or better ideas how to handle = the new requirements? >>> >>> Please note that the patches are only hacked together quick&dirty to = demonstrate the problem and not very well tested. >>> >>> Best regards, >>> Christian. >> > --------------070708050800030807040004 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable >=C2=A0 Another alternative would be to use the userspace mapping = to check the BO value
This is what I was thinking.=C2=A0

Sincerely yours,
Serguei Sagalovitch

On 15-04-13 11:35 AM, Christian K=C3=B6= nig wrote:
On 13.04.2015 17:25, Serguei Sagalovitch wrote:
&g= t; the BO to be kept in the same place while it is mapped inside the kernel page table
...
> So this requires that we pin down the BO for the duration of the wait IOCTL.

But my understanding is that it should be not duration of "wait" IOCTL but "duration" of command buffer execution.

BTW: I would assume that this is not the new scenario.

=C2=A0This is scenario:
=C2=A0=C2=A0=C2=A0 - User allocate BO
=C2=A0=C2=A0=C2=A0 - User get CPU address for BO
=C2=A0=C2=A0=C2=A0 - User submit command buffer to write to BO =C2=A0=C2=A0=C2=A0 - User could "poll" / "read" or "write" BO dat= a by CPU

So when=C2=A0 TTM needs=C2=A0 to move BO to another location it s= hould also update CPU "mapping" correctly so user will always read / write the correct data.

Did I miss anything?

The problem is that kernel mappings are not updated when TTM moves the buffer around. In the case of a swapped out buffer that wouldn't even be possible cause kernel mappings aren't pageable.
You just can't map the BO into kernel space unless you have it pinned down, so you can't check the current value written in the BO in your IOCTL.

One alternative is to send all interrupts in question unfiltered to user space and let userspace do the check if the right value was written or not. But I assume that this would be rather bad for performance.

Another alternative would be to use the userspace mapping to check the BO value, but this approach isn't compatible with a GPU scheduler. E.g. you can't really do cross process space memory access in device drivers.

Regards,
Christian.


Sincerely yours,
Serguei Sagalovitch

On 15-04-13 10:52 AM, Christian K=C3=B6nig wrote:
Hello everyone,

we have a requirement for a bit different kind of fence handling. Current=
ly we handle fences completely inside the kernel, but in the future we wo=
uld like to emit multiple fences inside the same IB as well.

This works by adding multiple fence commands into an IB which just write =
their value to a specific location inside a BO and trigger the appropriat=
e hardware interrupt.

The user part of the driver stack should then be able to call an IOCTL to=
 wait for the interrupt and block for the value (or something larger) to =
be written to the specific location.

This has the advantage that you can have multiple synchronization points =
in the same IB and don't need to split up your draw commands over several=
 IBs so that the kernel can insert kernel fences in between.

The following set of patches tries to implement exactly this IOCTL. The b=
ig problem with that IOCTL is that TTM needs the BO to be kept in the sam=
e place while it is mapped inside the kernel page table. So this requires=
 that we pin down the BO for the duration of the wait IOCTL.

This practically gives userspace a way of pinning down BOs for as long as=
 it wants, without the ability for the kernel for intervention.

Any ideas how to avoid those problems? Or better ideas how to handle the =
new requirements?

Please note that the patches are only hacked together quick&dirty to =
demonstrate the problem and not very well tested.

Best regards,
Christian.



--------------070708050800030807040004-- --===============1732745626== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHA6Ly9saXN0 cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9kcmktZGV2ZWwK --===============1732745626==--