From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?Q2hyaXN0aWFuIEvDtm5pZw==?= Subject: Re: [PATCH] drm/radeon: fix TOPDOWN handling for bo_create Date: Tue, 17 Mar 2015 16:43:09 +0100 Message-ID: <55084B8D.1030306@vodafone.de> References: <1426088652-32727-1-git-send-email-alexander.deucher@amd.com> <550087B6.3090307@vodafone.de> <55015640.4020008@daenzer.net> <55015B27.2030208@vodafone.de> <550251B2.5020000@daenzer.net> <5507A41F.20003@daenzer.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1986585015==" Return-path: Received: from pegasos-out.vodafone.de (pegasos-out.vodafone.de [80.84.1.38]) by gabe.freedesktop.org (Postfix) with ESMTP id 78B296E6FF for ; Tue, 17 Mar 2015 08:43:18 -0700 (PDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by pegasos-out.vodafone.de (Rohrpostix1 Daemon) with ESMTP id C824B26087A for ; Tue, 17 Mar 2015 16:43:16 +0100 (CET) Received: from pegasos-out.vodafone.de ([127.0.0.1]) by localhost (rohrpostix1.prod.vfnet.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id k2HBbQfBmQxP for ; Tue, 17 Mar 2015 16:43:13 +0100 (CET) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: Alex Deucher , =?UTF-8?B?TWljaGVsIETDpG56ZXI=?= Cc: Maling list - DRI developers List-Id: dri-devel@lists.freedesktop.org This is a multi-part message in MIME format. --===============1986585015== Content-Type: multipart/alternative; boundary="------------070801020105050304050800" This is a multi-part message in MIME format. --------------070801020105050304050800 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable On 17.03.2015 16:19, Alex Deucher wrote: > On Mon, Mar 16, 2015 at 11:48 PM, Michel D=C3=A4nzer wrote: >> On 17.03.2015 07:32, Alex Deucher wrote: >>> On Thu, Mar 12, 2015 at 10:55 PM, Michel D=C3=A4nzer wrote: >>>> On 12.03.2015 22:09, Alex Deucher wrote: >>>>> On Thu, Mar 12, 2015 at 5:23 AM, Christian K=C3=B6nig >>>>> wrote: >>>>>> On 12.03.2015 10:02, Michel D=C3=A4nzer wrote: >>>>>>> On 12.03.2015 06:14, Alex Deucher wrote: >>>>>>>> On Wed, Mar 11, 2015 at 4:51 PM, Alex Deucher >>>>>>>> wrote: >>>>>>>>> On Wed, Mar 11, 2015 at 2:21 PM, Christian K=C3=B6nig >>>>>>>>> wrote: >>>>>>>>>> On 11.03.2015 16:44, Alex Deucher wrote: >>>>>>>>>>> radeon_bo_create() calls radeon_ttm_placement_from_domain() >>>>>>>>>>> before ttm_bo_init() is called. radeon_ttm_placement_from_do= main() >>>>>>>>>>> uses the ttm bo size to determine when to select top down >>>>>>>>>>> allocation but since the ttm bo is not initialized yet the >>>>>>>>>>> check is always false. >>>>>>>>>>> >>>>>>>>>>> Noticed-by: Oded Gabbay >>>>>>>>>>> Signed-off-by: Alex Deucher >>>>>>>>>>> Cc: stable@vger.kernel.org >>>>>>>>>> >>>>>>>>>> And I was already wondering why the heck the BOs always made t= his >>>>>>>>>> ping/pong >>>>>>>>>> in memory after creation. >>>>>>>>>> >>>>>>>>>> Patch is Reviewed-by: Christian K=C3=B6nig >>>>>>>>> And fixing that promptly broke VCE due to vram location require= ments. >>>>>>>>> Updated patch attached. Thoughts? >>>>>>>> And one more take to make things a bit more explicit for static = kernel >>>>>>>> driver allocations. >>>>>>> struct ttm_place::lpfn is honoured even with TTM_PL_FLAG_TOPDOWN,= so >>>>>>> latter should work with RADEON_GEM_CPU_ACCESS. It sounds like the >>>>>>> problem is really that some BOs are expected to be within a certa= in >>>>>>> range from the beginning of VRAM, but lpfn isn't set accordingly.= It >>>>>>> would be better to fix that by setting lpfn directly than indirec= tly via >>>>>>> RADEON_GEM_CPU_ACCESS. >>>>>> >>>>>> Yeah, agree. We should probably try to find the root cause of this= instead. >>>>>> >>>>>> As far as I know VCE has no documented limitation on where buffers= are >>>>>> placed (unlike UVD). So this is a bit strange. Are you sure that i= t isn't >>>>>> UVD which breaks here? >>>>> It's definitely VCE, I don't know why UVD didn't have a problem. I >>>>> considered using pin_restricted to make sure it got pinned in the C= PU >>>>> visible region, but that had two problems: 1. it would end up getti= ng >>>>> migrated when pinned, >>>> Maybe something like radeon_uvd_force_into_uvd_segment() is needed f= or >>>> VCE as well? >>>> >>>> >>>>> 2. it would end up at the top of the restricted >>>>> region since the top down flag is set which would end up fragmentin= g >>>>> vram. >>>> If that's an issue (which outweighs the supposed benefit of >>>> TTM_PL_FLAG_TOPDOWN), then again the proper solution would be not to= set >>>> TTM_PL_FLAG_TOPDOWN when rbo->placements[i].lpfn !=3D 0 and smaller = than >>>> the whole available region, instead of checking for VRAM and >>>> RADEON_GEM_CPU_ACCESS. >>>> >>> How about something like the attached patch? I'm not really sure >>> about the restrictions for the UVD and VCE fw and stack/heap buffers, >>> but this seems to work. It seems like the current UVD/VCE code works >>> by accident since the check for TOPDOWN fails. >> This patch is getting a bit messy, mixing several logically separate >> changes. Can you split it up accordingly? E.g. one patch just adding t= he >> new fpfn and lpfn function parameters but passing 0 for them (so no >> functional change), then one or several patches with the corresponding >> functional changes, and finally one patch adding the new size paramete= r >> (and thus making TTM_PL_FLAG_TOPDOWN actually used for newly allocated >> BOs). I think that would help for reviewing and generally understandin= g >> the changes. >> >> >>> @@ -105,14 +106,17 @@ void radeon_ttm_placement_from_domain(struct ra= deon_bo *rbo, u32 domain) >>> */ >>> if ((rbo->flags & RADEON_GEM_NO_CPU_ACCESS) && >>> rbo->rdev->mc.visible_vram_size < rbo->rdev->mc.re= al_vram_size) { >>> - rbo->placements[c].fpfn =3D >>> - rbo->rdev->mc.visible_vram_size >> PAGE= _SHIFT; >>> + if (fpfn > (rbo->rdev->mc.visible_vram_size >> = PAGE_SHIFT)) >>> + rbo->placements[c].fpfn =3D fpfn; >>> + else >>> + rbo->placements[c].fpfn =3D >>> + rbo->rdev->mc.visible_vram_size= >> PAGE_SHIFT; >>> rbo->placements[c++].flags =3D TTM_PL_FLAG_WC = | >>> TTM_PL_FLAG_UNCAC= HED | >>> TTM_PL_FLAG_VRAM; >>> } >> If (fpfn >=3D rbo->rdev->mc.visible_vram_size), this whole block can b= e >> skipped, since the next placement will be identical. >> >> OTOH, fpfn is currently always 0 anyway, so maybe it's better not to a= dd >> that parameter in the first place. >> >> >> Other than that, looks good to me. > Broken out patches attached. Also available here: > http://cgit.freedesktop.org/~agd5f/linux/log/?h=3Dtopdown-fixes Thinking more about it that approach is a NAK. For limiting a BO into=20 visible VRAM we want the limit it to only apply to the VRAM domain=20 entry, doing it this way it applies to GTT as well which is really bad=20 for handling page faults. I would rather say let us completely nuke=20 radeon_ttm_placement_from_domain for internal allocations and give=20 radeon_bo_create a ttm_placement pointer to use. Driver internal allocations would then have a couple of predefined=20 placements for their buffers. We might need to make a few ttm_placement=20 pointers const for this, but I think that this is the better approach. Regards, Christian. > > Alex > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel --------------070801020105050304050800 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
On 17.03.2015 16:19, Alex Deucher wrote:
On Mon, Mar 16, 2015 at 11:48 PM, Michel D=C3=A4nzer=
 &l=
t;michel@daenzer.net> wrote:
On 17.03.2015 07:32, Alex Deucher wrote:
On Thu, Mar 12, 2015 at 10:55 PM, Michel D=C3=A4=
nzer <michel@daenzer.net> wrote:
On 12.03.2015 22:09, Alex Deucher wrote:
On Thu, Mar 12, 2015 at 5:23 AM, Christian K=
=C3=B6nig
<deathsimple@vodafone.de> wrote:
On 12.03.2015 10:02, Michel D=C3=A4nzer wr=
ote:
On 12.03.2015 06:14, Alex Deucher wrote:
On Wed, Mar 11, 2015 at 4:51 PM, Alex Deucher <alexdeucher@gmail.com&g=
t;
wrote:
On Wed, Mar 11, 2015 at 2:21 PM, Christian K=C3=B6nig
<deathsimple@vodafone.de> wrote:
On 11.03.2015 16:44, Alex Deucher wrote:
radeon_bo_create() calls radeon_ttm_placement_from_domain()
before ttm_bo_init() is called.  radeon_ttm_placement_from_domain()
uses the ttm bo size to determine when to select top down
allocation but since the ttm bo is not initialized yet the
check is always false.

Noticed-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

And I was already wondering why the heck the BOs always made this
ping/pong
in memory after creation.

Patch is Reviewed-by: Christian K=C3=B6nig <christian.koenig@amd.co=
m>
And fixing that promptly broke VCE due to vram location requirements.
Updated patch attached.  Thoughts?
And one more take to make things a bit more explicit for static kernel
driver allocations.
struct ttm_place::lpfn is honoured even with TTM_PL_FLAG_TOPDOWN, so
latter should work with RADEON_GEM_CPU_ACCESS. It sounds like the
problem is really that some BOs are expected to be within a certain
range from the beginning of VRAM, but lpfn isn't set accordingly. It
would be better to fix that by setting lpfn directly than indirectly via
RADEON_GEM_CPU_ACCESS.

Yeah, agree. We should probably try to find the root cause of this instea=
d.

As far as I know VCE has no documented limitation on where buffers are
placed (unlike UVD). So this is a bit strange. Are you sure that it isn't
UVD which breaks here?
It's definitely VCE, I don't know why UVD didn't have a problem.  I
considered using pin_restricted to make sure it got pinned in the CPU
visible region, but that had two problems: 1. it would end up getting
migrated when pinned,
Maybe something like radeon_uvd_force_into_uvd_segment() is needed for
VCE as well?


2. it would end up at the top of the restric=
ted
region since the top down flag is set which would end up fragmenting
vram.
If that's an issue (which outweighs the supposed benefit of
TTM_PL_FLAG_TOPDOWN), then again the proper solution would be not to set
TTM_PL_FLAG_TOPDOWN when rbo->placements[i].lpfn !=3D 0 and smaller th=
an
the whole available region, instead of checking for VRAM and
RADEON_GEM_CPU_ACCESS.

How about something like the attached patch?  I'm not really sure
about the restrictions for the UVD and VCE fw and stack/heap buffers,
but this seems to work.  It seems like the current UVD/VCE code works
by accident since the check for TOPDOWN fails.
This patch is getting a bit messy, mixing several logically separate
changes. Can you split it up accordingly? E.g. one patch just adding the
new fpfn and lpfn function parameters but passing 0 for them (so no
functional change), then one or several patches with the corresponding
functional changes, and finally one patch adding the new size parameter
(and thus making TTM_PL_FLAG_TOPDOWN actually used for newly allocated
BOs). I think that would help for reviewing and generally understanding
the changes.


@@ -105,14 +106,17 @@ void radeon_ttm_placement_=
from_domain(struct radeon_bo *rbo, u32 domain)
               */
              if ((rbo->flags & RADEON_GEM_NO_CPU_ACCESS) &&am=
p;
                  rbo->rdev->mc.visible_vram_size < rbo->rdev=
->mc.real_vram_size) {
-                     rbo->placements[c].fpfn =3D
-                             rbo->rdev->mc.visible_vram_size >&=
gt; PAGE_SHIFT;
+                     if (fpfn > (rbo->rdev->mc.visible_vram_siz=
e >> PAGE_SHIFT))
+                             rbo->placements[c].fpfn =3D fpfn;
+                     else
+                             rbo->placements[c].fpfn =3D
+                                     rbo->rdev->mc.visible_vram_si=
ze >> PAGE_SHIFT;
                      rbo->placements[c++].flags =3D TTM_PL_FLAG_WC |
                                                   TTM_PL_FLAG_UNCACHED |
                                                   TTM_PL_FLAG_VRAM;
              }
If (fpfn >=3D rbo->rdev->mc.visible_vram_size), this whole block=
 can be
skipped, since the next placement will be identical.

OTOH, fpfn is currently always 0 anyway, so maybe it's better not to add
that parameter in the first place.


Other than that, looks good to me.
Broken out patches attached.  Also available here:
http://cgit.freedesktop.org/~agd5f/lin=
ux/log/?h=3Dtopdown-fixes
Thinking more about it that approach is a NAK. For limiting a BO into visible VRAM we want the limit it to only apply to the VRAM domain entry, doing it this way it applies to GTT as well which is really bad for handling page faults.

I would rather say let us completely nuke radeon_ttm_placement_from_domain for internal allocations and give radeon_bo_create a ttm_placement pointer to use.

Driver internal allocations would then have a couple of predefined placements for their buffers. We might need to make a few ttm_placement pointers const for this, but I think that this is the better approach.

Regards,
Christian.



Alex


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/=
dri-devel

--------------070801020105050304050800-- --===============1986585015== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHA6Ly9saXN0 cy5mcmVlZGVza3RvcC5vcmcvbWFpbG1hbi9saXN0aW5mby9kcmktZGV2ZWwK --===============1986585015==--