From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mark Zhang Subject: Re: [PATCHv4 0/8] Support for Tegra 2D hardware Date: Thu, 03 Jan 2013 11:31:59 +0800 Message-ID: <50E4FBAF.30700@gmail.com> References: <1356089964-5265-1-git-send-email-tbergstrom@nvidia.com> <50DD6311.9000002@gmail.com> <50E40106.4020406@nvidia.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: In-Reply-To: <50E40106.4020406-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> Sender: linux-tegra-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: =?ISO-8859-1?Q?Terje_Bergstr=F6m?= Cc: "thierry.reding-RM9K5IK7kjKj5M59NBduVrNAH6kLmebB@public.gmane.org" , "airlied-cv59FeDIM0c@public.gmane.org" , "dev-8ppwABl0HbeELgA04lAiVw@public.gmane.org" , "dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org" , "linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , "linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" List-Id: dri-devel@lists.freedesktop.org On 01/02/2013 05:42 PM, Terje Bergstr=F6m wrote: > On 28.12.2012 11:14, Mark Zhang wrote: >> diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr= 2d.c >> index a936902..c3ded60 100644 >> --- a/drivers/gpu/drm/tegra/gr2d.c >> +++ b/drivers/gpu/drm/tegra/gr2d.c >> @@ -131,6 +131,14 @@ static int gr2d_submit(struct tegra_drm_context >> *context, >> if (err) >> goto fail; >> >> + /* We define CMA as an temporary solution in host1x = driver. >> + That's also why we have a CMA kernel config in it= =2E >> + But seems here, in tegradrm, we hardcode the CMA = here. >> + So what do you do when host1x change to IOMMU? >> + Do we also need to define a CMA config in tegradr= m >> driver, >> + then after host1x changes to IOMMU, we add anothe= r IOMMU >> + config in tegradrm? Or we should invent another m= ore >> + generic wrapper layer here? */ >> cmdbuf.mem =3D handle_cma_to_host1x(drm, file_priv, >> cmdbuf.mem); >=20 > Lucas is working on host1x allocator, so let's defer this comment unt= il > we have Lucas' code. >=20 OK. >> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c >> index cc8ca2f..0867b72 100644 >> --- a/drivers/gpu/host1x/job.c >> +++ b/drivers/gpu/host1x/job.c >> @@ -82,6 +82,26 @@ struct host1x_job *host1x_job_alloc(struct >> host1x_channel *ch, >> mem +=3D num_unpins * sizeof(dma_addr_t); >> job->pin_ids =3D num_unpins ? mem : NULL; >> >> + /* I think this is a somewhat ugly design. >> + Actually "addr_phys" is consisted by "reloc_addr_phys" an= d >> + "gather_addr_phys". >> + Why don't we just declare "reloc_addr_phys" and >> "gather_addr_phys"? >> + In current design, let's say if one nvhost newbie changes= the >> order >> + of these 2 blocks of code in function "pin_job_mem": >> + >> + for (i =3D 0; i < job->num_relocs; i++) { >> + struct host1x_reloc *reloc =3D &job->relocarray[i]; >> + job->pin_ids[count] =3D reloc->target; >> + count++; >> + } >> + >> + for (i =3D 0; i < job->num_gathers; i++) { >> + struct host1x_job_gather *g =3D &job->gathers[i]; >> + job->pin_ids[count] =3D g->mem_id; >> + count++; >> + } >> + >> + Then likely something weird occurs... */ >=20 > We do this because this way we can implement batch pinning of memory > handles. This way we can decrease memory handle management a lot as w= e > need to do locking only once per submit. >=20 Sorry I didn't get it. Yes, in current design, you can pin all mem handles in one time but I haven't found anything related with "locking only once per submit". My idea is: - remove "job->addr_phys" - assign "job->reloc_addr_phys" & "job->gather_addr_phys" separately - In "pin_job_mem", just call "host1x_memmgr_pin_array_ids" twice to fill the "reloc_addr_phy" & "gather_addr_phys". Anything I misunderstood? > Decreasing memory management overhead is really important, because in > complex graphics cases, there are might be a hundreds of buffer > references per submit, and several submits per frame. Any extra overh= ead > relates directly to reduced performance. >=20 >> job->reloc_addr_phys =3D job->addr_phys; >> job->gather_addr_phys =3D &job->addr_phys[num_relocs]; >> >> @@ -252,6 +272,10 @@ static int do_relocs(struct host1x_job *job, >> } >> } >> >> + /* I have question here. Does this mean the address = info >> + which need to be relocated(according to the libdr= m codes, >> + seems this address is "0xDEADBEEF") always stayin= g at the >> + beginning of a page? */ >> __raw_writel( >> (job->reloc_addr_phys[i] + >> reloc->target_offset) >> reloc->shif= t, >=20 > No - the slot can be anywhere. That's why we have cmdbuf_offset in th= e > reloc struct. >=20 Yes. Sorry I must misread the code before. >> @@ -565,6 +589,7 @@ int host1x_job_pin(struct host1x_job *job, struc= t >> platform_device *pdev) >> } >> } >> >> + /* if (host1x_firewall && !err) { */ >> if (host1x_firewall) { >> err =3D copy_gathers(job, pdev); >> if (err) { >=20 > Will add. >=20 >> @@ -573,6 +598,9 @@ int host1x_job_pin(struct host1x_job *job, struc= t >> platform_device *pdev) >> } >> } >> >> +/* Rename this label to "out" or something else. >> + Because if everything goes right, the codes under this label als= o >> + get executed. */ >> fail: >> wmb(); >=20 > Will do. >=20 >> >> diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr= =2Ec >> index f3954f7..bb5763d 100644 >> --- a/drivers/gpu/host1x/memmgr.c >> +++ b/drivers/gpu/host1x/memmgr.c >> @@ -156,6 +156,9 @@ int host1x_memmgr_pin_array_ids(struct >> platform_device *dev, >> count, &unpin_data[pin_count], >> phys_addr); >> >> + /* I don't think the current "host1x_cma_pin_array_i= ds" >> + is able to return a negative value. So this "if" = doesn't >> + make sense...*/ >> if (cma_count < 0) { >> /* clean up previous handles */ >> while (pin_count) { >=20 > It should return negative in case of error. >=20 "host1x_cma_pin_array_ids" doesn't return negative value right now, so maybe you need to take a look at it. > Terje >=20