From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: [RFC PATCH v2] Utilize the PCI API in the TTM framework. Date: Tue, 11 Jan 2011 13:28:57 -0500 Message-ID: <20110111182857.GC29223@dumpdata.com> References: <1294420304-24811-1-git-send-email-konrad.wilk@oracle.com> <4D2B16F3.1070105@shipmail.org> <20110110152135.GA9732@dumpdata.com> <4D2B2CC1.2050203@shipmail.org> <20110110164519.GA27066@dumpdata.com> <4D2B70FB.3000504@shipmail.org> <20110111155545.GD10897@dumpdata.com> <20110111165953.GI10897@dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Alex Deucher Cc: Thomas Hellstrom , konrad@darnok.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org On Tue, Jan 11, 2011 at 01:12:57PM -0500, Alex Deucher wrote: > On Tue, Jan 11, 2011 at 11:59 AM, Konrad Rzeszutek Wilk > wrote: > >> >> Another thing that I was thinking of is what happens if you hav= e a > >> >> huge gart and allocate a lot of coherent memory. Could that > >> >> potentially exhaust IOMMU resources? > >> > > >> > > >> > > >> > So the GART is in the PCI space in one of the BARs of the device= right? > >> > (We are talking about the discrete card GART, not the poor man A= MD IOMMU?) > >> > The PCI space is under the 4GB, so it would be considered cohere= nt by > >> > definition. > >> > >> GART is not a PCI BAR; it's just a remapper for system pages. =A0O= n > >> radeon GPUs at least there is a memory controller with 3 programma= ble > >> apertures: vram, internal gart, and agp gart. =A0You can map these > > > > To access it, ie, to program it, you would need to access the PCIe = card > > MMIO regions, right? So that would be considered in PCI BAR space? >=20 > yes, you need access to the mmio aperture to configure the gpu. I wa= s > thinking you mean something akin the the framebuffer BAR only for gar= t > space which is not the case. Aaah, gotcha. >=20 > > > >> resources whereever you want in the GPU's address space and then t= he > >> memory controller takes care of the translation to off-board resou= rces > >> like gart pages. =A0On chip memory clients (display controllers, t= exture > >> blocks, render blocks, etc.) write to internal GPU addresses. =A0T= he GPU > >> has it's own direct connection to vram, so that's not an issue. =A0= =46or > >> AGP, the GPU specifies aperture base and size, and you point it to= the > >> bus address of gart aperture provided by the northbridge's AGP > >> controller. =A0For internal gart, the GPU has a page table stored = in > > > > I think we are just talking about the GART on the GPU, not the old = AGP > > GART. >=20 > Ok. I just mentioned it for completeness. >=20 > > > >> either vram or uncached system memory depending on the asic. =A0It > >> provides a contiguous linear aperture to GPU clients and the memor= y > >> controller translates the transactions to the backing pages via th= e > >> pagetable. > > > > So I think I misunderstood what is meant by 'huge gart'. That sound= s > > like linear address space provided by GPU. And hooking up a lot of = coherent > > memory (so System RAM) to that linear address space would be no dif= ferent that what > > is currently being done. When you allocate memory using page_alloc(= GFP_DMA32) > > and hook up that memory to the linear space you exhaust the same am= ount of > > ZONE_DMA32 memory as if you were to use the PCI API. It comes from = the same > > pool, except that doing it from the PCI API gets you the bus addres= s right > > away. > > >=20 > In this case GPU clients refers to the hw blocks on the GPU; they are > the ones that see the contiguous linear aperture. From the > application's perspective, gart memory looks like any other pages. . Those 'hw blocks' or 'gart memory' are in reality just pages received via the 'alloc_page()' (before this patchset and=20 also after this patchset) Oh wait, this 'hw blocks' or 'gart memory' ca= n also refer to the VRAM memory right? In which case that is not memory alloca= ted via 'alloc_page' but using a different mechanism. Is TTM used then? If so h= ow do you stick those VRAM pages under its accounting rules? Or do the dri= vers use some other mechanism for that that is dependent on each driver?