From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arvind R Subject: Re: Nouveau on dom0 Date: Wed, 3 Mar 2010 03:04:19 +0530 Message-ID: References: <20100225125552.GC9040@phenom.dumpdata.com> <20100225174411.GA13270@phenom.dumpdata.com> <20100301160130.GB7881@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20100301160130.GB7881@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Konrad Rzeszutek Wilk Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On Mon, Mar 1, 2010 at 9:31 PM, Konrad Rzeszutek Wilk wrote: > On Fri, Feb 26, 2010 at 09:04:33PM +0530, Arvind R wrote: >> On Thu, Feb 25, 2010 at 11:14 PM, Konrad Rzeszutek Wilk >> wrote: >> > On Thu, Feb 25, 2010 at 09:01:48AM -0800, Arvind R wrote: >> >> On Thu, Feb 25, 2010 at 6:25 PM, Konrad Rzeszutek Wilk >> >> wrote: >> >> > On Thu, Feb 25, 2010 at 02:16:07PM +0530, Arvind R wrote: >> >> >> Hi all, >> >> >> I merged the drm-tree from 2.6.33-rc8 into jeremy's 2.6.31.6 maste= r and >> >> =3D=3D=3D=3D=3D=3D=3D snip =3D=3D=3D=3D=3D=3D=3D >> >> > is not. Would it be possible to trace down who allocates that *chan= ? You >> >> > say it is 'PRAMIN' - is that allocated via pci_alloc_* call? >> =3D=3D=3D=3D=3D=3D=3D snip =3D=3D=3D=3D=3D=3D=3D >> >> So, there must be a mmap call somewhere to map the area to user-space >> >> for that problem write to work on non-Xen boots. Will try track down = some more >> >> and post. With mmaps and PCIGARTs - it will be some hunt! >> =A0=3D=3D=3D=3D=3D=3D=3D snip =3D=3D=3D=3D=3D=3D=3D >> > to the drm_radeon driver which used it as a ring buffer. Took a bit of >> > hoping around to find who allocated it in the first place. >> > >> After a lot of reboots and log viewing: >> The pushbuf (FIFO/RING) is the only means of programming the card DMA >> activity. It is exposed to user-space by mmap of the drm_device (PCI) ha= ndle >> with different offsets for each channel. Parameters are associated to th= e DMA >> command using ioctls to bind channels/sub-channels/contexts. This mmap i= s >> in the libdrm2 library. Libdrm channel/accelerator =A0initialization and >> setup chores >> =A0and the DDX driver (xf86-video-nouveau) more-or-less acts thro' libdr= m. > > Ok, that is the DRM_NOUVEAU_CHANNEL_ALLOC ioctl, which ends up calling > the 'ttm_bo_init'. I remember Pasi having an issue with this on Radeon > and I provided a hack to see if it would work. Take a look at this > e-mail: > > http://lists.xensource.com/archives/cgi-bin/extract-mesg.cgi?a=3Dxen-deve= l&m=3D2010-01&i=3D20100115071856.GD17978%40reaktio.net > >> >> My suspicion is that Xen has some problems with mmap of PCI(E) device >> memory. How is iomem handled in a mmap? > > It looks to be using 'ioremap' which is Xen safe. Unless your card has > an AGP bridge on it, at which point it would end up using > dma_alloc_coherent in all likehood. > >> >> As of now, accelerator on Xen stops right at the initialisation stage - = when >> libdrm tries to set up the accelerator-engine in the course of ScreenIni= t. And >> to do that, it cannot write the command to setup the basic 2D engine. > > I think that the ttm_bo calls set up pages in the 4KB size, but the > initial channel requests a 64KB one. I think it also sets up Got that far, tried some dirty patches of mine which broke the framebuffer Your ttm patch using dma_alloc_coherent instead of alloc_page resulted in the same problem as with the Radeon report - leaking pages, erroneous page = count > page-table directory so that when the GPU accesses the addresses, it > gets the real bus address. I wonder if it fails at that thought - > meaning that the addresses that are written to the page table are > actually the guest page numbers (gpfn) instead of the machine page number= s (mfn). No, I don't think thats how it works. The user-space write triggers an aio-write - I got that in a trace that my patch caused - which page_faults and leads to the ttm_bo_fault. I tried to alloc_pages in ttm_bo_vm_fault but I think I = got the remap_pfn_range address parameter wrong. This patch crashed the same way under bare boot as on xen with_or_without the patch! So it is clearly the mmap of pushbuf thats the block. ttm_bo_vm_fault is the pivot for the pushbuf_bo allocation My patch in ttm_bo_vm_fault: if (io_mem) { /* retain the orig. speculative pre-fault code */ ... } else { /* ttm_bo_get_pages is modified __ttm_tt_get_page using alloc_pages Irrespective of where fault occurs, fault-in the whole buffer */ pages =3D ttm_bo_get_pages(ttm, get_order(bo->num_pages)); pfn =3D page_to_pfn(page); remap_pfn_range(vma, bo->buffer_start, pfn, bo->num_pages << PAGE_SHIFT= , vma->vm_page_prot); /* Triggers Kernel BUG invalid opcode */ } BTW, ttm_bo_vm_fault is the ONLY user of vm_insert_mixed in the kernel tre= e! Tried to use split_page() - resulted in undefined symbol! > The other issue might be that your back-port broke the AGP allocation. > Nope - untouched and same.