From mboxrd@z Thu Jan 1 00:00:00 1970 From: ww-ml@gmx.de (Wolfgang Wegner) Date: Tue, 14 Sep 2010 09:03:41 +0200 Subject: Kirkwood PCI(e) write performance and DMA engine support for copy_{to, from}_user? In-Reply-To: References: <20100906100244.GA6897@debian-wegner1.datadisplay.de> <20100906140347.GA24522@n2100.arm.linux.org.uk> <20100906141444.GD6897@debian-wegner1.datadisplay.de> <20100907161156.GA3625@debian-wegner1.datadisplay.de> <20100908083558.GA3393@debian-wegner1.datadisplay.de> <20100909162135.GA3524@debian-wegner1.datadisplay.de> Message-ID: <20100914070341.GA3438@debian-wegner1.datadisplay.de> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Leon, On Mon, Sep 13, 2010 at 07:10:59PM +0200, Leon Woestenberg wrote: > Hello Wolfgang, > > On Thu, Sep 9, 2010 at 6:21 PM, Wolfgang Wegner wrote: > > On Wed, Sep 08, 2010 at 10:35:58AM +0200, Wolfgang Wegner wrote: > >> > > With the FPGA evaluation board I get: > > - around 38 MBytes/second with Nicolas' inline assembly code > > - around 6 MBytes/second with any other C code (mmapped) as > > ?well as write() via dd > > > > So the main problem seems to be either our board implementation > > of the PCIe->PCI bridge or the FPGA. However, I am still wondering > > how a framebuffer-based application can attain reasonable performance, > > > Having implemented a framebuffer demo on an FPGA recently using PCI > Express, I think the main performance gain is made by having the DMA > done by the endpoint (FPGA) rather than by the CPU. this is what I read all around, however, I do not see how this can improve anything when using an mmap()ed frame buffer for pixel-oriented operations... This is why I thought about reverting to write() and simply transfer complete frames, which would be sufficient for about 90% of my application scenarios - and for the other 10% I could live with the lower performance. Regards, Wolfgang