From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Return-Path: From: Ruben Guerra Marin To: "linux-pci@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" Subject: Performance issues writing to PCIe in a Zynq Date: Fri, 3 Nov 2017 07:10:09 +0000 Message-ID: <1509693009315.39595@axon.tv> MIME-Version: 1.0 List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+bjorn=helgaas.com@lists.infradead.org List-ID: 4oCLCkhpLAoKSSBoYXZlIHRoZSBhIFp5bnEgYm9hcmQgcnVubmluZyBwZXRhbGludXgsIGFuZCBp dCBpcyBjb25uZWN0ZWQgdGhyb3VnaCBQQ0llIHRvIGEgVmlydGV4IFVsdHJhc2NhbGUgYm9hcmQu IEkgY29uZmlndXJlZCB0aGUgVWx0cmFzY2FsZSBmb3IgVGFuZGVtIFBDSWUsIHdoaWNoIHRoZSBz ZWNvbmQgc3RhZ2UgYml0c3RyZWFtIGlzIGJlaW5nIHByb2dyYW1tZWQgZnJvbSB0aGUgWnlucSBi b2FyZCAoSSBjcm9zc2VkIGNvbXBpbGVkIHRoZSBtY2FwIGFwcGxpY2F0aW9uIHRoYXQgWGlsaW54 IHByb3ZpZGVzKS4KIApUaGlzIHdvcmtzIHBlcmZlY3RseSwgYnV0IHRha2VzIGFyb3VuZCB+MTIg c2Vjb25kcyB0byBwcm9ncmFtIHRoZSBzZWNvbmQgc3RhZ2UgYml0c3RyZWFtIChjb21wcmVzc2Vk IGlzIH4xMiBNQiksIHdoaWNoIGlzIHF1aXRlIHNsb3cuIFdlIGFsc28gdHJpZWQgZGVidWdnaW5n IHRoZSBtY2FwIGFwcGxpY2F0aW9uIGFuZCBwY2l1dGlscy4gV2UgZm91bmQgb3V0IHRoZSBvcGVy YXRpb24gdGhhdCB0YWtlcyBsb25nIHRvIGV4ZWN1dGU6IEluIHBjaXV0aWxzLCB0aGUgaW5zdHJ1 Y3Rpb24gdG8gYWN0dWFsbHkgY2FsbCB0aGUgd3JpdGUgdG8gdGhlIGRyaXZlciAocHdyaXRlKSB0 YWtlcyBhcHByb3hpbWF0ZWx5IDZ1Uywgc28gaWYgeW91IGFkZCB1cCB0aGlzIGZvciAxMiBNQiB0 aGVuIHlvdSBjYW4gc2VlIHdoeSBpdCB0YWtlcyBzbyBsb25nLiBXaHkgaXMgdGhpcyBzbyBzbG93 PyBJcyB0aGlzIG1heWJlIGEgcHJvYmxlbSB3aXRoIHRoZSBkcml2ZXI/CgpGb3IgdGVzdGluZywg SSBhZGRlZCBhbiBJTEEgdG8gdGhlIEFYSSBidXMgaW4gYmV0d2VlbiB0aGUgWnlucSBHUDEgYW5k IHRoZSBQQ0llIElQIGNvbnRyb2wgcmVnaXN0ZXJzIHBvcnQuIEkgdHJpZ2dlcmVkIGhhbGZ3YXkg dGhlIHByb2dyYW1taW5nIG9mIHRoZSBiaXRzdHJlYW0gdXNpbmcgdGhlIG1jYXAgcHJvZ3JhbSBw cm92aWRlZCBieSBYaWxpbnguIEkgY2FuIHNlZSB0aGF0IGl0IGlzIHdyaXRpbmcgdG8gYWRkcmVz cyB4MzU4LCB3aGljaCBhY2NvcmRpbmcgdG8gdGhlICpkYXRhc2hlZXQqIChodHRwczovL3d3dy54 aWxpbnguY29tL0F0dGFjaG1lbnQvWGlsaW54X0Fuc3dlcl82NDc2MV9fVWx0cmFTY2FsZV9EZXZp Y2VzLnBkZikgaXMgdGhlIFdyaXRlIERhdGEgUmVnaXN0ZXIsIHdoaWNoIGlzIGNvcnJlY3QgKGFu ZCBhZ2FpbiwgSSBrbm93IHRoZSB3aG9sZSBiaXRzdHJlYW0gZ2V0cyBwcm9ncmFtbWVkIGNvcnJl Y3RseSkuCiAKQnV0IHdoYXQgSSBhbHNvIHNlZSBpcyB0aGF0IGEgImF3dmFsaWQiIGJlaW5nIGFz c2VydGVkIHRvIHRoZSBuZXh0IG9uZSBpdCB0YWtlcyAyNDUgY3ljbGVzLCBhbmQgSSBjYW4gaW1h Z2luZSB0aGlzIGlzIHdoeSBpdCB0YWtlcyAxMiBzZWNvbmRzIHRvIHByb2dyYW0gYSAxMk1CIGJp dHN0cmVhbS4K4oCLClRoYW5rcyBhIGxvdCwKCiAgCiBSdWJlbiBHdWVycmEgTWFyaW4gCnJ1YmVu Lmd1ZXJyYS5tYXJpbkBheG9uLnR2CiAgICAKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX18KbGludXgtYXJtLWtlcm5lbCBtYWlsaW5nIGxpc3QKbGludXgtYXJt LWtlcm5lbEBsaXN0cy5pbmZyYWRlYWQub3JnCmh0dHA6Ly9saXN0cy5pbmZyYWRlYWQub3JnL21h aWxtYW4vbGlzdGluZm8vbGludXgtYXJtLWtlcm5lbAo= From mboxrd@z Thu Jan 1 00:00:00 1970 From: ruben.guerra.marin@axon.tv (Ruben Guerra Marin) Date: Fri, 3 Nov 2017 07:10:09 +0000 Subject: Performance issues writing to PCIe in a Zynq Message-ID: <1509693009315.39595@axon.tv> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org ? Hi, I have the a Zynq board running petalinux, and it is connected through PCIe to a Virtex Ultrascale board. I configured the Ultrascale for Tandem PCIe, which the second stage bitstream is being programmed from the Zynq board (I crossed compiled the mcap application that Xilinx provides). This works perfectly, but takes around ~12 seconds to program the second stage bitstream (compressed is ~12 MB), which is quite slow. We also tried debugging the mcap application and pciutils. We found out the operation that takes long to execute: In pciutils, the instruction to actually call the write to the driver (pwrite) takes approximately 6uS, so if you add up this for 12 MB then you can see why it takes so long. Why is this so slow? Is this maybe a problem with the driver? For testing, I added an ILA to the AXI bus in between the Zynq GP1 and the PCIe IP control registers port. I triggered halfway the programming of the bitstream using the mcap program provided by Xilinx. I can see that it is writing to address x358, which according to the *datasheet* (https://www.xilinx.com/Attachment/Xilinx_Answer_64761__UltraScale_Devices.pdf) is the Write Data Register, which is correct (and again, I know the whole bitstream gets programmed correctly). But what I also see is that a "awvalid" being asserted to the next one it takes 245 cycles, and I can imagine this is why it takes 12 seconds to program a 12MB bitstream. ? Thanks a lot, Ruben Guerra Marin ruben.guerra.marin at axon.tv