From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============1433358023465857294==" MIME-Version: 1.0 From: Jonathan Richardson Subject: Re: [SPDK] spdk/vpp performance Date: Mon, 16 Sep 2019 14:03:06 -0700 Message-ID: In-Reply-To: 89b24cc8-28ac-1fa3-a12e-05d2310cef6a@dev.mellanox.co.il List-ID: To: spdk@lists.01.org --===============1433358023465857294== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Hi Sasha, Thank you for your feedback. Looks like making vpp/nvmeotcp zero copy is not straight forward at this point but at least we know where we stand. Jon -----Original Message----- From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha Kotchubievsky Sent: Thursday, September 12, 2019 9:25 AM To: spdk(a)lists.01.org Subject: Re: [SPDK] spdk/vpp performance Hi, I'd recommend to use VMA user-space stack for TCP https://github.com/Mellanox/libvma With VMA we see better performance than with VPP. I think, zero-copy is not feasible in case of VPP. We have plans to implement zero-copy in NVME-OF TCP above VMA. Best regards Sasha On 10-Sep-19 10:42 PM, Jonathan Richardson via SPDK wrote: > Hi, > > I'm using vpp 19.04.2 and spdk 19.07. I have a couple questions about the > architecture of the session API with nvmeotcp. I attached my perf results > for a 4k random read. I have 3 cores for vpp workers with 3 RSS queues and > the other 5 cores are for spdk. The memcpy is the bottleneck taking 31% > cpu usage on the spdk threads when it writes the completion into shared > memory. The vpp thread memcpy is 54% when it pulls the data out for tx. > You can flip those results around for a random write. L3 cache miss is bad > at 60+%. With RDMA it's ~10%. This gives rather poor performance. Memory > is more of a bottleneck than cpu on my ARM system, so the spdk design > worked out well being zero copy. It would be better if the copying in and > out of the shmem were avoided though I'm not sure it's feasible and need > to study the code further. > > Are there any plans to change the session API or the way it uses memory? > > Spdk and vpp both use dpdk independently with different memory regions > (--file-prefix). But the shared memory in the session API uses a libc > malloc'd buffer. Would using a dpdk mempool be better for cache efficiency > and synchronization with other dpdk memory usage? > > Any thoughts on shared dpdk mempools between vpp and spdk instead of > copying into the session API's shared memory? It looks like there are 2 > copies for 1 read, the nvme completion into shmem, and the vpp copy out > (haven't looked where session_tx_fifo_peek_and_snd is copying to). Even > getting rid of one of the memcpy's could help. > > We are looking into possible optimizations so any guidance is appreciated > since we want any changes merged into the mainline. > > Thanks, > Jon > > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk _______________________________________________ SPDK mailing list SPDK(a)lists.01.org https://lists.01.org/mailman/listinfo/spdk --===============1433358023465857294==--