From mboxrd@z Thu Jan  1 00:00:00 1970
Content-Type: multipart/mixed; boundary="===============1433358023465857294=="
MIME-Version: 1.0
From: Jonathan Richardson <jonathan.richardson at broadcom.com>
Subject: Re: [SPDK] spdk/vpp performance
Date: Mon, 16 Sep 2019 14:03:06 -0700
Message-ID: <c1001429109ffbb18188ff7eb948ed2b@mail.gmail.com>
In-Reply-To: 89b24cc8-28ac-1fa3-a12e-05d2310cef6a@dev.mellanox.co.il
List-ID: <spdk@lists.01.org>
To: spdk@lists.01.org

--===============1433358023465857294==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

Hi Sasha,

Thank you for your feedback. Looks like making vpp/nvmeotcp
zero copy is not straight forward at this point but at least we know
where we stand.

Jon

-----Original Message-----
From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Sasha
Kotchubievsky
Sent: Thursday, September 12, 2019 9:25 AM
To: spdk(a)lists.01.org
Subject: Re: [SPDK] spdk/vpp performance

Hi,

I'd recommend to use VMA user-space stack for TCP

https://github.com/Mellanox/libvma

With VMA we see better performance than with VPP.

I think, zero-copy is not feasible in case of VPP.

We have plans to implement zero-copy in NVME-OF TCP above VMA.

Best regards

Sasha

On 10-Sep-19 10:42 PM, Jonathan Richardson via SPDK wrote:
> Hi,
>
> I'm using vpp 19.04.2 and spdk 19.07. I have a couple questions about
the
> architecture of the session API with nvmeotcp. I attached my perf
results
> for a 4k random read. I have 3 cores for vpp workers with 3 RSS queues
and
> the other 5 cores are for spdk. The memcpy is the bottleneck taking 31%
> cpu usage on the spdk threads when it writes the completion into shared
> memory. The vpp thread memcpy is 54% when it pulls the data out for tx.
> You can flip those results around for a random write. L3 cache miss is
bad
> at 60+%. With RDMA it's ~10%. This gives rather poor performance. Memory
> is more of a bottleneck than cpu on my ARM system, so the spdk design
> worked out well being zero copy. It would be better if the copying in
and
> out of the shmem were avoided though I'm not sure it's feasible and need
> to study the code further.
>
> Are there any plans to change the session API or the way it uses memory?
>
> Spdk and vpp both use dpdk independently with different memory regions
> (--file-prefix). But the shared memory in the session API uses a libc
> malloc'd buffer. Would using a dpdk mempool be better for cache
efficiency
> and synchronization with other dpdk memory usage?
>
> Any thoughts on shared dpdk mempools between vpp and spdk instead of
> copying into the session API's shared memory? It looks like there are 2
> copies for 1 read, the nvme completion into shmem, and the vpp copy out
> (haven't looked where session_tx_fifo_peek_and_snd is copying to). Even
> getting rid of one of the memcpy's could help.
>
> We are looking into possible optimizations so any guidance is
appreciated
> since we want any changes merged into the mainline.
>
> Thanks,
> Jon
>
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org
https://lists.01.org/mailman/listinfo/spdk

--===============1433358023465857294==--