From: Walker, Benjamin <benjamin.walker at intel.com>
To: spdk@lists.01.org
Subject: [SPDK] Re: SPDK socket abstraction layer
Date: Wed, 30 Oct 2019 23:20:56 +0000 [thread overview]
Message-ID: <b308bdfe242ba11689799e0da55ef84bb405e027.camel@intel.com> (raw)
In-Reply-To: d63f4372-3309-45c3-3e41-222cefca68aa@dev.mellanox.co.il
[-- Attachment #1: Type: text/plain, Size: 12482 bytes --]
On Wed, 2019-10-30 at 23:47 +0200, Sasha Kotchubievsky wrote:
> We started from following use-case: Initiator running on ARM above
> user-space stack (VMA) + TSO optimization . TSO -TCP segmentation
> offload. This optimization should improve sending large packets (storage
> case). In this case, we see benefit after applying zero-copy in any
> block size bigger than 512B.
>
> Obviously, you test x86 and target side. We didn't tested that yet. So,
> I can't say what is a bottleneck in target (x86) case. I'd suggest to
> reduce overhead for sending big buffer in network card by configuring
> TSO, or jubmo-frames. After that retest zero-copy solution. Maybe
> without TSO, memcopy is not a real bottleneck.
>
> memcpy in target side (on ARM) after applying TSO takes about 18% (4K
> IO). So, I believe, zero-copy should improve performance.
The copy is taking ~25% of CPU in our traces normally. I agree that it should be
a huge improvement.
>
> We can test your configuration too. It's very interesting understanding
> what's real bottleneck.
>
> What configuration do you use?
> - What's network card?
> - What's OS and kernel?
> - NULL devices, or real NVME disks?
> - queue depth and number of cores?
It's a Mellanox CX-5 100GbE card with Ubuntu 18.10 and kernel 5.4.0-rc4 that we
compiled ourselves. TSO and jumbo frames are enabled. The I/O is going to real
NVMe devices on the backend (there are 12 P4500 Intel NVMe SSDs attached). We're
running 8 cores on the initiator side, each sending queue depth 64 worth of 4k
I/O. The target is running just one core.
However, I just figured out what's wrong, so maybe you can help me fix it. The
zero copy is all working mechanically, except when I get the zero copy
completion notification, ee_code is set to SO_EE_CODE_ZEROCOPY_COPIED. So
something is not configured correctly in my networking stack and the kernel is
doing deferred copies instead. Any ideas what I'd need to do in order to enable
this? I'm fairly certain I have it working at my desk on Fedora 30 in loopback,
based on the CPU traces I'm seeing, so maybe it's just a matter of installing
Fedora 30 on the benchmark system instead.
>
> Is it enough to apply those two patches for zero-copy in target ?
>
> Asynchronous writev https://review.gerrithub.io/c/spdk/spdk/+/470523
>
> MSG_ZEROCOPY use in the posix implementation
> https://review.gerrithub.io/c/spdk/spdk/+/471752
Let me get the patches sorted out in a nice series and then you can grab the top
of the series for testing.
>
>
> BR,
>
> Sasha
>
> On 30-Oct-19 10:28 PM, Walker, Benjamin wrote:
> > On Wed, 2019-10-30 at 21:55 +0200, Sasha Kotchubievsky wrote:
> > > Hi Ben,
> > >
> > > Great list of patches,
> > > This work will , definitely, take NVME-OF TCP at the next level.
> > >
> > > We, also, work on zero-copy (TX) in initiator side based on MSG_ZEROCOPY.
> > > Preliminary results are great. In the target side, we still investigate
> > > the
> > > solution.
> > I'm very interested to hear about your work here. I've been doing it
> > primarily
> > on the target side and the preliminary results we're seeing are that it's
> > slower
> > for 4K I/O. That's not really what I expected at all, and I feel like I must
> > be
> > missing something with getting the page pinning to hit the fast path
> > consistently. It's early days with all of these things, so it's a safe
> > assumption that I either coded something incorrectly or the system isn't
> > configured right.
> >
> > > We run a lot of tests and see great potential for zero-copy. It look like
> > > a
> > > real bottleneck. On the send side, it can be removed with POSIX interface
> > > (MSG_ZEROCOPY), in receive side, it needs deep integration between TCP
> > > stack
> > > and SPDK. Next week we will have internal brain-storming, and, I hope, on
> > > dev
> > > meetup, I'll be ready for discussions.
> > >
> > > We will invest in both directions: user-space path and in Linux Kernel
> > > path. But, in user-space area, at this stage, VPP is out of our interest.
> > >
> > > Best regards
> > > Sasha
> > >
> > > -----Original Message-----
> > > From: Walker, Benjamin <benjamin.walker(a)intel.com>
> > > Sent: Wednesday, October 30, 2019 8:47 PM
> > > To: spdk(a)lists.01.org
> > > Subject: [SPDK] Re: SPDK socket abstraction layer
> > >
> > > On Wed, 2019-10-30 at 17:54 +0000, Harris, James R wrote:
> > > > Hi Sasha,
> > > >
> > > > Tomek is only talking about the VPP implementation. There are no
> > > > plans to remove the socket abstraction layer. If anything, the
> > > > project needs to look at extending it in ways as you suggested.
> > > To expand on this, there's a lot of activity right now in the SPDK sock
> > > abstraction layer to begin to implement asynchronous operations, zero copy
> > > operations, etc. For example, see:
> > >
> > > Asynchronous writev
> > > https://review.gerrithub.io/c/spdk/spdk/+/470523
> > >
> > > MSG_ZEROCOPY use in the posix implementation
> > > https://review.gerrithub.io/c/spdk/spdk/+/471752
> > >
> > > A new sock implementation based on io_uring/libaio:
> > > https://review.gerrithub.io/c/spdk/spdk/+/471314
> > >
> > > And a new sock implementation based on Seastar:
> > > https://review.gerrithub.io/c/spdk/spdk/+/466629
> > >
> > > So not only is the sock abstraction layer sticking around, but it's
> > > getting a
> > > lot of focus going forward. There is a lot of innovation happening in the
> > > Linux kernel around networking at all layers that we need to keep up with.
> > >
> > > One thing I would like community feedback on is what to do about the
> > > current
> > > VPP implementation. As we make improvements and additions to the sock
> > > abstraction, it will necessarily require updates to the VPP
> > > implementation. We
> > > can of course continue to make those, but does the community see value in
> > > maintaining support here? I'd really love to see someone take up the
> > > mantle on
> > > VPP if they believe there is value that we just haven't been able to
> > > unlock
> > > yet, but absent that it's just a maintenance burden.
> > >
> > > Personally speaking, it would be easier for me, as someone trying to
> > > evolve
> > > the sock abstraction layer, to drop VPP. That's one less implementation
> > > that I
> > > then have to go update and test each time. But I'm very open to opinions
> > > and
> > > feedback here if anyone has something to say. SPDK obviously can't just
> > > drop
> > > support without strong consensus and a considerable amount of forewarning.
> > >
> > > Thanks,
> > > Ben
> > >
> > > > -Jim
> > > >
> > > >
> > > > On 10/30/19, 10:50 AM, "Sasha Kotchubievsky"
> > > > <sashakot(a)dev.mellanox.co.il>
> > > > wrote:
> > > >
> > > > Hi Tomek,
> > > >
> > > > Are you looking for community feedback regarding VPP implementation
> > > > of
> > > > TCP
> > > > stack, or about having socket abstraction layer in SPDK?
> > > > I think, socket abstraction layer is critical for future
> > > > integration between
> > > > SPDK and user-space stacks. In Mellanox, we're evaluating
> > > > integration
> > > > between VMA (https://github.com/Mellanox/libvma) and SPDK.
> > > > Although, VMA can
> > > > be used as replacement for Kernel implementation of Posix socket
> > > > interface,
> > > > we see great potential in "deep" integration, which definitely
> > > > needs
> > > > keep
> > > > existing abstraction layer. For example, one of potential
> > > > improvements
> > > > can
> > > > be zero-copy in RX (receive) flow. I don't see how that can be
> > > > implemented
> > > > on top of Linux Kernel stack.
> > > >
> > > > Best regards
> > > > Sasha
> > > >
> > > > -----Original Message-----
> > > > From: Zawadzki, Tomasz <tomasz.zawadzki(a)intel.com>
> > > > Sent: Monday, October 21, 2019 3:01 PM
> > > > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > > Subject: [SPDK] SPDK socket abstraction layer
> > > >
> > > > Hello everyone,
> > > >
> > > > Summary:
> > > >
> > > > With this message I wanted to update SPDK community on state of VPP
> > > > socket
> > > > abstraction as of SPDK 19.07 release.
> > > > At this time there does not seem to be a clear efficiency
> > > > improvements with
> > > > VPP. There is no further work planned on SPDK and VPP integration.
> > > >
> > > > Details:
> > > >
> > > > As some of you may remember, SPDK 18.04 release introduced support
> > > > for
> > > > alternative socket types. Along with that release, Vector Packet
> > > > Processing
> > > > (VPP)<https://wiki.fd.io/view/VPP> 18.01 was integrated with SPDK,
> > > > by
> > > > expanding socket abstraction to use VPP Communications Library
> > > > (VCL).
> > > > TCP/IP
> > > > stack in VPP<https://wiki.fd.io/view/VPP/HostStack> was in early
> > > > stages back
> > > > then and has seen improvements throughout the last year.
> > > >
> > > > To better use VPP capabilities, following fruitful collaboration
> > > > with
> > > > VPP
> > > > team, in SPDK 19.07, this implementation was changed from VCL to
> > > > VPP Session
> > > > API from VPP 19.04.2.
> > > >
> > > > VPP socket abstraction has met some challenges due to inherent
> > > > design of
> > > > both projects, in particular related to running separate processes
> > > > and
> > > > memory copies.
> > > > Seeing improvements from original implementation was encouraging,
> > > > yet
> > > > measuring against posix socket abstraction (taking into
> > > > consideration entire
> > > > system, i.e. both processes), results are comparable. In other
> > > > words, at
> > > > this time there does not seem to be a clear benefit of either
> > > > socket
> > > > abstraction from standpoint of CPU efficiency or IOPS.
> > > >
> > > > With this message I just wanted to update SPDK community on state
> > > > of socket
> > > > abstraction layers as of SPDK 19.07 release. Each SPDK release
> > > > always brings
> > > > improvements to the abstraction and its implementations, with
> > > > exciting work
> > > > on more efficient use of kernel TCP stack - changes in SPDK 19.10
> > > > and
> > > > SPDK
> > > > 20.01.
> > > >
> > > > However there is no active involvement at this point around VPP
> > > > implementation of socket abstraction in SPDK. Contributions in
> > > > this area are
> > > > always welcome. In case you're interested in implementing further
> > > > enhancements of VPP and SPDK integration feel free to reply, or to
> > > > use
> > > > one
> > > > of the many SPDK community communications
> > > > channels<https://spdk.io/community/>;;;.
> > > >
> > > > Thanks,
> > > > Tomek
> > > >
> > > > _______________________________________________
> > > > SPDK mailing list -- spdk(a)lists.01.org
> > > > To unsubscribe send an email to spdk-leave(a)lists.01.org
> > > > _______________________________________________
> > > > SPDK mailing list -- spdk(a)lists.01.org
> > > > To unsubscribe send an email to spdk-leave(a)lists.01.org
> > > >
> > > >
> > > > _______________________________________________
> > > > SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an email to
> > > > spdk-leave(a)lists.01.org
> > > _______________________________________________
> > > SPDK mailing list -- spdk(a)lists.01.org
> > > To unsubscribe send an email to spdk-leave(a)lists.01.org
> > > _______________________________________________
> > > SPDK mailing list -- spdk(a)lists.01.org
> > > To unsubscribe send an email to spdk-leave(a)lists.01.org
> > _______________________________________________
> > SPDK mailing list -- spdk(a)lists.01.org
> > To unsubscribe send an email to spdk-leave(a)lists.01.org
> _______________________________________________
> SPDK mailing list -- spdk(a)lists.01.org
> To unsubscribe send an email to spdk-leave(a)lists.01.org
next reply other threads:[~2019-10-30 23:20 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-30 23:20 Walker, Benjamin [this message]
-- strict thread matches above, loose matches on Subject: below --
2020-06-01 16:55 [SPDK] Re: SPDK socket abstraction layer Zawadzki, Tomasz
2019-11-07 18:45 Walker, Benjamin
2019-11-07 16:26 Or Gerlitz
2019-11-06 10:19 Or Gerlitz
2019-11-05 19:56 Sasha Kotchubievsky
2019-11-05 18:08 Walker, Benjamin
2019-11-05 15:06 Or Gerlitz
2019-11-05 5:29 allenz
2019-11-03 16:56 Walker, Benjamin
2019-11-03 15:59 Or Gerlitz
2019-10-31 21:11 Andrey Kuzmin
2019-10-31 18:54 Walker, Benjamin
2019-10-31 14:21 Sasha Kotchubievsky
2019-10-30 21:47 Sasha Kotchubievsky
2019-10-30 20:28 Walker, Benjamin
2019-10-30 19:55 Sasha Kotchubievsky
2019-10-30 18:46 Walker, Benjamin
2019-10-30 17:54 Harris, James R
2019-10-30 17:50 Sasha Kotchubievsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b308bdfe242ba11689799e0da55ef84bb405e027.camel@intel.com \
--to=spdk@lists.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.