From: Joe Damato <jdamato@fastly.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, davem@davemloft.net,
linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: [RFC,net-next,x86 0/6] Nontemporal copies in unix socket write path
Date: Thu, 12 May 2022 15:53:05 -0700 [thread overview]
Message-ID: <20220512225302.GA74948@fastly.com> (raw)
In-Reply-To: <20220512124608.452d3300@kernel.org>
On Thu, May 12, 2022 at 12:46:08PM -0700, Jakub Kicinski wrote:
> On Wed, 11 May 2022 18:01:54 -0700 Joe Damato wrote:
> > > Is there a practical use case?
> >
> > Yes; for us there seems to be - especially with AMD Zen2. I'll try to
> > describe such a setup and my synthetic HTTP benchmark results.
> >
> > Imagine a program, call it storageD, which is responsible for storing and
> > retrieving data from a data store. Other programs can request data from
> > storageD via communicating with it on a Unix socket.
> >
> > One such program that could request data via the Unix socket is an HTTP
> > daemon. For some client connections that the HTTP daemon receives, the
> > daemon may determine that responses can be sent in plain text.
> >
> > In this case, the HTTP daemon can use splice to move data from the unix
> > socket connection with storageD directly to the client TCP socket via a
> > pipe. splice saves CPU cycles and avoids incurring any memory access
> > latency since the data itself is not accessed.
> >
> > Because we'll use splice (instead of accessing the data and potentially
> > affecting the CPU cache) it is advantageous for storageD to use NT copies
> > when it writes to the Unix socket to avoid evicting hot data from the CPU
> > cache. After all, once the data is copied into the kernel on the unix
> > socket write path, it won't be touched again; only spliced.
> >
> > In my synthetic HTTP benchmarks for this setup, we've been able to increase
> > network throughput of the the HTTP daemon by roughly 30% while reducing
> > the system time of storageD. We're still collecting data on production
> > workloads.
> >
> > The motivation, IMHO, is very similar to the motivation for
> > NETIF_F_NOCACHE_COPY, as far I understand.
> >
> > In some cases, when an application writes to a network socket the data
> > written to the socket won't be accessed again once it is copied into the
> > kernel. In these cases, NETIF_F_NOCACHE_COPY can improve performance and
> > helps to preserve the CPU cache and avoid evicting hot data.
> >
> > We get a sizable benefit from this option, too, in situations where we
> > can't use splice and have to call write to transmit data to client
> > connections. We want to get the same benefit of NETIF_F_NOCACHE_COPY, but
> > when writing to Unix sockets as well.
> >
> > Let me know if that makes it more clear.
>
> Makes sense, thanks for the explainer.
>
> > > The patches look like a lot of extra indirect calls.
> >
> > Yup. As I mentioned in the cover letter this was mostly a PoC that seems to
> > work and increases network throughput in a real world scenario.
> >
> > If this general line of thinking (NT copies on write to a Unix socket) is
> > acceptable, I'm happy to refactor the code however you (and others) would
> > like to get it to an acceptable state.
>
> My only concern is that in post-spectre world the indirect calls are
> going to be more expensive than an branch would be. But I'm not really
> a mirco-optimization expert :)
Makes sense; neither am I, FWIW :)
For whatever reason, on AMD Zen2 it seems that using non-temporal
instructions when copying data sizes above the L2 size is a huge
performance win (compared to the kernel's normal temporal copy code) even
if that size fits in L3.
This is why both NETIF_F_NOCACHE_COPY and MSG_NTCOPY from this series seem
to have such a large, measurable impact in the contrived benchmark I
included in the cover letter and also in synthetic HTTP workloads.
I'll plan on including numbers from the benchmark program on a few other
CPUs I have access to in the cover letter for any follow-up RFCs or
revisions.
As a data point, there has been similar-ish work done in glibc [1] to
determine when non-temporal copies should be used on Zen2 based on the size
of the copy. I'm certainly not a micro-arch expert by any stretch, but the
glibc work plus the benchmark results I've measured seem to suggest that
NT-copies can be very helpful on Zen2.
Two questions for you:
1. Do you have any strong opinions on the sendmsg flag vs a socket option?
2. If I can think of a way to avoid the indirect calls, do you think this
series is ready for a v1? I'm not sure if there's anything major that
needs to be addressed aside from the indirect calls.
I'll include some documentation and cosmetic cleanup in the v1, as well.
Thanks,
Joe
[1]: https://sourceware.org/pipermail/libc-alpha/2020-October/118895.html
next prev parent reply other threads:[~2022-05-12 22:53 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-11 3:54 [RFC,net-next,x86 0/6] Nontemporal copies in unix socket write path Joe Damato
2022-05-11 3:54 ` [RFC,net-next,x86 1/6] arch, x86, uaccess: Add nontemporal copy functions Joe Damato
2022-05-11 3:54 ` [RFC,net-next 2/6] iov_iter: Allow custom copyin function Joe Damato
2022-05-11 3:54 ` [RFC,net-next 3/6] iov_iter: Add a nocache copy iov iterator Joe Damato
2022-05-11 3:54 ` [RFC,net-next 4/6] net: Add a struct for managing copy functions Joe Damato
2022-05-11 3:54 ` [RFC,net-next 5/6] net: Add a way to copy skbs without affect cache Joe Damato
2022-05-11 3:54 ` [RFC,net-next 6/6] net: unix: Add MSG_NTCOPY Joe Damato
2022-05-11 23:25 ` [RFC,net-next,x86 0/6] Nontemporal copies in unix socket write path Jakub Kicinski
2022-05-12 1:01 ` Joe Damato
2022-05-12 19:46 ` Jakub Kicinski
2022-05-12 22:53 ` Joe Damato [this message]
2022-05-12 23:12 ` Jakub Kicinski
2022-05-31 6:04 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220512225302.GA74948@fastly.com \
--to=jdamato@fastly.com \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).