netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joe Damato <jdamato@fastly.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, davem@davemloft.net,
	linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: [RFC,net-next,x86 0/6] Nontemporal copies in unix socket write path
Date: Thu, 12 May 2022 15:53:05 -0700	[thread overview]
Message-ID: <20220512225302.GA74948@fastly.com> (raw)
In-Reply-To: <20220512124608.452d3300@kernel.org>

On Thu, May 12, 2022 at 12:46:08PM -0700, Jakub Kicinski wrote:
> On Wed, 11 May 2022 18:01:54 -0700 Joe Damato wrote:
> > > Is there a practical use case?  
> > 
> > Yes; for us there seems to be - especially with AMD Zen2. I'll try to
> > describe such a setup and my synthetic HTTP benchmark results.
> > 
> > Imagine a program, call it storageD, which is responsible for storing and
> > retrieving data from a data store. Other programs can request data from
> > storageD via communicating with it on a Unix socket.
> > 
> > One such program that could request data via the Unix socket is an HTTP
> > daemon. For some client connections that the HTTP daemon receives, the
> > daemon may determine that responses can be sent in plain text.
> > 
> > In this case, the HTTP daemon can use splice to move data from the unix
> > socket connection with storageD directly to the client TCP socket via a
> > pipe. splice saves CPU cycles and avoids incurring any memory access
> > latency since the data itself is not accessed.
> > 
> > Because we'll use splice (instead of accessing the data and potentially
> > affecting the CPU cache) it is advantageous for storageD to use NT copies
> > when it writes to the Unix socket to avoid evicting hot data from the CPU
> > cache. After all, once the data is copied into the kernel on the unix
> > socket write path, it won't be touched again; only spliced.
> > 
> > In my synthetic HTTP benchmarks for this setup, we've been able to increase
> > network throughput of the the HTTP daemon by roughly 30% while reducing
> > the system time of storageD. We're still collecting data on production
> > workloads.
> > 
> > The motivation, IMHO, is very similar to the motivation for
> > NETIF_F_NOCACHE_COPY, as far I understand.
> > 
> > In some cases, when an application writes to a network socket the data
> > written to the socket won't be accessed again once it is copied into the
> > kernel. In these cases, NETIF_F_NOCACHE_COPY can improve performance and
> > helps to preserve the CPU cache and avoid evicting hot data.
> > 
> > We get a sizable benefit from this option, too, in situations where we
> > can't use splice and have to call write to transmit data to client
> > connections. We want to get the same benefit of NETIF_F_NOCACHE_COPY, but
> > when writing to Unix sockets as well.
> > 
> > Let me know if that makes it more clear.
> 
> Makes sense, thanks for the explainer.
> 
> > > The patches look like a lot of extra indirect calls.  
> > 
> > Yup. As I mentioned in the cover letter this was mostly a PoC that seems to
> > work and increases network throughput in a real world scenario.
> > 
> > If this general line of thinking (NT copies on write to a Unix socket) is
> > acceptable, I'm happy to refactor the code however you (and others) would
> > like to get it to an acceptable state.
> 
> My only concern is that in post-spectre world the indirect calls are
> going to be more expensive than an branch would be. But I'm not really
> a mirco-optimization expert :)

Makes sense; neither am I, FWIW :)

For whatever reason, on AMD Zen2 it seems that using non-temporal
instructions when copying data sizes above the L2 size is a huge
performance win (compared to the kernel's normal temporal copy code) even
if that size fits in L3.

This is why both NETIF_F_NOCACHE_COPY and MSG_NTCOPY from this series seem
to have such a large, measurable impact in the contrived benchmark I
included in the cover letter and also in synthetic HTTP workloads.

I'll plan on including numbers from the benchmark program on a few other
CPUs I have access to in the cover letter for any follow-up RFCs or
revisions.

As a data point, there has been similar-ish work done in glibc [1] to
determine when non-temporal copies should be used on Zen2 based on the size
of the copy. I'm certainly not a micro-arch expert by any stretch, but the
glibc work plus the benchmark results I've measured seem to suggest that
NT-copies can be very helpful on Zen2.

Two questions for you:

 1. Do you have any strong opinions on the sendmsg flag vs a socket option?

 2. If I can think of a way to avoid the indirect calls, do you think this
    series is ready for a v1? I'm not sure if there's anything major that
    needs to be addressed aside from the indirect calls.

I'll include some documentation and cosmetic cleanup in the v1, as well.

Thanks,
Joe

[1]: https://sourceware.org/pipermail/libc-alpha/2020-October/118895.html

  reply	other threads:[~2022-05-12 22:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-11  3:54 [RFC,net-next,x86 0/6] Nontemporal copies in unix socket write path Joe Damato
2022-05-11  3:54 ` [RFC,net-next,x86 1/6] arch, x86, uaccess: Add nontemporal copy functions Joe Damato
2022-05-11  3:54 ` [RFC,net-next 2/6] iov_iter: Allow custom copyin function Joe Damato
2022-05-11  3:54 ` [RFC,net-next 3/6] iov_iter: Add a nocache copy iov iterator Joe Damato
2022-05-11  3:54 ` [RFC,net-next 4/6] net: Add a struct for managing copy functions Joe Damato
2022-05-11  3:54 ` [RFC,net-next 5/6] net: Add a way to copy skbs without affect cache Joe Damato
2022-05-11  3:54 ` [RFC,net-next 6/6] net: unix: Add MSG_NTCOPY Joe Damato
2022-05-11 23:25 ` [RFC,net-next,x86 0/6] Nontemporal copies in unix socket write path Jakub Kicinski
2022-05-12  1:01   ` Joe Damato
2022-05-12 19:46     ` Jakub Kicinski
2022-05-12 22:53       ` Joe Damato [this message]
2022-05-12 23:12         ` Jakub Kicinski
2022-05-31  6:04 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220512225302.GA74948@fastly.com \
    --to=jdamato@fastly.com \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).