All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joe Damato <jdamato@fastly.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org, davem@davemloft.net,
	linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: [RFC,net-next,x86 0/6] Nontemporal copies in unix socket write path
Date: Thu, 12 May 2022 15:53:05 -0700	[thread overview]
Message-ID: <20220512225302.GA74948@fastly.com> (raw)
In-Reply-To: <20220512124608.452d3300@kernel.org>

On Thu, May 12, 2022 at 12:46:08PM -0700, Jakub Kicinski wrote:
> On Wed, 11 May 2022 18:01:54 -0700 Joe Damato wrote:
> > > Is there a practical use case?  
> > 
> > Yes; for us there seems to be - especially with AMD Zen2. I'll try to
> > describe such a setup and my synthetic HTTP benchmark results.
> > 
> > Imagine a program, call it storageD, which is responsible for storing and
> > retrieving data from a data store. Other programs can request data from
> > storageD via communicating with it on a Unix socket.
> > 
> > One such program that could request data via the Unix socket is an HTTP
> > daemon. For some client connections that the HTTP daemon receives, the
> > daemon may determine that responses can be sent in plain text.
> > 
> > In this case, the HTTP daemon can use splice to move data from the unix
> > socket connection with storageD directly to the client TCP socket via a
> > pipe. splice saves CPU cycles and avoids incurring any memory access
> > latency since the data itself is not accessed.
> > 
> > Because we'll use splice (instead of accessing the data and potentially
> > affecting the CPU cache) it is advantageous for storageD to use NT copies
> > when it writes to the Unix socket to avoid evicting hot data from the CPU
> > cache. After all, once the data is copied into the kernel on the unix
> > socket write path, it won't be touched again; only spliced.
> > 
> > In my synthetic HTTP benchmarks for this setup, we've been able to increase
> > network throughput of the the HTTP daemon by roughly 30% while reducing
> > the system time of storageD. We're still collecting data on production
> > workloads.
> > 
> > The motivation, IMHO, is very similar to the motivation for
> > NETIF_F_NOCACHE_COPY, as far I understand.
> > 
> > In some cases, when an application writes to a network socket the data
> > written to the socket won't be accessed again once it is copied into the
> > kernel. In these cases, NETIF_F_NOCACHE_COPY can improve performance and
> > helps to preserve the CPU cache and avoid evicting hot data.
> > 
> > We get a sizable benefit from this option, too, in situations where we
> > can't use splice and have to call write to transmit data to client
> > connections. We want to get the same benefit of NETIF_F_NOCACHE_COPY, but
> > when writing to Unix sockets as well.
> > 
> > Let me know if that makes it more clear.
> 
> Makes sense, thanks for the explainer.
> 
> > > The patches look like a lot of extra indirect calls.  
> > 
> > Yup. As I mentioned in the cover letter this was mostly a PoC that seems to
> > work and increases network throughput in a real world scenario.
> > 
> > If this general line of thinking (NT copies on write to a Unix socket) is
> > acceptable, I'm happy to refactor the code however you (and others) would
> > like to get it to an acceptable state.
> 
> My only concern is that in post-spectre world the indirect calls are
> going to be more expensive than an branch would be. But I'm not really
> a mirco-optimization expert :)

Makes sense; neither am I, FWIW :)

For whatever reason, on AMD Zen2 it seems that using non-temporal
instructions when copying data sizes above the L2 size is a huge
performance win (compared to the kernel's normal temporal copy code) even
if that size fits in L3.

This is why both NETIF_F_NOCACHE_COPY and MSG_NTCOPY from this series seem
to have such a large, measurable impact in the contrived benchmark I
included in the cover letter and also in synthetic HTTP workloads.

I'll plan on including numbers from the benchmark program on a few other
CPUs I have access to in the cover letter for any follow-up RFCs or
revisions.

As a data point, there has been similar-ish work done in glibc [1] to
determine when non-temporal copies should be used on Zen2 based on the size
of the copy. I'm certainly not a micro-arch expert by any stretch, but the
glibc work plus the benchmark results I've measured seem to suggest that
NT-copies can be very helpful on Zen2.

Two questions for you:

 1. Do you have any strong opinions on the sendmsg flag vs a socket option?

 2. If I can think of a way to avoid the indirect calls, do you think this
    series is ready for a v1? I'm not sure if there's anything major that
    needs to be addressed aside from the indirect calls.

I'll include some documentation and cosmetic cleanup in the v1, as well.

Thanks,
Joe

[1]: https://sourceware.org/pipermail/libc-alpha/2020-October/118895.html

  reply	other threads:[~2022-05-12 22:53 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-11  3:54 [RFC,net-next,x86 0/6] Nontemporal copies in unix socket write path Joe Damato
2022-05-11  3:54 ` [RFC,net-next,x86 1/6] arch, x86, uaccess: Add nontemporal copy functions Joe Damato
2022-05-11  3:54 ` [RFC,net-next 2/6] iov_iter: Allow custom copyin function Joe Damato
2022-05-11  3:54 ` [RFC,net-next 3/6] iov_iter: Add a nocache copy iov iterator Joe Damato
2022-05-11  3:54 ` [RFC,net-next 4/6] net: Add a struct for managing copy functions Joe Damato
2022-05-11  3:54 ` [RFC,net-next 5/6] net: Add a way to copy skbs without affect cache Joe Damato
2022-05-11  3:54 ` [RFC,net-next 6/6] net: unix: Add MSG_NTCOPY Joe Damato
2022-05-11 23:25 ` [RFC,net-next,x86 0/6] Nontemporal copies in unix socket write path Jakub Kicinski
2022-05-12  1:01   ` Joe Damato
2022-05-12 19:46     ` Jakub Kicinski
2022-05-12 22:53       ` Joe Damato [this message]
2022-05-12 23:12         ` Jakub Kicinski
2022-05-31  6:04 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220512225302.GA74948@fastly.com \
    --to=jdamato@fastly.com \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.