* [PATCH net] splice: do not checksum AF_UNIX sockets
@ 2024-11-29 23:20 Frederik Deweerdt
2024-11-30 1:43 ` Cong Wang
2024-12-03 16:30 ` David Howells
0 siblings, 2 replies; 7+ messages in thread
From: Frederik Deweerdt @ 2024-11-29 23:20 UTC (permalink / raw)
To: netdev; +Cc: dhowells
When `skb_splice_from_iter` was introduced, it inadvertently added
checksumming for AF_UNIX sockets. This resulted in significant
slowdowns, as when using sendfile over unix sockets.
Using the test code [1] in my test setup (2G, single core x86_64 qemu),
the client receives a 1000M file in:
- without the patch: 1577ms (+/- 36.1ms)
- with the patch: 725ms (+/- 28.3ms)
This commit skips addresses the issue by skipping checksumming when
splice occurs a AF_UNIX socket.
[1] https://gist.github.com/deweerdt/a3ee2477d1d87524cf08618d3c179f06
Signed-off-by: Frederik Deweerdt <deweerdt.lkml@gmail.com>
Fixes: 2e910b95329c ("net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES")
---
net/core/skbuff.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 6841e61a6bd0..49e4f9ab625f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -7233,7 +7233,7 @@ ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter,
goto out;
}
- if (skb->ip_summed == CHECKSUM_NONE)
+ if (skb->ip_summed == CHECKSUM_NONE && skb->sk->sk_family != AF_UNIX)
skb_splice_csum_page(skb, page, off, part);
off = 0;
--
2.44.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net] splice: do not checksum AF_UNIX sockets
2024-11-29 23:20 [PATCH net] splice: do not checksum AF_UNIX sockets Frederik Deweerdt
@ 2024-11-30 1:43 ` Cong Wang
2024-11-30 4:05 ` Frederik Deweerdt
2024-11-30 12:08 ` David Laight
2024-12-03 16:30 ` David Howells
1 sibling, 2 replies; 7+ messages in thread
From: Cong Wang @ 2024-11-30 1:43 UTC (permalink / raw)
To: Frederik Deweerdt; +Cc: netdev, dhowells
On Fri, Nov 29, 2024 at 03:20:14PM -0800, Frederik Deweerdt wrote:
> When `skb_splice_from_iter` was introduced, it inadvertently added
> checksumming for AF_UNIX sockets. This resulted in significant
> slowdowns, as when using sendfile over unix sockets.
>
> Using the test code [1] in my test setup (2G, single core x86_64 qemu),
> the client receives a 1000M file in:
> - without the patch: 1577ms (+/- 36.1ms)
> - with the patch: 725ms (+/- 28.3ms)
>
> This commit skips addresses the issue by skipping checksumming when
> splice occurs a AF_UNIX socket.
>
> [1] https://gist.github.com/deweerdt/a3ee2477d1d87524cf08618d3c179f06
>
> Signed-off-by: Frederik Deweerdt <deweerdt.lkml@gmail.com>
> Fixes: 2e910b95329c ("net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES")
> ---
> net/core/skbuff.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 6841e61a6bd0..49e4f9ab625f 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -7233,7 +7233,7 @@ ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter,
> goto out;
> }
>
> - if (skb->ip_summed == CHECKSUM_NONE)
> + if (skb->ip_summed == CHECKSUM_NONE && skb->sk->sk_family != AF_UNIX)
Are you sure it is always safe to dereferene skb->sk here? I am not sure
about the KCM socket case.
Instead of checking skb->sk->sk_family, why not just pass an additional
boolean parameter to this function?
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] splice: do not checksum AF_UNIX sockets
2024-11-30 1:43 ` Cong Wang
@ 2024-11-30 4:05 ` Frederik Deweerdt
2024-11-30 12:08 ` David Laight
1 sibling, 0 replies; 7+ messages in thread
From: Frederik Deweerdt @ 2024-11-30 4:05 UTC (permalink / raw)
To: Cong Wang; +Cc: netdev, dhowells
On Fri, Nov 29, 2024 at 05:43:04PM -0800, Cong Wang wrote:
> On Fri, Nov 29, 2024 at 03:20:14PM -0800, Frederik Deweerdt wrote:
> > When `skb_splice_from_iter` was introduced, it inadvertently added
> > checksumming for AF_UNIX sockets. This resulted in significant
> > slowdowns, as when using sendfile over unix sockets.
> >
> > Using the test code [1] in my test setup (2G, single core x86_64 qemu),
> > the client receives a 1000M file in:
> > - without the patch: 1577ms (+/- 36.1ms)
> > - with the patch: 725ms (+/- 28.3ms)
> >
> > This commit skips addresses the issue by skipping checksumming when
> > splice occurs a AF_UNIX socket.
> >
> > [1] https://gist.github.com/deweerdt/a3ee2477d1d87524cf08618d3c179f06
> >
> > Signed-off-by: Frederik Deweerdt <deweerdt.lkml@gmail.com>
> > Fixes: 2e910b95329c ("net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES")
> > ---
> > net/core/skbuff.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 6841e61a6bd0..49e4f9ab625f 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -7233,7 +7233,7 @@ ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter,
> > goto out;
> > }
> >
> > - if (skb->ip_summed == CHECKSUM_NONE)
> > + if (skb->ip_summed == CHECKSUM_NONE && skb->sk->sk_family != AF_UNIX)
>
> Are you sure it is always safe to dereferene skb->sk here? I am not sure
> about the KCM socket case.
>
> Instead of checking skb->sk->sk_family, why not just pass an additional
> boolean parameter to this function?
>
I saw that every caller was dereferrencing an `sk`, but you are right
that that a boolean would be safer.
I'll resend an updated version.
Thanks,
Frederik
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: [PATCH net] splice: do not checksum AF_UNIX sockets
2024-11-30 1:43 ` Cong Wang
2024-11-30 4:05 ` Frederik Deweerdt
@ 2024-11-30 12:08 ` David Laight
1 sibling, 0 replies; 7+ messages in thread
From: David Laight @ 2024-11-30 12:08 UTC (permalink / raw)
To: 'Cong Wang', Frederik Deweerdt
Cc: netdev@vger.kernel.org, dhowells@redhat.com
From: Cong Wang <xiyou.wangcong@gmail.com>
> Sent: 30 November 2024 01:43
>
> On Fri, Nov 29, 2024 at 03:20:14PM -0800, Frederik Deweerdt wrote:
> > When `skb_splice_from_iter` was introduced, it inadvertently added
> > checksumming for AF_UNIX sockets. This resulted in significant
> > slowdowns, as when using sendfile over unix sockets.
> >
> > Using the test code [1] in my test setup (2G, single core x86_64 qemu),
> > the client receives a 1000M file in:
> > - without the patch: 1577ms (+/- 36.1ms)
> > - with the patch: 725ms (+/- 28.3ms)
> >
> > This commit skips addresses the issue by skipping checksumming when
> > splice occurs a AF_UNIX socket.
> >
> > [1] https://gist.github.com/deweerdt/a3ee2477d1d87524cf08618d3c179f06
> >
> > Signed-off-by: Frederik Deweerdt <deweerdt.lkml@gmail.com>
> > Fixes: 2e910b95329c ("net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES")
> > ---
> > net/core/skbuff.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 6841e61a6bd0..49e4f9ab625f 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -7233,7 +7233,7 @@ ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter,
> > goto out;
> > }
> >
> > - if (skb->ip_summed == CHECKSUM_NONE)
> > + if (skb->ip_summed == CHECKSUM_NONE && skb->sk->sk_family != AF_UNIX)
>
> Are you sure it is always safe to dereferene skb->sk here? I am not sure
> about the KCM socket case.
>
> Instead of checking skb->sk->sk_family, why not just pass an additional
> boolean parameter to this function?
A thought.
It is ever actually worth doing an 'early checksum' in software for either
TCP or UDP?
Most modern ethernet hardware supports transmit checksum offload, so there
is nothing to be gained by doing the checksum during a copy and everything
to be lost because the copy is a lot slower.
I think the code always does a checksum when copying data in send()
pretty much all the time it isn't needed.
The same is true for the delayed checksum of receive UDP.
I'm not sure what the rational for that was (and Linus can't remember),
my guess is a userspace NFS daemon.
But it seriously complicated the code.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] splice: do not checksum AF_UNIX sockets
2024-11-29 23:20 [PATCH net] splice: do not checksum AF_UNIX sockets Frederik Deweerdt
2024-11-30 1:43 ` Cong Wang
@ 2024-12-03 16:30 ` David Howells
2024-12-03 21:48 ` Cong Wang
1 sibling, 1 reply; 7+ messages in thread
From: David Howells @ 2024-12-03 16:30 UTC (permalink / raw)
To: Frederik Deweerdt; +Cc: dhowells, netdev
Frederik Deweerdt <deweerdt.lkml@gmail.com> wrote:
> - if (skb->ip_summed == CHECKSUM_NONE)
> + if (skb->ip_summed == CHECKSUM_NONE && skb->sk->sk_family != AF_UNIX)
> skb_splice_csum_page(skb, page, off, part);
Should AF_UNIX set some other CHECKSUM_* constant indicating that the checksum
is unnecessary?
David
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] splice: do not checksum AF_UNIX sockets
2024-12-03 16:30 ` David Howells
@ 2024-12-03 21:48 ` Cong Wang
2024-12-03 22:35 ` Frederik Deweerdt
0 siblings, 1 reply; 7+ messages in thread
From: Cong Wang @ 2024-12-03 21:48 UTC (permalink / raw)
To: David Howells; +Cc: Frederik Deweerdt, netdev
On Tue, Dec 03, 2024 at 04:30:22PM +0000, David Howells wrote:
> Frederik Deweerdt <deweerdt.lkml@gmail.com> wrote:
>
> > - if (skb->ip_summed == CHECKSUM_NONE)
> > + if (skb->ip_summed == CHECKSUM_NONE && skb->sk->sk_family != AF_UNIX)
> > skb_splice_csum_page(skb, page, off, part);
>
> Should AF_UNIX set some other CHECKSUM_* constant indicating that the checksum
> is unnecessary?
>
It already means unnecessary on TX path:
* - %CHECKSUM_NONE
*
* The skb was already checksummed by the protocol, or a checksum is not
* required.
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net] splice: do not checksum AF_UNIX sockets
2024-12-03 21:48 ` Cong Wang
@ 2024-12-03 22:35 ` Frederik Deweerdt
0 siblings, 0 replies; 7+ messages in thread
From: Frederik Deweerdt @ 2024-12-03 22:35 UTC (permalink / raw)
To: Cong Wang; +Cc: David Howells, netdev
On Tue, Dec 03, 2024 at 01:48:56PM -0800, Cong Wang wrote:
> On Tue, Dec 03, 2024 at 04:30:22PM +0000, David Howells wrote:
> > Frederik Deweerdt <deweerdt.lkml@gmail.com> wrote:
> >
> > > - if (skb->ip_summed == CHECKSUM_NONE)
> > > + if (skb->ip_summed == CHECKSUM_NONE && skb->sk->sk_family != AF_UNIX)
> > > skb_splice_csum_page(skb, page, off, part);
> >
> > Should AF_UNIX set some other CHECKSUM_* constant indicating that the checksum
> > is unnecessary?
> >
>
> It already means unnecessary on TX path:
>
> * - %CHECKSUM_NONE
> *
> * The skb was already checksummed by the protocol, or a checksum is not
> * required.
>
Looking back at the patch series that introduced the checksumming [1],
it looks like `ip_output.c::ip_append_page()` was the only path doing
checksumming, it had similar looking logic:
-
- if (skb->ip_summed == CHECKSUM_NONE) {
- __wsum csum;
- csum = csum_page(page, offset, len);
- skb->csum = csum_block_add(skb->csum, csum, skb->len);
- }
-
That code in turn has been like this since the git import. I'm not sure
what that was for and how to test its intent.
Frederik
[1] https://patchwork.kernel.org/project/linux-mm/patch/20230522121125.2595254-15-dhowells@redhat.com/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-12-03 22:35 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-29 23:20 [PATCH net] splice: do not checksum AF_UNIX sockets Frederik Deweerdt
2024-11-30 1:43 ` Cong Wang
2024-11-30 4:05 ` Frederik Deweerdt
2024-11-30 12:08 ` David Laight
2024-12-03 16:30 ` David Howells
2024-12-03 21:48 ` Cong Wang
2024-12-03 22:35 ` Frederik Deweerdt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).