* [PATCH] ip: use checksum options of first fragment also for subsequent fragments
@ 2010-12-14 7:51 Timo Juhani Lindfors
2010-12-17 19:58 ` David Miller
0 siblings, 1 reply; 7+ messages in thread
From: Timo Juhani Lindfors @ 2010-12-14 7:51 UTC (permalink / raw)
To: netdev; +Cc: Timo Juhani Lindfors
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=24832
Signed-off-by: Timo Juhani Lindfors <timo.lindfors@iki.fi>
---
net/ipv4/ip_output.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 439d2a3..c0743ed 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1167,7 +1167,7 @@ ssize_t ip_append_page(struct sock *sk, struct page *page,
/*
* Fill in the control structures
*/
- skb->ip_summed = CHECKSUM_NONE;
+ skb->ip_summed = skb_prev->ip_summed;
skb->csum = 0;
skb_reserve(skb, hh_len);
--
1.7.2.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] ip: use checksum options of first fragment also for subsequent fragments
2010-12-14 7:51 [PATCH] ip: use checksum options of first fragment also for subsequent fragments Timo Juhani Lindfors
@ 2010-12-17 19:58 ` David Miller
2010-12-20 10:38 ` [PATCH] ip: reuse ip_summed of first fragment for all " Timo Juhani Lindfors
0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2010-12-17 19:58 UTC (permalink / raw)
To: timo.lindfors; +Cc: netdev
From: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Date: Tue, 14 Dec 2010 09:51:33 +0200
> Reference: https://bugzilla.kernel.org/show_bug.cgi?id=24832
> Signed-off-by: Timo Juhani Lindfors <timo.lindfors@iki.fi>
You need to provide a commit log message that actually describes,
in detail, why you are making this change and what the bug is.
Referencing a bugzilla entry is insufficient.
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] ip: reuse ip_summed of first fragment for all subsequent fragments
2010-12-17 19:58 ` David Miller
@ 2010-12-20 10:38 ` Timo Juhani Lindfors
2010-12-28 21:49 ` David Miller
0 siblings, 1 reply; 7+ messages in thread
From: Timo Juhani Lindfors @ 2010-12-20 10:38 UTC (permalink / raw)
To: netdev; +Cc: Timo Juhani Lindfors
Currently if an outgoing UDP packet gets fragmented the checksum of
the last fragment is always calculated regardless of SO_NO_CHECK. This
patch reuses the checksum options of the first fragment for all
subsequent fragments to make SO_NO_CHECK work for all fragments.
Tested with Chelsio T320 10GBASE-CX4 NIC (rev 3) PCI Express x8 MSI-X.
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=24832
Signed-off-by: Timo Juhani Lindfors <timo.lindfors@iki.fi>
---
I am not very familiar with networking code so there's a good chance
that I have missed something important. However, with the test program
attached to the bug report I see very real performance improvements
when sending UDP packets larger than MTU with SO_NO_CHECK set:
Perf output before:
46.81% sendfileudp_tes [kernel] [k] csum_partial
5.30% sendfileudp_tes [kernel] [k] _spin_lock
|
|
|--99.95%-- sendfile64
--0.05%-- [...]
4.61% sendfileudp_tes [kernel] [k] ip_append_page
3.68% sendfileudp_tes [kernel] [k] ip_fragment
2.04% sendfileudp_tes [kernel] [k] _local_bh_enable_ip
1.84% sendfileudp_tes [kernel] [k] ip_finish_output2
and after:
6.93% sendfileudp_tes [kernel] [k] _spin_lock
5.25% sendfileudp_tes [kernel] [k] ip_append_page
3.21% sendfileudp_tes [kernel] [k] _spin_lock_bh
|
|--99.95%-- sendfile64
--0.05%-- [...]
3.16% sendfileudp_tes [kernel] [k] _local_bh_enable_ip
2.97% sendfileudp_tes [kernel] [k] ip_finish_output2
---
net/ipv4/ip_output.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 439d2a3..c0743ed 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1167,7 +1167,7 @@ ssize_t ip_append_page(struct sock *sk, struct page *page,
/*
* Fill in the control structures
*/
- skb->ip_summed = CHECKSUM_NONE;
+ skb->ip_summed = skb_prev->ip_summed;
skb->csum = 0;
skb_reserve(skb, hh_len);
--
1.7.2.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] ip: reuse ip_summed of first fragment for all subsequent fragments
2010-12-20 10:38 ` [PATCH] ip: reuse ip_summed of first fragment for all " Timo Juhani Lindfors
@ 2010-12-28 21:49 ` David Miller
2011-01-11 13:49 ` Timo Juhani Lindfors
0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2010-12-28 21:49 UTC (permalink / raw)
To: timo.lindfors; +Cc: netdev
From: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Date: Mon, 20 Dec 2010 12:38:45 +0200
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 439d2a3..c0743ed 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1167,7 +1167,7 @@ ssize_t ip_append_page(struct sock *sk, struct page *page,
> /*
> * Fill in the control structures
> */
> - skb->ip_summed = CHECKSUM_NONE;
> + skb->ip_summed = skb_prev->ip_summed;
> skb->csum = 0;
> skb_reserve(skb, hh_len);
You can't just assign skb_prev->ip_summed here, if it's CHECKSUM_PARTIAL
then things will go completely wrong. This is especially true since
we're about to zero out skb->csum.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ip: reuse ip_summed of first fragment for all subsequent fragments
2010-12-28 21:49 ` David Miller
@ 2011-01-11 13:49 ` Timo Juhani Lindfors
2011-01-13 2:42 ` David Miller
0 siblings, 1 reply; 7+ messages in thread
From: Timo Juhani Lindfors @ 2011-01-11 13:49 UTC (permalink / raw)
To: David Miller; +Cc: netdev
David Miller <davem@davemloft.net> writes:
> > - skb->ip_summed = CHECKSUM_NONE;
> > + skb->ip_summed = skb_prev->ip_summed;
> You can't just assign skb_prev->ip_summed here, if it's CHECKSUM_PARTIAL
> then things will go completely wrong. This is especially true since
> we're about to zero out skb->csum.
Thanks, I clearly didn't understand the semantics of ip_summed. For
received packets CHECKSUM_PARTIAL apparently means that the checksum
is calculated by hardware and CHECKSUM_NONE that it is calculated by
software or it is not calculated at all. I guess the ip_summed of a
new fragment is set to CHECKSUM_NONE because hardware can not handle
checksums of fragments?
Anyways, socket option SO_NO_CHECK sets sk->sk_no_check. Could this be
checked before calculating checksums of each fragment? Currently
udp_push_pending_frames checks this but checksums have already been
calculated at that point (and the only job left is to sum the
checksums together). Here's a patch that works for me (=according to
perf time is no longer spent calculating checksums) but probably
should be reviewed carefully:
>From 67fe193db5a2e1a2efaa0fa8e1c1c2554e108b78 Mon Sep 17 00:00:00 2001
From: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Date: Tue, 11 Jan 2011 14:36:23 +0200
Subject: [PATCH] ip: make sure disabling checksums really works
---
net/ipv4/ip_output.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 439d2a3..9bc163a 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -80,6 +80,7 @@
#include <linux/mroute.h>
#include <linux/netlink.h>
#include <linux/tcp.h>
+#include <net/udp.h>
int sysctl_ip_default_ttl __read_mostly = IPDEFTTL;
@@ -1208,7 +1209,7 @@ ssize_t ip_append_page(struct sock *sk, struct page *page,
goto error;
}
- if (skb->ip_summed == CHECKSUM_NONE) {
+ if (skb->ip_summed == CHECKSUM_NONE && sk->sk_no_check != UDP_CSUM_NOXMIT) {
__wsum csum;
csum = csum_page(page, offset, len);
skb->csum = csum_block_add(skb->csum, csum, skb->len);
--
1.7.2.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] ip: reuse ip_summed of first fragment for all subsequent fragments
2011-01-11 13:49 ` Timo Juhani Lindfors
@ 2011-01-13 2:42 ` David Miller
2011-02-16 9:10 ` Timo Juhani Lindfors
0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2011-01-13 2:42 UTC (permalink / raw)
To: timo.lindfors; +Cc: netdev
From: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Date: Tue, 11 Jan 2011 15:49:19 +0200
> Anyways, socket option SO_NO_CHECK sets sk->sk_no_check. Could this be
> checked before calculating checksums of each fragment? Currently
> udp_push_pending_frames checks this but checksums have already been
> calculated at that point (and the only job left is to sum the
> checksums together). Here's a patch that works for me (=according to
> perf time is no longer spent calculating checksums) but probably
> should be reviewed carefully:
You're now not handling the code block above this one, guarded
by the "if (len <= 0)" check.
You seem to just be peppering checks all over the place rather
than coming up with a coherent, complete, fix for this problem.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ip: reuse ip_summed of first fragment for all subsequent fragments
2011-01-13 2:42 ` David Miller
@ 2011-02-16 9:10 ` Timo Juhani Lindfors
0 siblings, 0 replies; 7+ messages in thread
From: Timo Juhani Lindfors @ 2011-02-16 9:10 UTC (permalink / raw)
To: David Miller; +Cc: netdev
David Miller <davem@davemloft.net> writes:
> You're now not handling the code block above this one, guarded
> by the "if (len <= 0)" check.
Yes that's true. Should we use skb_copy_bits() when sk->sk_no_check ==
UDP_CSUM_NOXMIT?
> You seem to just be peppering checks all over the place rather
> than coming up with a coherent, complete, fix for this problem.
I can understand that but I'm afraid that I lack the expertise to do
that. I might be able to fix the above problem but I can't be sure that
it is the only one. The bug report will remain at
https://bugzilla.kernel.org/show_bug.cgi?id=24832
in case somebody wants to continue from here.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2011-02-16 9:17 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-14 7:51 [PATCH] ip: use checksum options of first fragment also for subsequent fragments Timo Juhani Lindfors
2010-12-17 19:58 ` David Miller
2010-12-20 10:38 ` [PATCH] ip: reuse ip_summed of first fragment for all " Timo Juhani Lindfors
2010-12-28 21:49 ` David Miller
2011-01-11 13:49 ` Timo Juhani Lindfors
2011-01-13 2:42 ` David Miller
2011-02-16 9:10 ` Timo Juhani Lindfors
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).