netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ip: use checksum options of first fragment also for subsequent fragments
@ 2010-12-14  7:51 Timo Juhani Lindfors
  2010-12-17 19:58 ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Timo Juhani Lindfors @ 2010-12-14  7:51 UTC (permalink / raw)
  To: netdev; +Cc: Timo Juhani Lindfors

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=24832
Signed-off-by: Timo Juhani Lindfors <timo.lindfors@iki.fi>
---
 net/ipv4/ip_output.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 439d2a3..c0743ed 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1167,7 +1167,7 @@ ssize_t	ip_append_page(struct sock *sk, struct page *page,
 			/*
 			 *	Fill in the control structures
 			 */
-			skb->ip_summed = CHECKSUM_NONE;
+			skb->ip_summed = skb_prev->ip_summed;
 			skb->csum = 0;
 			skb_reserve(skb, hh_len);
 
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] ip: use checksum options of first fragment also for subsequent fragments
  2010-12-14  7:51 [PATCH] ip: use checksum options of first fragment also for subsequent fragments Timo Juhani Lindfors
@ 2010-12-17 19:58 ` David Miller
  2010-12-20 10:38   ` [PATCH] ip: reuse ip_summed of first fragment for all " Timo Juhani Lindfors
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2010-12-17 19:58 UTC (permalink / raw)
  To: timo.lindfors; +Cc: netdev

From: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Date: Tue, 14 Dec 2010 09:51:33 +0200

> Reference: https://bugzilla.kernel.org/show_bug.cgi?id=24832
> Signed-off-by: Timo Juhani Lindfors <timo.lindfors@iki.fi>

You need to provide a commit log message that actually describes,
in detail, why you are making this change and what the bug is.

Referencing a bugzilla entry is insufficient.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] ip: reuse ip_summed of first fragment for all subsequent fragments
  2010-12-17 19:58 ` David Miller
@ 2010-12-20 10:38   ` Timo Juhani Lindfors
  2010-12-28 21:49     ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Timo Juhani Lindfors @ 2010-12-20 10:38 UTC (permalink / raw)
  To: netdev; +Cc: Timo Juhani Lindfors

Currently if an outgoing UDP packet gets fragmented the checksum of
the last fragment is always calculated regardless of SO_NO_CHECK. This
patch reuses the checksum options of the first fragment for all
subsequent fragments to make SO_NO_CHECK work for all fragments.

Tested with Chelsio T320 10GBASE-CX4 NIC (rev 3) PCI Express x8 MSI-X.

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=24832
Signed-off-by: Timo Juhani Lindfors <timo.lindfors@iki.fi>
---
I am not very familiar with networking code so there's a good chance
that I have missed something important. However, with the test program
attached to the bug report I see very real performance improvements
when sending UDP packets larger than MTU with SO_NO_CHECK set:

Perf output before:

46.81%  sendfileudp_tes  [kernel]                   [k] csum_partial

 5.30%  sendfileudp_tes  [kernel]                   [k] _spin_lock
            |
                       |
                       |--99.95%-- sendfile64
                        --0.05%-- [...]

 4.61%  sendfileudp_tes  [kernel]                   [k] ip_append_page

 3.68%  sendfileudp_tes  [kernel]                   [k] ip_fragment

 2.04%  sendfileudp_tes  [kernel]                   [k] _local_bh_enable_ip

 1.84%  sendfileudp_tes  [kernel]                   [k] ip_finish_output2

and after:

6.93%  sendfileudp_tes  [kernel]                   [k] _spin_lock

5.25%  sendfileudp_tes  [kernel]                   [k] ip_append_page

3.21%  sendfileudp_tes  [kernel]                   [k] _spin_lock_bh
           |
           |--99.95%-- sendfile64
            --0.05%-- [...]

3.16%  sendfileudp_tes  [kernel]                   [k] _local_bh_enable_ip

2.97%  sendfileudp_tes  [kernel]                   [k] ip_finish_output2
---
 net/ipv4/ip_output.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 439d2a3..c0743ed 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1167,7 +1167,7 @@ ssize_t	ip_append_page(struct sock *sk, struct page *page,
 			/*
 			 *	Fill in the control structures
 			 */
-			skb->ip_summed = CHECKSUM_NONE;
+			skb->ip_summed = skb_prev->ip_summed;
 			skb->csum = 0;
 			skb_reserve(skb, hh_len);
 
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] ip: reuse ip_summed of first fragment for all subsequent fragments
  2010-12-20 10:38   ` [PATCH] ip: reuse ip_summed of first fragment for all " Timo Juhani Lindfors
@ 2010-12-28 21:49     ` David Miller
  2011-01-11 13:49       ` Timo Juhani Lindfors
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2010-12-28 21:49 UTC (permalink / raw)
  To: timo.lindfors; +Cc: netdev

From: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Date: Mon, 20 Dec 2010 12:38:45 +0200

> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 439d2a3..c0743ed 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1167,7 +1167,7 @@ ssize_t	ip_append_page(struct sock *sk, struct page *page,
>  			/*
>  			 *	Fill in the control structures
>  			 */
> -			skb->ip_summed = CHECKSUM_NONE;
> +			skb->ip_summed = skb_prev->ip_summed;
>  			skb->csum = 0;
>  			skb_reserve(skb, hh_len);

You can't just assign skb_prev->ip_summed here, if it's CHECKSUM_PARTIAL
then things will go completely wrong.  This is especially true since
we're about to zero out skb->csum.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ip: reuse ip_summed of first fragment for all subsequent fragments
  2010-12-28 21:49     ` David Miller
@ 2011-01-11 13:49       ` Timo Juhani Lindfors
  2011-01-13  2:42         ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Timo Juhani Lindfors @ 2011-01-11 13:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

David Miller <davem@davemloft.net> writes:
> > -			skb->ip_summed = CHECKSUM_NONE;
> > +			skb->ip_summed = skb_prev->ip_summed;
> You can't just assign skb_prev->ip_summed here, if it's CHECKSUM_PARTIAL
> then things will go completely wrong.  This is especially true since
> we're about to zero out skb->csum.

Thanks, I clearly didn't understand the semantics of ip_summed. For
received packets CHECKSUM_PARTIAL apparently means that the checksum
is calculated by hardware and CHECKSUM_NONE that it is calculated by
software or it is not calculated at all. I guess the ip_summed of a
new fragment is set to CHECKSUM_NONE because hardware can not handle
checksums of fragments?

Anyways, socket option SO_NO_CHECK sets sk->sk_no_check. Could this be
checked before calculating checksums of each fragment? Currently
udp_push_pending_frames checks this but checksums have already been
calculated at that point (and the only job left is to sum the
checksums together). Here's a patch that works for me (=according to
perf time is no longer spent calculating checksums) but probably
should be reviewed carefully:

>From 67fe193db5a2e1a2efaa0fa8e1c1c2554e108b78 Mon Sep 17 00:00:00 2001
From: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Date: Tue, 11 Jan 2011 14:36:23 +0200
Subject: [PATCH] ip: make sure disabling checksums really works

---
 net/ipv4/ip_output.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 439d2a3..9bc163a 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -80,6 +80,7 @@
 #include <linux/mroute.h>
 #include <linux/netlink.h>
 #include <linux/tcp.h>
+#include <net/udp.h>

 int sysctl_ip_default_ttl __read_mostly = IPDEFTTL;

@@ -1208,7 +1209,7 @@ ssize_t   ip_append_page(struct sock *sk, struct page *page,
                        goto error;
                }

-               if (skb->ip_summed == CHECKSUM_NONE) {
+               if (skb->ip_summed == CHECKSUM_NONE && sk->sk_no_check != UDP_CSUM_NOXMIT) {
                        __wsum csum;
                        csum = csum_page(page, offset, len);
                        skb->csum = csum_block_add(skb->csum, csum, skb->len);
--
1.7.2.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] ip: reuse ip_summed of first fragment for all subsequent fragments
  2011-01-11 13:49       ` Timo Juhani Lindfors
@ 2011-01-13  2:42         ` David Miller
  2011-02-16  9:10           ` Timo Juhani Lindfors
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2011-01-13  2:42 UTC (permalink / raw)
  To: timo.lindfors; +Cc: netdev

From: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Date: Tue, 11 Jan 2011 15:49:19 +0200

> Anyways, socket option SO_NO_CHECK sets sk->sk_no_check. Could this be
> checked before calculating checksums of each fragment? Currently
> udp_push_pending_frames checks this but checksums have already been
> calculated at that point (and the only job left is to sum the
> checksums together). Here's a patch that works for me (=according to
> perf time is no longer spent calculating checksums) but probably
> should be reviewed carefully:

You're now not handling the code block above this one, guarded
by the "if (len <= 0)" check.

You seem to just be peppering checks all over the place rather
than coming up with a coherent, complete, fix for this problem.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ip: reuse ip_summed of first fragment for all subsequent fragments
  2011-01-13  2:42         ` David Miller
@ 2011-02-16  9:10           ` Timo Juhani Lindfors
  0 siblings, 0 replies; 7+ messages in thread
From: Timo Juhani Lindfors @ 2011-02-16  9:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

David Miller <davem@davemloft.net> writes:
> You're now not handling the code block above this one, guarded
> by the "if (len <= 0)" check.

Yes that's true. Should we use skb_copy_bits() when sk->sk_no_check ==
UDP_CSUM_NOXMIT?

> You seem to just be peppering checks all over the place rather
> than coming up with a coherent, complete, fix for this problem.

I can understand that but I'm afraid that I lack the expertise to do
that. I might be able to fix the above problem but I can't be sure that
it is the only one. The bug report will remain at

https://bugzilla.kernel.org/show_bug.cgi?id=24832

in case somebody wants to continue from here.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-02-16  9:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-14  7:51 [PATCH] ip: use checksum options of first fragment also for subsequent fragments Timo Juhani Lindfors
2010-12-17 19:58 ` David Miller
2010-12-20 10:38   ` [PATCH] ip: reuse ip_summed of first fragment for all " Timo Juhani Lindfors
2010-12-28 21:49     ` David Miller
2011-01-11 13:49       ` Timo Juhani Lindfors
2011-01-13  2:42         ` David Miller
2011-02-16  9:10           ` Timo Juhani Lindfors

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).