netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Borkmann <dborkman@redhat.com>
To: Julian Anastasov <ja@ssi.bg>
Cc: horms@verge.net.au, mohanreddykv@gmail.com, pablo@netfilter.org,
	lvs-devel@vger.kernel.org, netfilter-devel@vger.kernel.org,
	linux-sctp@vger.kernel.org
Subject: Re: [PATCH net] ipvs: sctp: fix checksumming by adding hardware offload support
Date: Tue, 05 Feb 2013 09:54:18 +0100	[thread overview]
Message-ID: <5110C8BA.4080205@redhat.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1302042349350.1869@ja.ssi.bg>

On 02/05/2013 12:27 AM, Julian Anastasov wrote:
> On Mon, 4 Feb 2013, Daniel Borkmann wrote:
>
>> In our test lab, we have a simple SCTP client connecting to a SCTP
>> server via an IPVS load balancer. On some machines, load balancing
>> works, but on others the initial handshake just fails, thus no
>> SCTP connection whatsoever can be established.
>>
>> We observed that the SCTP INIT-ACK handshake reply from the IPVS
>> machine to the client had a correct IP checksum, but corrupt SCTP
>> checksum when forwarded, thus on the client-side the packet was
>> dropped and an intial handshake retriggered until all attempts
>> run into the void.
>
> 	Hm, may be packet is received with CHECKSUM_PARTIAL
> value? Is it a virtual NIC or a real NIC with the
> forwarding problem?

Thanks for your feedback!

The setup was about ...

  SCTP Client <--> IPVS Balancer <--> SCTP Server
  172.16.100.2    172.16.100.1(eth1)  192.168.100.11
                 192.168.100.1(eth2)

... where eth1 on IPVS uses the igb driver, which in fact seems to
support NETIF_F_SCTP_CSUM.

Interestingly, the path from the client to the server had no checksum
issues (no errors on the server in /proc/net/sctp/snmp), but on the
way back however from IPVS onwards to the client, the checksum is
wrong. With turned off SCTP checksum validation on the client side it
works obviously.

I think your hint might be correct to set

  skb->ip_summed = CHECKSUM_UNNECESSARY;

since this is also done in ip_vs_proto_tcp.c after a full checksum
calculation. I will come back to your questions resp. with a version 2
of the patch a bit later after doing some experiments.

>> This patch fixes the problem by taking hardware offload features
>> of the NIC into account (which was just ignored here), similar as
>
> 	So, if NIC has no such support the bug remains?
> Why it works for some boxes? Where is the difference in
> the RX devices? GRO? skb->ip_summed?
>
>> done in the SCTP checksumming code itself. Also, while at it,
>> the checksum is in little-endian format (as fixed in commit 458f04c:
>> sctp: Clean up sctp checksumming code). Tested by myself.
>
> 	I don't feel such optimization can work in all cases.
> skb->dev is NULL in LOCAL_OUT hook (NAT via loopback).
> For dnat_handler skb->dev can be present but it is an old
> value. The new skb->dst value is prepared by ip_vs_nat_xmit
> but is still not set. For snat_handler the ip_vs_route_me_harder()
> can reroute, for example, when traffic from different VIPs
> use their own ISP link (another device).
>
> 	Even if we provide the new dst to IPVS nat handlers,
> later a netfilter rule can change the path and reroute the
> packet to different device. As result, I'm not sure
> CHECKSUM_PARTIAL is handled properly for SCTP in
> dev_hard_start_xmit and skb_checksum_help for the
> case where we are rerouted to device without hw csum
> support. skb_checksum_help supports only csum in 16 bits.
>
> 	Also, skb_transport_header is not updated yet,
> IPVS runs before such protocols, i.e. in LOCAL_IN, while
> ip_local_deliver_finish (where iphdr is skipped) is
> called after all handlers in this hook. It is not
> valid for FORWARD hook (for the snat_handler case).
> Still, the sctphoff value is present and can be used
> instead.
>
> 	May be you can instead test a change that adds a
> missing skb->ip_summed = CHECKSUM_UNNECESSARY in
> both handlers, including a change for the
> mentioned commit for endian format. May be crc32 should
> be changed from __be32 to __u32 and using the
> sctph->checksum = sctp_end_cksum(crc32) variant?
> I hope it will fix the wrong checksum in forwarding path
> if it is caused by the wrong ip_summed value.
>
>> Cc: Venkata Mohan Reddy <mohanreddykv@gmail.com>
>> Cc: Simon Horman <horms@verge.net.au>
>> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
>> ---
>>   net/netfilter/ipvs/ip_vs_proto_sctp.c | 41 ++++++++++++++++++++---------------
>>   1 file changed, 23 insertions(+), 18 deletions(-)
>>
>> diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c
>> index 746048b..dc41622 100644
>> --- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
>> +++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
>> @@ -61,14 +61,33 @@ sctp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
>>   	return 1;
>>   }
>>
>> +static void sctp_nat_csum(struct sk_buff *skb, sctp_sctphdr_t *sctph,
>> +			  unsigned int sctphoff)
>> +{
>> +	struct sk_buff *iter;
>> +
>> +	/* Calculate the checksum */
>> +	if (!(skb->dev->features & NETIF_F_SCTP_CSUM)) {
>> +		__u32 crc32 = sctp_start_cksum((__u8 *)sctph,
>> +					       skb_headlen(skb) - sctphoff);
>> +		skb_walk_frags(skb, iter)
>> +			crc32 = sctp_update_cksum((u8 *) iter->data,
>> +						  skb_headlen(iter), crc32);
>> +		sctph->checksum = sctp_end_cksum(crc32);
>> +	} else {
>> +		/* no need to seed pseudo checksum for SCTP */
>> +		skb->ip_summed = CHECKSUM_PARTIAL;
>> +		skb->csum_start = (skb_transport_header(skb) - skb->head);
>> +		skb->csum_offset = offsetof(struct sctphdr, checksum);
>> +	}
>> +}
>> +
>>   static int
>>   sctp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
>>   		  struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
>>   {
>>   	sctp_sctphdr_t *sctph;
>>   	unsigned int sctphoff = iph->len;
>> -	struct sk_buff *iter;
>> -	__be32 crc32;
>>
>>   #ifdef CONFIG_IP_VS_IPV6
>>   	if (cp->af == AF_INET6 && iph->fragoffs)
>> @@ -92,13 +111,7 @@ sctp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
>>   	sctph = (void *) skb_network_header(skb) + sctphoff;
>>   	sctph->source = cp->vport;
>>
>> -	/* Calculate the checksum */
>> -	crc32 = sctp_start_cksum((u8 *) sctph, skb_headlen(skb) - sctphoff);
>> -	skb_walk_frags(skb, iter)
>> -		crc32 = sctp_update_cksum((u8 *) iter->data, skb_headlen(iter),
>> -				          crc32);
>> -	crc32 = sctp_end_cksum(crc32);
>> -	sctph->checksum = crc32;
>> +	sctp_nat_csum(skb, sctph, sctphoff);
>>
>>   	return 1;
>>   }
>> @@ -109,8 +122,6 @@ sctp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
>>   {
>>   	sctp_sctphdr_t *sctph;
>>   	unsigned int sctphoff = iph->len;
>> -	struct sk_buff *iter;
>> -	__be32 crc32;
>>
>>   #ifdef CONFIG_IP_VS_IPV6
>>   	if (cp->af == AF_INET6 && iph->fragoffs)
>> @@ -134,13 +145,7 @@ sctp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
>>   	sctph = (void *) skb_network_header(skb) + sctphoff;
>>   	sctph->dest = cp->dport;
>>
>> -	/* Calculate the checksum */
>> -	crc32 = sctp_start_cksum((u8 *) sctph, skb_headlen(skb) - sctphoff);
>> -	skb_walk_frags(skb, iter)
>> -		crc32 = sctp_update_cksum((u8 *) iter->data, skb_headlen(iter),
>> -					  crc32);
>> -	crc32 = sctp_end_cksum(crc32);
>> -	sctph->checksum = crc32;
>> +	sctp_nat_csum(skb, sctph, sctphoff);
>>
>>   	return 1;
>>   }
>> --
>> 1.7.11.7
>
> Regards
>
> --
> Julian Anastasov <ja@ssi.bg>
>

      reply	other threads:[~2013-02-05  8:54 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1359997089.git.dborkman@redhat.com>
2013-02-04 17:27 ` [PATCH net] ipvs: sctp: fix checksumming by adding hardware offload support Daniel Borkmann
2013-02-04 23:27   ` Julian Anastasov
2013-02-05  8:54     ` Daniel Borkmann [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5110C8BA.4080205@redhat.com \
    --to=dborkman@redhat.com \
    --cc=horms@verge.net.au \
    --cc=ja@ssi.bg \
    --cc=linux-sctp@vger.kernel.org \
    --cc=lvs-devel@vger.kernel.org \
    --cc=mohanreddykv@gmail.com \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).