netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH]NET: Add ECN support for TSO
@ 2006-07-08  1:01 Michael Chan
  2006-07-08 20:32 ` David Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Chan @ 2006-07-08  1:01 UTC (permalink / raw)
  To: David Miller; +Cc: ravinandan.arakali, herbert, netdev

David Miller wrote:

> Ok, this is correct.  TSO should simply replicate the NS bit
> because the cumulative ACK is going to be the same in each
> and every packet emitted.
> 
> 

Thanks for verifying.  However, Large Receive Offload will be
a different story.  If packets are accumulated in the hardware
and presented to the stack as one large packet, the stack will
not be able to calculate the cumulative NS correctly.  Unless
the hardware calculates the partial NS over the LRO packet and
puts it in the SKB when handing over the packet.


^ permalink raw reply	[flat|nested] 25+ messages in thread
* RE: [PATCH]NET: Add ECN support for TSO
@ 2006-07-14 16:12 Dan Reader
  2006-07-26 19:40 ` Michael Chan
  0 siblings, 1 reply; 25+ messages in thread
From: Dan Reader @ 2006-07-14 16:12 UTC (permalink / raw)
  To: Michael Chan, Ravinandan Arakali, David Miller
  Cc: herbert, netdev, steve.arden, Leonid Grossman, Ananda Raju



> My take is that the ECT code point should also be replicated on
> all TSO segments.  When RFC3540 is implemented, the stack will
> randomly use ECT(0) or ECT(1) on the TSO super segments.  On
> each of these super segments, the hardware will replicate the ECT
> code point.  This will keep everything stateless.  The stack
> only needs to know the TSO factor (number of divided segements)
> for each TSO super segement to keep track of the NS.

My concern with this approach is that it goes even farther from the
spirit of RFC3540 than our other proposals.  The intent of the RFC, as I
understand it, is to protect against misbehaving receivers by using a
parity code that the sender knows and the receiver can't reliably guess.
If a receiver gets a Congestion Experienced codepoint, it shouldn't be
able to determine what the original codepoint was and thereby ignore its
ECN requirements.

If we implement the approach you suggest and all data sent uses TSO, the
receiver can easily determine the expected codepoint of almost any
isolated CE packet.  It simply has to look at the segment before and the
segment after (which it can wait for, thanks to delayed ACKs).  If they
have the same ECT value, use that one.  If they're different, then it
just has to perform a check for non-MSS (i.e. the boundary between TSOs
if send ops do not yield a multiple of MSS) and it might know.

>From my point of view, the one thing we have to do for TSO is make sure
that at least one packet per send op have a random codepoint that has no
greater chance of being "guessed" by the receiver than packets sent
without TSO.

To critique the other options, then, I see the following problems: Using
ECT(0) for all but one packet will mean that only one packet per TSO
can't be reliably guessed by the receiver.  Assuming all packets sent in
a TSO have an equal probability of being flagged with CE when it occurs,
a misbehaving receiver will have an increased ability to reliably ignore
and override CE that is in proportion to the number of packets generated
by a single TSO.  Using an adapter randomized value is better because
we're then generating a parity sum across all segments of the TSO.  It's
potential for compromise lies in the ability of the receiver to never
ACK the last received segment of a TSO.  If it ACKs other packets, the
sender wouldn't be able to check the sum...  I still prefer that
approach, though, because it yields more robustness if security
enhancements are built into the sender (e.g. occasionally send non-TSO
packets of MSS size or use TSO ops that yield an integral multiple of
MSS segments; in either case, the sender obfuscates the points where the
sum is being checked).

Yet another option would be to add a field to the TSO descriptor that
indicates a random segment on which to oppose the rule or achieve the
desired NS (e.g. when replicating, if all other composite segments are
using ECT(0) or ECT(1), set the opposite on the indicated segment).
Then, the sender will get a point per TSO where it can check the ACKed
NS, that can't be reliably guessed by the receiver...


^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [PATCH]NET: Add ECN support for TSO
@ 2006-07-13 19:35 Michael Chan
  2006-07-14  5:03 ` David Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Chan @ 2006-07-13 19:35 UTC (permalink / raw)
  To: ravinandan.arakali, David Miller
  Cc: herbert, netdev, steve.arden, Dan.Reader, leonid.grossman,
	ananda.raju


Ravinandan Arakali wrote:
> But one of the issues is the random/pseudo-random generation of
> ECT code points on each of these segments.

My take is that the ECT code point should also be replicated on
all TSO segments.  When RFC3540 is implemented, the stack will
randomly use ECT(0) or ECT(1) on the TSO super segments.  On
each of these super segments, the hardware will replicate the ECT
code point.  This will keep everything stateless.  The stack
only needs to know the TSO factor (number of divided segements)
for each TSO super segement to keep track of the NS.


^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [PATCH]NET: Add ECN support for TSO
@ 2006-07-12  4:53 Michael Chan
  2006-07-12  6:11 ` David Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Chan @ 2006-07-12  4:53 UTC (permalink / raw)
  To: David Miller, ravinandan.arakali; +Cc: herbert, netdev


David Miller wrote:

> On receive?  There is no reason for skb->sk to be anything other than
> NULL on receive, the networking stack hasn't even seen the packet yet.
> Only the driver has seen the skb.
> 
> 

Yeah, here's roughly how LRO should work for ECN:

If CE, ECE, or CWR is set on an incoming packet, hw should stop LRO
on that TCP stream and indicate the packet(s) to the driver right
away.

In the future if RFC 3540 is implemented, hw needs to do the above
and keep track of ECT(0) and ECT(1) on incoming packets.  It needs
to do a partial sum or parity on the ECT codes and give it to the
driver when an LRO packet is ready.

There is no reason to find out if ECN is enabled or not for any
TCP connections.  Hw just needs to watch the above bits in the
incoming packets.


^ permalink raw reply	[flat|nested] 25+ messages in thread
* Re: [PATCH]NET: Add ECN support for TSO
@ 2006-07-07 20:57 Michael Chan
  2006-07-07 21:59 ` David Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Chan @ 2006-07-07 20:57 UTC (permalink / raw)
  To: ravinandan.arakali, davem, herbert; +Cc: netdev


Ravinandan Arakali wrote:
> Michael,
> Are network cards expected to be aware-of and implement 
> RFC3540(ECN with
> nonces) ?
> 
Hi Ravi,

RFC3540 is still experimental I believe and it is not implemented
in the Linux stack.  Even if it is implemented and the NS bit is set,
I think the TSO logic can simply replicate the NS bit as each TSO
divided segment contains the same acknowledgement information from
the original super segment.

We should hear Herbert, David, or others' opinions on this.



^ permalink raw reply	[flat|nested] 25+ messages in thread
* [PATCH]NET: Add ECN support for TSO
@ 2006-06-28  3:06 Michael Chan
  2006-06-28  3:10 ` Herbert Xu
  2006-07-07 18:56 ` Ravinandan Arakali
  0 siblings, 2 replies; 25+ messages in thread
From: Michael Chan @ 2006-06-28  3:06 UTC (permalink / raw)
  To: davem, herbert; +Cc: netdev

In the current TSO implementation, NETIF_F_TSO and ECN cannot be
turned on together in a TCP connection.  The problem is that most
hardware that supports TSO does not handle CWR correctly if it is set
in the TSO packet.  Correct handling requires CWR to be set in the
first packet only if it is set in the TSO header.

This patch adds the ability to turn on NETIF_F_TSO and ECN using 
GSO if necessary to handle TSO packets with CWR set.  Hardware
that handles CWR correctly can turn on NETIF_F_TSO_ECN in the dev->
features flag.

All TSO packets with CWR set will have the SKB_GSO_TCPV4_ECN set.  If
the output device does not have the NETIF_F_TSO_ECN feature set, GSO
will split the packet up correctly with CWR only set in the first
segment.

It is further assumed that all hardware will handle ECE properly by
replicating the ECE flag in all segments.  If that is not the case, a
simple extension of the logic will be required.


Signed-off-by: Michael Chan <mchan@broadcom.com>


diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index efd1e2a..f393de2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -316,6 +316,7 @@ struct net_device
 #define NETIF_F_TSO		(SKB_GSO_TCPV4 << NETIF_F_GSO_SHIFT)
 #define NETIF_F_UFO		(SKB_GSO_UDPV4 << NETIF_F_GSO_SHIFT)
 #define NETIF_F_GSO_ROBUST	(SKB_GSO_DODGY << NETIF_F_GSO_SHIFT)
+#define NETIF_F_TSO_ECN		(SKB_GSO_TCPV4_ECN << NETIF_F_GSO_SHIFT)
 
 #define NETIF_F_GEN_CSUM	(NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
 #define NETIF_F_ALL_CSUM	(NETIF_F_IP_CSUM | NETIF_F_GEN_CSUM)
@@ -1002,6 +1003,11 @@ static inline int netif_needs_gso(struct
 	return !skb_gso_ok(skb, dev->features);
 }
 
+static inline int tso_ecn_capable(unsigned long features)
+{
+	return ((features & NETIF_F_GSO) || (features & NETIF_F_TSO_ECN));
+}
+
 #endif /* __KERNEL__ */
 
 #endif	/* _LINUX_DEV_H */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 5fb72da..e74c294 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -175,6 +175,9 @@ enum {
 
 	/* This indicates the skb is from an untrusted source. */
 	SKB_GSO_DODGY = 1 << 2,
+
+	/* This indicates the tcp segment has CWR set. */
+	SKB_GSO_TCPV4_ECN = 1 << 3,
 };
 
 /** 
diff --git a/include/net/sock.h b/include/net/sock.h
index 2d8d6ad..2c75172 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1033,7 +1033,8 @@ static inline void sk_setup_caps(struct 
 	if (sk->sk_route_caps & NETIF_F_GSO)
 		sk->sk_route_caps |= NETIF_F_TSO;
 	if (sk->sk_route_caps & NETIF_F_TSO) {
-		if (sock_flag(sk, SOCK_NO_LARGESEND) || dst->header_len)
+		if ((sock_flag(sk, SOCK_NO_LARGESEND) &&
+		    !tso_ecn_capable(sk->sk_route_caps)) || dst->header_len)
 			sk->sk_route_caps &= ~NETIF_F_TSO;
 		else 
 			sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
diff --git a/include/net/tcp_ecn.h b/include/net/tcp_ecn.h
index c6b8439..871dca2 100644
--- a/include/net/tcp_ecn.h
+++ b/include/net/tcp_ecn.h
@@ -31,7 +31,8 @@ static inline void TCP_ECN_send_syn(stru
 				    struct sk_buff *skb)
 {
 	tp->ecn_flags = 0;
-	if (sysctl_tcp_ecn && !(sk->sk_route_caps & NETIF_F_TSO)) {
+	if (sysctl_tcp_ecn && (!(sk->sk_route_caps & NETIF_F_TSO) ||
+			       tso_ecn_capable(sk->sk_route_caps))) {
 		TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_ECE|TCPCB_FLAG_CWR;
 		tp->ecn_flags = TCP_ECN_OK;
 		sock_set_flag(sk, SOCK_NO_LARGESEND);
@@ -56,6 +57,9 @@ static inline void TCP_ECN_send(struct s
 			if (tp->ecn_flags&TCP_ECN_QUEUE_CWR) {
 				tp->ecn_flags &= ~TCP_ECN_QUEUE_CWR;
 				skb->h.th->cwr = 1;
+				if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+					skb_shinfo(skb)->gso_type |=
+						SKB_GSO_TCPV4_ECN;
 			}
 		} else {
 			/* ACK or retransmitted segment: clear ECT|CE */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index bdd71db..c4a4dba 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2044,7 +2044,8 @@ struct sk_buff * tcp_make_synack(struct 
 	memset(th, 0, sizeof(struct tcphdr));
 	th->syn = 1;
 	th->ack = 1;
-	if (dst->dev->features&NETIF_F_TSO)
+	if ((dst->dev->features & NETIF_F_TSO) &&
+	    !tso_ecn_capable(dst->dev->features))
 		ireq->ecn_ok = 0;
 	TCP_ECN_make_synack(req, th);
 	th->source = inet_sk(sk)->sport;



^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2006-07-26 19:39 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-08  1:01 [PATCH]NET: Add ECN support for TSO Michael Chan
2006-07-08 20:32 ` David Miller
2006-07-12  1:45   ` Ravinandan Arakali
2006-07-12  1:51     ` David Miller
2006-07-13 17:26   ` Ravinandan Arakali
  -- strict thread matches above, loose matches on Subject: below --
2006-07-14 16:12 Dan Reader
2006-07-26 19:40 ` Michael Chan
2006-07-13 19:35 Michael Chan
2006-07-14  5:03 ` David Miller
2006-07-12  4:53 Michael Chan
2006-07-12  6:11 ` David Miller
2006-07-12 17:15   ` Ravinandan Arakali
2006-07-07 20:57 Michael Chan
2006-07-07 21:59 ` David Miller
2006-07-07 22:52   ` David Miller
2006-06-28  3:06 Michael Chan
2006-06-28  3:10 ` Herbert Xu
2006-06-28  3:40   ` Michael Chan
2006-06-28  3:48     ` Herbert Xu
2006-06-28  4:37       ` Michael Chan
2006-06-28  4:42         ` Herbert Xu
2006-06-28  4:54           ` Michael Chan
2006-06-28  4:57             ` Herbert Xu
2006-06-29 19:30         ` David Miller
2006-07-07 18:56 ` Ravinandan Arakali

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).