[RFC/PATCH] "strict" ipv4 reassembly

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC/PATCH] "strict" ipv4 reassembly
@ 2005-05-17 16:18 Arthur Kepner
  2005-05-17 17:49 ` David S. Miller
  0 siblings, 1 reply; 80+ messages in thread
From: Arthur Kepner @ 2005-05-17 16:18 UTC (permalink / raw)
  To: netdev


The Problem
-----------

There's a well-known problem with IPv4 fragmentation/ 
reassembly - the 16-bit IP ID can uniquely identify 
only 65535 datagrams, and a gigabit/sec source can 
emit that many datagrams in seconds. Since fragments 
can sit on a reassembly queue for a "long time" 
(30 seconds is the default in Linux), there's a 
possibility of reassembling fragments from different 
datagrams.

The Bigger Problem
------------------

This is mainly a problem for UDP, where IP fragmentation 
at the source is not uncommon. The UDP checksum is only 
16 bits, so there's a not-insignificant possibility that 
it won't detect when a datagram has been incorrectly 
reassembled. The result can be silent data corruption, 
and much unhappiness.

What next?
----------

This is a fundamental problem with IP(v4), which was 
designed before people dreamed of gigabit/sec network 
links. There's no simple, completely effective fix that 
preserves compatability. 

There has been at least a little discussion here 
before:
http://marc.theaimsgroup.com/?l=linux-netdev&m=108987122412812&w=2

A simple, obvious, partial remedy is to tune 
"sysctl_ipfrag_time" down.

Another simple change which preserves compatability in 
practice, is "strict ipv4 reassembly". Being "strict" about 
reassembly means that you don't allow gaps or overlaps in 
the reassembly queue. Fragments which would introduce gaps 
or overlaps are dropped. Disallowing gaps and overlaps also 
implies that:

  1) Fragments must arrive in order (or in reverse order) -
     out of order fragments are dropped. 

  2) If the first (or last) fragment of a datagram arrives 
     and there's already a reassembly queue for that (proto, 
     ipid, src, dst), then the existing reassembly queue is
     dropped, and a new one started.

Strict reassembly has been very effective in practice at 
preventing incorrect IP reassembly. (We have tests which 
attempt to produce and then detect corruption produced 
by bad IP reassembly.)

Following is a patch which implements strict reassembly. 
Strict reassembly is off by default (so the default 
behavior is not changed.)  Strict reassembly is enabled 
by doing "echo 1 > /proc/sys/net/ipv4/strict_reassembly".

Patch is against 2.6.12-rc4.

 include/linux/sysctl.h     |    1
 net/ipv4/ip_fragment.c     |  189 ++++++++++++++++++++++++++++++++++++++++++++-
 net/ipv4/sysctl_net_ipv4.c |    9 ++
 3 files changed, 198 insertions(+), 1 deletion(-)

Signed-off-by: Arthur Kepner <akepner@sgi.com>

diff -pur linux.old/include/linux/sysctl.h linux.new/include/linux/sysctl.h
--- linux.old/include/linux/sysctl.h	2005-05-16 16:29:55.236056917 -0700
+++ linux.new/include/linux/sysctl.h	2005-05-17 00:57:16.889588271 -0700
@@ -347,6 +347,7 @@ enum
 	NET_TCP_MODERATE_RCVBUF=106,
 	NET_TCP_TSO_WIN_DIVISOR=107,
 	NET_TCP_BIC_BETA=108,
+	NET_IPV4_STRICT_REASM=109,
 };
 
 enum {
diff -pur linux.old/net/ipv4/ip_fragment.c linux.new/net/ipv4/ip_fragment.c
--- linux.old/net/ipv4/ip_fragment.c	2005-05-16 16:30:18.610281655 -0700
+++ linux.new/net/ipv4/ip_fragment.c	2005-05-17 00:53:53.243293277 -0700
@@ -56,6 +56,9 @@
 int sysctl_ipfrag_high_thresh = 256*1024;
 int sysctl_ipfrag_low_thresh = 192*1024;
 
+/* strict reassembly is off by default */
+int sysctl_strict_reassembly = 0;
+
 /* Important NOTE! Fragment queue must be destroyed before MSL expires.
  * RFC791 is wrong proposing to prolongate timer each fragment arrival by TTL.
  */
@@ -353,6 +356,24 @@ static struct ipq *ip_frag_create(unsign
 {
 	struct ipq *qp;
 
+	if (sysctl_strict_reassembly) {
+		/* make a new reassembly queue iff this is the first or
+		 * last fragment */
+		int frag_off = ntohs(iph->frag_off);
+		int offset   = (frag_off & IP_OFFSET);
+
+		if ((offset == 0) && !(frag_off & IP_MF)) {
+			/* first fragment and there aren't anymore coming -
+			 * nonsense, drop the fragment */
+			return NULL;
+		}
+		if ((offset != 0) && (frag_off & IP_MF)) {
+			/* if this isn't either the first or last fragment,
+			 * don't make a new reassembly queue */
+			return NULL;
+		}
+	}
+
 	if ((qp = frag_alloc_queue()) == NULL)
 		goto out_nomem;
 
@@ -381,6 +402,35 @@ out_nomem:
 	return NULL;
 }
 
+/* a reassembly queue match has been found and we're being
+ * "strict" about reassembly. If this fragment is first (last)
+ * in a reassembly queue which already has a first (last)
+ * fragment, then drop the existing reassembly queue. Return
+ * 1 if the existing queue was dropped, 0 otherwise.
+ *
+ * This function is called with (and returns with) a read_lock
+ * on ipfrag_lock
+ */
+static int __ip_find_strict_check(const struct iphdr *iph, struct ipq *qp)
+{
+	int frag_off = ntohs(iph->frag_off);
+	int offset   = (frag_off & IP_OFFSET);
+
+	if (((offset == 0) && (qp->last_in & FIRST_IN)) ||
+		(!(frag_off & IP_MF) && (qp->last_in & LAST_IN))) {
+		atomic_inc(&qp->refcnt);
+		read_unlock(&ipfrag_lock);
+		spin_lock(&qp->lock);
+		if (!(qp->last_in&COMPLETE))
+			ipq_kill(qp);
+		spin_unlock(&qp->lock);
+		ipq_put(qp, NULL);
+		read_lock(&ipfrag_lock);
+		return 1;
+	}
+	return 0;
+}
+
 /* Find the correct entry in the "incomplete datagrams" queue for
  * this IP datagram, and create new one, if nothing is found.
  */
@@ -400,6 +450,10 @@ static inline struct ipq *ip_find(struct
 		   qp->daddr == daddr	&&
 		   qp->protocol == protocol &&
 		   qp->user == user) {
+			if (sysctl_strict_reassembly &&
+				__ip_find_strict_check(iph, qp)) {
+				break;
+			}
 			atomic_inc(&qp->refcnt);
 			read_unlock(&ipfrag_lock);
 			return qp;
@@ -549,6 +603,136 @@ err:
 	kfree_skb(skb);
 }
 
+/* Add new segment to existing queue using "strict" semantics. 
+ * The segment is rejected if it introduces gaps in the reassembly 
+ * queue or overlaps with any existing fragments in the reassembly 
+ * queue. 
+ */
+static void ip_frag_queue_strict(struct ipq *qp, struct sk_buff *skb)
+{
+	struct sk_buff *prev, *next;
+	int flags, offset;
+	int ihl, end;
+
+	if (qp->last_in & COMPLETE)
+		goto err;
+
+ 	offset = ntohs(skb->nh.iph->frag_off);
+	flags = offset & ~IP_OFFSET;
+	offset &= IP_OFFSET;
+	offset <<= 3;		/* offset is in 8-byte chunks */
+ 	ihl = skb->nh.iph->ihl * 4;
+
+	/* Determine the position of this fragment. */
+ 	end = offset + skb->len - ihl;
+
+	if ((!(flags & IP_MF) && (qp->last_in & LAST_IN)) ||
+		((offset == 0) && (qp->last_in & FIRST_IN))) {
+		/* This can happen only if sysctl_strict_reassembly 
+		 * was toggled on after we did ip_find() for this 
+		 * fragment.
+		 */
+		goto err;
+	}
+
+	/* Is this the final fragment? */
+	if ((flags & IP_MF) == 0) {
+		/* If we already have some bits beyond end 
+		 * drop this fragment.
+		 */
+		if (end < qp->len)
+			goto err;
+		qp->len = end;
+	} else {
+		if (end&7) {
+			end &= ~7;
+			if (skb->ip_summed != CHECKSUM_UNNECESSARY)
+				skb->ip_summed = CHECKSUM_NONE;
+		}
+		if (end > qp->len) {
+			/* Some bits beyond end -> corruption. */
+			if (qp->last_in & LAST_IN)
+				goto err;
+			qp->len = end;
+		}
+	}
+	if (end == offset)
+		goto err;
+
+	if (pskb_pull(skb, ihl) == NULL)
+		goto err;
+	if (pskb_trim(skb, end-offset))
+		goto err;
+
+	/* Find out which fragments are in front and at the back of us
+	 * in the chain of fragments so far.  We must know where to put
+	 * this fragment, right?
+	 */
+	prev = NULL;
+	for(next = qp->fragments; next != NULL; next = next->next) {
+		if (FRAG_CB(next)->offset >= offset)
+			break;	/* bingo! */
+		prev = next;
+	}
+
+	WARN_ON(prev && next);
+
+	/* We found where to put this one. Make sure there are no overlaps */
+
+	if (prev) {
+		int i = (FRAG_CB(prev)->offset + prev->len) - offset;
+
+		/* if i > 0, this fragment overlaps with the previous 
+		 * fragment. if i < 0, this fragment would introduce a 
+		 * "hole" in the reassembly chain. Drop it in either 
+		 * case. 
+		 */
+
+		if (i != 0) goto err;
+
+	}
+
+	if (next) {
+		int i = end - FRAG_CB(next)->offset;
+
+		/* if i > 0, this fragment overlaps with the next.
+		 * if i < 0,  this fragment would introduce a
+		 * "hole" in the reassembly chain. Drop it in either
+		 * case.
+		 */
+
+		if (i != 0) goto err;
+	}
+
+	FRAG_CB(skb)->offset = offset;
+
+	/* Insert this fragment in the chain of fragments. */
+	skb->next = next;
+	if (prev)
+		prev->next = skb;
+	else
+		qp->fragments = skb;
+
+ 	if (skb->dev)
+ 		qp->iif = skb->dev->ifindex;
+	skb->dev = NULL;
+	qp->stamp = skb->stamp;
+	qp->meat += skb->len;
+	atomic_add(skb->truesize, &ip_frag_mem);
+	if (offset == 0)
+		qp->last_in |= FIRST_IN;
+	if ((flags & IP_MF) == 0) 
+		qp->last_in |= LAST_IN;
+
+	write_lock(&ipfrag_lock);
+	list_move_tail(&qp->lru_list, &ipq_lru_list);
+	write_unlock(&ipfrag_lock);
+
+	return;
+
+err:
+	kfree_skb(skb);
+}
 
 /* Build a new IP datagram from all its fragments. */
 
@@ -661,7 +845,10 @@ struct sk_buff *ip_defrag(struct sk_buff
 
 		spin_lock(&qp->lock);
 
-		ip_frag_queue(qp, skb);
+		if (sysctl_strict_reassembly)
+			ip_frag_queue_strict(qp, skb);
+		else
+			ip_frag_queue(qp, skb);
 
 		if (qp->last_in == (FIRST_IN|LAST_IN) &&
 		    qp->meat == qp->len)
diff -pur linux.old/net/ipv4/sysctl_net_ipv4.c linux.new/net/ipv4/sysctl_net_ipv4.c
--- linux.old/net/ipv4/sysctl_net_ipv4.c	2005-05-16 16:30:31.773480692 -0700
+++ linux.new/net/ipv4/sysctl_net_ipv4.c	2005-05-16 17:18:07.125138363 -0700
@@ -29,6 +29,7 @@ extern int sysctl_ipfrag_low_thresh;
 extern int sysctl_ipfrag_high_thresh; 
 extern int sysctl_ipfrag_time;
 extern int sysctl_ipfrag_secret_interval;
+extern int sysctl_strict_reassembly;
 
 /* From ip_output.c */
 extern int sysctl_ip_dynaddr;
@@ -258,6 +259,14 @@ ctl_table ipv4_table[] = {
 		.strategy	= &sysctl_jiffies
 	},
 	{
+		.ctl_name	= NET_IPV4_STRICT_REASM,
+		.procname	= "strict_reassembly",
+		.data		= &sysctl_strict_reassembly,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec
+	},
+	{
 		.ctl_name	= NET_IPV4_TCP_KEEPALIVE_TIME,
 		.procname	= "tcp_keepalive_time",
 		.data		= &sysctl_tcp_keepalive_time,


--
Arthur

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 16:18 [RFC/PATCH] "strict" ipv4 reassembly Arthur Kepner
@ 2005-05-17 17:49 ` David S. Miller
  2005-05-17 18:28   ` Arthur Kepner
                     ` (2 more replies)
  0 siblings, 3 replies; 80+ messages in thread
From: David S. Miller @ 2005-05-17 17:49 UTC (permalink / raw)
  To: akepner; +Cc: netdev

From: Arthur Kepner <akepner@sgi.com>
Date: Tue, 17 May 2005 09:18:26 -0700 (PDT)

>   1) Fragments must arrive in order (or in reverse order) -
>      out of order fragments are dropped. 

Even the most simplistic flow over the real internet
can get slight packet reordering.

Heck, reordering happens on SMP on any network.

IP is supposed to be resilient to side effects of network
topology, and one such common side effect is packet reordering.
It's common, it's fine, and the networking stack deals with it
gracefully.  Strict reassembly does not.

Sure it's off by default, but isn't it a better idea
to use NFS over TCP instead?

Decreasing ipfrag_time is also not an option, because then
you break fragmentation for packet radio folks :-)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 17:49 ` David S. Miller
@ 2005-05-17 18:28   ` Arthur Kepner
  2005-05-17 18:48     ` David S. Miller
  2005-05-17 18:38   ` Andi Kleen
  2005-05-17 22:11   ` Herbert Xu
  2 siblings, 1 reply; 80+ messages in thread
From: Arthur Kepner @ 2005-05-17 18:28 UTC (permalink / raw)
  To: David S.Miller; +Cc: netdev

On Tue, 17 May 2005, David S.Miller wrote:

> ....
> IP is supposed to be resilient to side effects of network
> topology, and one such common side effect is packet reordering.
> It's common, it's fine, and the networking stack deals with it
> gracefully.  Strict reassembly does not.
> 

IP was designed a looong time ago. I think it's reasonable to 
make (or at least allow for) some accomodation when networking 
bandwidths have gone up by several orders of magnitude. (And 
while we wait for IPv6 to catch on ;-) 

>
> Sure it's off by default, but isn't it a better idea
> to use NFS over TCP instead?
> 

This isn't limited to NFS, of course, though that's the 
application of most concern. I know that we have customers 
who, for good or bad reasons, _do_ use NFS over UDP. 

> Decreasing ipfrag_time is also not an option, because then
> you break fragmentation for packet radio folks :-)

Different sysctls for different folks....

--
Arthur

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 17:49 ` David S. Miller
  2005-05-17 18:28   ` Arthur Kepner
@ 2005-05-17 18:38   ` Andi Kleen
  2005-05-17 18:45     ` Pekka Savola
                       ` (3 more replies)
  2005-05-17 22:11   ` Herbert Xu
  2 siblings, 4 replies; 80+ messages in thread
From: Andi Kleen @ 2005-05-17 18:38 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, akepner

"David S. Miller" <davem@davemloft.net> writes:

> From: Arthur Kepner <akepner@sgi.com>
> Date: Tue, 17 May 2005 09:18:26 -0700 (PDT)
>
>>   1) Fragments must arrive in order (or in reverse order) -
>>      out of order fragments are dropped. 
>
> Even the most simplistic flow over the real internet
> can get slight packet reordering.
>
> Heck, reordering happens on SMP on any network.
>
> IP is supposed to be resilient to side effects of network
> topology, and one such common side effect is packet reordering.
> It's common, it's fine, and the networking stack deals with it
> gracefully.  Strict reassembly does not.

If anything it would be better as a per route flag.
Then you could set it only for your local network
where you know Gigabit happens and reordering might
be avoidable in some cases.

-Andi

P.S.: Arthur I think your arguments would have more
force if you published the test program that demonstrates the
corruption.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 18:38   ` Andi Kleen
@ 2005-05-17 18:45     ` Pekka Savola
  2005-05-17 18:50     ` David S. Miller
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 80+ messages in thread
From: Pekka Savola @ 2005-05-17 18:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David S. Miller, netdev, akepner

On Tue, 17 May 2005, Andi Kleen wrote:
> P.S.: Arthur I think your arguments would have more
> force if you published the test program that demonstrates the
> corruption.

If more explicit analytical or theoretical background would be 
convincing, look no further than 
http://www.watersprings.org/pub/id/draft-mathis-frag-harmful-00.txt

-- 
Pekka Savola                 "You each name yourselves king, yet the
Netcore Oy                    kingdom bleeds."
Systems. Networks. Security. -- George R.R. Martin: A Clash of Kings

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 18:28   ` Arthur Kepner
@ 2005-05-17 18:48     ` David S. Miller
  2005-05-17 20:21       ` Arthur Kepner
  0 siblings, 1 reply; 80+ messages in thread
From: David S. Miller @ 2005-05-17 18:48 UTC (permalink / raw)
  To: akepner; +Cc: netdev

From: Arthur Kepner <akepner@sgi.com>
Date: Tue, 17 May 2005 11:28:05 -0700 (PDT)

> On Tue, 17 May 2005, David S.Miller wrote:
> 
> > Decreasing ipfrag_time is also not an option, because then
> > you break fragmentation for packet radio folks :-)
> 
> Different sysctls for different folks....

Can I tell users to call you when they enable the strict
fragmentation and they can no longer talk UDP to
remote sites outside of their subnet, or it breaks
on their heavily SMP machine due to natural system local
packet reordering?

Packet reordering happens on the local machine with SMP.
There is no way to avoid this.  And when it triggers your
patch will drop frags on the ground all the time.

If you want to fix things, do it without knowingly breaking
stuff that does currently work.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 18:38   ` Andi Kleen
  2005-05-17 18:45     ` Pekka Savola
@ 2005-05-17 18:50     ` David S. Miller
  2005-05-17 18:56       ` Rick Jones
  2005-05-17 18:57     ` John Heffner
  2005-05-17 19:01     ` Nivedita Singhvi
  3 siblings, 1 reply; 80+ messages in thread
From: David S. Miller @ 2005-05-17 18:50 UTC (permalink / raw)
  To: ak; +Cc: netdev, akepner

From: Andi Kleen <ak@muc.de>
Date: Tue, 17 May 2005 20:38:25 +0200

> If anything it would be better as a per route flag.
> Then you could set it only for your local network
> where you know Gigabit happens and reordering might
> be avoidable in some cases.

This is still bogus, on SMP we get packet reordering all the time.

> P.S.: Arthur I think your arguments would have more
> force if you published the test program that demonstrates the
> corruption.

I already know it happens, and that IBM has gotten NFS corruption due
to this problem.  Using NFS over UDP is stupid.

I find it ironic to claim that folks are "stuck with UDP over NFS" but
were able to upgrade their networking technology to gigabit.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 18:50     ` David S. Miller
@ 2005-05-17 18:56       ` Rick Jones
  0 siblings, 0 replies; 80+ messages in thread
From: Rick Jones @ 2005-05-17 18:56 UTC (permalink / raw)
  To: netdev

> I already know it happens, and that IBM has gotten NFS corruption due
> to this problem.  

Frankengrams can be fun :)

> Using NFS over UDP is stupid.
> 
> I find it ironic to claim that folks are "stuck with UDP over NFS" but
> were able to upgrade their networking technology to gigabit.

It would seem new NIC drivers ship more easily than new NFS/OS bits in some places.

rick  jones

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 18:38   ` Andi Kleen
  2005-05-17 18:45     ` Pekka Savola
  2005-05-17 18:50     ` David S. Miller
@ 2005-05-17 18:57     ` John Heffner
  2005-05-17 19:09       ` David S. Miller
  2005-05-17 19:17       ` Andi Kleen
  2005-05-17 19:01     ` Nivedita Singhvi
  3 siblings, 2 replies; 80+ messages in thread
From: John Heffner @ 2005-05-17 18:57 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David S. Miller, netdev, akepner

On Tuesday 17 May 2005 02:38 pm, Andi Kleen wrote:
> "David S. Miller" <davem@davemloft.net> writes:
> > From: Arthur Kepner <akepner@sgi.com>
> > Date: Tue, 17 May 2005 09:18:26 -0700 (PDT)
> >
> >>   1) Fragments must arrive in order (or in reverse order) -
> >>      out of order fragments are dropped.
> >
> > Even the most simplistic flow over the real internet
> > can get slight packet reordering.
> >
> > Heck, reordering happens on SMP on any network.
> >
> > IP is supposed to be resilient to side effects of network
> > topology, and one such common side effect is packet reordering.
> > It's common, it's fine, and the networking stack deals with it
> > gracefully.  Strict reassembly does not.
>
> If anything it would be better as a per route flag.
> Then you could set it only for your local network
> where you know Gigabit happens and reordering might
> be avoidable in some cases.

It would be better still to have a per-route packet reassembly timeout in 
milliseconds.  Expecting perfect ordering seems very fragile even for a LAN.

  -John

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 18:38   ` Andi Kleen
                       ` (2 preceding siblings ...)
  2005-05-17 18:57     ` John Heffner
@ 2005-05-17 19:01     ` Nivedita Singhvi
  2005-05-17 19:13       ` Rick Jones
  3 siblings, 1 reply; 80+ messages in thread
From: Nivedita Singhvi @ 2005-05-17 19:01 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David S. Miller, netdev, akepner

Andi Kleen wrote:

> "David S. Miller" <davem@davemloft.net> writes:
> 
> 
>>From: Arthur Kepner <akepner@sgi.com>
>>Date: Tue, 17 May 2005 09:18:26 -0700 (PDT)
>>
>>
>>>  1) Fragments must arrive in order (or in reverse order) -
>>>     out of order fragments are dropped. 
>>
>>Even the most simplistic flow over the real internet
>>can get slight packet reordering.
>>
>>Heck, reordering happens on SMP on any network.
>>
>>IP is supposed to be resilient to side effects of network
>>topology, and one such common side effect is packet reordering.
>>It's common, it's fine, and the networking stack deals with it
>>gracefully.  Strict reassembly does not.
> 
> 
> If anything it would be better as a per route flag.
> Then you could set it only for your local network
> where you know Gigabit happens and reordering might
> be avoidable in some cases.
> 
> -Andi
> 
> P.S.: Arthur I think your arguments would have more
> force if you published the test program that demonstrates the
> corruption.

When we first ran into this, the dropping of
out-of-order fragments and overlapped fragments
was considered by us, but we finally did not
employ it precisely because of the ordering
requirement.

This is a fast LAN problem - real internet latencies
don't allow for the wrapping of the id field that fast.

Reordering does happen frequently on an SMP (this was
a non-NAPI environment, NAPI reduces it quite a bit)
so even local gigabit low latency LANs tend to suffer
from it. You really need to be running on a UP to be
entirely safe.

The problem is exacerbated by NFS mount sizes of at least
4K or 8K - thus running NFS over UDP is just an
environment you have to avoid in any case. That doesn't
take care of the other apps, of course.

So you cannot deploy a solution like this over all
interfaces and all routes - perhaps, as Andi says,
a per-route flag (turned on by the sysadmin when
running on a UP or NAPI case) might help. But you'd
have to do this very carefully.

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 18:57     ` John Heffner
@ 2005-05-17 19:09       ` David S. Miller
  2005-05-17 19:21         ` Rick Jones
                           ` (2 more replies)
  2005-05-17 19:17       ` Andi Kleen
  1 sibling, 3 replies; 80+ messages in thread
From: David S. Miller @ 2005-05-17 19:09 UTC (permalink / raw)
  To: jheffner; +Cc: ak, netdev, akepner

From: John Heffner <jheffner@psc.edu>
Date: Tue, 17 May 2005 14:57:38 -0400

> It would be better still to have a per-route packet reassembly timeout in 
> milliseconds.

I agree.  And if we can setup the infrastructure such that the
drivers can indicate the speed of the link they are communicating
on, then we can set sane default values on the automatically
created subnet routes.

These would need to be refreshed when link state changes, but
we have the mechanics for that kind of device event stuff
already.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 19:01     ` Nivedita Singhvi
@ 2005-05-17 19:13       ` Rick Jones
  2005-05-17 19:25         ` Nivedita Singhvi
  0 siblings, 1 reply; 80+ messages in thread
From: Rick Jones @ 2005-05-17 19:13 UTC (permalink / raw)
  To: netdev

> This is a fast LAN problem - real internet latencies
> don't allow for the wrapping of the id field that fast.

What is a "fast" LAN these days?  We were seeing IP ID wrap with NFS traffic 
back in the time of HP-UX 8.07 (early 1990's) over 10 Mbit/s Ethernet.  Since 
I'm waxing historic, part of the problem at that time was giving the IP 
fragments to the driver one at a time rather than all at once.  When the system 
was as fast or faster than the network, and the driver queue filled, the first 
couple fragments of a datagram would get into the queue, and the rest would be 
dropped, setting the stage for those wonderful Frankengrams.  Part of the fix 
was to require a driver to take an entire IP datagram's worth of fragments or 
none of them.

And there was at least one CKO NIC from that era (the HP-PB FDDI NIC) where the 
first IP datagram fragment was sent last :)

If the WAN connected system is (still?) using a global IP ID rather than 
per-route, it could quite easily be wrapping.  And we have WAN links with 
bandwidths of 10's of Megabits, so it also comes-down to how much other traffic 
is going and the quantity of request parallelism in NFS right?

The larger NFS UDP mount sizes mean more fragments, but intriguingly, they also 
mean slower wrap of the IP ID space :)

rick jones

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 18:57     ` John Heffner
  2005-05-17 19:09       ` David S. Miller
@ 2005-05-17 19:17       ` Andi Kleen
  2005-05-17 19:56         ` David Stevens
  1 sibling, 1 reply; 80+ messages in thread
From: Andi Kleen @ 2005-05-17 19:17 UTC (permalink / raw)
  To: John Heffner; +Cc: David S. Miller, netdev, akepner

John Heffner <jheffner@psc.edu> writes:

>
> It would be better still to have a per-route packet reassembly timeout in 
> milliseconds.  Expecting perfect ordering seems very fragile even for a LAN.

In fact I implemented that a long time ago for the SUSE 2.4 kernel.
But for some reason nobody liked it very much, so I never submitted it.

I can dig out the old patches if there is interest.

-Andi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 19:09       ` David S. Miller
@ 2005-05-17 19:21         ` Rick Jones
  2005-05-17 19:26         ` Ben Greear
  2005-05-17 20:48         ` Thomas Graf
  2 siblings, 0 replies; 80+ messages in thread
From: Rick Jones @ 2005-05-17 19:21 UTC (permalink / raw)
  To: netdev

David S. Miller wrote:
> From: John Heffner <jheffner@psc.edu>
> Date: Tue, 17 May 2005 14:57:38 -0400
> 
> 
>>It would be better still to have a per-route packet reassembly timeout in 
>>milliseconds.
> 
> 
> I agree.  And if we can setup the infrastructure such that the
> drivers can indicate the speed of the link they are communicating
> on, then we can set sane default values on the automatically
> created subnet routes.

Does the ingress link really tell us all that much about the path a given 
datagram's fragments took to get to us?  Even if the source IP is ostensibly a 
local one?

rick jones

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 19:13       ` Rick Jones
@ 2005-05-17 19:25         ` Nivedita Singhvi
  2005-05-17 19:31           ` John Heffner
  2005-05-17 19:33           ` Rick Jones
  0 siblings, 2 replies; 80+ messages in thread
From: Nivedita Singhvi @ 2005-05-17 19:25 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev

Rick Jones wrote:

> 
>> This is a fast LAN problem - real internet latencies
>> don't allow for the wrapping of the id field that fast.
> 
> 
> What is a "fast" LAN these days?  We were seeing IP ID wrap with NFS 
> traffic back in the time of HP-UX 8.07 (early 1990's) over 10 Mbit/s 
> Ethernet.  Since I'm waxing historic, part of the problem at that time 
> was giving the IP fragments to the driver one at a time rather than all 
> at once.  When the system was as fast or faster than the network, and 
> the driver queue filled, the first couple fragments of a datagram would 
> get into the queue, and the rest would be dropped, setting the stage for 
> those wonderful Frankengrams.  Part of the fix was to require a driver 
> to take an entire IP datagram's worth of fragments or none of them.

Yes, a different manifestation of the problem ;). I was talking
in the context of current Linux code and the common ethernet drivers
and typical current Internet/Intranet latencies.

> And there was at least one CKO NIC from that era (the HP-PB FDDI NIC) 
> where the first IP datagram fragment was sent last :)
> 
> If the WAN connected system is (still?) using a global IP ID rather than 
> per-route, it could quite easily be wrapping.  And we have WAN links 
> with bandwidths of 10's of Megabits, so it also comes-down to how much 
> other traffic is going and the quantity of request parallelism in NFS 
> right?

Actually, the problem is much worse now - we have virtual
partitions in the Xen environment for instance where some
packets are headed for local consumption (virtual network,
no actual network latency to speak of) and some going
out to the network. Having a global IP id generator just
won't be able to keep up - we could wrap in submilliseconds...

> The larger NFS UDP mount sizes mean more fragments, but intriguingly, 
> they also mean slower wrap of the IP ID space :)

True, but in a 32 NIC environment, see how they wrap ;)...

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 19:09       ` David S. Miller
  2005-05-17 19:21         ` Rick Jones
@ 2005-05-17 19:26         ` Ben Greear
  2005-05-17 20:48         ` Thomas Graf
  2 siblings, 0 replies; 80+ messages in thread
From: Ben Greear @ 2005-05-17 19:26 UTC (permalink / raw)
  To: David S. Miller; +Cc: jheffner, ak, netdev, akepner

David S. Miller wrote:
> From: John Heffner <jheffner@psc.edu>
> Date: Tue, 17 May 2005 14:57:38 -0400
> 
> 
>>It would be better still to have a per-route packet reassembly timeout in 
>>milliseconds.
> 
> 
> I agree.  And if we can setup the infrastructure such that the
> drivers can indicate the speed of the link they are communicating
> on, then we can set sane default values on the automatically
> created subnet routes.

I assume you mean more than local physical link speed?  Imagine:

GigE server -- gige-core -- 10bt hub -- gige-core -- gige client

Now, this particular setup would be lame, but the 10bt hub could
really be a bridged WAN, wireless or other slow and not-so-easily
upgraded network link.

Local link speed is relatively easy to figure out using ethtool for the
NICs that support it at least....

Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 19:25         ` Nivedita Singhvi
@ 2005-05-17 19:31           ` John Heffner
  2005-05-17 19:52             ` Nivedita Singhvi
  2005-05-17 19:33           ` Rick Jones
  1 sibling, 1 reply; 80+ messages in thread
From: John Heffner @ 2005-05-17 19:31 UTC (permalink / raw)
  To: Nivedita Singhvi, netdev; +Cc: Rick Jones

On Tuesday 17 May 2005 03:25 pm, Nivedita Singhvi wrote:
> Yes, a different manifestation of the problem ;). I was talking
> in the context of current Linux code and the common ethernet drivers
> and typical current Internet/Intranet latencies.

The problem is latency independent.

  -John

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 19:25         ` Nivedita Singhvi
  2005-05-17 19:31           ` John Heffner
@ 2005-05-17 19:33           ` Rick Jones
  2005-05-17 19:53             ` Andi Kleen
  1 sibling, 1 reply; 80+ messages in thread
From: Rick Jones @ 2005-05-17 19:33 UTC (permalink / raw)
  To: netdev

> Actually, the problem is much worse now - we have virtual
> partitions in the Xen environment for instance where some
> packets are headed for local consumption (virtual network,
> no actual network latency to speak of) and some going
> out to the network. Having a global IP id generator just
> won't be able to keep up - we could wrap in submilliseconds...

and the classic TCP sequence number isn't _really_ all that far behind :)

>> The larger NFS UDP mount sizes mean more fragments, but intriguingly, 
>> they also mean slower wrap of the IP ID space :)
> 
> 
> True, but in a 32 NIC environment, see how they wrap ;)...

Yep - I'd thought that just about everyone had gone to per-dest or per-route IP 
ID's by now, but even then

rick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 19:31           ` John Heffner
@ 2005-05-17 19:52             ` Nivedita Singhvi
  2005-05-17 20:05               ` John Heffner
  0 siblings, 1 reply; 80+ messages in thread
From: Nivedita Singhvi @ 2005-05-17 19:52 UTC (permalink / raw)
  To: John Heffner; +Cc: netdev, Rick Jones

John Heffner wrote:
> On Tuesday 17 May 2005 03:25 pm, Nivedita Singhvi wrote:
> 
>>Yes, a different manifestation of the problem ;). I was talking
>>in the context of current Linux code and the common ethernet drivers
>>and typical current Internet/Intranet latencies.
> 
> 
> The problem is latency independent.

Not entirely - because our reassembly timeout expires, typically.
(If I understand you right).

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 19:33           ` Rick Jones
@ 2005-05-17 19:53             ` Andi Kleen
  0 siblings, 0 replies; 80+ messages in thread
From: Andi Kleen @ 2005-05-17 19:53 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev

Rick Jones <rick.jones2@hp.com> writes:

>> Actually, the problem is much worse now - we have virtual
>> partitions in the Xen environment for instance where some
>> packets are headed for local consumption (virtual network,
>> no actual network latency to speak of) and some going
>> out to the network. Having a global IP id generator just
>> won't be able to keep up - we could wrap in submilliseconds...
>
> and the classic TCP sequence number isn't _really_ all that far behind :)

But it has PAWS at least. But I agree even IPv6 fragmentation
with 32bit IDs is not significantly safer.

-Andi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 19:17       ` Andi Kleen
@ 2005-05-17 19:56         ` David Stevens
  2005-05-17 20:17           ` Andi Kleen
  2005-05-17 20:29           ` John Heffner
  0 siblings, 2 replies; 80+ messages in thread
From: David Stevens @ 2005-05-17 19:56 UTC (permalink / raw)
  To: Andi Kleen; +Cc: akepner, David S. Miller, John Heffner, netdev, netdev-bounce

Dave,
        Shouldn't that be an estimator on the destination RTT? Or is that
what you meant? A fast link locally that traverses a slow path along the 
way
would time out too fast.
        Of course, getting an RTT estimate if there hasn't been TCP 
traffic
is a complication. :-)

                                                +-DLS

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 19:52             ` Nivedita Singhvi
@ 2005-05-17 20:05               ` John Heffner
  2005-05-17 20:12                 ` Rick Jones
  0 siblings, 1 reply; 80+ messages in thread
From: John Heffner @ 2005-05-17 20:05 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: netdev

On Tuesday 17 May 2005 03:52 pm, Nivedita Singhvi wrote:
> John Heffner wrote:
> > On Tuesday 17 May 2005 03:25 pm, Nivedita Singhvi wrote:
> >>Yes, a different manifestation of the problem ;). I was talking
> >>in the context of current Linux code and the common ethernet drivers
> >>and typical current Internet/Intranet latencies.
> >
> > The problem is latency independent.
>
> Not entirely - because our reassembly timeout expires, typically.
> (If I understand you right).

Timer expiration is not dependent on latency, nor is IP ID wrap time.

  -John

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 20:05               ` John Heffner
@ 2005-05-17 20:12                 ` Rick Jones
  0 siblings, 0 replies; 80+ messages in thread
From: Rick Jones @ 2005-05-17 20:12 UTC (permalink / raw)
  To: netdev

> Timer expiration is not dependent on latency, nor is IP ID wrap time.

IP ID wrap time depends on the rate at which IP datgrams are generated.  That 
will depend on latency if the stuff sitting above IP is latency sensitive and/or 
has not increased its "window" (classic TCP, or number of outstanding NFS 
requests ets) to compensate for latency.

So, it is certainly true that one cannot ass-u-me that a high latency path 
cannot have IP ID wrap, but latency can indeed be a factor in how quickly IP IDs 
will wrap in a given situation.

rick jones

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 19:56         ` David Stevens
@ 2005-05-17 20:17           ` Andi Kleen
  2005-05-17 20:22             ` David S. Miller
  2005-05-17 20:29           ` John Heffner
  1 sibling, 1 reply; 80+ messages in thread
From: Andi Kleen @ 2005-05-17 20:17 UTC (permalink / raw)
  To: David Stevens
  Cc: akepner, David S. Miller, John Heffner, netdev, netdev-bounce

David Stevens <dlstevens@us.ibm.com> writes:

> Dave,
>         Shouldn't that be an estimator on the destination RTT? Or is that
> what you meant? A fast link locally that traverses a slow path along the 
> way
> would time out too fast.
>         Of course, getting an RTT estimate if there hasn't been TCP 
> traffic
> is a complication. :-)

At least for a directly connected flat network you could get it from ARP.

-Andi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 18:48     ` David S. Miller
@ 2005-05-17 20:21       ` Arthur Kepner
  0 siblings, 0 replies; 80+ messages in thread
From: Arthur Kepner @ 2005-05-17 20:21 UTC (permalink / raw)
  To: David S.Miller; +Cc: netdev


On Tue, 17 May 2005, David S.Miller wrote:

> ....
> Can I tell users to call you when they enable the strict
> fragmentation and they can no longer talk UDP to
> remote sites outside of their subnet, or it breaks
> on their heavily SMP machine due to natural system local
> packet reordering?
> ....

I'd much rather take that call than one from a customer 
whose data has been quietly corrupted.

--
Arthur

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 20:17           ` Andi Kleen
@ 2005-05-17 20:22             ` David S. Miller
  2005-05-17 20:27               ` Andi Kleen
  0 siblings, 1 reply; 80+ messages in thread
From: David S. Miller @ 2005-05-17 20:22 UTC (permalink / raw)
  To: ak; +Cc: dlstevens, akepner, jheffner, netdev, netdev-bounce

From: Andi Kleen <ak@muc.de>
Date: Tue, 17 May 2005 22:17:36 +0200

> At least for a directly connected flat network you could get it from ARP.

So we can make this host based, and store that "arp RTT" thing in the
inetpeer cache.  :-)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 20:22             ` David S. Miller
@ 2005-05-17 20:27               ` Andi Kleen
  2005-05-17 21:02                 ` David S. Miller
  0 siblings, 1 reply; 80+ messages in thread
From: Andi Kleen @ 2005-05-17 20:27 UTC (permalink / raw)
  To: David S. Miller; +Cc: dlstevens, akepner, jheffner, netdev, netdev-bounce

On Tue, May 17, 2005 at 01:22:02PM -0700, David S. Miller wrote:
> From: Andi Kleen <ak@muc.de>
> Date: Tue, 17 May 2005 22:17:36 +0200
> 
> > At least for a directly connected flat network you could get it from ARP.
> 
> So we can make this host based, and store that "arp RTT" thing in the
> inetpeer cache.  :-)

Arghl, you said the "i" word....
Anyways, I think the neighbour cache is fully appropiate, is it not?

But it's not clear such a hack would be worth it anyways. 
If anything it would be probably better to let mountd set the RTTs, e.g. 
implicitely with MSG_CONFIRM (I hope it is using it these days ...) 

-Andi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 19:56         ` David Stevens
  2005-05-17 20:17           ` Andi Kleen
@ 2005-05-17 20:29           ` John Heffner
  1 sibling, 0 replies; 80+ messages in thread
From: John Heffner @ 2005-05-17 20:29 UTC (permalink / raw)
  To: David Stevens; +Cc: Andi Kleen, akepner, David S. Miller, netdev

On Tuesday 17 May 2005 03:56 pm, David Stevens wrote:
> Dave,
>         Shouldn't that be an estimator on the destination RTT? Or is that

Unfortunately, it's not the RTT that matters, but link speed.
You could probably do some heuristics on this, but I don't know if it's worth 
the effort.

IMO, high-rate applications which rely on IP fragmentation and a 16-bit ones 
complement for data integrity are broken, and a band-aid style workaround for 
them is probably good enough.  As long as it's not inflicted on the rest of 
us. :)

  -John

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 19:09       ` David S. Miller
  2005-05-17 19:21         ` Rick Jones
  2005-05-17 19:26         ` Ben Greear
@ 2005-05-17 20:48         ` Thomas Graf
  2 siblings, 0 replies; 80+ messages in thread
From: Thomas Graf @ 2005-05-17 20:48 UTC (permalink / raw)
  To: David S. Miller; +Cc: jheffner, ak, netdev, akepner

* David S. Miller <20050517.120950.74749758.davem@davemloft.net> 2005-05-17 12:09
> From: John Heffner <jheffner@psc.edu>
> Date: Tue, 17 May 2005 14:57:38 -0400
> 
> > It would be better still to have a per-route packet reassembly timeout in 
> > milliseconds.
> 
> I agree.  And if we can setup the infrastructure such that the
> drivers can indicate the speed of the link they are communicating
> on, then we can set sane default values on the automatically
> created subnet routes.

I think we have two separate issues to resolve, first we need
to figure out a way to state the probability of a id wrap, this
can be a very simple pps rate estimator on the driver statistics
for hardware interfaces and something more tricky for virtual
interfaces such as found in xen. I think the link speed is not
reliable enough for this.

The second issue covers the actual defeat of the wraps, we have
to differ between the rare but possible case for long delayed
fragments and short delayed reorderings as caused by SMP. A
combination of both short and long delayed reorderings rarely
happens in practice and if so that's the bigger problem anyway.
The above probability can be used to distinguish these cases
and be more specific. SMP caused fragments have a very local
distribution and thus the window for accepted ids can be smaller
than the one for a long delayed fragment session. So maybe we
can use the above probability as a factor to calculate a window
of accepted ids.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 20:27               ` Andi Kleen
@ 2005-05-17 21:02                 ` David S. Miller
  2005-05-17 21:13                   ` Andi Kleen
  2005-05-17 21:25                   ` Rick Jones
  0 siblings, 2 replies; 80+ messages in thread
From: David S. Miller @ 2005-05-17 21:02 UTC (permalink / raw)
  To: ak; +Cc: dlstevens, akepner, jheffner, netdev, netdev-bounce

From: Andi Kleen <ak@muc.de>
Date: 17 May 2005 22:27:30 +0200,Tue, 17 May 2005 22:27:30 +0200

> But it's not clear such a hack would be worth it anyways. 
> If anything it would be probably better to let mountd set the RTTs, e.g. 
> implicitely with MSG_CONFIRM (I hope it is using it these days ...) 

I think we should implement a solution that goes beyond
the confines of NFS.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 21:02                 ` David S. Miller
@ 2005-05-17 21:13                   ` Andi Kleen
  2005-05-17 21:24                     ` David S. Miller
  2005-05-17 21:25                   ` Rick Jones
  1 sibling, 1 reply; 80+ messages in thread
From: Andi Kleen @ 2005-05-17 21:13 UTC (permalink / raw)
  To: David S. Miller; +Cc: dlstevens, akepner, jheffner, netdev, netdev-bounce

On Tue, May 17, 2005 at 02:02:45PM -0700, David S. Miller wrote:
> From: Andi Kleen <ak@muc.de>
> Date: 17 May 2005 22:27:30 +0200,Tue, 17 May 2005 22:27:30 +0200
> 
> > But it's not clear such a hack would be worth it anyways. 
> > If anything it would be probably better to let mountd set the RTTs, e.g. 
> > implicitely with MSG_CONFIRM (I hope it is using it these days ...) 
> 
> I think we should implement a solution that goes beyond
> the confines of NFS.

Nothing in MSG_CONFIRM is NFS specific.

-Andi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 21:13                   ` Andi Kleen
@ 2005-05-17 21:24                     ` David S. Miller
  0 siblings, 0 replies; 80+ messages in thread
From: David S. Miller @ 2005-05-17 21:24 UTC (permalink / raw)
  To: ak; +Cc: dlstevens, akepner, jheffner, netdev, netdev-bounce

From: Andi Kleen <ak@muc.de>
Date: 17 May 2005 23:13:01 +0200,Tue, 17 May 2005 23:13:00 +0200

> On Tue, May 17, 2005 at 02:02:45PM -0700, David S. Miller wrote:
> > From: Andi Kleen <ak@muc.de>
> > Date: 17 May 2005 22:27:30 +0200,Tue, 17 May 2005 22:27:30 +0200
> > 
> > > But it's not clear such a hack would be worth it anyways. 
> > > If anything it would be probably better to let mountd set the RTTs, e.g. 
> > > implicitely with MSG_CONFIRM (I hope it is using it these days ...) 
> > 
> > I think we should implement a solution that goes beyond
> > the confines of NFS.
> 
> Nothing in MSG_CONFIRM is NFS specific.

I'm referring to mountd.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 21:02                 ` David S. Miller
  2005-05-17 21:13                   ` Andi Kleen
@ 2005-05-17 21:25                   ` Rick Jones
  2005-05-17 22:06                     ` Arthur Kepner
  2005-05-17 22:12                     ` David S. Miller
  1 sibling, 2 replies; 80+ messages in thread
From: Rick Jones @ 2005-05-17 21:25 UTC (permalink / raw)
  To: netdev; +Cc: netdev-bounce

this may be drifting tooo much, but it seems the issue of deciding when to 
give-up on reassembly of an IP datagram is similar to the issues that neterion 
are going to be going-through creating their "LRO" (Large Receive Offload) 
solution, albeit the potential consequences of a bad decision are rather different.

both seek to know when it is unlikely that no more frames/fragments will arrive.

just how much extra overhead would there be to track the interarrival time of ip 
datagram fragments and would that allow someone to make a guess as to how long 
to reasonably wait for all the fragments to arrive? (or did I miss that being 
shot-down already?)

or an added heuristic of "if have reassembled N datagrams for the same 
source/dest/protocol tuple with ID's "larger" than 'this one' since it has 
arrived, we are probably going to wrap so might as well drop 'this one'"  for 
some judicious and magical selection of N that may be a decent predictor of wrap 
on top of some existing reassembly timout.

rick jones

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 21:25                   ` Rick Jones
@ 2005-05-17 22:06                     ` Arthur Kepner
  2005-05-17 22:18                       ` Rick Jones
  2005-05-17 22:12                     ` David S. Miller
  1 sibling, 1 reply; 80+ messages in thread
From: Arthur Kepner @ 2005-05-17 22:06 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, netdev-bounce


On Tue, 17 May 2005, Rick Jones wrote:

> ....
> or an added heuristic of "if have reassembled N datagrams for the same
> source/dest/protocol tuple with ID's "larger" than 'this one' since it has
> arrived, we are probably going to wrap so might as well drop 'this one'"  for
> some judicious and magical selection of N that may be a decent predictor of
> wrap on top of some existing reassembly timout.
> ....

How do you define "larger" in this case? A sender is free to choose 
any ID - they can't be assumed to be montonic, for sure. 

--
Arthur

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 17:49 ` David S. Miller
  2005-05-17 18:28   ` Arthur Kepner
  2005-05-17 18:38   ` Andi Kleen
@ 2005-05-17 22:11   ` Herbert Xu
  2005-05-17 22:13     ` David S. Miller
  2005-05-18  0:47     ` Thomas Graf
  2 siblings, 2 replies; 80+ messages in thread
From: Herbert Xu @ 2005-05-17 22:11 UTC (permalink / raw)
  To: David S. Miller; +Cc: akepner, netdev

David S. Miller <davem@davemloft.net> wrote:
>
> Decreasing ipfrag_time is also not an option, because then

Here is a possible solution to this:

Instead of measuring the distance using time, let's measure it
in terms of packet counts.  So every time we receive a fragmented
packet, we find all waiting fragments with the same src/dst pair.
If the id is identical we perform reassembly, if it isn't we increase
a counter in that fragment.  If the counter exceeds a threshold,
we drop the fragment.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 21:25                   ` Rick Jones
  2005-05-17 22:06                     ` Arthur Kepner
@ 2005-05-17 22:12                     ` David S. Miller
  2005-05-17 22:23                       ` Rick Jones
  1 sibling, 1 reply; 80+ messages in thread
From: David S. Miller @ 2005-05-17 22:12 UTC (permalink / raw)
  To: rick.jones2; +Cc: netdev, netdev-bounce

From: Rick Jones <rick.jones2@hp.com>
Date: Tue, 17 May 2005 14:25:19 -0700

> just how much extra overhead would there be to track the interarrival time of ip 
> datagram fragments and would that allow someone to make a guess as to how long 
> to reasonably wait for all the fragments to arrive? (or did I miss that being 
> shot-down already?)

I spam you with fragments tightly interspaced matching a known
shost/dhost/ID tuple, lowering your interarrival estimate.  The
legitimate fragment source can thus never get his fragments in
before the timer expires.

Every other one of these IP fragmentation ideas tends to have
some DoS hole in it.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 22:11   ` Herbert Xu
@ 2005-05-17 22:13     ` David S. Miller
  2005-05-17 23:08       ` Herbert Xu
  2005-05-18  0:47     ` Thomas Graf
  1 sibling, 1 reply; 80+ messages in thread
From: David S. Miller @ 2005-05-17 22:13 UTC (permalink / raw)
  To: herbert; +Cc: akepner, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 18 May 2005 08:11:01 +1000

> Instead of measuring the distance using time, let's measure it
> in terms of packet counts.  So every time we receive a fragmented
> packet, we find all waiting fragments with the same src/dst pair.
> If the id is identical we perform reassembly, if it isn't we increase
> a counter in that fragment.  If the counter exceeds a threshold,
> we drop the fragment.

And you protect against purposefully built malicious fragments how?

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 22:06                     ` Arthur Kepner
@ 2005-05-17 22:18                       ` Rick Jones
  2005-05-17 22:40                         ` David Stevens
  0 siblings, 1 reply; 80+ messages in thread
From: Rick Jones @ 2005-05-17 22:18 UTC (permalink / raw)
  To: netdev; +Cc: netdev-bounce

Arthur Kepner wrote:
> On Tue, 17 May 2005, Rick Jones wrote:
> 
> 
>>....
>>or an added heuristic of "if have reassembled N datagrams for the same
>>source/dest/protocol tuple with ID's "larger" than 'this one' since it has
>>arrived, we are probably going to wrap so might as well drop 'this one'"  for
>>some judicious and magical selection of N that may be a decent predictor of
>>wrap on top of some existing reassembly timout.
>>....
> 
> 
> How do you define "larger" in this case? A sender is free to choose 
> any ID - they can't be assumed to be montonic, for sure. 

Actually, I was ass-u-me-ing they would be monotonic, but thinking about it 
more, that doesn't really matter.  If N datagrams from that source/dest/prot 
tuple have been reassembled since the first/current frag of this datagam has 
arrived, chances are still good that the rest of the fragments of this datagram 
are toast and any subsequent fragments with a matching src/dst/prot/id would 
likely create a frankengram.

in broad handwaving terms, those N other datagrams (non-fragmented or fragmented 
and successfully reassembled or not I suspect) are a measure of just how far 
"out of order" the rest of the fragments of this datagram happen to be.  for 
some degree of "out of orderness" we can ass-u-me the datagram is toast.

chosing a strawman, if we've received 24000 datagrams from that source/dest/prot 
(perhaps even just that source) since we've started reassembling this datagram 
from that source/dest/prot, chances seem pretty good indeed that this datagram 
is toast.  a value of N even smaller than 24000 might suffice.

the devil seems to be in the accounting.

rick jones

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 22:12                     ` David S. Miller
@ 2005-05-17 22:23                       ` Rick Jones
  0 siblings, 0 replies; 80+ messages in thread
From: Rick Jones @ 2005-05-17 22:23 UTC (permalink / raw)
  To: netdev; +Cc: netdev-bounce

David S.Miller wrote:
> From: Rick Jones <rick.jones2@hp.com>
> Date: Tue, 17 May 2005 14:25:19 -0700
> 
> 
>>just how much extra overhead would there be to track the interarrival time of ip 
>>datagram fragments and would that allow someone to make a guess as to how long 
>>to reasonably wait for all the fragments to arrive? (or did I miss that being 
>>shot-down already?)
> 
> 
> I spam you with fragments tightly interspaced matching a known
> shost/dhost/ID tuple, lowering your interarrival estimate.  The
> legitimate fragment source can thus never get his fragments in
> before the timer expires.
> 
> Every other one of these IP fragmentation ideas tends to have
> some DoS hole in it.

Are the holes any larger than the existing ones?  I've no idea, and perhaps the 
only answer is indeed to say "Then don't do that (fragment)!"

rick jones

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 22:18                       ` Rick Jones
@ 2005-05-17 22:40                         ` David Stevens
  2005-05-17 23:11                           ` Herbert Xu
  2005-05-17 23:53                           ` Rick Jones
  0 siblings, 2 replies; 80+ messages in thread
From: David Stevens @ 2005-05-17 22:40 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev

> in broad handwaving terms, those N other datagrams (non-fragmented or 
fragmented
> and successfully reassembled or not I suspect) are a measure of just how 
far
> "out of order" the rest of the fragments of this datagram happen to be. 
for
> some degree of "out of orderness" we can ass-u-me the datagram is toast.

> chosing a strawman, if we've received 24000 datagrams from that 
source/dest/prot
> (perhaps even just that source) since we've started reassembling this 
datagram
> from that source/dest/prot, chances seem pretty good indeed that this 
datagram
> is toast.  a value of N even smaller than 24000 might suffice.

> the devil seems to be in the accounting.

        This assumes that you have a per-destination IP ID. If it's 
per-route,
you can send 1 packet to host A, 65534 to host B through the same route, 
and
1 to host A-- wrap on the next received packet, as far as host A is 
concerned.
(even sooner, if it's using randomized ID's or a bigger-than-1 increment).
        Since it's the other side, which need not be Linux, I think
assumptions about how ID's are generated open the possibility of breaking
something that works now.
        I think an estimator for the interarrival time of fragments for
the same packet, per destination, is what you really want here. The
problem is path changes can cause the timeout to change dramatically,
and you don't want something too short to cause you to drop packets
when there was no wrap. But if it's a conservative estimate, it beats
accidental corruption.
        I can do DoS or intentional corruption if I can generate a
fragment as soon as I know the ID (by guessing, or by getting my
fragment in after I've seen the the first one sent), so I'm not
sure that can be fixed, except by using IPsec, if it concerns you. :-)

                                                        +-DLS

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 22:13     ` David S. Miller
@ 2005-05-17 23:08       ` Herbert Xu
  2005-05-17 23:16         ` David S. Miller
  0 siblings, 1 reply; 80+ messages in thread
From: Herbert Xu @ 2005-05-17 23:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: akepner, netdev

On Tue, May 17, 2005 at 03:13:52PM -0700, David S. Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Wed, 18 May 2005 08:11:01 +1000
> 
> > Instead of measuring the distance using time, let's measure it
> > in terms of packet counts.  So every time we receive a fragmented
> > packet, we find all waiting fragments with the same src/dst pair.
> > If the id is identical we perform reassembly, if it isn't we increase
> > a counter in that fragment.  If the counter exceeds a threshold,
> > we drop the fragment.
> 
> And you protect against purposefully built malicious fragments how?

Is it any worse than what we've got now?
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 22:40                         ` David Stevens
@ 2005-05-17 23:11                           ` Herbert Xu
  2005-05-17 23:20                             ` Arthur Kepner
  2005-05-17 23:53                           ` Rick Jones
  1 sibling, 1 reply; 80+ messages in thread
From: Herbert Xu @ 2005-05-17 23:11 UTC (permalink / raw)
  To: David Stevens; +Cc: rick.jones2, netdev

David Stevens <dlstevens@us.ibm.com> wrote:
> 
>        This assumes that you have a per-destination IP ID. If it's 
> per-route,
> you can send 1 packet to host A, 65534 to host B through the same route, 
> and
> 1 to host A-- wrap on the next received packet, as far as host A is 
> concerned.
> (even sooner, if it's using randomized ID's or a bigger-than-1 increment).

Such systems would be violating the spirit of RFC791 which says:

    The identification field is used to distinguish the fragments of one
    datagram from those of another.  The originating protocol module of
    an internet datagram sets the identification field to a value that
    must be unique for that source-destination pair and protocol for the
    time the datagram will be active in the internet system.

Are you aware of any extant systems that do this?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 23:08       ` Herbert Xu
@ 2005-05-17 23:16         ` David S. Miller
  2005-05-17 23:28           ` Herbert Xu
  0 siblings, 1 reply; 80+ messages in thread
From: David S. Miller @ 2005-05-17 23:16 UTC (permalink / raw)
  To: herbert; +Cc: akepner, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 18 May 2005 09:08:33 +1000

> On Tue, May 17, 2005 at 03:13:52PM -0700, David S. Miller wrote:
> > And you protect against purposefully built malicious fragments how?
> 
> Is it any worse than what we've got now?

Good point, in both cases what ends up happening is that
the queue is invalidated.  In the existing case it's usually
because the final UDP or whatever checksum doesn't pass.
With your idea it'd be due to the artificially deflated timeout.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 23:11                           ` Herbert Xu
@ 2005-05-17 23:20                             ` Arthur Kepner
  2005-05-17 23:25                               ` Herbert Xu
  0 siblings, 1 reply; 80+ messages in thread
From: Arthur Kepner @ 2005-05-17 23:20 UTC (permalink / raw)
  To: Herbert Xu; +Cc: dlstevens, rick.jones2, netdev

On Wed, 18 May 2005, Herbert Xu wrote:

> ....
> Such systems would be violating the spirit of RFC791 which says:
> 
>     The identification field is used to distinguish the fragments of one
>     datagram from those of another.  The originating protocol module of
>     an internet datagram sets the identification field to a value that
>     must be unique for that source-destination pair and protocol for the
>     time the datagram will be active in the internet system.
> 
> Are you aware of any extant systems that do this?
> ....

Are you aware of any (new) systems that _don't_ violate this? I 
wouldn't want to own one of them! 

--
Arthur

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 23:20                             ` Arthur Kepner
@ 2005-05-17 23:25                               ` Herbert Xu
  2005-05-17 23:55                                 ` David Stevens
                                                   ` (3 more replies)
  0 siblings, 4 replies; 80+ messages in thread
From: Herbert Xu @ 2005-05-17 23:25 UTC (permalink / raw)
  To: Arthur Kepner; +Cc: dlstevens, rick.jones2, netdev

On Tue, May 17, 2005 at 04:20:07PM -0700, Arthur Kepner wrote:
> On Wed, 18 May 2005, Herbert Xu wrote:
> 
> > ....
> > Such systems would be violating the spirit of RFC791 which says:
> > 
> >     The identification field is used to distinguish the fragments of one
> >     datagram from those of another.  The originating protocol module of
> >     an internet datagram sets the identification field to a value that
> >     must be unique for that source-destination pair and protocol for the
> >     time the datagram will be active in the internet system.
> > 
> > Are you aware of any extant systems that do this?
> > ....
> 
> Are you aware of any (new) systems that _don't_ violate this? I 
> wouldn't want to own one of them! 

Perhaps you misunderstood what I was saying.  I meant are there any
extant systems that would transmit 1 set of fragments to host A with
id x, then 65535 packets host B, and then wrap around and send a new
set of fragments to host A with idx.

Linux will never do this thanks to inetpeer.c.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 23:16         ` David S. Miller
@ 2005-05-17 23:28           ` Herbert Xu
  2005-05-17 23:36             ` Patrick McHardy
  0 siblings, 1 reply; 80+ messages in thread
From: Herbert Xu @ 2005-05-17 23:28 UTC (permalink / raw)
  To: David S. Miller; +Cc: akepner, netdev, Alexey Kuznetsov, Patrick McHardy

On Tue, May 17, 2005 at 04:16:41PM -0700, David S. Miller wrote:
> 
> Good point, in both cases what ends up happening is that
> the queue is invalidated.  In the existing case it's usually
> because the final UDP or whatever checksum doesn't pass.
> With your idea it'd be due to the artificially deflated timeout.

It just occured to me that the optimisation in IPv4/IPv6 that performs
fragmentation after tunnel-mode IPsec is fundamentally broken.  It
makes IPsec vulnerable to fragmentation attacks.

We have to perform fragmentation before tunnel-mode IPsec.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 23:28           ` Herbert Xu
@ 2005-05-17 23:36             ` Patrick McHardy
  2005-05-17 23:41               ` Herbert Xu
  0 siblings, 1 reply; 80+ messages in thread
From: Patrick McHardy @ 2005-05-17 23:36 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, akepner, netdev, Alexey Kuznetsov

Herbert Xu wrote:
> On Tue, May 17, 2005 at 04:16:41PM -0700, David S. Miller wrote:
> 
>>Good point, in both cases what ends up happening is that
>>the queue is invalidated.  In the existing case it's usually
>>because the final UDP or whatever checksum doesn't pass.
>>With your idea it'd be due to the artificially deflated timeout.
> 
> 
> It just occured to me that the optimisation in IPv4/IPv6 that performs
> fragmentation after tunnel-mode IPsec is fundamentally broken.  It
> makes IPsec vulnerable to fragmentation attacks.

You mean vulnerable at reassembly time? Isn't that something reassembly
and policy checks should take care of?

Regards
Patrick

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 23:36             ` Patrick McHardy
@ 2005-05-17 23:41               ` Herbert Xu
  0 siblings, 0 replies; 80+ messages in thread
From: Herbert Xu @ 2005-05-17 23:41 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David S. Miller, akepner, netdev, Alexey Kuznetsov

On Wed, May 18, 2005 at 01:36:47AM +0200, Patrick McHardy wrote:
> 
> You mean vulnerable at reassembly time? Isn't that something reassembly
> and policy checks should take care of?

I mean that it's vulnerable to the following simple DoS attack by
someone who doesn't otherwise have the capability to drop the
packets between the source and the target.

If the IPsec packets arrive as fragments, the attacker only needs
to guess the identity to cause the entire IPsec packet to be dropped.

If it was fragmented prior to IPsec it would not be vulnerable to
this.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 22:40                         ` David Stevens
  2005-05-17 23:11                           ` Herbert Xu
@ 2005-05-17 23:53                           ` Rick Jones
  1 sibling, 0 replies; 80+ messages in thread
From: Rick Jones @ 2005-05-17 23:53 UTC (permalink / raw)
  To: netdev

David Stevens wrote:

> 
> This assumes that you have a per-destination IP ID. If it's per-route, you
> can send 1 packet to host A, 65534 to host B through the same route, and 1 to
> host A-- wrap on the next received packet, as far as host A is concerned. 
> (even sooner, if it's using randomized ID's or a bigger-than-1 increment).

If we were actually looking at the ID's themselves, rather than the count of
datagrams received that would be correct, but someone already pointed-out that
ass-u-me-ing monotonic increasing was not a good thing, so simply count
datagrams completed/recevied on that source/dest pair instead.  Then we don't
really care about the sender's IP ID assignment policy.

If someone wants to hit that with a DoS attack, I'm still wondering if that is a
large DoS hole, (larger than existing ones with spoofing fragments) and the
extent to which it depends on whether the attacker is closer to me than the
sender or "on the other side" of the sender from me.

rick jones

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 23:25                               ` Herbert Xu
@ 2005-05-17 23:55                                 ` David Stevens
  2005-05-18  0:00                                   ` Herbert Xu
  2005-05-18  0:04                                 ` Andi Kleen
                                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 80+ messages in thread
From: David Stevens @ 2005-05-17 23:55 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Arthur Kepner, netdev, rick.jones2

BSD systems, and derivatives, at least 20 years ago, had a single,
global IP ID for all destinations. The requirement was that at least
a single src-dest pair should be unique, which didn't prohibit all IP
ID's generated by the system from being unique. And BSD systems
generated an ID for every IP packet, not just frags, if memory serves.

IP ID wrap wasn't the concern, obviously, since those two pieces make
a wrap on any one destination more likely-- a wrap is for all packets
sent, not just on one interface or to one destination.

Some later systems use randomization of IP ID to make it harder to
inject data into fragments by guessing the ID, which means the entire
IP ID space isn't necessarily consumed before a "wrap" in the
pseudo-random sequence (thus, may happen even sooner than
a simple increment).

I don't know if any recent, common systems use a per-host global
IP ID, but yes, almost every host did 15-20 years ago
(and after RFC 791 :-) ).

                                        +-DLS

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 23:55                                 ` David Stevens
@ 2005-05-18  0:00                                   ` Herbert Xu
  0 siblings, 0 replies; 80+ messages in thread
From: Herbert Xu @ 2005-05-18  0:00 UTC (permalink / raw)
  To: David Stevens; +Cc: Arthur Kepner, netdev, rick.jones2

On Tue, May 17, 2005 at 04:55:20PM -0700, David Stevens wrote:
> 
> I don't know if any recent, common systems use a per-host global
> IP ID, but yes, almost every host did 15-20 years ago
> (and after RFC 791 :-) ).

Unless these dinosaurs have evolved enough to support gigabit then
it's not a problem :)
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 23:25                               ` Herbert Xu
  2005-05-17 23:55                                 ` David Stevens
@ 2005-05-18  0:04                                 ` Andi Kleen
  2005-05-18  0:09                                   ` Herbert Xu
  2005-05-18  0:06                                 ` Nivedita Singhvi
  2005-05-18  1:09                                 ` John Heffner
  3 siblings, 1 reply; 80+ messages in thread
From: Andi Kleen @ 2005-05-18  0:04 UTC (permalink / raw)
  To: Herbert Xu; +Cc: dlstevens, rick.jones2, netdev

Herbert Xu <herbert@gondor.apana.org.au> writes:
>
> Perhaps you misunderstood what I was saying.  I meant are there any
> extant systems that would transmit 1 set of fragments to host A with
> id x, then 65535 packets host B, and then wrap around and send a new
> set of fragments to host A with idx.
>
> Linux will never do this thanks to inetpeer.c.

It will, you just need enough other hosts to thrash inetpeer. How many
you need depends on your available memory.

-Andi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 23:25                               ` Herbert Xu
  2005-05-17 23:55                                 ` David Stevens
  2005-05-18  0:04                                 ` Andi Kleen
@ 2005-05-18  0:06                                 ` Nivedita Singhvi
  2005-05-18  0:10                                   ` Herbert Xu
  2005-05-18  1:09                                 ` John Heffner
  3 siblings, 1 reply; 80+ messages in thread
From: Nivedita Singhvi @ 2005-05-18  0:06 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Arthur Kepner, dlstevens, rick.jones2, netdev

Herbert Xu wrote:

>>>Such systems would be violating the spirit of RFC791 which says:
>>>
>>>    The identification field is used to distinguish the fragments of one
>>>    datagram from those of another.  The originating protocol module of
>>>    an internet datagram sets the identification field to a value that
>>>    must be unique for that source-destination pair and protocol for the
>>>    time the datagram will be active in the internet system.
>>>
>>>Are you aware of any extant systems that do this?
>>>....
>>
>>Are you aware of any (new) systems that _don't_ violate this? I 
>>wouldn't want to own one of them! 
> 
> 
> Perhaps you misunderstood what I was saying.  I meant are there any
> extant systems that would transmit 1 set of fragments to host A with
> id x, then 65535 packets host B, and then wrap around and send a new
> set of fragments to host A with idx.
> 
> Linux will never do this thanks to inetpeer.c.

Actually, it depends on which Linux you are using.

Mainline linux certainly has this (per-inetpeer ip_id) - but
at least one distro did not (use inetpeer) :). Not sure
what the current situation is.

Of course, if all the traffic is on the same connection
(which isn't out of the ordinary) would still come down
to the same thing...

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18  0:04                                 ` Andi Kleen
@ 2005-05-18  0:09                                   ` Herbert Xu
  2005-05-18  0:52                                     ` David S. Miller
  0 siblings, 1 reply; 80+ messages in thread
From: Herbert Xu @ 2005-05-18  0:09 UTC (permalink / raw)
  To: Andi Kleen; +Cc: dlstevens, rick.jones2, netdev

On Wed, May 18, 2005 at 02:04:14AM +0200, Andi Kleen wrote:
> Herbert Xu <herbert@gondor.apana.org.au> writes:
> >
> > Perhaps you misunderstood what I was saying.  I meant are there any
> > extant systems that would transmit 1 set of fragments to host A with
> > id x, then 65535 packets host B, and then wrap around and send a new
> > set of fragments to host A with idx.
> >
> > Linux will never do this thanks to inetpeer.c.
> 
> It will, you just need enough other hosts to thrash inetpeer. How many
> you need depends on your available memory.

Even when the cache entry is deleted, Linux will allocate an ID randomly
so the chance of what was stated above occuring is very small.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18  0:06                                 ` Nivedita Singhvi
@ 2005-05-18  0:10                                   ` Herbert Xu
  2005-05-18  0:51                                     ` David S. Miller
  0 siblings, 1 reply; 80+ messages in thread
From: Herbert Xu @ 2005-05-18  0:10 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: Arthur Kepner, dlstevens, rick.jones2, netdev

On Tue, May 17, 2005 at 05:06:55PM -0700, Nivedita Singhvi wrote:
> 
> Mainline linux certainly has this (per-inetpeer ip_id) - but
> at least one distro did not (use inetpeer) :). Not sure
> what the current situation is.

What was the reason for this? Perhaps we can solve their problems
with inetpeer in a better way than disabling it?

> Of course, if all the traffic is on the same connection
> (which isn't out of the ordinary) would still come down
> to the same thing...

Please see my proposal elsewhere in this thread.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 22:11   ` Herbert Xu
  2005-05-17 22:13     ` David S. Miller
@ 2005-05-18  0:47     ` Thomas Graf
  2005-05-18  1:06       ` Arthur Kepner
  2005-05-18  1:16       ` Herbert Xu
  1 sibling, 2 replies; 80+ messages in thread
From: Thomas Graf @ 2005-05-18  0:47 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, akepner, netdev

* Herbert Xu <E1DYAHF-0006qW-00@gondolin.me.apana.org.au> 2005-05-18 08:11
> Instead of measuring the distance using time, let's measure it
> in terms of packet counts.  So every time we receive a fragmented
> packet, we find all waiting fragments with the same src/dst pair.
> If the id is identical we perform reassembly, if it isn't we increase
> a counter in that fragment.  If the counter exceeds a threshold,
> we drop the fragment.

I like this, although the problem is derived to the definition
of the threshold. Any ideas on how to define this? A
combination of your idea together with the idea I stated in
another post which would additional expire fragments earlier
depending on the actual packet (or fragment) rate might give
better results.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18  0:10                                   ` Herbert Xu
@ 2005-05-18  0:51                                     ` David S. Miller
  2005-05-18  1:05                                       ` Andi Kleen
  2005-05-18  1:13                                       ` Herbert Xu
  0 siblings, 2 replies; 80+ messages in thread
From: David S. Miller @ 2005-05-18  0:51 UTC (permalink / raw)
  To: herbert; +Cc: niv, akepner, dlstevens, rick.jones2, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Subject: Re: [RFC/PATCH] "strict" ipv4 reassembly
Date: Wed, 18 May 2005 10:10:54 +1000

> On Tue, May 17, 2005 at 05:06:55PM -0700, Nivedita Singhvi wrote:
> > 
> > Mainline linux certainly has this (per-inetpeer ip_id) - but
> > at least one distro did not (use inetpeer) :). Not sure
> > what the current situation is.
> 
> What was the reason for this? Perhaps we can solve their problems
> with inetpeer in a better way than disabling it?

Andi Kleen thought inetpeer was a pig, so he removed it from SUSE's
kernel and replaced it with a per-cpu salted IP ID generator.  The
initial verion he wrote had serious bugs that severely decreased the
effective ID space, and thus made the NFS corruption problem happen
more frequently.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18  0:09                                   ` Herbert Xu
@ 2005-05-18  0:52                                     ` David S. Miller
  0 siblings, 0 replies; 80+ messages in thread
From: David S. Miller @ 2005-05-18  0:52 UTC (permalink / raw)
  To: herbert; +Cc: ak, dlstevens, rick.jones2, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Subject: Re: [RFC/PATCH] "strict" ipv4 reassembly
Date: Wed, 18 May 2005 10:09:55 +1000

> On Wed, May 18, 2005 at 02:04:14AM +0200, Andi Kleen wrote:
> > Herbert Xu <herbert@gondor.apana.org.au> writes:
> > It will, you just need enough other hosts to thrash inetpeer. How many
> > you need depends on your available memory.
> 
> Even when the cache entry is deleted, Linux will allocate an ID randomly
> so the chance of what was stated above occuring is very small.

Yes, that's right.  Andi just doesn't like inetpeer, so let's just
move along and accept that. :-)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18  0:51                                     ` David S. Miller
@ 2005-05-18  1:05                                       ` Andi Kleen
  2005-05-18  1:13                                       ` Herbert Xu
  1 sibling, 0 replies; 80+ messages in thread
From: Andi Kleen @ 2005-05-18  1:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: niv, akepner, dlstevens, rick.jones2, netdev

"David S. Miller" <davem@davemloft.net> writes:

> From: Herbert Xu <herbert@gondor.apana.org.au>
> Subject: Re: [RFC/PATCH] "strict" ipv4 reassembly
> Date: Wed, 18 May 2005 10:10:54 +1000
>
>> On Tue, May 17, 2005 at 05:06:55PM -0700, Nivedita Singhvi wrote:
>> > 
>> > Mainline linux certainly has this (per-inetpeer ip_id) - but
>> > at least one distro did not (use inetpeer) :). Not sure
>> > what the current situation is.
>> 
>> What was the reason for this? Perhaps we can solve their problems
>> with inetpeer in a better way than disabling it?
>
> Andi Kleen thought inetpeer was a pig, so he removed it from SUSE's
> kernel and replaced it with a per-cpu salted IP ID generator.  The
> initial verion he wrote had serious bugs that severely decreased the
> effective ID space, and thus made the NFS corruption problem happen
> more frequently.

That's not true, there were no bugs it in. Or at least none
I know about.

However any randomized IPID scheme decreases the effective IP-ID
space slightly. The only algorithm that uses a bit space perfectly
is a counter :) However I admit the constant to regulate the grainness 
was a bit too aggressive at the beginning, which indeed triggered
the NFS corruption problem more frequently. However since the 16bit
space is more or less useless (as Artur demonstrated it cannot
even handle a single Gigabit link) it did not make that much difference anyways.

The eventual workaround for the NFS IP-ID problem that went into
the vendor kernel also did work in a different way on top 
of the algorithm.

As for the background (not for you Dave, for other readers ;-) why I
consider inetpeer useless please read the archives. As a hint just
look what kind of functionality it implements and how much of it is
actually enabled by default and think of its relationship to masquerading
(which BTW breaks most of the fancy algorithms proposed so far) 

-Andi

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18  0:47     ` Thomas Graf
@ 2005-05-18  1:06       ` Arthur Kepner
  2005-05-18  1:16       ` Herbert Xu
  1 sibling, 0 replies; 80+ messages in thread
From: Arthur Kepner @ 2005-05-18  1:06 UTC (permalink / raw)
  To: Rick Jones, Herbert Xu; +Cc: David S. Miller, Thomas Graf, netdev


It sounds as if Herbert and Rick have much the same idea. And 
it sounds like a good idea, definitely worth considering...

On Wed, 18 May 2005, Thomas Graf wrote:

> * Herbert Xu <E1DYAHF-0006qW-00@gondolin.me.apana.org.au> 2005-05-18 08:11
> > Instead of measuring the distance using time, let's measure it
> > in terms of packet counts.  So every time we receive a fragmented
> > packet, we find all waiting fragments with the same src/dst pair.
> > If the id is identical we perform reassembly, if it isn't we increase
> > a counter in that fragment.  If the counter exceeds a threshold,
> > we drop the fragment.
> 
> I like this, although the problem is derived to the definition
> of the threshold. Any ideas on how to define this? A
> combination of your idea together with the idea I stated in
> another post which would additional expire fragments earlier
> depending on the actual packet (or fragment) rate might give
> better results.
> 

------------------------------ [snip] ------------------------------
> On Tue, 17 May 2005, Rick Jones wrote:
> .....
> Actually, I was ass-u-me-ing they would be monotonic, but thinking about 
> it more, that doesn't really matter.  If N datagrams from that 
> source/dest/prot tuple have been reassembled since the first/current 
> frag of this datagam has arrived, chances are still good that the rest 
> of the fragments of this datagram are toast and any subsequent 
> fragments with a matching src/dst/prot/id would likely create a 
> frankengram.
> 
> in broad handwaving terms, those N other datagrams (non-fragmented or 
> fragmented and successfully reassembled or not I suspect) are a measure 
> of just how far "out of order" the rest of the fragments of this 
> datagram happen to be.  for some degree of "out of orderness" we can 
> ass-u-me the datagram is toast.
> 
> chosing a strawman, if we've received 24000 datagrams from that 
> source/dest/prot (perhaps even just that source) since we've started 
> reassembling this datagram from that source/dest/prot, chances seem 
> pretty good indeed that this datagram is toast.  a value of N even 
> smaller than 24000 might suffice.
> 
> the devil seems to be in the accounting.

------------------------------ [snip] ------------------------------

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-17 23:25                               ` Herbert Xu
                                                   ` (2 preceding siblings ...)
  2005-05-18  0:06                                 ` Nivedita Singhvi
@ 2005-05-18  1:09                                 ` John Heffner
  3 siblings, 0 replies; 80+ messages in thread
From: John Heffner @ 2005-05-18  1:09 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, Arthur Kepner, dlstevens, rick.jones2

On May 17, 2005, at 7:25 PM, Herbert Xu wrote:
>
> Perhaps you misunderstood what I was saying.  I meant are there any
> extant systems that would transmit 1 set of fragments to host A with
> id x, then 65535 packets host B, and then wrap around and send a new
> set of fragments to host A with idx.
>
> Linux will never do this thanks to inetpeer.c.

Of course (as usual) NATs break everything. ;-)

There are also the ugly case where fragments could be delayed in the 
network for a period of time, for example during a path change, and 
show up at exactly the wrong time.

   -John

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18  0:51                                     ` David S. Miller
  2005-05-18  1:05                                       ` Andi Kleen
@ 2005-05-18  1:13                                       ` Herbert Xu
  1 sibling, 0 replies; 80+ messages in thread
From: Herbert Xu @ 2005-05-18  1:13 UTC (permalink / raw)
  To: David S. Miller; +Cc: niv, akepner, dlstevens, rick.jones2, netdev

On Tue, May 17, 2005 at 05:51:26PM -0700, David S. Miller wrote:
> 
> Andi Kleen thought inetpeer was a pig, so he removed it from SUSE's
> kernel and replaced it with a per-cpu salted IP ID generator.  The
> initial verion he wrote had serious bugs that severely decreased the
> effective ID space, and thus made the NFS corruption problem happen
> more frequently.

Is he turning off PMTU by default as well? :) I don't see how it
can be a pig otherwise.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18  0:47     ` Thomas Graf
  2005-05-18  1:06       ` Arthur Kepner
@ 2005-05-18  1:16       ` Herbert Xu
  2005-05-18  1:37         ` Thomas Graf
  1 sibling, 1 reply; 80+ messages in thread
From: Herbert Xu @ 2005-05-18  1:16 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, akepner, netdev

On Wed, May 18, 2005 at 02:47:33AM +0200, Thomas Graf wrote:
> 
> I like this, although the problem is derived to the definition
> of the threshold. Any ideas on how to define this? A

The threshold doesn't need to be very large at all since it is
essentially the maximum reordering distance that we allow.

However, for an initial estimate we can be conservative and
use something like 30000.  If we're too conservative the SGI
guys will tell us :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18  1:16       ` Herbert Xu
@ 2005-05-18  1:37         ` Thomas Graf
  2005-05-18  1:52           ` Herbert Xu
  2005-05-18 16:21           ` Rick Jones
  0 siblings, 2 replies; 80+ messages in thread
From: Thomas Graf @ 2005-05-18  1:37 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, akepner, netdev

* Herbert Xu <20050518011632.GA27813@gondor.apana.org.au> 2005-05-18 11:16
> On Wed, May 18, 2005 at 02:47:33AM +0200, Thomas Graf wrote:
> > 
> > I like this, although the problem is derived to the definition
> > of the threshold. Any ideas on how to define this? A
> 
> The threshold doesn't need to be very large at all since it is
> essentially the maximum reordering distance that we allow.
> 
> However, for an initial estimate we can be conservative and
> use something like 30000.  If we're too conservative the SGI
> guys will tell us :)

OK, I initially thought you would head for a much larger
threshold. Not sure if 30000 is large enough for a full
scale NFS server though ;-> You conviced me that my idea
doesn't have any real advantage over yours, it essentialy
gets down the same result but yours is probably easier to
implement and we can also adopt the threshold dynamicaly
depending on the type of system. A big advantage is surely
that it is quite easy to tune this parameter for certain
more difficult cases.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18  1:37         ` Thomas Graf
@ 2005-05-18  1:52           ` Herbert Xu
  2005-05-18 11:30             ` Thomas Graf
  2005-05-18 16:21           ` Rick Jones
  1 sibling, 1 reply; 80+ messages in thread
From: Herbert Xu @ 2005-05-18  1:52 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, akepner, netdev

On Wed, May 18, 2005 at 03:37:12AM +0200, Thomas Graf wrote:
> 
> OK, I initially thought you would head for a much larger
> threshold. Not sure if 30000 is large enough for a full
> scale NFS server though ;-> You conviced me that my idea

I think it's big enough.  If it isn't it means that somebody
has reordered the packets by 30000 which I find hard to
believe :)
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18  1:52           ` Herbert Xu
@ 2005-05-18 11:30             ` Thomas Graf
  2005-05-18 11:40               ` Herbert Xu
  0 siblings, 1 reply; 80+ messages in thread
From: Thomas Graf @ 2005-05-18 11:30 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, akepner, netdev

* Herbert Xu <20050518015213.GB28070@gondor.apana.org.au> 2005-05-18 11:52
> On Wed, May 18, 2005 at 03:37:12AM +0200, Thomas Graf wrote:
> > 
> > OK, I initially thought you would head for a much larger
> > threshold. Not sure if 30000 is large enough for a full
> > scale NFS server though ;-> You conviced me that my idea
> 
> I think it's big enough.  If it isn't it means that somebody
> has reordered the packets by 30000 which I find hard to
> believe :)

I was thinking about some kind of nfs server with huge recv
buffers and increased limits receiving at 50kpps experiencing
a delayed fragment once in a while. Definitely a rare case
but not impossible ;->

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18 11:30             ` Thomas Graf
@ 2005-05-18 11:40               ` Herbert Xu
  2005-05-18 12:24                 ` Thomas Graf
  0 siblings, 1 reply; 80+ messages in thread
From: Herbert Xu @ 2005-05-18 11:40 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, akepner, netdev

On Wed, May 18, 2005 at 01:30:30PM +0200, Thomas Graf wrote:
>
> > I think it's big enough.  If it isn't it means that somebody
> > has reordered the packets by 30000 which I find hard to
> > believe :)
> 
> I was thinking about some kind of nfs server with huge recv
> buffers and increased limits receiving at 50kpps experiencing
> a delayed fragment once in a while. Definitely a rare case
> but not impossible ;->

The worst that can happen if 30000 is too low is that we drop
a packet when we shouldn't.  IMHO if a single host sends us an
incomplete packet, followed by 30000 unrelated fragments, and
finally a fragment of the original packet, then it is only
fair for us to drop that packet.

OTOH if you're arguing that 30000 is too high then you might have
a point.  However, in that respect it cannot be any worse than
what we have now which is essentially unlimited.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18 11:40               ` Herbert Xu
@ 2005-05-18 12:24                 ` Thomas Graf
  0 siblings, 0 replies; 80+ messages in thread
From: Thomas Graf @ 2005-05-18 12:24 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, akepner, netdev

* Herbert Xu <20050518114030.GA16993@gondor.apana.org.au> 2005-05-18 21:40
> On Wed, May 18, 2005 at 01:30:30PM +0200, Thomas Graf wrote:
> >
> > > I think it's big enough.  If it isn't it means that somebody
> > > has reordered the packets by 30000 which I find hard to
> > > believe :)
> > 
> > I was thinking about some kind of nfs server with huge recv
> > buffers and increased limits receiving at 50kpps experiencing
> > a delayed fragment once in a while. Definitely a rare case
> > but not impossible ;->
> 
> The worst that can happen if 30000 is too low is that we drop
> a packet when we shouldn't.  IMHO if a single host sends us an
> incomplete packet, followed by 30000 unrelated fragments, and
> finally a fragment of the original packet, then it is only
> fair for us to drop that packet.

Argueable but I generally agree.

> OTOH if you're arguing that 30000 is too high then you might have
> a point.  However, in that respect it cannot be any worse than
> what we have now which is essentially unlimited.

It's definitely better than what we have now, if the sender
uses a per route or even per interface counter we're on the
losing side anyway but if there are any gbit capable oses present
which do that then it's really their problem. Not sure if it
is small enough for randomly generated ids though.

Anyways, I think that the assumption that at least 50% of the
id space is actually used by the sender and seen by the receiver
is fair enough.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18  1:37         ` Thomas Graf
  2005-05-18  1:52           ` Herbert Xu
@ 2005-05-18 16:21           ` Rick Jones
  2005-05-18 17:40             ` Thomas Graf
  2005-05-18 21:45             ` Herbert Xu
  1 sibling, 2 replies; 80+ messages in thread
From: Rick Jones @ 2005-05-18 16:21 UTC (permalink / raw)
  To: netdev

If we ass-u-me that the sender is indeed using a random IP ID assignment 
mechanism, 30000 is probably too many.  There are only 65536 possible ID's, and 
if we "choose" 30000 of them there will undoubtedly be many duplicated.  Someone 
who didn't fall asleep too often in ProbStats (unlike myself) can probably tell 
us just how many.

Also, I think the count has to be _any_ IP datagram on that src/dst pair, 
fragmented or not.  Someone else pointed-out the possiblity of sending use one 
fragmetned datagram, then 64K to someone else - well, those 64K to someone else 
could just as easily be 64K non-fragmented IP datagrams to us, so it seems for a 
measure of "out of orderness liklihood" we need to include non-fragmented IP 
datagrams.

The thought of having to do added accounting on a non-fragmented datagram seems 
unpleasant though.

rick jones

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18 16:21           ` Rick Jones
@ 2005-05-18 17:40             ` Thomas Graf
  2005-05-18 17:44               ` Thomas Graf
  2005-05-18 21:45             ` Herbert Xu
  1 sibling, 1 reply; 80+ messages in thread
From: Thomas Graf @ 2005-05-18 17:40 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev

* Rick Jones <428B6B72.5010407@hp.com> 2005-05-18 09:21
> If we ass-u-me that the sender is indeed using a random IP ID assignment 
> mechanism, 30000 is probably too many.  There are only 65536 possible ID's, 
> and if we "choose" 30000 of them there will undoubtedly be many duplicated. 
> Someone who didn't fall asleep too often in ProbStats (unlike myself) can 
> probably tell us just how many.

I can't come up with a specific number but some thoughts, the upper
limit of the number can be described as a probability of the complete
id space := P_id_space * P_visible
...where
  P_id_space := size of effective id space, 1.0 for counters, 0.9-1.0 for
                good random schemes.
  P_visible  := how many fragments do we actually see of the effective id space,
                can be described as P_loss + P_scope
  P_loss     := probability of lost fragment-ids
  P_scope    := probability of complete view over P_visible, 1.0 if we're the
                only receiver, decreases with every additional host we share
		the sender's id space with. Also depends on the ration every
		receiver sees.

P_id_space and P_loss can be disregarded but P_scope is hard to determine,
the value can range from nearly zero to 1.0 so we can be optimistic and
chose 0.5 or be paranoid and define it as a very small value. But what is
small? F.e. 0.1 would give us a value around 6K which is nothing on a gbit
link at 40kpps on average. I think there isn't even a big difference between
3K and 30K, both border cases we're worrying about can happen with both
limits.

In a perfect world without any randomly generated ids we could measure
the absolute distance even without being aware of all ids, it might even
be possible to try and differ between random and serial id sequences
and optimize a bit there but in the end we have to find a good
compromise for the random case anyway. Worth some experimentation I guess.

> Also, I think the count has to be _any_ IP datagram on that src/dst pair, 
> fragmented or not.  Someone else pointed-out the possiblity of sending use 
> one fragmetned datagram, then 64K to someone else - well, those 64K to 
> someone else could just as easily be 64K non-fragmented IP datagrams to us, 
> so it seems for a measure of "out of orderness liklihood" we need to 
> include non-fragmented IP datagrams.

Sure, I took herbert's "fragments" as "fragment ids", i.e. the per
fragment counter being the distance of different ids between the arrival
of the fragment and the current position.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18 17:40             ` Thomas Graf
@ 2005-05-18 17:44               ` Thomas Graf
  2005-05-18 21:46                 ` Herbert Xu
  0 siblings, 1 reply; 80+ messages in thread
From: Thomas Graf @ 2005-05-18 17:44 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev

* Thomas Graf <20050518174010.GC15391@postel.suug.ch> 2005-05-18 19:40
> In a perfect world without any randomly generated ids we could measure
> the absolute distance even without being aware of all ids, it might even
> be possible to try and differ between random and serial id sequences
> and optimize a bit there but in the end we have to find a good
> compromise for the random case anyway. Worth some experimentation I guess.

Wild thought: We could introduce a new ip option stating that the id
generator uses a serial approach which would give us the possibility
to measure the absolute distance and resolve this issue in a perfect
matter for everyone supporting this extension. ;->

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18 16:21           ` Rick Jones
  2005-05-18 17:40             ` Thomas Graf
@ 2005-05-18 21:45             ` Herbert Xu
  2005-05-19 12:23               ` Thomas Graf
  1 sibling, 1 reply; 80+ messages in thread
From: Herbert Xu @ 2005-05-18 21:45 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev

Rick Jones <rick.jones2@hp.com> wrote:
> If we ass-u-me that the sender is indeed using a random IP ID assignment 
> mechanism, 30000 is probably too many.  There are only 65536 possible ID's, and 
> if we "choose" 30000 of them there will undoubtedly be many duplicated.  Someone 
> who didn't fall asleep too often in ProbStats (unlike myself) can probably tell 
> us just how many.

IMHO hosts using purely random IDs all the time are fundamentally broken
for applications such as NFS over UDP over gigabit.  However, in order
to handle such hosts we should make this threshold configurable and
then those who need it can set it to a value like 600 which gives a
collision probability with the first fragment of just less than 1%.

> Also, I think the count has to be _any_ IP datagram on that src/dst pair, 
> fragmented or not.  Someone else pointed-out the possiblity of sending use one 
You might be there.  However, we should keep in mind that we're not
trying to come up with a perfect solution to the IP fragmentation
problem.  All we need is something that's good enough to deal with
usages similar to NFS over UDP over gigabit.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18 17:44               ` Thomas Graf
@ 2005-05-18 21:46                 ` Herbert Xu
  2005-05-18 22:24                   ` David Stevens
  0 siblings, 1 reply; 80+ messages in thread
From: Herbert Xu @ 2005-05-18 21:46 UTC (permalink / raw)
  To: Thomas Graf; +Cc: rick.jones2, netdev

Thomas Graf <tgraf@suug.ch> wrote:
> 
> Wild thought: We could introduce a new ip option stating that the id
> generator uses a serial approach which would give us the possibility
> to measure the absolute distance and resolve this issue in a perfect
> matter for everyone supporting this extension. ;->

Well Linux does that anyway (apart from Suse) so all we need to do
is to tell everyone doing NFS over gigabit to use Linux :)
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18 21:46                 ` Herbert Xu
@ 2005-05-18 22:24                   ` David Stevens
  2005-05-18 22:39                     ` Nivedita Singhvi
  2005-05-18 23:31                     ` Thomas Graf
  0 siblings, 2 replies; 80+ messages in thread
From: David Stevens @ 2005-05-18 22:24 UTC (permalink / raw)
  To: Herbert Xu; +Cc: netdev, netdev-bounce, rick.jones2, Thomas Graf

netdev-bounce@oss.sgi.com wrote on 05/18/2005 02:46:54 PM:

> Thomas Graf <tgraf@suug.ch> wrote:
> >
> > Wild thought: We could introduce a new ip option stating that the id
> > generator uses a serial approach which would give us the possibility
> > to measure the absolute distance and resolve this issue in a perfect
> > matter for everyone supporting this extension. ;->

> Well Linux does that anyway (apart from Suse) so all we need to do
> is to tell everyone doing NFS over gigabit to use Linux :)

        If you're going to add an IP option, you can eliminate the
problem entirely. Just add an "extended IP ID" IP option and give
it as many bits as you want-- make that the high order of an n+16-bit
IP ID.
        The IP timestamp option, if done per frag and required to be
the same for all frags, could be used in this way, since you
presumably won't wrap without incrementing that by at least 1. :-)

                                                        +-DLS

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18 22:24                   ` David Stevens
@ 2005-05-18 22:39                     ` Nivedita Singhvi
  2005-05-18 23:31                     ` Thomas Graf
  1 sibling, 0 replies; 80+ messages in thread
From: Nivedita Singhvi @ 2005-05-18 22:39 UTC (permalink / raw)
  To: David Stevens; +Cc: Herbert Xu, netdev, netdev-bounce, rick.jones2, Thomas Graf

David Stevens wrote:
> netdev-bounce@oss.sgi.com wrote on 05/18/2005 02:46:54 PM:
> 
> 
>>Thomas Graf <tgraf@suug.ch> wrote:
>>
>>>Wild thought: We could introduce a new ip option stating that the id
>>>generator uses a serial approach which would give us the possibility
>>>to measure the absolute distance and resolve this issue in a perfect
>>>matter for everyone supporting this extension. ;->
> 
> 
>>Well Linux does that anyway (apart from Suse) so all we need to do
>>is to tell everyone doing NFS over gigabit to use Linux :)
> 
> 
>         If you're going to add an IP option, you can eliminate the
> problem entirely. Just add an "extended IP ID" IP option and give
> it as many bits as you want-- make that the high order of an n+16-bit
> IP ID.
>         The IP timestamp option, if done per frag and required to be
> the same for all frags, could be used in this way, since you
> presumably won't wrap without incrementing that by at least 1. :-)
> 
>                                                         +-DLS

Whatever happened to UDP Path MTU? While we're at this, can't
we start kicking some path-MTU-broken butt?


thanks,
Nivedita

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18 22:24                   ` David Stevens
  2005-05-18 22:39                     ` Nivedita Singhvi
@ 2005-05-18 23:31                     ` Thomas Graf
  1 sibling, 0 replies; 80+ messages in thread
From: Thomas Graf @ 2005-05-18 23:31 UTC (permalink / raw)
  To: David Stevens; +Cc: Herbert Xu, netdev, netdev-bounce, rick.jones2

* David Stevens <OFD80D42F2.31DFC921-ON88257005.007ABBCD-88257005.007B23EA@us.ibm.com> 2005-05-18 15:24
>         If you're going to add an IP option, you can eliminate the
> problem entirely. Just add an "extended IP ID" IP option and give
> it as many bits as you want-- make that the high order of an n+16-bit
> IP ID.

I was thinking of something more nasty such as using as value
for this flag (IP_DF|IP_MF) ;->

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-18 21:45             ` Herbert Xu
@ 2005-05-19 12:23               ` Thomas Graf
  2005-05-19 12:48                 ` Herbert Xu
  2005-05-19 17:02                 ` Rick Jones
  0 siblings, 2 replies; 80+ messages in thread
From: Thomas Graf @ 2005-05-19 12:23 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Rick Jones, netdev

* Herbert Xu <E1DYWM2-0004jM-00@gondolin.me.apana.org.au> 2005-05-19 07:45
> Rick Jones <rick.jones2@hp.com> wrote:
> > If we ass-u-me that the sender is indeed using a random IP ID assignment 
> > mechanism, 30000 is probably too many.  There are only 65536 possible ID's, and 
> > if we "choose" 30000 of them there will undoubtedly be many duplicated.  Someone 
> > who didn't fall asleep too often in ProbStats (unlike myself) can probably tell 
> > us just how many.
> 
> IMHO hosts using purely random IDs all the time are fundamentally broken
> for applications such as NFS over UDP over gigabit.  However, in order
> to handle such hosts we should make this threshold configurable and
> then those who need it can set it to a value like 600 which gives a
> collision probability with the first fragment of just less than 1%.

I agree, however defining a value of 600 system wide is horrible for
all hosts that behave "correctly". So what we could do is take probes
of the id distribution and define the threshold on a per peer scope.

Example: Once in a while we start a probe and set a bit in a bitmap
for every id that matches a defined window. Not sure about the size of
that bitmap yet but 2048 bits might be a good start. The first fragment
id received during the probe defines the lower bound of the window and
the probe lasts until we received 2048 fragment ids. Once the probe
is finished we calculate the ratio between set and unset bits and
calculate the threshold. Additionaly we can define a threshold of bits
set to figure out if the peer is likely to use a counter for id
generation, if so we can set a flag which switches from counting
fragment ids to calculating the delta between the fragment id subject
to expiration and the current last fragment id. So basically we
would have two different modes, the "difficult peer"-mode for peers
using random ids or if we only see a very small portion of the effective
id space and a "counters"-mode for all peers using counters.

The "counters"-mode can offer a nearly perfect solution to detect
wraps without suffering from side effects such as droping perfectly
legal fragments.

This idea relies on the fact that the result of the probe stays
true for some time which might not be true but is probably fair
enough.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-19 12:23               ` Thomas Graf
@ 2005-05-19 12:48                 ` Herbert Xu
  2005-05-19 15:19                   ` Thomas Graf
  2005-05-19 17:02                 ` Rick Jones
  1 sibling, 1 reply; 80+ messages in thread
From: Herbert Xu @ 2005-05-19 12:48 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Rick Jones, netdev

On Thu, May 19, 2005 at 02:23:19PM +0200, Thomas Graf wrote:
> 
> I agree, however defining a value of 600 system wide is horrible for
> all hosts that behave "correctly". So what we could do is take probes
> of the id distribution and define the threshold on a per peer scope.
> 
> Example: Once in a while we start a probe and set a bit in a bitmap
> for every id that matches a defined window. Not sure about the size of
> that bitmap yet but 2048 bits might be a good start. The first fragment

Sorry, but this scheme is way too complex for a problem that only affects
a tiny section of the community.  If you really want to do this then
do it as a static route flag instead of something that the system tries
to auto-detect.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-19 12:48                 ` Herbert Xu
@ 2005-05-19 15:19                   ` Thomas Graf
  0 siblings, 0 replies; 80+ messages in thread
From: Thomas Graf @ 2005-05-19 15:19 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Rick Jones, netdev

* Herbert Xu <20050519124821.GA686@gondor.apana.org.au> 2005-05-19 22:48
> On Thu, May 19, 2005 at 02:23:19PM +0200, Thomas Graf wrote:
> > 
> > I agree, however defining a value of 600 system wide is horrible for
> > all hosts that behave "correctly". So what we could do is take probes
> > of the id distribution and define the threshold on a per peer scope.
> > 
> > Example: Once in a while we start a probe and set a bit in a bitmap
> > for every id that matches a defined window. Not sure about the size of
> > that bitmap yet but 2048 bits might be a good start. The first fragment
> 
> Sorry, but this scheme is way too complex for a problem that only affects
> a tiny section of the community.  If you really want to do this then
> do it as a static route flag instead of something that the system tries
> to auto-detect.

Yes, it's currently quite complex, I'm trying to reduce it to something
simpler. If we do the route flag thing then we should allow the same flag
for permanent neighbours as well.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [RFC/PATCH] "strict" ipv4 reassembly
  2005-05-19 12:23               ` Thomas Graf
  2005-05-19 12:48                 ` Herbert Xu
@ 2005-05-19 17:02                 ` Rick Jones
  1 sibling, 0 replies; 80+ messages in thread
From: Rick Jones @ 2005-05-19 17:02 UTC (permalink / raw)
  To: netdev

> I agree, however defining a value of 600 system wide is horrible for
> all hosts that behave "correctly". So what we could do is take probes
> of the id distribution and define the threshold on a per peer scope.

Why would 600 penalize a host behaving "correctly?"  I mean, what are the 
chances of a datagram's being reassembled, if 600 subsent datagrams have arrived 
from that same host?

rick

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2005-05-19 17:02 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-05-17 16:18 [RFC/PATCH] "strict" ipv4 reassembly Arthur Kepner
2005-05-17 17:49 ` David S. Miller
2005-05-17 18:28   ` Arthur Kepner
2005-05-17 18:48     ` David S. Miller
2005-05-17 20:21       ` Arthur Kepner
2005-05-17 18:38   ` Andi Kleen
2005-05-17 18:45     ` Pekka Savola
2005-05-17 18:50     ` David S. Miller
2005-05-17 18:56       ` Rick Jones
2005-05-17 18:57     ` John Heffner
2005-05-17 19:09       ` David S. Miller
2005-05-17 19:21         ` Rick Jones
2005-05-17 19:26         ` Ben Greear
2005-05-17 20:48         ` Thomas Graf
2005-05-17 19:17       ` Andi Kleen
2005-05-17 19:56         ` David Stevens
2005-05-17 20:17           ` Andi Kleen
2005-05-17 20:22             ` David S. Miller
2005-05-17 20:27               ` Andi Kleen
2005-05-17 21:02                 ` David S. Miller
2005-05-17 21:13                   ` Andi Kleen
2005-05-17 21:24                     ` David S. Miller
2005-05-17 21:25                   ` Rick Jones
2005-05-17 22:06                     ` Arthur Kepner
2005-05-17 22:18                       ` Rick Jones
2005-05-17 22:40                         ` David Stevens
2005-05-17 23:11                           ` Herbert Xu
2005-05-17 23:20                             ` Arthur Kepner
2005-05-17 23:25                               ` Herbert Xu
2005-05-17 23:55                                 ` David Stevens
2005-05-18  0:00                                   ` Herbert Xu
2005-05-18  0:04                                 ` Andi Kleen
2005-05-18  0:09                                   ` Herbert Xu
2005-05-18  0:52                                     ` David S. Miller
2005-05-18  0:06                                 ` Nivedita Singhvi
2005-05-18  0:10                                   ` Herbert Xu
2005-05-18  0:51                                     ` David S. Miller
2005-05-18  1:05                                       ` Andi Kleen
2005-05-18  1:13                                       ` Herbert Xu
2005-05-18  1:09                                 ` John Heffner
2005-05-17 23:53                           ` Rick Jones
2005-05-17 22:12                     ` David S. Miller
2005-05-17 22:23                       ` Rick Jones
2005-05-17 20:29           ` John Heffner
2005-05-17 19:01     ` Nivedita Singhvi
2005-05-17 19:13       ` Rick Jones
2005-05-17 19:25         ` Nivedita Singhvi
2005-05-17 19:31           ` John Heffner
2005-05-17 19:52             ` Nivedita Singhvi
2005-05-17 20:05               ` John Heffner
2005-05-17 20:12                 ` Rick Jones
2005-05-17 19:33           ` Rick Jones
2005-05-17 19:53             ` Andi Kleen
2005-05-17 22:11   ` Herbert Xu
2005-05-17 22:13     ` David S. Miller
2005-05-17 23:08       ` Herbert Xu
2005-05-17 23:16         ` David S. Miller
2005-05-17 23:28           ` Herbert Xu
2005-05-17 23:36             ` Patrick McHardy
2005-05-17 23:41               ` Herbert Xu
2005-05-18  0:47     ` Thomas Graf
2005-05-18  1:06       ` Arthur Kepner
2005-05-18  1:16       ` Herbert Xu
2005-05-18  1:37         ` Thomas Graf
2005-05-18  1:52           ` Herbert Xu
2005-05-18 11:30             ` Thomas Graf
2005-05-18 11:40               ` Herbert Xu
2005-05-18 12:24                 ` Thomas Graf
2005-05-18 16:21           ` Rick Jones
2005-05-18 17:40             ` Thomas Graf
2005-05-18 17:44               ` Thomas Graf
2005-05-18 21:46                 ` Herbert Xu
2005-05-18 22:24                   ` David Stevens
2005-05-18 22:39                     ` Nivedita Singhvi
2005-05-18 23:31                     ` Thomas Graf
2005-05-18 21:45             ` Herbert Xu
2005-05-19 12:23               ` Thomas Graf
2005-05-19 12:48                 ` Herbert Xu
2005-05-19 15:19                   ` Thomas Graf
2005-05-19 17:02                 ` Rick Jones

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).