ARP table question

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* ARP table question
       [not found] ` <491B1841.9050404@candelatech.com>
@ 2008-11-12 19:43   ` Ben Greear
  2008-11-12 22:10     ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2008-11-12 19:43 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: NetDev

I have 500 mac-vlans on a system talking to 500 other
mac-vlans.  My problem is that the arp-table gets extremely
huge because every time an arp-request comes in on all mac-vlans,
a stale arp entry is added for each mac-vlan.  I have filtering
turned on, but that doesn't help because the neigh_event_ns call
below will cause a stale neighbor entry to be created regardless
of whether a replay will be sent or not.

Maybe the neigh_event code should be below the checks for dont_send,
and only create check neigh_event_ns if we are !dont_send?

This is the code I'm talking about, in arp.c (in kernel 2.6.25.15)

	if (arp->ar_op == htons(ARPOP_REQUEST) &&
	    ip_route_input(skb, tip, sip, 0, dev) == 0) {

		rt = (struct rtable*)skb->dst;
		addr_type = rt->rt_type;

		if (addr_type == RTN_LOCAL) {
			n = neigh_event_ns(&arp_tbl, sha, &sip, dev);
				int dont_send = 0;

				if (!dont_send)
					dont_send |= arp_ignore(in_dev,sip,tip);
				if (!dont_send && IN_DEV_ARPFILTER(in_dev))
					dont_send |= arp_filter(sip,tip,dev);
				if (!dont_send)
					arp_send(ARPOP_REPLY,ETH_P_ARP,sip,dev,tip,sha,dev->dev_addr,sha);

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARP table question
  2008-11-12 19:43   ` ARP table question Ben Greear
@ 2008-11-12 22:10     ` Ben Greear
  2008-11-17  3:16       ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2008-11-12 22:10 UTC (permalink / raw)
  To: NetDev; +Cc: Patrick McHardy

[-- Attachment #1: Type: text/plain, Size: 1025 bytes --]

Ben Greear wrote:
> I have 500 mac-vlans on a system talking to 500 other
> mac-vlans.  My problem is that the arp-table gets extremely
> huge because every time an arp-request comes in on all mac-vlans,
> a stale arp entry is added for each mac-vlan.  I have filtering
> turned on, but that doesn't help because the neigh_event_ns call
> below will cause a stale neighbor entry to be created regardless
> of whether a replay will be sent or not.
> 
> Maybe the neigh_event code should be below the checks for dont_send,
> and only create check neigh_event_ns if we are !dont_send?

The attached patch makes it work much better for me.  The patch
will cause the code to NOT create a stale neighbor entry if we
are not going to respond to the ARP request.  The old code
*would* create a stale entry even if we are not going to respond.

This is against 2.6.25.15.

Signed-off-by:  Ben Greear<greearb@candelatech.com>

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


[-- Attachment #2: patch0.patch --]
[-- Type: text/x-patch, Size: 1046 bytes --]

diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 8c16b42..dd454b7 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -872,18 +872,18 @@ static int arp_process(struct sk_buff *skb)
 		addr_type = rt->rt_type;
 
 		if (addr_type == RTN_LOCAL) {
-			n = neigh_event_ns(&arp_tbl, sha, &sip, dev);
-			if (n) {
-				int dont_send = 0;
-
-				if (!dont_send)
-					dont_send |= arp_ignore(in_dev,sip,tip);
-				if (!dont_send && IN_DEV_ARPFILTER(in_dev))
-					dont_send |= arp_filter(sip,tip,dev);
-				if (!dont_send)
-					arp_send(ARPOP_REPLY,ETH_P_ARP,sip,dev,tip,sha,dev->dev_addr,sha);
+			int dont_send = 0;
 
-				neigh_release(n);
+			if (!dont_send)
+				dont_send |= arp_ignore(in_dev,sip,tip);
+			if (!dont_send && IN_DEV_ARPFILTER(in_dev))
+				dont_send |= arp_filter(sip,tip,dev);
+			if (!dont_send) {
+				n = neigh_event_ns(&arp_tbl, sha, &sip, dev);
+				if (n) {
+					arp_send(ARPOP_REPLY,ETH_P_ARP,sip,dev,tip,sha,dev->dev_addr,sha);
+					neigh_release(n);
+				}
 			}
 			goto out;
 		} else if (IN_DEV_FORWARD(in_dev)) {

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: ARP table question
  2008-11-12 22:10     ` Ben Greear
@ 2008-11-17  3:16       ` David Miller
  2008-11-17 18:17         ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2008-11-17  3:16 UTC (permalink / raw)
  To: greearb; +Cc: netdev, kaber

From: Ben Greear <greearb@candelatech.com>
Date: Wed, 12 Nov 2008 14:10:26 -0800

> Ben Greear wrote:
> > I have 500 mac-vlans on a system talking to 500 other
> > mac-vlans.  My problem is that the arp-table gets extremely
> > huge because every time an arp-request comes in on all mac-vlans,
> > a stale arp entry is added for each mac-vlan.  I have filtering
> > turned on, but that doesn't help because the neigh_event_ns call
> > below will cause a stale neighbor entry to be created regardless
> > of whether a replay will be sent or not.
> > Maybe the neigh_event code should be below the checks for dont_send,
> > and only create check neigh_event_ns if we are !dont_send?
> 
> The attached patch makes it work much better for me.  The patch
> will cause the code to NOT create a stale neighbor entry if we
> are not going to respond to the ARP request.  The old code
> *would* create a stale entry even if we are not going to respond.
> 
> This is against 2.6.25.15.
> 
> Signed-off-by:  Ben Greear<greearb@candelatech.com>

This change makes a lot of sense to me, I'll add it to net-next-2.6
so it can cook in there for a while just in case there are some
unwanted side-effects.

Thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARP table question
  2008-11-17  3:16       ` David Miller
@ 2008-11-17 18:17         ` Ben Greear
  2008-11-18  0:33           ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2008-11-17 18:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Patrick McHardy

David Miller wrote:

> This change makes a lot of sense to me, I'll add it to net-next-2.6
> so it can cook in there for a while just in case there are some
> unwanted side-effects.

Thanks Dave.

I think I found another problem as well:  If I start 1 TCP and 1 UDP connection
between each of the 500 interfaces on mac-vlans, the ARP tables will not converge.

It seems to be because mac-vlan has to copy broadcast packets to every
mac-vlan on a physical device, there are just too many packets:

500 vlans arping once per second means 500 pkts per second on the
other NIC.
Other NIC must copy these 500 times,
so, 250000 packets per second in each direction are
processed by the stack (they are not all on the wire, at least).

A few get through and those UDP/TCP connections start consuming
bandwidth, which clogs up the 1G link enough that other responses
are lost most of the time.

I'm going to try to work on some sort of random backoff for ARP that can
be enabled in this situation next.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARP table question
  2008-11-17 18:17         ` Ben Greear
@ 2008-11-18  0:33           ` Ben Greear
  2008-11-18  0:51             ` Rick Jones
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2008-11-18  0:33 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy

[-- Attachment #1: Type: text/plain, Size: 1963 bytes --]

Ben Greear wrote:
> David Miller wrote:
> 
>> This change makes a lot of sense to me, I'll add it to net-next-2.6
>> so it can cook in there for a while just in case there are some
>> unwanted side-effects.
> 
> Thanks Dave.
> 
> I think I found another problem as well:  If I start 1 TCP and 1 UDP 
> connection
> between each of the 500 interfaces on mac-vlans, the ARP tables will not 
> converge.
> 
> It seems to be because mac-vlan has to copy broadcast packets to every
> mac-vlan on a physical device, there are just too many packets:
> 
> 500 vlans arping once per second means 500 pkts per second on the
> other NIC.
> Other NIC must copy these 500 times,
> so, 250000 packets per second in each direction are
> processed by the stack (they are not all on the wire, at least).
> 
> A few get through and those UDP/TCP connections start consuming
> bandwidth, which clogs up the 1G link enough that other responses
> are lost most of the time.
> 
> I'm going to try to work on some sort of random backoff for ARP that can
> be enabled in this situation next.

Ok, here is the patch that implements this.  The idea is to spread out
arp requests when you do something like start 500 TCP connections on 500
MAC-VLANs talking to 500 other MAC-VLANs.

With a retrans timer of 1 sec, and a high volume of traffic, and a semi flaky
network in between, my system will not resolve the ARPs and the retransmits
overload my processors.

Setting the retrans timer to 5 secs on my system also works, so I'm not sure
if this patch is really required, but it might help keep arp requests somewhat
random in cases where arp timers would otherwise try to all fire at the same
time.

This is against 2.6.25.20 plus my patches, but I believe it should
apply to a clean 2.6.25.20 as well.

Comments are welcome.

Signed-Off-By  Ben Greear<greearb@candelatech.com>

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


[-- Attachment #2: neigh_retrans.patch --]
[-- Type: text/x-patch, Size: 4413 bytes --]

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 518ebe6..4c805b3 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -2028,6 +2028,16 @@ Expression of retrans_time, which is deprecated, is in 1/100 seconds (for
 IPv4) or in jiffies (for IPv6).
 Expression of retrans_time_ms is in milliseconds.
 
+
+retrans_rand_backof_ms
+----------------------
+
+This is an extra delay (ms) for the retransmit timer.  A random value between
+0 and retrans_rand_backof_ms will be added to the retrans_timer.  Default
+is zero.  Setting this to a larger value will help large broadcast domains
+resolve ARP (for instance, 500 mac-vlans talking to 500 other mac-vlans).
+
+
 unres_qlen
 ----------
 
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 8dbe468..a45b5df 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -608,6 +608,7 @@ enum {
 	NET_NEIGH_GC_THRESH3=16,
 	NET_NEIGH_RETRANS_TIME_MS=17,
 	NET_NEIGH_REACHABLE_TIME_MS=18,
+	NET_NEIGH_RETRANS_RAND_BACKOFF=19,
 	__NET_NEIGH_MAX
 };
 
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 64a5f01..4947976 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -65,6 +65,7 @@ struct neigh_parms
 	int	proxy_delay;
 	int	proxy_qlen;
 	int	locktime;
+	int	retrans_rand_backoff;
 };
 
 struct neigh_statistics
diff --git a/kernel/sysctl_check.c b/kernel/sysctl_check.c
index 37c8fab..6f3467c 100644
--- a/kernel/sysctl_check.c
+++ b/kernel/sysctl_check.c
@@ -249,6 +249,7 @@ static const struct trans_ctl_table trans_net_neigh_vars_table[] = {
 	{ NET_NEIGH_GC_THRESH3,		"gc_thresh3" },
 	{ NET_NEIGH_RETRANS_TIME_MS,	"retrans_time_ms" },
 	{ NET_NEIGH_REACHABLE_TIME_MS,	"base_reachable_time_ms" },
+	{ NET_NEIGH_RETRANS_RAND_BACKOFF, "retrans_rand_backoff_ms"},
 	{}
 };
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 19b8e00..ec1f048 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -765,6 +765,13 @@ static __inline__ int neigh_max_probes(struct neighbour *n)
 		p->ucast_probes + p->app_probes + p->mcast_probes);
 }
 
+static unsigned long neigh_rand_retry(struct neighbour* neigh) {
+	if (neigh->parms->retrans_rand_backoff) {
+		return net_random() % neigh->parms->retrans_rand_backoff;
+	}
+	return 0;
+}
+
 /* Called when a timer expires for a neighbour entry. */
 
 static void neigh_timer_handler(unsigned long arg)
@@ -820,11 +827,11 @@ static void neigh_timer_handler(unsigned long arg)
 			neigh->nud_state = NUD_PROBE;
 			neigh->updated = jiffies;
 			atomic_set(&neigh->probes, 0);
-			next = now + neigh->parms->retrans_time;
+			next = now + neigh->parms->retrans_time + neigh_rand_retry(neigh);
 		}
 	} else {
 		/* NUD_PROBE|NUD_INCOMPLETE */
-		next = now + neigh->parms->retrans_time;
+		next = now + neigh->parms->retrans_time + neigh_rand_retry(neigh);
 	}
 
 	if ((neigh->nud_state & (NUD_INCOMPLETE | NUD_PROBE)) &&
@@ -2642,6 +2649,14 @@ static struct neigh_sysctl_table {
 			.strategy	= &sysctl_ms_jiffies,
 		},
 		{
+			.ctl_name	= NET_NEIGH_RETRANS_RAND_BACKOFF,
+			.procname	= "retrans_rand_backoff_ms",
+			.maxlen		= sizeof(int),
+			.mode		= 0644,
+			.proc_handler	= &proc_dointvec_ms_jiffies,
+			.strategy	= &sysctl_ms_jiffies,
+		},
+		{
 			.ctl_name	= NET_NEIGH_GC_INTERVAL,
 			.procname	= "gc_interval",
 			.maxlen		= sizeof(int),
@@ -2712,18 +2727,19 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
 	t->neigh_vars[11].data = &p->locktime;
 	t->neigh_vars[12].data  = &p->retrans_time;
 	t->neigh_vars[13].data  = &p->base_reachable_time;
+	t->neigh_vars[14].data  = &p->retrans_rand_backoff;
 
 	if (dev) {
 		dev_name_source = dev->name;
 		neigh_path[NEIGH_CTL_PATH_DEV].ctl_name = dev->ifindex;
 		/* Terminate the table early */
-		memset(&t->neigh_vars[14], 0, sizeof(t->neigh_vars[14]));
+		memset(&t->neigh_vars[15], 0, sizeof(t->neigh_vars[14]));
 	} else {
 		dev_name_source = neigh_path[NEIGH_CTL_PATH_DEV].procname;
-		t->neigh_vars[14].data = (int *)(p + 1);
-		t->neigh_vars[15].data = (int *)(p + 1) + 1;
-		t->neigh_vars[16].data = (int *)(p + 1) + 2;
-		t->neigh_vars[17].data = (int *)(p + 1) + 3;
+		t->neigh_vars[15].data = (int *)(p + 1);
+		t->neigh_vars[16].data = (int *)(p + 1) + 1;
+		t->neigh_vars[17].data = (int *)(p + 1) + 2;
+		t->neigh_vars[18].data = (int *)(p + 1) + 3;
 	}
 
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: ARP table question
  2008-11-18  0:33           ` Ben Greear
@ 2008-11-18  0:51             ` Rick Jones
  2008-11-18  1:23               ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: Rick Jones @ 2008-11-18  0:51 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, Patrick McHardy

Ben Greear wrote:
> Ok, here is the patch that implements this.  The idea is to spread out
> arp requests when you do something like start 500 TCP connections on 500
> MAC-VLANs talking to 500 other MAC-VLANs.
> 
> With a retrans timer of 1 sec, and a high volume of traffic, and a
> semi flaky network in between, my system will not resolve the ARPs
> and the retransmits overload my processors.
> 
> Setting the retrans timer to 5 secs on my system also works, so I'm
> not sure if this patch is really required, but it might help keep arp
> requests somewhat random in cases where arp timers would otherwise
> try to all fire at the same time.
> 
> This is against 2.6.25.20 plus my patches, but I believe it should
> apply to a clean 2.6.25.20 as well.
> 
> Comments are welcome.
> 
> Signed-Off-By  Ben Greear<greearb@candelatech.com>
> 
> Thanks,
> Ben
> 
> 
> 
> ------------------------------------------------------------------------
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 518ebe6..4c805b3 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -2028,6 +2028,16 @@ Expression of retrans_time, which is deprecated, is in 1/100 seconds (for
>  IPv4) or in jiffies (for IPv6).
>  Expression of retrans_time_ms is in milliseconds.
>  
> +
> +retrans_rand_backof_ms
> +----------------------
> +
> +This is an extra delay (ms) for the retransmit timer.  A random value between
> +0 and retrans_rand_backof_ms will be added to the retrans_timer.  Default
> +is zero.  Setting this to a larger value will help large broadcast domains
> +resolve ARP (for instance, 500 mac-vlans talking to 500 other mac-vlans).
> +
> +
>  unres_qlen
>  ----------
> ...
 >
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 19b8e00..ec1f048 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -765,6 +765,13 @@ static __inline__ int neigh_max_probes(struct neighbour *n)
>  		p->ucast_probes + p->app_probes + p->mcast_probes);
>  }
>  
> +static unsigned long neigh_rand_retry(struct neighbour* neigh) {
> +	if (neigh->parms->retrans_rand_backoff) {
> +		return net_random() % neigh->parms->retrans_rand_backoff;
> +	}
> +	return 0;
> +}
> +
>  /* Called when a timer expires for a neighbour entry. */

I thought that mod was something we tried to avoid?  Could you instead 
use something that isn't random but perhaps varies among all the 
requests?  Say some of the low-order bits of the IP being resolved?

It wouldn't necessarily be "fair" to some destination IP's but it should 
serve to spread things out a bit without having to generate a random 
number and mod it.

rick jones

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARP table question
  2008-11-18  0:51             ` Rick Jones
@ 2008-11-18  1:23               ` Ben Greear
  2008-11-18  1:39                 ` Rick Jones
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2008-11-18  1:23 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, Patrick McHardy

Rick Jones wrote:

>> +static unsigned long neigh_rand_retry(struct neighbour* neigh) {
>> +    if (neigh->parms->retrans_rand_backoff) {
>> +        return net_random() % neigh->parms->retrans_rand_backoff;
>> +    }
>> +    return 0;
>> +}
>> +
>>  /* Called when a timer expires for a neighbour entry. */
> 
> I thought that mod was something we tried to avoid?  Could you instead 
> use something that isn't random but perhaps varies among all the 
> requests?  Say some of the low-order bits of the IP being resolved?

This is only called when we are going to retransmit an ARP, which shouldn't
be in any sort of hot path, so I figured MOD was fine.

The net_random is a very cheap method (last I checked), as well.

So, I think that part is OK as it is, but I'm open to
persuasion :)

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARP table question
  2008-11-18  1:23               ` Ben Greear
@ 2008-11-18  1:39                 ` Rick Jones
  2008-11-18  1:50                   ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: Rick Jones @ 2008-11-18  1:39 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev, Patrick McHardy

Ben Greear wrote:
> Rick Jones wrote:
> 
>>> +static unsigned long neigh_rand_retry(struct neighbour* neigh) {
>>> +    if (neigh->parms->retrans_rand_backoff) {
>>> +        return net_random() % neigh->parms->retrans_rand_backoff;
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>>  /* Called when a timer expires for a neighbour entry. */
>>
>>
>> I thought that mod was something we tried to avoid?  Could you instead 
>> use something that isn't random but perhaps varies among all the 
>> requests?  Say some of the low-order bits of the IP being resolved?
> 
> 
> This is only called when we are going to retransmit an ARP, which shouldn't
> be in any sort of hot path, so I figured MOD was fine.
> 
> The net_random is a very cheap method (last I checked), as well.
> 
> So, I think that part is OK as it is, but I'm open to
> persuasion :)

Perhaps I'm confused, or simply channeling Emily Litella again, but if 
you only do this on the 1st through Nth retransmissions (ie after the 
first retransmission timer has popped) don't you still have a thundering 
herd problem on the first transmission and the first retransmission of 
ARP requests?

rick jones

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARP table question
  2008-11-18  1:39                 ` Rick Jones
@ 2008-11-18  1:50                   ` Ben Greear
  2008-11-20  8:33                     ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2008-11-18  1:50 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, Patrick McHardy

Rick Jones wrote:
> Ben Greear wrote:
>> Rick Jones wrote:
>>
>>>> +static unsigned long neigh_rand_retry(struct neighbour* neigh) {
>>>> +    if (neigh->parms->retrans_rand_backoff) {
>>>> +        return net_random() % neigh->parms->retrans_rand_backoff;
>>>> +    }
>>>> +    return 0;
>>>> +}
>>>> +
>>>>  /* Called when a timer expires for a neighbour entry. */
>>>
>>>
>>> I thought that mod was something we tried to avoid?  Could you 
>>> instead use something that isn't random but perhaps varies among all 
>>> the requests?  Say some of the low-order bits of the IP being resolved?
>>
>>
>> This is only called when we are going to retransmit an ARP, which 
>> shouldn't
>> be in any sort of hot path, so I figured MOD was fine.
>>
>> The net_random is a very cheap method (last I checked), as well.
>>
>> So, I think that part is OK as it is, but I'm open to
>> persuasion :)
> 
> Perhaps I'm confused, or simply channeling Emily Litella again, but if 
> you only do this on the 1st through Nth retransmissions (ie after the 
> first retransmission timer has popped) don't you still have a thundering 
> herd problem on the first transmission and the first retransmission of 
> ARP requests?

You'd certainly have it on the first transmission, but I think from there on
the randomness should kick in.  This is a pretty rare case, and I'd rather
not slow down the initial ARP.  If we *are* in the overload situation, then
the network can just purge/drop/whatever the initial flood and then the
retransmits should start doing their random thing.  On my system, it still
takes maybe 30 seconds for all the ARPs to resolve since a good deal of
the requests and/or responses are being lost.

After some more testing, I can still get it into a bad
state if I have a retrans timer of 1 sec and a randomness of 5 secs
and manage to cause all 1000 arp entries to go stale at once (by
yanking a cable, for instance).

It seems I have to bump up the base timer to 3-5 seconds (I'm
leaving the random backoff at 5 secs as well).

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARP table question
  2008-11-18  1:50                   ` Ben Greear
@ 2008-11-20  8:33                     ` David Miller
  2008-11-20 17:23                       ` Ben Greear
  0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2008-11-20  8:33 UTC (permalink / raw)
  To: greearb; +Cc: rick.jones2, netdev, kaber

From: Ben Greear <greearb@candelatech.com>
Date: Mon, 17 Nov 2008 17:50:50 -0800

> Rick Jones wrote:
> > Ben Greear wrote:
> >> Rick Jones wrote:
> >>
> >>>> +static unsigned long neigh_rand_retry(struct neighbour* neigh) {
> >>>> +    if (neigh->parms->retrans_rand_backoff) {
> >>>> +        return net_random() % neigh->parms->retrans_rand_backoff;
> >>>> +    }
> >>>> +    return 0;
> >>>> +}
> >>>> +
> >>>>  /* Called when a timer expires for a neighbour entry. */
> >>>
> >>>
> >>> I thought that mod was something we tried to avoid?  Could you instead use something that isn't random but perhaps varies among all the requests?  Say some of the low-order bits of the IP being resolved?
> >>
> >>
> >> This is only called when we are going to retransmit an ARP, which shouldn't
> >> be in any sort of hot path, so I figured MOD was fine.
> >>
> >> The net_random is a very cheap method (last I checked), as well.
> >>
> >> So, I think that part is OK as it is, but I'm open to
> >> persuasion :)
> > Perhaps I'm confused, or simply channeling Emily Litella again, but if you only do this on the 1st through Nth retransmissions (ie after the first retransmission timer has popped) don't you still have a thundering herd problem on the first transmission and the first retransmission of ARP requests?
> 
> You'd certainly have it on the first transmission, but I think from there on
> the randomness should kick in.  This is a pretty rare case, and I'd rather
> not slow down the initial ARP.  If we *are* in the overload situation, then
> the network can just purge/drop/whatever the initial flood and then the
> retransmits should start doing their random thing.  On my system, it still
> takes maybe 30 seconds for all the ARPs to resolve since a good deal of
> the requests and/or responses are being lost.
> 
> After some more testing, I can still get it into a bad
> state if I have a retrans timer of 1 sec and a randomness of 5 secs
> and manage to cause all 1000 arp entries to go stale at once (by
> yanking a cable, for instance).
> 
> It seems I have to bump up the base timer to 3-5 seconds (I'm
> leaving the random backoff at 5 secs as well).

This scheme still seems hackish to me, so I'm going to defer on this
for now.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARP table question
  2008-11-20  8:33                     ` David Miller
@ 2008-11-20 17:23                       ` Ben Greear
  2008-11-20 17:33                         ` Benjamin LaHaise
  0 siblings, 1 reply; 12+ messages in thread
From: Ben Greear @ 2008-11-20 17:23 UTC (permalink / raw)
  To: David Miller; +Cc: rick.jones2, netdev, kaber

David Miller wrote:

>> After some more testing, I can still get it into a bad
>> state if I have a retrans timer of 1 sec and a randomness of 5 secs
>> and manage to cause all 1000 arp entries to go stale at once (by
>> yanking a cable, for instance).
>>
>> It seems I have to bump up the base timer to 3-5 seconds (I'm
>> leaving the random backoff at 5 secs as well).
> 
> This scheme still seems hackish to me, so I'm going to defer on this
> for now.

You think something like an exponential backoff capped at some user-configurable
max-value would be better?


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ARP table question
  2008-11-20 17:23                       ` Ben Greear
@ 2008-11-20 17:33                         ` Benjamin LaHaise
  0 siblings, 0 replies; 12+ messages in thread
From: Benjamin LaHaise @ 2008-11-20 17:33 UTC (permalink / raw)
  To: Ben Greear; +Cc: David Miller, rick.jones2, netdev, kaber

On Thu, Nov 20, 2008 at 09:23:53AM -0800, Ben Greear wrote:
> You think something like an exponential backoff capped at some 
> user-configurable
> max-value would be better?

I'll throw in an observation on arp behaviour on wifi / HomePNA: neither 
protocol provides reliable delivery of broadcast traffic, while point to 
point traffic is reliably delivered.  If arp traffic is not sufficiently 
aggressive when a connection is first used, the user can end up waiting 
some time until one of the broadcast packets finally gets through.  Doing 
an exponential backoff will make this significantly worse, unless the 
initial timeout is sufficiently small.

		-ben
-- 
"Time is of no importance, Mr. President, only life is important."
Don't Email: <zyntrop@kvack.org>.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-11-20 17:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <491B1600.4080505@candelatech.com>
     [not found] ` <491B1841.9050404@candelatech.com>
2008-11-12 19:43   ` ARP table question Ben Greear
2008-11-12 22:10     ` Ben Greear
2008-11-17  3:16       ` David Miller
2008-11-17 18:17         ` Ben Greear
2008-11-18  0:33           ` Ben Greear
2008-11-18  0:51             ` Rick Jones
2008-11-18  1:23               ` Ben Greear
2008-11-18  1:39                 ` Rick Jones
2008-11-18  1:50                   ` Ben Greear
2008-11-20  8:33                     ` David Miller
2008-11-20 17:23                       ` Ben Greear
2008-11-20 17:33                         ` Benjamin LaHaise

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).