public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* gratuitous arps lost during IB switch failure
@ 2010-09-21 23:42 Sumeet Lahorani
  2010-09-30 23:23 ` Sumeet
  0 siblings, 1 reply; 4+ messages in thread
From: Sumeet Lahorani @ 2010-09-21 23:42 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA


Hi All,

We are using dual ported HCAs connected with each port connected to 2 
different IB switches so that we can tolerate the failure of any one of 
those switches and we are trying to cut down the amount of time it takes 
for traffic (TCP & RDS) to resume when there is an IB switch failure and 
the hosts failover from one port to the other.

We have the bonding driver configured in active-backup mode and setup to 
send out 100 gratuitous arps at intervals of 100ms whenever there is a 
failover. In most cases, traffic resumes within a few seconds after a 
failover because these gratuitous arps take care of updating all the 
nodes with the new IP:GID mapping.

The problem we are seeing is that sometimes, one or more of the nodes on 
the fabric do not receive even 1 of these gratuitous arps and 
re-establishing communication with these nodes takes a much longer time 
(over 40 seconds) as it depends on various arp cache timeouts. Does 
anyone know why all these gratuitous arps might be lost?

Besides the gratuitous arp settings, are there any other tunables to 
look at to minimize the time it takes for IPoIB traffic to resume?

- Sumeet

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-10-02 20:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-21 23:42 gratuitous arps lost during IB switch failure Sumeet Lahorani
2010-09-30 23:23 ` Sumeet
     [not found]   ` <loom.20101001T010729-594-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2010-10-02 20:17     ` Or Gerlitz
     [not found]       ` <AANLkTinwGJq-aRM43ct9_1PrUCkn7QMecPczPpEyD2pQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-10-02 20:19         ` Or Gerlitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox