* gratuitous arps lost during IB switch failure
@ 2010-09-21 23:42 Sumeet Lahorani
2010-09-30 23:23 ` Sumeet
0 siblings, 1 reply; 4+ messages in thread
From: Sumeet Lahorani @ 2010-09-21 23:42 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi All,
We are using dual ported HCAs connected with each port connected to 2
different IB switches so that we can tolerate the failure of any one of
those switches and we are trying to cut down the amount of time it takes
for traffic (TCP & RDS) to resume when there is an IB switch failure and
the hosts failover from one port to the other.
We have the bonding driver configured in active-backup mode and setup to
send out 100 gratuitous arps at intervals of 100ms whenever there is a
failover. In most cases, traffic resumes within a few seconds after a
failover because these gratuitous arps take care of updating all the
nodes with the new IP:GID mapping.
The problem we are seeing is that sometimes, one or more of the nodes on
the fabric do not receive even 1 of these gratuitous arps and
re-establishing communication with these nodes takes a much longer time
(over 40 seconds) as it depends on various arp cache timeouts. Does
anyone know why all these gratuitous arps might be lost?
Besides the gratuitous arp settings, are there any other tunables to
look at to minimize the time it takes for IPoIB traffic to resume?
- Sumeet
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: gratuitous arps lost during IB switch failure
2010-09-21 23:42 gratuitous arps lost during IB switch failure Sumeet Lahorani
@ 2010-09-30 23:23 ` Sumeet
[not found] ` <loom.20101001T010729-594-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Sumeet @ 2010-09-30 23:23 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
It turns out that this problem was being caused because we had multiple IPs
configured on the bonded infiniband interface. It appears that grat. arps are
being sent out for only one of those IPs.
For example, if we configure the bonded interfaces as below, and we trigger a
failover, we only see grat arps go out for the bond0:1 IP. Can the bonding
driver be fixed to send out grat arps for both these IPs?
# cat ifcfg-bond0
DEVICE=bond0
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.50.118
NETMASK=255.255.254.0
NETWORK=192.168.50.0
BROADCAST=192.168.51.255
BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000"
MTU=65520
# cat ifcfg-bond0:1
DEVICE=bond0:1
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.169.50.118
NETMASK=255.255.254.0
NETWORK=192.169.50.0
BROADCAST=192.169.51.255
BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000"
MTU=65520
- Sumeet
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: gratuitous arps lost during IB switch failure
[not found] ` <loom.20101001T010729-594-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
@ 2010-10-02 20:17 ` Or Gerlitz
[not found] ` <AANLkTinwGJq-aRM43ct9_1PrUCkn7QMecPczPpEyD2pQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: Or Gerlitz @ 2010-10-02 20:17 UTC (permalink / raw)
To: Sumeet; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Sumeet <Sumeet.Lahorani-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> It turns out that this problem was being caused because we had multiple IPs
> configured on the bonded infiniband interface. It appears that grat. arps are
> being sent out for only one of those IPs. [...] Can the bonding
> driver be fixed to send out grat arps for both these IPs?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: gratuitous arps lost during IB switch failure
[not found] ` <AANLkTinwGJq-aRM43ct9_1PrUCkn7QMecPczPpEyD2pQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-10-02 20:19 ` Or Gerlitz
0 siblings, 0 replies; 4+ messages in thread
From: Or Gerlitz @ 2010-10-02 20:19 UTC (permalink / raw)
To: Sumeet; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Sumeet <Sumeet.Lahorani-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> It turns out that this problem was being caused because we had multiple IPs
> configured on the bonded infiniband interface. It appears that grat. arps are
> being sent out for only one of those IPs. [...] Can the bonding
> driver be fixed to send out grat arps for both these IPs?
is there anything that makes you think this issue has something to do
with ipoib/bonding? did you check with ethernet? the bonding driver
isn't maintained over linux-rdma but rather over netdev.
Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-10-02 20:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-21 23:42 gratuitous arps lost during IB switch failure Sumeet Lahorani
2010-09-30 23:23 ` Sumeet
[not found] ` <loom.20101001T010729-594-eS7Uydv5nfjZ+VzJOa5vwg@public.gmane.org>
2010-10-02 20:17 ` Or Gerlitz
[not found] ` <AANLkTinwGJq-aRM43ct9_1PrUCkn7QMecPczPpEyD2pQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-10-02 20:19 ` Or Gerlitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox