netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Question regarding failure utilizing bonding mode 5 (balance-tlb)
@ 2013-08-01  5:12 Yuval Mintz
  2013-08-02  3:09 ` Jay Vosburgh
  0 siblings, 1 reply; 9+ messages in thread
From: Yuval Mintz @ 2013-08-01  5:12 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Ariel Elior

We've had reports that load/unload tests using bonding driver in
balance-tlb mode over bnx2x interfaces results in loss of traffic.

When investigating, we've found out that the bonding driver uses the ndo
(ndo_change_mac_addr()) during ifenslave to override the slaves' HW MAC
address. It then directly goes and changes the slaves netdevices'
dev_addr so that each network interface would posses a distinguish MAC
address (as seen in ifconfig), while the FW/HW of both interfaces is
still configured by the MAC passed by the ndo.

When the active slave is unloaded, the ifconfig MAC (dev_addr) is
swapped between the slaves directly, i.e., without calling the ndo. Once
the interface of the previously active slave will be reloaded, it will
configure it's HW MAC according to that dev_addr value  (i.e., the
bonding driver takes no additional measures to force it's own MAC on the
interface when re-loading), causing it to have a configured MAC which
differs from the one that is held by the bonding driver.

If this is done an additional time (on the newly active slave), both
slave devices will be configured to a MAC which differs from the one
held by the bond interface (i.e., the bond interface holds the MAC of
the original active slave, while both interfaces configured the MAC of
the original inactive slave). This obviously prevents any traffic from
being successfully sent/received.

bnx2x uses dev_addr directly for MAC configuration, which I think is the
default behaviour for most network drivers - ixgbe has a shadow value
which it uses instead, but I think that's the exception and not the
rule.

As I see it, either:

   1. The bonding driver is flawed in balance-tlb mode and should be
fixed.

   2. bnx2x's behaviour is flawed - it should have some persistent
shadow MAC which should contain the last MAC set - either factory value
or what was configured by the ndo, and use it instead of dev_addr when
configuring the HW MAC.
This would probably indicate that other drivers are flawed as well.

   3. The test itself is flawed, since user should not unload slave
interfaces.

What's the correct approach for fixing the issue?
Idea's will be welcomed.

Thanks,
Yuval

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-10-01 13:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-01  5:12 Question regarding failure utilizing bonding mode 5 (balance-tlb) Yuval Mintz
2013-08-02  3:09 ` Jay Vosburgh
2013-08-02 20:16   ` Yuval Mintz
2013-08-02 20:53     ` Jay Vosburgh
2013-08-03  7:47       ` Yuval Mintz
2013-09-30 11:30         ` Yuval Mintz
2013-09-30 21:24           ` Veaceslav Falico
2013-10-01 12:56             ` Yuval Mintz
2013-10-01 12:58               ` Veaceslav Falico

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).