netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bonding driver unreliable under high CPU load
@ 2002-09-14 16:38 Pascal Brisset
  2002-09-15  3:24 ` Andrew Morton
  0 siblings, 1 reply; 2+ messages in thread
From: Pascal Brisset @ 2002-09-14 16:38 UTC (permalink / raw)
  To: bonding-devel, netdev

I would like to confirm the problem reported by Tony Cureington at
http://sourceforge.net/mailarchive/forum.php?thread_id=1015008&forum_id=2094

Problem: In MII-monitoring mode, when the CPU load is high,
the ethernet bonding driver silently fails to detect dead links.

How to reproduce:
i686, 2.4.19; "modprobe bonding mode=1 miimon=100"; ifenslave two
interfaces; ping while you plug/unplug cables. Bonding will
switch to the available interface, as expected. Now load the CPU
with "while(1) { }", and failover will not work at all anymore.

Explanation:
The bonding driver monitors the state of its slave interfaces by
calling their dev->do_ioctl(SIOCGMIIREG|ETHTOOL_GLINK) from a
timer callback function. Whenever this occurs during a user task,
the get_user() in the ioctl handling code of the slave fails with
-EFAULT because the ifreq struct is allocated in the stack of the
timer function, above 0xC0000000. In that case, the bonding driver
considers the link up by default.

This problem went unnoticed because for most applications, when the
active link dies, the host becomes idle and the monitoring function
gets a chance to run during a kernel thread (in which case it works).
The active-backup switchover is just slower than it should be.
Serious trouble only happens when the active link dies during a long,
CPU-intensive job.

Is anyone working on a fix ? Maybe running the monitoring stuff in
a dedicated task ?

-- Pascal

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-09-15  3:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-09-14 16:38 Bonding driver unreliable under high CPU load Pascal Brisset
2002-09-15  3:24 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).