* [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
@ 2005-04-07 19:59 Jay Vosburgh
2005-04-07 20:31 ` David S. Miller
2005-04-08 3:50 ` Jeff Garzik
0 siblings, 2 replies; 17+ messages in thread
From: Jay Vosburgh @ 2005-04-07 19:59 UTC (permalink / raw)
To: netdev; +Cc: davem
This patch backs out some of the calls to dev_set_mac_address
and replaces them with calls to a similar function that does not call
notifier_call_chain.
The reason for this is that the rtnetlink event handler and its
descendents make GFP_KERNEL memory allocation requests, and the bonding
driver makes some of its MAC address change calls from timer context
with a lock held (notably the ALB mode).
Rearranging the bonding driver to not call this way is a fairly
involved change; this patch merely reverts one part of bonding to the
way it used to be.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
diff -ur linux-2.6.12-rc2-virgin/drivers/net/bonding/bond_alb.c linux-2.6.12-rc2-setmac/drivers/net/bonding/bond_alb.c
--- linux-2.6.12-rc2-virgin/drivers/net/bonding/bond_alb.c 2005-04-05 22:04:27.000000000 -0700
+++ linux-2.6.12-rc2-setmac/drivers/net/bonding/bond_alb.c 2005-04-07 12:25:02.000000000 -0700
@@ -138,6 +138,23 @@
return hash;
}
+/*
+ * Stopgap that doesn't notifier_call_chain like real dev_set_mac_address,
+ * as rtnetlink event notifier makes sleepable memory allocations, and we
+ * call this with a lock held.
+ */
+static int bond_dev_set_mac_address(struct net_device *dev, struct sockaddr *sa)
+{
+ if (!dev->set_mac_address)
+ return -EOPNOTSUPP;
+ if (sa->sa_family != dev->type)
+ return -EINVAL;
+ if (!netif_device_present(dev))
+ return -ENODEV;
+ return dev->set_mac_address(dev, sa);
+}
+
+
/*********************** tlb specific functions ***************************/
static inline void _lock_tx_hashtbl(struct bonding *bond)
@@ -955,7 +972,7 @@
/* each slave will receive packets destined to a different mac */
memcpy(s_addr.sa_data, addr, dev->addr_len);
s_addr.sa_family = dev->type;
- if (dev_set_mac_address(dev, &s_addr)) {
+ if (bond_dev_set_mac_address(dev, &s_addr)) {
printk(KERN_ERR DRV_NAME
": Error: dev_set_mac_address of dev %s failed! ALB "
"mode requires that the base driver support setting "
@@ -1210,7 +1227,7 @@
/* save net_device's current hw address */
memcpy(tmp_addr, slave->dev->dev_addr, ETH_ALEN);
- res = dev_set_mac_address(slave->dev, addr);
+ res = bond_dev_set_mac_address(slave->dev, addr);
/* restore net_device's hw address */
memcpy(slave->dev->dev_addr, tmp_addr, ETH_ALEN);
@@ -1230,7 +1247,7 @@
stop_at = slave;
bond_for_each_slave_from_to(bond, slave, i, bond->first_slave, stop_at) {
memcpy(tmp_addr, slave->dev->dev_addr, ETH_ALEN);
- dev_set_mac_address(slave->dev, &sa);
+ bond_dev_set_mac_address(slave->dev, &sa);
memcpy(slave->dev->dev_addr, tmp_addr, ETH_ALEN);
}
Only in linux-2.6.12-rc2-setmac/drivers/net/bonding/: bond_alb.o
Only in linux-2.6.12-rc2-setmac/drivers/net/bonding/: bond_main.o
Only in linux-2.6.12-rc2-setmac/drivers/net/bonding/: bonding.ko
Only in linux-2.6.12-rc2-setmac/drivers/net/bonding/: bonding.mod.c
Only in linux-2.6.12-rc2-setmac/drivers/net/bonding/: bonding.mod.o
Only in linux-2.6.12-rc2-setmac/drivers/net/bonding/: bonding.o
Only in linux-2.6.12-rc2-setmac/drivers/net/bonding/: built-in.o
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
2005-04-07 19:59 [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address Jay Vosburgh
@ 2005-04-07 20:31 ` David S. Miller
2005-04-07 20:55 ` Jay Vosburgh
2005-04-08 3:50 ` Jeff Garzik
1 sibling, 1 reply; 17+ messages in thread
From: David S. Miller @ 2005-04-07 20:31 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: netdev, jgarzik
On Thu, 07 Apr 2005 12:59:35 -0700
Jay Vosburgh <fubar@us.ibm.com> wrote:
> This patch backs out some of the calls to dev_set_mac_address
> and replaces them with calls to a similar function that does not call
> notifier_call_chain.
>
> The reason for this is that the rtnetlink event handler and its
> descendents make GFP_KERNEL memory allocation requests, and the bonding
> driver makes some of its MAC address change calls from timer context
> with a lock held (notably the ALB mode).
>
> Rearranging the bonding driver to not call this way is a fairly
> involved change; this patch merely reverts one part of bonding to the
> way it used to be.
You can't remove that notifier call, you will break ipv4 ARP,
ipv6 neighbour discovery, and bridging if you do that.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
2005-04-07 20:31 ` David S. Miller
@ 2005-04-07 20:55 ` Jay Vosburgh
2005-04-07 20:57 ` David S. Miller
0 siblings, 1 reply; 17+ messages in thread
From: Jay Vosburgh @ 2005-04-07 20:55 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, jgarzik
David S. Miller <davem@davemloft.net> wrote:
>On Thu, 07 Apr 2005 12:59:35 -0700
>Jay Vosburgh <fubar@us.ibm.com> wrote:
[...]
>> Rearranging the bonding driver to not call this way is a fairly
>> involved change; this patch merely reverts one part of bonding to the
>> way it used to be.
>
>You can't remove that notifier call, you will break ipv4 ARP,
>ipv6 neighbour discovery, and bridging if you do that.
Only as they relate to bonding slave device MAC changes in the
ALB and TLB modes (the "no notifier" version is only called by MAC
changes that occur in those circumstances, all of which are in
bond_alb.c).
Until a month ago, none of the bonding calls that change MAC
addresses issued notifiers, this reverts to that behavior only for ALB
and TLB modes. Is that worse than going forward with a potential sleep
from a timer context with a lock held?
Another alternative would be to change the rtnetlink handler to
not use a sleepable memory allocation, e.g.,
--- linux-2.6.12-rc2-virgin/net/core/rtnetlink.c 2005-03-03 15:53:48.000000000 -0800
+++ linux-2.6.12-rc2-setmac/net/core/rtnetlink.c 2005-04-07 09:30:03.000000000 -0700
@@ -441,7 +441,7 @@
sizeof(struct rtnl_link_ifmap) +
sizeof(struct rtnl_link_stats) + 128);
- skb = alloc_skb(size, GFP_KERNEL);
+ skb = alloc_skb(size, GFP_ATOMIC);
if (!skb)
return;
@@ -450,7 +450,7 @@
return;
}
NETLINK_CB(skb).dst_groups = RTMGRP_LINK;
- netlink_broadcast(rtnl, skb, 0, RTMGRP_LINK, GFP_KERNEL);
+ netlink_broadcast(rtnl, skb, 0, RTMGRP_LINK, GFP_ATOMIC);
}
static int rtnetlink_done(struct netlink_callback *cb)
My presumption is that the above would be unacceptable, if for
no other reason than other notifiers could be attached that also make
sleepable memory allocations.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
2005-04-07 20:55 ` Jay Vosburgh
@ 2005-04-07 20:57 ` David S. Miller
2005-04-07 21:35 ` Jay Vosburgh
0 siblings, 1 reply; 17+ messages in thread
From: David S. Miller @ 2005-04-07 20:57 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: netdev, jgarzik
On Thu, 07 Apr 2005 13:55:33 -0700
Jay Vosburgh <fubar@us.ibm.com> wrote:
> Another alternative would be to change the rtnetlink handler to
> not use a sleepable memory allocation, e.g.,
...
> My presumption is that the above would be unacceptable, if for
> no other reason than other notifiers could be attached that also make
> sleepable memory allocations.
You could change it instead to just use gfp_any().
Would that work? The problematic case occurs from softirq
not hardirq right?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
2005-04-07 20:57 ` David S. Miller
@ 2005-04-07 21:35 ` Jay Vosburgh
2005-04-08 12:36 ` Herbert Xu
0 siblings, 1 reply; 17+ messages in thread
From: Jay Vosburgh @ 2005-04-07 21:35 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, jgarzik
David S. Miller <davem@davemloft.net> wrote:
>> My presumption is that the above would be unacceptable, if for
>> no other reason than other notifiers could be attached that also make
>> sleepable memory allocations.
>
>You could change it instead to just use gfp_any().
>
>Would that work? The problematic case occurs from softirq
>not hardirq right?
Yes, that works, and yes, the troublesome calls come from
softirq (timers). In that case, the rtnetlink patch would be:
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
--- linux-2.6.12-rc2-virgin/net/core/rtnetlink.c 2005-03-03 15:53:48.000000000 -0800
+++ linux-2.6.12-rc2-setmac/net/core/rtnetlink.c 2005-04-07 14:05:29.000000000 -0700
@@ -441,7 +441,7 @@
sizeof(struct rtnl_link_ifmap) +
sizeof(struct rtnl_link_stats) + 128);
- skb = alloc_skb(size, GFP_KERNEL);
+ skb = alloc_skb(size, gfp_any());
if (!skb)
return;
@@ -450,7 +450,7 @@
return;
}
NETLINK_CB(skb).dst_groups = RTMGRP_LINK;
- netlink_broadcast(rtnl, skb, 0, RTMGRP_LINK, GFP_KERNEL);
+ netlink_broadcast(rtnl, skb, 0, RTMGRP_LINK, gfp_any());
}
static int rtnetlink_done(struct netlink_callback *cb)
This doesn't do anything about other event handlers that might
also have potential sleeps.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
2005-04-07 19:59 [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address Jay Vosburgh
2005-04-07 20:31 ` David S. Miller
@ 2005-04-08 3:50 ` Jeff Garzik
2005-04-08 4:45 ` David S. Miller
1 sibling, 1 reply; 17+ messages in thread
From: Jeff Garzik @ 2005-04-08 3:50 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: netdev, davem
Jay Vosburgh wrote:
> This patch backs out some of the calls to dev_set_mac_address
> and replaces them with calls to a similar function that does not call
> notifier_call_chain.
>
> The reason for this is that the rtnetlink event handler and its
> descendents make GFP_KERNEL memory allocation requests, and the bonding
> driver makes some of its MAC address change calls from timer context
> with a lock held (notably the ALB mode).
>
> Rearranging the bonding driver to not call this way is a fairly
> involved change; this patch merely reverts one part of bonding to the
> way it used to be.
Do I need to forward this for -rc3?
Jeff
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
2005-04-08 3:50 ` Jeff Garzik
@ 2005-04-08 4:45 ` David S. Miller
0 siblings, 0 replies; 17+ messages in thread
From: David S. Miller @ 2005-04-08 4:45 UTC (permalink / raw)
To: Jeff Garzik; +Cc: fubar, netdev
On Thu, 07 Apr 2005 23:50:01 -0400
Jeff Garzik <jgarzik@pobox.com> wrote:
> Jay Vosburgh wrote:
> > This patch backs out some of the calls to dev_set_mac_address
> > and replaces them with calls to a similar function that does not call
> > notifier_call_chain.
> >
> > The reason for this is that the rtnetlink event handler and its
> > descendents make GFP_KERNEL memory allocation requests, and the bonding
> > driver makes some of its MAC address change calls from timer context
> > with a lock held (notably the ALB mode).
> >
> > Rearranging the bonding driver to not call this way is a fairly
> > involved change; this patch merely reverts one part of bonding to the
> > way it used to be.
>
> Do I need to forward this for -rc3?
No, read the rest of the thread, we're discussing an alternative
version of the fix.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
2005-04-07 21:35 ` Jay Vosburgh
@ 2005-04-08 12:36 ` Herbert Xu
2005-04-08 20:58 ` Jay Vosburgh
0 siblings, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2005-04-08 12:36 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: davem, netdev, jgarzik
Jay Vosburgh <fubar@us.ibm.com> wrote:
>
> Yes, that works, and yes, the troublesome calls come from
> softirq (timers). In that case, the rtnetlink patch would be:
This should work.
However, this is a bit of an eye sore as it is the only place
where rtmsg_ifinfo is called from softirq context. In fact,
this is probably the only place where CHANGEADDR (and perhaps
even call_netdevice_notifiers) is called from softirq context.
It just so happens that all the CHANGEADDR handlers were coded to be
on the safe side and this seems to be the only place where it sleeps.
Nevertheless, I'd like to ask the possibility for bond to not do
this from softirq context.
I looked at bond_alb.c briefly and it appears that the MAC addresses
aren't actually set from the timers. It is however set in user
context with spin locks held to guard against the timer.
I looked at one of those places: alb_handle_addr_collision_on_attach.
As far as I can see, there is no need for the address change to occur
under the write_lock.
It would work just as well if that function returned the MAC address
to be set and performed the actual setting after dropping the lock.
So if you have time, could you please comb through the bonding driver
and let us know if it is possible to move the setting of the MAC
address outside atomic areas.
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
2005-04-08 12:36 ` Herbert Xu
@ 2005-04-08 20:58 ` Jay Vosburgh
2005-04-08 22:16 ` Herbert Xu
0 siblings, 1 reply; 17+ messages in thread
From: Jay Vosburgh @ 2005-04-08 20:58 UTC (permalink / raw)
To: Herbert Xu; +Cc: davem, netdev, jgarzik
Herbert Xu <herbert@gondor.apana.org.au> wrote:
[...]
>However, this is a bit of an eye sore as it is the only place
>where rtmsg_ifinfo is called from softirq context. In fact,
>this is probably the only place where CHANGEADDR (and perhaps
>even call_netdevice_notifiers) is called from softirq context.
I looked as well, and didn't find anywhere else doing the netdev
related notifiers from softirq.
[...]
>Nevertheless, I'd like to ask the possibility for bond to not do
>this from softirq context.
>
>I looked at bond_alb.c briefly and it appears that the MAC addresses
>aren't actually set from the timers. It is however set in user
>context with spin locks held to guard against the timer.
There are several paths from a timer context, e.g.,
timer -> bond_mii_monitor -> bond_change_active_slave ->
bond_alb_handle_active_change -> alb_set_slave_mac_addr
There are also calls that go through bond_select_active_slave
(then on to bond_change_active_slave). In all of these cases, the path
is taken when a link failure is detected.
Running the timers function from a work queue instead (to get a
sleepable context) is technically possible. I'm not sure how much
overhead that would add (given that the link monitor timer typically
runs on intervals in the 10 - 50 ms range). Getting the locks out is a
little trickier; I've only looked at it briefly and don't see a simple
way to do it. Well, at least that isn't, shall we say, less than
elegant: e.g., a work queue out of line call that just changes the MAC
address. I'm not sure what extra delay that kind of scheme would add to
the failover time for the link.
>I looked at one of those places: alb_handle_addr_collision_on_attach.
>As far as I can see, there is no need for the address change to occur
>under the write_lock.
That case, yes, could be rearranged to do the assignment outside
the lock without too much pain.
>So if you have time, could you please comb through the bonding driver
>and let us know if it is possible to move the setting of the MAC
>address outside atomic areas.
As I said above, the timer case doesn't have an obviously good
way to change it to do the MAC sets in a sleepable context with no
locks. I also wonder how expensive it is to run work queue events every
10ms as compared to running timer events every 10ms?
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
2005-04-08 20:58 ` Jay Vosburgh
@ 2005-04-08 22:16 ` Herbert Xu
2005-04-08 23:55 ` Jay Vosburgh
0 siblings, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2005-04-08 22:16 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: davem, netdev, jgarzik
On Fri, Apr 08, 2005 at 01:58:43PM -0700, Jay Vosburgh wrote:
>
> There are several paths from a timer context, e.g.,
>
> timer -> bond_mii_monitor -> bond_change_active_slave ->
> bond_alb_handle_active_change -> alb_set_slave_mac_addr
Is that what it's for. I sense an opportunity for you to
greatly clean up this section of code :)
The networking drivers are trying to switch over to using
netif_carrier_ok as the interface for notification and
monitoring of link status. In fact, even the bonding driver
appears to do this by default.
When you're using netif_carrier_ok, you're relying on the
driver itself to monitor the physical status. It will usually
do so in its own timer.
In other words, it's pointless to examine netif_carrier_ok
more often than the underlying driver timer. The standard
interface to monitor its status is through a netdev
notification (NETDEV_CHANGE). That is called in process
context by net/core/link_watch.c.
So I'd suggest that we get rid of directly MII monitoring in
bonding, and always use netif_carrier_ok. Further more,
instead of monitoring that in a timer you just move the code
into bond_netdev_event.
> As I said above, the timer case doesn't have an obviously good
> way to change it to do the MAC sets in a sleepable context with no
> locks. I also wonder how expensive it is to run work queue events every
> 10ms as compared to running timer events every 10ms?
If you really want to keep monitoring the MII status directly, then
you can still schedule the changing of the MAC address via a work
queue.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
2005-04-08 22:16 ` Herbert Xu
@ 2005-04-08 23:55 ` Jay Vosburgh
2005-04-09 0:21 ` Herbert Xu
0 siblings, 1 reply; 17+ messages in thread
From: Jay Vosburgh @ 2005-04-08 23:55 UTC (permalink / raw)
To: Herbert Xu; +Cc: davem, netdev, jgarzik
Herbert Xu <herbert@gondor.apana.org.au> wrote:
[...]
>The networking drivers are trying to switch over to using
>netif_carrier_ok as the interface for notification and
>monitoring of link status. In fact, even the bonding driver
>appears to do this by default.
Yep. It's been that way for a couple of years, as I recall.
[...]
>In other words, it's pointless to examine netif_carrier_ok
>more often than the underlying driver timer. The standard
>interface to monitor its status is through a netdev
>notification (NETDEV_CHANGE). That is called in process
>context by net/core/link_watch.c.
Looking at link_watch.c, it appears that it issues events on a
one second timer, so there could be a lag of up to one second between
the time a driver does "netif_carrier_off" and the time bonding would
receive the event from link_watch.
So, yes, it's pointless to check netif_carrier more often that
the driver updates it, but it's not pointless to check netif_carrier on
a timer if the granularity of events from link_watch is too slow (which
it appears to be for a "hot standby" type of application).
>So I'd suggest that we get rid of directly MII monitoring in
>bonding, and always use netif_carrier_ok. Further more,
>instead of monitoring that in a timer you just move the code
>into bond_netdev_event.
The direct MII monitor option has stayed because some drivers
update netif_carrier at fairly long intervals; the extreme example is
3c59x, which checks every 60 seconds. For a hot standby use, even one
second is a pretty long time.
FWIW, It's been a while since a user has reported trouble with
bonding link detection with a driver because it didn't support
netif_carrier; lately, when it comes up it's the notification interval.
We've also not been discussing the bonding "arp monitor," which
does link integrity checking by means of detecting traffic flow across
the link(s) (generating traffic, via ARP requests, when the link is
idle). It will trigger the same kinds of failovers that the mii monitor
does; the saving grace right now is that it doesn't currently run with
the alb/tlb modes that do the MAC address swapping from the failover.
My other question here is how much time is there left to get
changes for this into 2.6.12? Rearchitecting the link monitor /
failover gizmos is reasonably nontrivial; I don't know if it's feasible
to make those kind of substantial changes this late in the cycle. So,
either 2.6.12 goes out with the potential sleep from timer / with lock,
the bonding MAC notifier change is partially backed out, the "gfp_any()"
change goes into rtnetlink.c, or some other solution that eludes me
occurs.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
2005-04-08 23:55 ` Jay Vosburgh
@ 2005-04-09 0:21 ` Herbert Xu
2005-04-09 0:31 ` Herbert Xu
0 siblings, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2005-04-09 0:21 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: davem, netdev, jgarzik
On Fri, Apr 08, 2005 at 04:55:53PM -0700, Jay Vosburgh wrote:
>
> Looking at link_watch.c, it appears that it issues events on a
> one second timer, so there could be a lag of up to one second between
> the time a driver does "netif_carrier_off" and the time bonding would
> receive the event from link_watch.
Events are delivered immediately except when there are multiple failures
within a second. Even then it will delay the subsequent failures only
if they occured after the queue run has already started.
Remember that whenever the queue run does trigger, it will update all
devices that's on the linked list.
We can easily change this to do something different if it really
bothers you.
> The direct MII monitor option has stayed because some drivers
> update netif_carrier at fairly long intervals; the extreme example is
> 3c59x, which checks every 60 seconds. For a hot standby use, even one
> second is a pretty long time.
Wouldn't it be better to modify those drivers so that they updated
their status more frequently? That way everybody would benefit and
not just bonding.
> We've also not been discussing the bonding "arp monitor," which
> does link integrity checking by means of detecting traffic flow across
> the link(s) (generating traffic, via ARP requests, when the link is
> idle). It will trigger the same kinds of failovers that the mii monitor
> does; the saving grace right now is that it doesn't currently run with
> the alb/tlb modes that do the MAC address swapping from the failover.
The ARP monitor would presumably start changing MAC addresses only after
a longish timeout. In that event, I don't see any problems with delaying
it a little bit more by putting it into a work queue.
> My other question here is how much time is there left to get
> changes for this into 2.6.12? Rearchitecting the link monitor /
> failover gizmos is reasonably nontrivial; I don't know if it's feasible
> to make those kind of substantial changes this late in the cycle. So,
> either 2.6.12 goes out with the potential sleep from timer / with lock,
> the bonding MAC notifier change is partially backed out, the "gfp_any()"
> change goes into rtnetlink.c, or some other solution that eludes me
> occurs.
It's up to Dave of course.
Personally I'd rather we aimed for a proper solution that will be in
2.6.13 since the current symptom is only a warning which doesn't
really hurt anyone.
After all, we have lived with this problem for years so a few more
weeks can't be fatal :)
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
2005-04-09 0:21 ` Herbert Xu
@ 2005-04-09 0:31 ` Herbert Xu
[not found] ` <20050424185149.278ffb93.davem@davemloft.net>
0 siblings, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2005-04-09 0:31 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: davem, netdev, jgarzik
On Sat, Apr 09, 2005 at 10:21:37AM +1000, herbert wrote:
>
> Personally I'd rather we aimed for a proper solution that will be in
> 2.6.13 since the current symptom is only a warning which doesn't
> really hurt anyone.
Actually there is another reason why we need to move the MAC setting
to process context rather than simply patching up the rtnetlink code.
The driver implementation for set_mac_address may well want to sleep
since it needs to communicate with the hardware.
In fact I just looked at some USB net drivers and the very first one
(zd1201) wants to sleep in there badly :)
So if you want a quick and dirty fix, why not make bonding call
dev_set_mac_address from a work queue?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
[not found] ` <20050424185149.278ffb93.davem@davemloft.net>
@ 2005-04-25 12:41 ` Herbert Xu
0 siblings, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2005-04-25 12:41 UTC (permalink / raw)
To: David S. Miller; +Cc: fubar, netdev, jgarzik
[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]
On Sun, Apr 24, 2005 at 06:51:49PM -0700, David S. Miller wrote:
>
> > So if you want a quick and dirty fix, why not make bonding call
> > dev_set_mac_address from a work queue?
>
> I'd say we still have at least two weeks until 2.6.12 final goes out.
> So if you want to work on the proper long-term fix now, by all means
> please do so.
Well I had a go at it but I realised that this isn't something that
can be fixed properly in two weeks. The work queue hack can
probably be done in that time frame but it'll be darn ugly.
I found another reason why we need to defer the dev_set_mac_address to
process context: We need to hold the RTNL while it's taking place.
Otherwise someone else could come along and change the slave's MAC
address at the same time. Because device drivers tend to rely on the
RTNL to guarantee non-reentrancy, this is likely to break.
I also managed to find a little bug along the way though so let's quash
it :) bond_alb_set_mac_address did not acquire bond->lock before
operating on the slaves. As it can be done at any time by the user,
this could interfere with the timers.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: p --]
[-- Type: text/plain, Size: 533 bytes --]
drivers/net/bonding/bond_alb.c: 5ce606d9dc03f9b145c3024abecfca20ec65fd9d
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1666,6 +1666,7 @@ int bond_alb_set_mac_address(struct net_
}
}
+ write_lock_bh(&bond->lock);
if (swap_slave) {
alb_swap_mac_addr(bond, swap_slave, bond->curr_active_slave);
} else {
@@ -1678,6 +1679,7 @@ int bond_alb_set_mac_address(struct net_
rlb_req_update_slave_clients(bond, bond->curr_active_slave);
}
}
+ write_unlock_bh(&bond->lock);
return 0;
}
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
[not found] ` <200504260411.j3Q4BYke004030@death.nxdomain.ibm.com>
@ 2005-04-26 11:18 ` Herbert Xu
2005-04-27 2:09 ` Jay Vosburgh
0 siblings, 1 reply; 17+ messages in thread
From: Herbert Xu @ 2005-04-26 11:18 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: David S. Miller, netdev, jgarzik
On Mon, Apr 25, 2005 at 09:11:33PM -0700, Jay Vosburgh wrote:
>
> There's a path in the "old" link monitor that does an
> SIOCGMIIREG ioctl to read the PHY registers directly; the handler
> functions typically do a copy_from_user sort of thing, and there's some
> segment switching magic done in bonding because it's calling from the
> timer. It will give warnings if DEBUG_SPINLOCK_SLEEP is on and the
> bonding "use_carrier=0" is supplied (to turn off netif_carrier detection
> and use the MII inspection). There's an IOCTL macro in bonding.h that
> does the nasty parts.
Indeed. But how can this stuff work at all? Surely if use_carrier is
disabled while miimon is enabled we should get a dead-lock as soon as
this call chain is run:
dev_ioctl => bond_enslave => bond_check_dev_link => slave_dev->ioctl
So either nobody ever uses this setup or I'm missing something :)
> >The timers would be converted to delayed work.
>
> It is better, performance-wise, to run the "main" part of the
> link monitoring in a timer, and then call out to a work queue only for
> those operations that need a context? I.e., how expensive are work
> queues compared to timers?
For the amount of work that these timers are doing, the overhead is
pretty small. It is also gentler on the system when the CPU load
goes up.
> I've been thinking in terms of rearranging the monitor system to
> have, essentially, one or more "inspectors" that check link state, and a
> single "executor" that makes link up/down decisions based on input from
> the "inspectors." The "executor" (or perhaps one of its minions, if the
> executor itself is a timer for performance reasons) would be the work
> queue piece, as it would be doing the actual swapping around.
This sounds good. One thing to watch out for though is that you need
to drop the bond lock between the inspector and the executor. This
is so that the executor can acquire the rtnl. The only other alternative
would be to always hold the rtnl.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
2005-04-26 11:18 ` Herbert Xu
@ 2005-04-27 2:09 ` Jay Vosburgh
2005-04-27 2:15 ` Herbert Xu
0 siblings, 1 reply; 17+ messages in thread
From: Jay Vosburgh @ 2005-04-27 2:09 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, netdev, jgarzik
Herbert Xu <herbert@gondor.apana.org.au> wrote:
[...]
>Indeed. But how can this stuff work at all? Surely if use_carrier is
>disabled while miimon is enabled we should get a dead-lock as soon as
>this call chain is run:
>
>dev_ioctl => bond_enslave => bond_check_dev_link => slave_dev->ioctl
Why would it deadlock? dev_ioctl holds RTNL, bonding grabs
various bond locks, and the slave device ioctl handler may or may not
get a lock of its own.
>> It is better, performance-wise, to run the "main" part of the
>> link monitoring in a timer, and then call out to a work queue only for
>> those operations that need a context? I.e., how expensive are work
>> queues compared to timers?
>
>For the amount of work that these timers are doing, the overhead is
>pretty small. It is also gentler on the system when the CPU load
>goes up.
Just so I'm clear: by "the overhead" do you mean the overhead of
running everything in a work queue, or the overhead of calling out from
a timer to a work queue for "special occasions"?
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address
2005-04-27 2:09 ` Jay Vosburgh
@ 2005-04-27 2:15 ` Herbert Xu
0 siblings, 0 replies; 17+ messages in thread
From: Herbert Xu @ 2005-04-27 2:15 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: David S. Miller, netdev, jgarzik
On Tue, Apr 26, 2005 at 07:09:01PM -0700, Jay Vosburgh wrote:
>
> >dev_ioctl => bond_enslave => bond_check_dev_link => slave_dev->ioctl
>
> Why would it deadlock? dev_ioctl holds RTNL, bonding grabs
> various bond locks, and the slave device ioctl handler may or may not
> get a lock of its own.
I get it now. I had thought that it was calling sys_ioctl but it's
going direct to slave_dev->ioctl which bypasses the rtnl.
So the bond_enslave path is safe but the timer path isn't as you
noted previously.
> >For the amount of work that these timers are doing, the overhead is
> >pretty small. It is also gentler on the system when the CPU load
> >goes up.
>
> Just so I'm clear: by "the overhead" do you mean the overhead of
> running everything in a work queue, or the overhead of calling out from
> a timer to a work queue for "special occasions"?
I'm referring to running everything out of a work queue.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2005-04-27 2:15 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-07 19:59 [PATCH 2.6.12-rc2] bonding: partially back out dev_set_mac_address Jay Vosburgh
2005-04-07 20:31 ` David S. Miller
2005-04-07 20:55 ` Jay Vosburgh
2005-04-07 20:57 ` David S. Miller
2005-04-07 21:35 ` Jay Vosburgh
2005-04-08 12:36 ` Herbert Xu
2005-04-08 20:58 ` Jay Vosburgh
2005-04-08 22:16 ` Herbert Xu
2005-04-08 23:55 ` Jay Vosburgh
2005-04-09 0:21 ` Herbert Xu
2005-04-09 0:31 ` Herbert Xu
[not found] ` <20050424185149.278ffb93.davem@davemloft.net>
2005-04-25 12:41 ` Herbert Xu
2005-04-08 3:50 ` Jeff Garzik
2005-04-08 4:45 ` David S. Miller
[not found] <20050426011907.GA13846@gondor.apana.org.au>
[not found] ` <200504260411.j3Q4BYke004030@death.nxdomain.ibm.com>
2005-04-26 11:18 ` Herbert Xu
2005-04-27 2:09 ` Jay Vosburgh
2005-04-27 2:15 ` Herbert Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).