* ethtool physical identify vs netlink locking? @ 2011-03-29 20:52 Stephen Hemminger 2011-03-30 0:07 ` Ben Hutchings 0 siblings, 1 reply; 6+ messages in thread From: Stephen Hemminger @ 2011-03-29 20:52 UTC (permalink / raw) To: David Miller; +Cc: netdev Right now if an administrator uses the ethtool function to identify network interface, the netlink lock can be held indefinitely. In other words, doing "ethtool -p eth1" will stop all other netlink activity. This is bad, imagine the case of an operator doing that to find a NIC in a rack, and because of the netlink lockout all routing daemon activity stops. There are several possible solutions but most involve fixing all the device drivers (24). Options: 1. Have device driver drop and reacquire rtnl() while blinking 2. Have ethtool core drop rtnl before calling device driver 3. Add per-device ethtool rtnl lock #1 is the least disruption #2 means additional locking maybe required for each device driver #3 seems like excessive overhead. Comments? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ethtool physical identify vs netlink locking? 2011-03-29 20:52 ethtool physical identify vs netlink locking? Stephen Hemminger @ 2011-03-30 0:07 ` Ben Hutchings 2011-03-30 0:13 ` David Miller ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Ben Hutchings @ 2011-03-30 0:07 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, netdev On Tue, 2011-03-29 at 13:52 -0700, Stephen Hemminger wrote: > Right now if an administrator uses the ethtool function to identify network > interface, the netlink lock can be held indefinitely. In other words, doing > "ethtool -p eth1" will stop all other netlink activity. This is bad, imagine > the case of an operator doing that to find a NIC in a rack, and because of > the netlink lockout all routing daemon activity stops. Also, glibc can enumerate devices during name lookup now (if I remember correctly), so new connections to servers that do reverse name lookups tend to stall immediately. > There are several possible solutions but most involve fixing all the device > drivers (24). Options: > > 1. Have device driver drop and reacquire rtnl() while blinking > 2. Have ethtool core drop rtnl before calling device driver > 3. Add per-device ethtool rtnl lock > > #1 is the least disruption but nasty! > #2 means additional locking maybe required for each device driver > #3 seems like excessive overhead. In the sfc driver, physical ID used to be delegated to the PHY operations. Then I realised that it was pointless to use a PHY's blink mode where it was available and a periodic timer on the host where it wasn't, when the latter would work for all of them. So I would propose: 4. Define a ethtool operation 'set_id_state' with an argument that sets identification on/off/inactive/active (the last optional, for any driver that really wants to do this differently). When this is defined, the ethtool core runs the loop and acquires the lock each time it calls this operation. This requires changes to every driver, though not all at once. As an additional benefit, it should result in consistent behaviour for the count = 0 case. The core ethtool function would look something like: static int ethtool_phys_id(struct net_device *dev, void __user *useraddr) { struct ethtool_value id; int rc; if (!dev->ethtool_ops->phys_id && !dev->ethtool_ops->set_id_led) return -EOPNOTSUPP; if (copy_from_user(&id, useraddr, sizeof(id))) return -EFAULT; if (!dev->ethtool_ops->set_id_led) /* Do it the old way */ return dev->ethtool_ops->phys_id(dev, id.data); rc = dev->ethtool_ops->set_id_state(dev, ETHTOOL_ID_ACTIVE); if (rc && rc != -EINVAL) return rc; dev_hold(dev); rtnl_unlock(); if (rc == 0) { /* Driver will handle this itself */ schedule_timeout_interruptible( id.data ? id.data : MAX_SCHEDULE_TIMEOUT); } else { /* Driver expects to be called periodically */ do { rtnl_lock(); rc = dev->ethtool_ops->set_id_state(dev, ETHTOOL_ID_ON); rtnl_unlock(); if (rc) break; schedule_timeout_interruptible(HZ / 2); rtnl_lock(); rc = dev->ethtool_ops->set_id_state(dev, ETHTOOL_ID_OFF); rtnl_unlock(); if (rc) break; schedule_timeout_interruptible(HZ / 2); } while (!signal_pending(current) && (id.data == 0 || --id.data != 0)); } rtnl_lock(); dev_put(dev); (void)dev->ethtool_ops->set_id_state(dev, ETHTOOL_ID_INACTIVE); return rc; } Ben. -- Ben Hutchings, Senior Software Engineer, Solarflare Communications Not speaking for my employer; that's the marketing department's job. They asked us to note that Solarflare product names are trademarked. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ethtool physical identify vs netlink locking? 2011-03-30 0:07 ` Ben Hutchings @ 2011-03-30 0:13 ` David Miller 2011-03-30 0:23 ` Stephen Hemminger 2011-03-30 1:35 ` Michał Mirosław 2 siblings, 0 replies; 6+ messages in thread From: David Miller @ 2011-03-30 0:13 UTC (permalink / raw) To: bhutchings; +Cc: shemminger, netdev From: Ben Hutchings <bhutchings@solarflare.com> Date: Wed, 30 Mar 2011 01:07:56 +0100 > In the sfc driver, physical ID used to be delegated to the PHY > operations. Then I realised that it was pointless to use a PHY's blink > mode where it was available and a periodic timer on the host where it > wasn't, when the latter would work for all of them. So I would propose: > > 4. Define a ethtool operation 'set_id_state' with an argument that sets > identification on/off/inactive/active (the last optional, for any driver > that really wants to do this differently). When this is defined, the > ethtool core runs the loop and acquires the lock each time it calls this > operation. > > This requires changes to every driver, though not all at once. As an > additional benefit, it should result in consistent behaviour for the > count = 0 case. This seems like a good way to solve the problem. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ethtool physical identify vs netlink locking? 2011-03-30 0:07 ` Ben Hutchings 2011-03-30 0:13 ` David Miller @ 2011-03-30 0:23 ` Stephen Hemminger 2011-03-30 1:35 ` Michał Mirosław 2 siblings, 0 replies; 6+ messages in thread From: Stephen Hemminger @ 2011-03-30 0:23 UTC (permalink / raw) To: Ben Hutchings; +Cc: David Miller, netdev On Wed, 30 Mar 2011 01:07:56 +0100 Ben Hutchings <bhutchings@solarflare.com> wrote: > 4. Define a ethtool operation 'set_id_state' with an argument that sets I like this way, but the code should return -EINTR if interrupted? ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ethtool physical identify vs netlink locking? 2011-03-30 0:07 ` Ben Hutchings 2011-03-30 0:13 ` David Miller 2011-03-30 0:23 ` Stephen Hemminger @ 2011-03-30 1:35 ` Michał Mirosław 2011-03-30 6:29 ` Stephen Hemminger 2 siblings, 1 reply; 6+ messages in thread From: Michał Mirosław @ 2011-03-30 1:35 UTC (permalink / raw) To: Ben Hutchings; +Cc: Stephen Hemminger, David Miller, netdev 2011/3/30 Ben Hutchings <bhutchings@solarflare.com>: > On Tue, 2011-03-29 at 13:52 -0700, Stephen Hemminger wrote: >> Right now if an administrator uses the ethtool function to identify network >> interface, the netlink lock can be held indefinitely. In other words, doing >> "ethtool -p eth1" will stop all other netlink activity. This is bad, imagine >> the case of an operator doing that to find a NIC in a rack, and because of >> the netlink lockout all routing daemon activity stops. [...] >> There are several possible solutions but most involve fixing all the device >> drivers (24). Options: >> >> 1. Have device driver drop and reacquire rtnl() while blinking >> 2. Have ethtool core drop rtnl before calling device driver >> 3. Add per-device ethtool rtnl lock > 4. Define a ethtool operation 'set_id_state' with an argument that sets > identification on/off/inactive/active (the last optional, for any driver > that really wants to do this differently). When this is defined, the > ethtool core runs the loop and acquires the lock each time it calls this > operation. 5. Have a driver register a LED class device instead of implementing an ethtool op. Hmm. This would require changes to userspace ethtool command. I wonder if anything else uses this call? Best Regards, Michał Mirosław ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: ethtool physical identify vs netlink locking? 2011-03-30 1:35 ` Michał Mirosław @ 2011-03-30 6:29 ` Stephen Hemminger 0 siblings, 0 replies; 6+ messages in thread From: Stephen Hemminger @ 2011-03-30 6:29 UTC (permalink / raw) To: Michał Mirosław; +Cc: Ben Hutchings, David Miller, netdev On Wed, 30 Mar 2011 03:35:40 +0200 Michał Mirosław <mirqus@gmail.com> wrote: > 2011/3/30 Ben Hutchings <bhutchings@solarflare.com>: > > On Tue, 2011-03-29 at 13:52 -0700, Stephen Hemminger wrote: > >> Right now if an administrator uses the ethtool function to identify network > >> interface, the netlink lock can be held indefinitely. In other words, doing > >> "ethtool -p eth1" will stop all other netlink activity. This is bad, imagine > >> the case of an operator doing that to find a NIC in a rack, and because of > >> the netlink lockout all routing daemon activity stops. > [...] > >> There are several possible solutions but most involve fixing all the device > >> drivers (24). Options: > >> > >> 1. Have device driver drop and reacquire rtnl() while blinking > >> 2. Have ethtool core drop rtnl before calling device driver > >> 3. Add per-device ethtool rtnl lock > > 4. Define a ethtool operation 'set_id_state' with an argument that sets > > identification on/off/inactive/active (the last optional, for any driver > > that really wants to do this differently). When this is defined, the > > ethtool core runs the loop and acquires the lock each time it calls this > > operation. > > 5. Have a driver register a LED class device instead of implementing > an ethtool op. > > Hmm. This would require changes to userspace ethtool command. I wonder > if anything else uses this call? Full LED support is overkill for this I think. Especially if it means creating 24 unique LED drivers. Ben's idea seams the best so far. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-03-30 6:29 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-29 20:52 ethtool physical identify vs netlink locking? Stephen Hemminger 2011-03-30 0:07 ` Ben Hutchings 2011-03-30 0:13 ` David Miller 2011-03-30 0:23 ` Stephen Hemminger 2011-03-30 1:35 ` Michał Mirosław 2011-03-30 6:29 ` Stephen Hemminger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).