netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] net: Always fire at least one linkwatch event
@ 2011-09-21 19:51 Neil Horman
  2011-09-27 18:59 ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Neil Horman @ 2011-09-21 19:51 UTC (permalink / raw)
  To: netdev; +Cc: Neil Horman, David S. Miller, jfeeney

It was recently noted that the tg3 driver had a problem in that after boot a
kernel and if-upping the tg3 interface the sysfs operstate attribute continued
to read 'unkown'.  This was happening because tg3 assumes the default carrier
state (which is to say the __LINK_STATE_NOCARRIER bit is clear) is correct.
That said, when the device is if-upped, and the open path, calls
netif_carrier_on, the test_and_set_bit call in that function returns false
(since the bit was previously zero from its initial state).  This means that
netif_carrier_on call never generates a linkwatch event, and as a result
dev->operstate never gets recomputed.  This could be fixed by unconditionally
calling netif_carrier_off in the probe routine, to simply force a state change
on that bit, but that seems like a sub-par solution, given that many drivers may
have this error.  Instead it seems like it might be better to burn an extra bit
in the state field to indicate that the CARRIER bit is still in the initial
state and our first call to netif_carrier_[off|on] should always fire a
linkwatch event.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: jfeeney@redhat.com
---
 include/linux/netdevice.h |    1 +
 net/sched/sch_generic.c   |    8 ++++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0a7f619..85d6f68 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -306,6 +306,7 @@ enum netdev_state_t {
 	__LINK_STATE_NOCARRIER,
 	__LINK_STATE_LINKWATCH_PENDING,
 	__LINK_STATE_DORMANT,
+	__LINK_STATE_CARRIER_INIT,
 };
 
 
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 69fca27..6f8bfd1 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -298,7 +298,9 @@ static void dev_watchdog_down(struct net_device *dev)
  */
 void netif_carrier_on(struct net_device *dev)
 {
-	if (test_and_clear_bit(__LINK_STATE_NOCARRIER, &dev->state)) {
+	int force = !test_and_set_bit(__LINK_STATE_CARRIER_INIT, &dev->state);
+
+	if (test_and_clear_bit(__LINK_STATE_NOCARRIER, &dev->state) || force) {
 		if (dev->reg_state == NETREG_UNINITIALIZED)
 			return;
 		linkwatch_fire_event(dev);
@@ -316,7 +318,9 @@ EXPORT_SYMBOL(netif_carrier_on);
  */
 void netif_carrier_off(struct net_device *dev)
 {
-	if (!test_and_set_bit(__LINK_STATE_NOCARRIER, &dev->state)) {
+	int force = !test_and_set_bit(__LINK_STATE_CARRIER_INIT, &dev->state);
+
+	if (!test_and_set_bit(__LINK_STATE_NOCARRIER, &dev->state) || force) {
 		if (dev->reg_state == NETREG_UNINITIALIZED)
 			return;
 		linkwatch_fire_event(dev);
-- 
1.7.6.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] net: Always fire at least one linkwatch event
  2011-09-21 19:51 [RFC PATCH] net: Always fire at least one linkwatch event Neil Horman
@ 2011-09-27 18:59 ` David Miller
  2011-09-27 19:34   ` Neil Horman
  0 siblings, 1 reply; 4+ messages in thread
From: David Miller @ 2011-09-27 18:59 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, jfeeney

From: Neil Horman <nhorman@tuxdriver.com>
Date: Wed, 21 Sep 2011 15:51:29 -0400

> It was recently noted that the tg3 driver had a problem in that after boot a
> kernel and if-upping the tg3 interface the sysfs operstate attribute continued
> to read 'unkown'.  This was happening because tg3 assumes the default carrier
> state (which is to say the __LINK_STATE_NOCARRIER bit is clear) is correct.
> That said, when the device is if-upped, and the open path, calls
> netif_carrier_on, the test_and_set_bit call in that function returns false
> (since the bit was previously zero from its initial state).  This means that
> netif_carrier_on call never generates a linkwatch event, and as a result
> dev->operstate never gets recomputed.  This could be fixed by unconditionally
> calling netif_carrier_off in the probe routine, to simply force a state change
> on that bit, but that seems like a sub-par solution, given that many drivers may
> have this error.  Instead it seems like it might be better to burn an extra bit
> in the state field to indicate that the CARRIER bit is still in the initial
> state and our first call to netif_carrier_[off|on] should always fire a
> linkwatch event.

I'm finding this analysis hard to follow.

tg3_open() does netif_carrier_off(), and this will set the
__LINK_STATE_NOCARRIER bit.

And since the registration state of the device is not
NETREG_UNINITIALIZED it will fire off a linkwatch event too.

So whenever the netif_carrier_on() happens later, the bit will be set
when the test_and_clear_bit() happens there.  So the
test_and_clear_bit() will not return false.

The registration state is not NETREG_UNINITIALIZED, so (again) that
will not block the linkwatch event from firing.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] net: Always fire at least one linkwatch event
  2011-09-27 18:59 ` David Miller
@ 2011-09-27 19:34   ` Neil Horman
  2011-09-27 19:49     ` David Miller
  0 siblings, 1 reply; 4+ messages in thread
From: Neil Horman @ 2011-09-27 19:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, jfeeney

On Tue, Sep 27, 2011 at 02:59:43PM -0400, David Miller wrote:
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Wed, 21 Sep 2011 15:51:29 -0400
> 
> > It was recently noted that the tg3 driver had a problem in that after boot a
> > kernel and if-upping the tg3 interface the sysfs operstate attribute continued
> > to read 'unkown'.  This was happening because tg3 assumes the default carrier
> > state (which is to say the __LINK_STATE_NOCARRIER bit is clear) is correct.
> > That said, when the device is if-upped, and the open path, calls
> > netif_carrier_on, the test_and_set_bit call in that function returns false
> > (since the bit was previously zero from its initial state).  This means that
> > netif_carrier_on call never generates a linkwatch event, and as a result
> > dev->operstate never gets recomputed.  This could be fixed by unconditionally
> > calling netif_carrier_off in the probe routine, to simply force a state change
> > on that bit, but that seems like a sub-par solution, given that many drivers may
> > have this error.  Instead it seems like it might be better to burn an extra bit
> > in the state field to indicate that the CARRIER bit is still in the initial
> > state and our first call to netif_carrier_[off|on] should always fire a
> > linkwatch event.
> 
> I'm finding this analysis hard to follow.
> 
> tg3_open() does netif_carrier_off(), and this will set the
> __LINK_STATE_NOCARRIER bit.
> 
Sorry, I should have explained further.  In the interests of full disclosure,
this was initially reported on a RHEL 2.6.32 kernel, where netif_carrier_off was
not called from tg3_open.  As a result, when tg3_carrier_on was called later in
the open path, the test_and_clear would return 0, since NOCARRIER was
initialized to 0, and we wouldn't fire a linkwatch event, which in turn meant
that operstate was never updated until a full ifup/down/up cycle was completed.

So tg3 actually works properly upstream, but the larger issue remains - Drivers
individually must set and clear the NOCARRIER flag in order to effectively prime
the linkwatch state machine, which seems to me haphazard and prone to recurring
bugs.  What I'm proposing here is a driver independent method of ensuring that
the first call to netif_carrier_off/on gets called regardless of initial state.
This prevents drivers from having to individually remember to call
netif_carrier_off at the start of an open routine, which visually makes more
sense to me, especially when they almost immediately call netif_carrier_on right
afterwards.

Hope that clarifies things somewhat.
Neil

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH] net: Always fire at least one linkwatch event
  2011-09-27 19:34   ` Neil Horman
@ 2011-09-27 19:49     ` David Miller
  0 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2011-09-27 19:49 UTC (permalink / raw)
  To: nhorman; +Cc: netdev, jfeeney

From: Neil Horman <nhorman@tuxdriver.com>
Date: Tue, 27 Sep 2011 15:34:13 -0400

> So tg3 actually works properly upstream, but the larger issue remains - Drivers
> individually must set and clear the NOCARRIER flag in order to effectively prime
> the linkwatch state machine, which seems to me haphazard and prone to recurring
> bugs.

Driver controls when the PHY is reset, auto-negotiation is started, etc. so it is
the only entity which is in the position to set the correct state.

So we kind of depend upon drivers managing the state correctly and accurately.

If I follow what tg3 is currently doing, just to show an example, it first
sets carrier off then resets then entire chip atomically.  This reset will
restart auto-neg, etc. and trigger a subsequent link-up event which will
netif_carrier_on() and get the proper transition.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-09-27 19:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-21 19:51 [RFC PATCH] net: Always fire at least one linkwatch event Neil Horman
2011-09-27 18:59 ` David Miller
2011-09-27 19:34   ` Neil Horman
2011-09-27 19:49     ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).