* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-12 13:23 Patch: Idea for RFC2863 conform OperStatus Stefan Rompf
@ 2002-10-12 13:13 ` jamal
2002-10-13 12:48 ` Stefan Rompf
2002-10-12 14:09 ` jamal
2002-10-14 10:38 ` bert hubert
2 siblings, 1 reply; 24+ messages in thread
From: jamal @ 2002-10-12 13:13 UTC (permalink / raw)
To: Stefan Rompf; +Cc: netdev
I forgot about this. I hate to rain on your parade Stefan, but
if you made one global worklist that will complete the discussion.
cheers,
jamal
^ permalink raw reply [flat|nested] 24+ messages in thread
* Patch: Idea for RFC2863 conform OperStatus
@ 2002-10-12 13:23 Stefan Rompf
2002-10-12 13:13 ` jamal
` (2 more replies)
0 siblings, 3 replies; 24+ messages in thread
From: Stefan Rompf @ 2002-10-12 13:23 UTC (permalink / raw)
To: netdev; +Cc: hadi
[-- Attachment #1: Type: text/plain, Size: 578 bytes --]
Hi,
coming back to Jamals idea of RFC2863 conform operative state, I've made
up a short patch on how an implementation could look like. It also takes
Tims idea of one struct workstruct per device for link state
notification (slightly modified). I know it's incomplete, it doesn't
event contain the needed fix on unregister_netdev I critised on Tim's
patch, but I want to put it to discussion early. May be it breaks the
hotplugging stuff.
If we decide to go this way, we'll also need an extension to rtnetlink
and a new ioctl to transport the information to userspace.
Stefan
[-- Attachment #2: patch-rfc2863-2.5.41 --]
[-- Type: text/plain, Size: 9362 bytes --]
diff -rNuX dontdiff linux-2.5.41/include/linux/netdevice.h linux-2.5.41-stefan/include/linux/netdevice.h
--- linux-2.5.41/include/linux/netdevice.h Tue Oct 8 22:18:50 2002
+++ linux-2.5.41-stefan/include/linux/netdevice.h Sat Oct 12 14:33:20 2002
@@ -39,6 +39,10 @@
#include <net/profile.h>
#endif
+#ifdef CONFIG_LINKWATCH
+#include <linux/workqueue.h>
+#endif
+
struct divert_blk;
struct vlan_group;
@@ -204,13 +208,25 @@
{
__LINK_STATE_XOFF=0,
__LINK_STATE_START,
- __LINK_STATE_PRESENT,
+ __LINK_STATE_PRESENT_OBSOLETE,
__LINK_STATE_SCHED,
- __LINK_STATE_NOCARRIER,
+ __LINK_STATE_NOCARRIER_OBSOLETE,
__LINK_STATE_RX_SCHED
};
+/* Device operative state as per RFC2863 */
+enum netdev_operstate_t {
+ NETDEV_OPER_UP = 1,
+ NETDEV_OPER_DOWN, /* Obsoletes LINK_STATE_NOCARRIER */
+ NETDEV_OPER_TESTING,
+ NETDEV_OPER_UNKNOWN,
+ NETDEV_OPER_DORMANT,
+ NETDEV_OPER_NOTPRESENT, /* Obsoletes !LINK_STATE_PRESENT */
+ NETDEV_OPER_LOWERDOWN
+};
+
+
/*
* This structure holds at boot time configured netdevice settings. They
* are then used in the device probing.
@@ -308,6 +324,15 @@
* which this device is member of.
*/
+ /* Operative state, semaphore and work_struct for
+ * userspace notification
+ */
+#ifdef CONFIG_LINKWATCH
+ struct work_struct linkwatch_work;
+#endif
+ rwlock_t operstate_lock;
+ unsigned short operstate;
+
/* Interface address info. */
unsigned char broadcast[MAX_ADDR_LEN]; /* hw bcast add */
unsigned char dev_addr[MAX_ADDR_LEN]; /* hw address */
@@ -631,34 +656,77 @@
* who is responsible for serialization of these calls.
*/
+#ifdef CONFIG_LINKWATCH
+extern void netdev_fire_linkwatch_event(struct net_device *dev);
+#endif
+
+static inline unsigned short netif_set_operstate(struct net_device *dev, unsigned short newstate)
+{
+ unsigned long flags;
+ unsigned short oldstate;
+
+ write_lock_irqsave(&dev->operstate_lock, flags);
+ oldstate = dev->operstate;
+ dev->operstate = newstate;
+ write_unlock_irqrestore(&dev->operstate_lock, flags);
+
+#ifdef CONFIG_LINKWATCH
+ if (oldstate != newstate) netdev_fire_linkwatch_event(dev);
+#endif
+
+ return oldstate;
+}
+
+static inline unsigned short netif_get_operstate(struct net_device *dev)
+{
+ unsigned long flags;
+ unsigned short state;
+
+ read_lock_irqsave(&dev->operstate_lock, flags);
+ state = dev->operstate;
+ read_unlock_irqrestore(&dev->operstate_lock, flags);
+
+ return state;
+}
+
static inline int netif_carrier_ok(struct net_device *dev)
{
- return !test_bit(__LINK_STATE_NOCARRIER, &dev->state);
+ return netif_get_operstate(dev) == NETDEV_OPER_UP;
+}
+
+static inline int netif_operstate_to_iff_running(struct net_device *dev)
+{
+ unsigned short state = netif_get_operstate(dev);
+
+ return((1 << state) &
+ (1 << NETDEV_OPER_UP | 1 << NETDEV_OPER_TESTING |
+ 1 << NETDEV_OPER_UNKNOWN));
}
extern void __netdev_watchdog_up(struct net_device *dev);
+
static inline void netif_carrier_on(struct net_device *dev)
{
- clear_bit(__LINK_STATE_NOCARRIER, &dev->state);
+ netif_set_operstate(dev, NETDEV_OPER_UP);
if (netif_running(dev))
__netdev_watchdog_up(dev);
}
static inline void netif_carrier_off(struct net_device *dev)
{
- set_bit(__LINK_STATE_NOCARRIER, &dev->state);
+ netif_set_operstate(dev, NETDEV_OPER_DOWN);
}
/* Hot-plugging. */
static inline int netif_device_present(struct net_device *dev)
{
- return test_bit(__LINK_STATE_PRESENT, &dev->state);
+ return netif_get_operstate(dev) != NETDEV_OPER_NOTPRESENT;
}
static inline void netif_device_detach(struct net_device *dev)
{
- if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
+ if (netif_set_operstate(dev, NETDEV_OPER_NOTPRESENT) != NETDEV_OPER_NOTPRESENT &&
netif_running(dev)) {
netif_stop_queue(dev);
}
@@ -666,7 +734,7 @@
static inline void netif_device_attach(struct net_device *dev)
{
- if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) &&
+ if (netif_set_operstate(dev, NETDEV_OPER_UNKNOWN) == NETDEV_OPER_NOTPRESENT &&
netif_running(dev)) {
netif_wake_queue(dev);
__netdev_watchdog_up(dev);
diff -rNuX dontdiff linux-2.5.41/net/Config.help linux-2.5.41-stefan/net/Config.help
--- linux-2.5.41/net/Config.help Tue Oct 1 09:06:18 2002
+++ linux-2.5.41-stefan/net/Config.help Sat Oct 12 00:56:59 2002
@@ -472,6 +472,17 @@
However, do not say Y here if you did not experience any serious
problems.
+CONFIG_LINKWATCH
+ When this option is enabled, the kernel will forward changes in the
+ operative ("RUNNING") state of an interface via the netlink socket.
+ This is most useful when running linux as a router.
+
+ Note that currently not many drivers support this, compliant ones
+ can be found by watching the the RUNNING flag in ifconfig output
+ that should follow operative state.
+
+ If unsure, say 'N'.
+
CONFIG_NET_SCHED
When the kernel has several packets to send out over a network
device, it has to decide which ones to send first, which ones to
diff -rNuX dontdiff linux-2.5.41/net/Config.in linux-2.5.41-stefan/net/Config.in
--- linux-2.5.41/net/Config.in Tue Oct 1 09:06:24 2002
+++ linux-2.5.41-stefan/net/Config.in Tue Oct 8 22:44:07 2002
@@ -82,6 +82,7 @@
tristate 'WAN router' CONFIG_WAN_ROUTER
bool 'Fast switching (read help!)' CONFIG_NET_FASTROUTE
bool 'Forwarding between high speed interfaces' CONFIG_NET_HW_FLOWCONTROL
+ bool 'Device link state notification (EXPERIMENTAL)' CONFIG_LINKWATCH
fi
mainmenu_option next_comment
diff -rNuX dontdiff linux-2.5.41/net/core/dev.c linux-2.5.41-stefan/net/core/dev.c
--- linux-2.5.41/net/core/dev.c Tue Oct 8 22:18:51 2002
+++ linux-2.5.41-stefan/net/core/dev.c Sat Oct 12 14:54:25 2002
@@ -198,6 +198,9 @@
int netdev_fastroute_obstacles;
#endif
+#ifdef CONFIG_LINKWATCH
+static void netdev_linkwatch_event(void *data);
+#endif
/*******************************************************************************
@@ -716,6 +719,8 @@
* Set the flags.
*/
dev->flags |= IFF_UP;
+ if (netif_operstate_to_iff_running(dev))
+ dev->flags |= IFF_RUNNING;
/*
* Initialize multicasting status
@@ -2017,7 +2022,7 @@
IFF_RUNNING)) |
(dev->gflags & (IFF_PROMISC |
IFF_ALLMULTI));
- if (netif_running(dev) && netif_carrier_ok(dev))
+ if (netif_running(dev) && netif_operstate_to_iff_running(dev))
ifr->ifr_flags |= IFF_RUNNING;
return 0;
@@ -2432,6 +2437,13 @@
goto out;
#endif /* CONFIG_NET_DIVERT */
+ /* Initial operstate */
+ dev->operstate_lock = RW_LOCK_UNLOCKED;
+ dev->operstate = NETDEV_OPER_UNKNOWN;
+#ifdef CONFIG_LINKWATCH
+ INIT_WORK(&dev->linkwatch_work, netdev_linkwatch_event, dev); // FIXME
+#endif
+
dev->iflink = -1;
/* Init, if this function is available */
@@ -2457,13 +2469,6 @@
if (!dev->rebuild_header)
dev->rebuild_header = default_rebuild_header;
- /*
- * Default initial state at registry is that the
- * device is present.
- */
-
- set_bit(__LINK_STATE_PRESENT, &dev->state);
-
dev->next = NULL;
dev_init_scheduler(dev);
write_lock_bh(&dev_base_lock);
@@ -2735,6 +2740,11 @@
#ifdef CONFIG_NET_FASTROUTE
dev->fastpath_lock = RW_LOCK_UNLOCKED;
#endif
+ dev->operstate_lock = RW_LOCK_UNLOCKED;
+ dev->operstate = NETDEV_OPER_UNKNOWN;
+#ifdef CONFIG_LINKWATCH
+ INIT_WORK(&dev->linkwatch_work, netdev_linkwatch_event, dev); // FIXME
+#endif
dev->xmit_lock_owner = -1;
dev->iflink = -1;
dev_hold(dev);
@@ -2767,7 +2777,6 @@
if (!dev->rebuild_header)
dev->rebuild_header = default_rebuild_header;
dev_init_scheduler(dev);
- set_bit(__LINK_STATE_PRESENT, &dev->state);
}
}
@@ -2848,3 +2857,32 @@
return call_usermodehelper(argv [0], argv, envp);
}
#endif
+
+
+#ifdef CONFIG_LINKWATCH
+static void netdev_linkwatch_event(void *data) {
+ struct net_device *dev = data;
+ unsigned int iff_running = netif_operstate_to_iff_running(dev);
+
+ rtnl_lock();
+ if (dev->flags & IFF_RUNNING && !iff_running) {
+ write_lock(&dev_base_lock);
+ dev->flags &= ~IFF_RUNNING;
+ write_unlock(&dev_base_lock);
+ netdev_state_change(dev);
+ } else if (!(dev->flags & IFF_RUNNING)) {
+ write_lock(&dev_base_lock);
+ dev->flags |= IFF_RUNNING;
+ write_unlock(&dev_base_lock);
+ netdev_state_change(dev);
+ }
+ rtnl_unlock();
+}
+
+
+void netdev_fire_linkwatch_event(struct net_device *dev) {
+ schedule_delayed_work(&dev->linkwatch_work, HZ / 4);
+}
+
+#endif
+
diff -rNuX dontdiff linux-2.5.41/net/core/rtnetlink.c linux-2.5.41-stefan/net/core/rtnetlink.c
--- linux-2.5.41/net/core/rtnetlink.c Tue Oct 1 09:07:57 2002
+++ linux-2.5.41-stefan/net/core/rtnetlink.c Sat Oct 12 14:27:43 2002
@@ -165,7 +165,7 @@
r->ifi_flags = dev->flags;
r->ifi_change = change;
- if (!netif_running(dev) || !netif_carrier_ok(dev))
+ if (!netif_running(dev) || !netif_operstate_to_iff_running(dev))
r->ifi_flags &= ~IFF_RUNNING;
else
r->ifi_flags |= IFF_RUNNING;
diff -rNuX dontdiff linux-2.5.41/net/netsyms.c linux-2.5.41-stefan/net/netsyms.c
--- linux-2.5.41/net/netsyms.c Tue Oct 8 22:18:53 2002
+++ linux-2.5.41-stefan/net/netsyms.c Sat Oct 12 14:34:38 2002
@@ -596,4 +596,8 @@
EXPORT_SYMBOL(wireless_send_event);
#endif /* CONFIG_NET_RADIO || CONFIG_NET_PCMCIA_RADIO */
+#ifdef CONFIG_LINKWATCH
+EXPORT_SYMBOL(netdev_fire_linkwatch_event);
+#endif
+
#endif /* CONFIG_NET */
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-12 13:23 Patch: Idea for RFC2863 conform OperStatus Stefan Rompf
2002-10-12 13:13 ` jamal
@ 2002-10-12 14:09 ` jamal
2002-10-13 19:14 ` kuznet
2002-10-14 10:38 ` bert hubert
2 siblings, 1 reply; 24+ messages in thread
From: jamal @ 2002-10-12 14:09 UTC (permalink / raw)
To: Stefan Rompf; +Cc: netdev
On Sat, 12 Oct 2002, Stefan Rompf wrote:
> If we decide to go this way, we'll also need an extension to rtnetlink
> and a new ioctl to transport the information to userspace.
Before you go doing that:
IFF_RUNNING and IFF_UP are already part of the ifi_flags(struct ifinfomsg)
passed today.
How about making use of ifi_change to extend them? Alexey, would this be
proper use of ifi_change?
cheers,
jamal
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-12 13:13 ` jamal
@ 2002-10-13 12:48 ` Stefan Rompf
2002-10-13 14:04 ` jamal
0 siblings, 1 reply; 24+ messages in thread
From: Stefan Rompf @ 2002-10-13 12:48 UTC (permalink / raw)
To: jamal; +Cc: netdev
[-- Attachment #1: Type: text/plain, Size: 1087 bytes --]
Hi,
> I forgot about this. I hate to rain on your parade Stefan, but
> if you made one global worklist that will complete the discussion.
here we go. Changes since the last version:
-One global worklist. Still using __LINK_STATE_LINKWATCH_PENDING to
know of a pending event fast
-unsigned char operstate instead of short. If there was no alignment,
this would have reduced the size of struct net_device by one byte ;-)
-removed usage if in-kernel-IFF_RUNNING as a mirror. Useless if we want
to broadcast complete operstate via netlink
-Map only NETDEV_OPER_UP and NETDEV_OPER_UNKNOWN to IFF_RUNNING. I have
kept UNKNOWN as a compatibility kludge for the majority of drivers that
cannot determine any operstate yet
-Use dev_hold()/dev_put()
While doing tests with a hacked vlan driver that creates
NETDEV_OPER_LOWERDOWN/_UP events I found that I get a "No buffer space
available" in ip monitor if the event list is longer than about 20
entries. This can be worked around with setsockopt on SO_RCVBUF, but
does anyone have a clue why netlink events are that expensive?
Cheers, Stefan
[-- Attachment #2: patch-rfc2863-2.5.41-2 --]
[-- Type: text/plain, Size: 12120 bytes --]
diff -uNrX dontdiff linux-2.5.41/include/linux/netdevice.h linux-2.5.41-stefan/include/linux/netdevice.h
--- linux-2.5.41/include/linux/netdevice.h Tue Oct 8 22:18:50 2002
+++ linux-2.5.41-stefan/include/linux/netdevice.h Sun Oct 13 12:47:13 2002
@@ -204,10 +204,23 @@
{
__LINK_STATE_XOFF=0,
__LINK_STATE_START,
- __LINK_STATE_PRESENT,
+ __LINK_STATE_PRESENT_OBSOLETE,
__LINK_STATE_SCHED,
- __LINK_STATE_NOCARRIER,
- __LINK_STATE_RX_SCHED
+ __LINK_STATE_NOCARRIER_OBSOLETE,
+ __LINK_STATE_RX_SCHED,
+ __LINK_STATE_LINKWATCH_PENDING
+};
+
+
+/* Device operative state as per RFC2863 */
+enum netdev_operstate_t {
+ NETDEV_OPER_UP = 1,
+ NETDEV_OPER_DOWN, /* Obsoletes LINK_STATE_NOCARRIER */
+ NETDEV_OPER_TESTING,
+ NETDEV_OPER_UNKNOWN,
+ NETDEV_OPER_DORMANT,
+ NETDEV_OPER_NOTPRESENT, /* Obsoletes !LINK_STATE_PRESENT */
+ NETDEV_OPER_LOWERDOWN
};
@@ -308,6 +321,10 @@
* which this device is member of.
*/
+ /* Operative state, access semaphore */
+ rwlock_t operstate_lock;
+ unsigned char operstate;
+
/* Interface address info. */
unsigned char broadcast[MAX_ADDR_LEN]; /* hw bcast add */
unsigned char dev_addr[MAX_ADDR_LEN]; /* hw address */
@@ -631,34 +648,76 @@
* who is responsible for serialization of these calls.
*/
+#ifdef CONFIG_LINKWATCH
+extern void linkwatch_fire_event(struct net_device *dev);
+#endif
+
+static inline unsigned char netif_set_operstate(struct net_device *dev, unsigned char newstate)
+{
+ unsigned long flags;
+ unsigned char oldstate;
+
+ write_lock_irqsave(&dev->operstate_lock, flags);
+ oldstate = dev->operstate;
+ dev->operstate = newstate;
+ write_unlock_irqrestore(&dev->operstate_lock, flags);
+
+#ifdef CONFIG_LINKWATCH
+ if (oldstate != newstate) linkwatch_fire_event(dev);
+#endif
+
+ return oldstate;
+}
+
+static inline unsigned char netif_get_operstate(struct net_device *dev)
+{
+ unsigned long flags;
+ unsigned char state;
+
+ read_lock_irqsave(&dev->operstate_lock, flags);
+ state = dev->operstate;
+ read_unlock_irqrestore(&dev->operstate_lock, flags);
+
+ return state;
+}
+
static inline int netif_carrier_ok(struct net_device *dev)
{
- return !test_bit(__LINK_STATE_NOCARRIER, &dev->state);
+ return netif_get_operstate(dev) != NETDEV_OPER_UP;
+}
+
+static inline int netif_operstate_to_iff_running(struct net_device *dev)
+{
+ unsigned char state = netif_get_operstate(dev);
+
+ return((1 << state) &
+ (1 << NETDEV_OPER_UP | 1 << NETDEV_OPER_UNKNOWN));
}
extern void __netdev_watchdog_up(struct net_device *dev);
+
static inline void netif_carrier_on(struct net_device *dev)
{
- clear_bit(__LINK_STATE_NOCARRIER, &dev->state);
+ netif_set_operstate(dev, NETDEV_OPER_UP);
if (netif_running(dev))
__netdev_watchdog_up(dev);
}
static inline void netif_carrier_off(struct net_device *dev)
{
- set_bit(__LINK_STATE_NOCARRIER, &dev->state);
+ netif_set_operstate(dev, NETDEV_OPER_DOWN);
}
/* Hot-plugging. */
static inline int netif_device_present(struct net_device *dev)
{
- return test_bit(__LINK_STATE_PRESENT, &dev->state);
+ return netif_get_operstate(dev) != NETDEV_OPER_NOTPRESENT;
}
static inline void netif_device_detach(struct net_device *dev)
{
- if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
+ if (netif_set_operstate(dev, NETDEV_OPER_NOTPRESENT) != NETDEV_OPER_NOTPRESENT &&
netif_running(dev)) {
netif_stop_queue(dev);
}
@@ -666,7 +725,7 @@
static inline void netif_device_attach(struct net_device *dev)
{
- if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) &&
+ if (netif_set_operstate(dev, NETDEV_OPER_UNKNOWN) == NETDEV_OPER_NOTPRESENT &&
netif_running(dev)) {
netif_wake_queue(dev);
__netdev_watchdog_up(dev);
diff -uNrX dontdiff linux-2.5.41/net/Config.help linux-2.5.41-stefan/net/Config.help
--- linux-2.5.41/net/Config.help Tue Oct 1 09:06:18 2002
+++ linux-2.5.41-stefan/net/Config.help Sat Oct 12 00:56:59 2002
@@ -472,6 +472,17 @@
However, do not say Y here if you did not experience any serious
problems.
+CONFIG_LINKWATCH
+ When this option is enabled, the kernel will forward changes in the
+ operative ("RUNNING") state of an interface via the netlink socket.
+ This is most useful when running linux as a router.
+
+ Note that currently not many drivers support this, compliant ones
+ can be found by watching the the RUNNING flag in ifconfig output
+ that should follow operative state.
+
+ If unsure, say 'N'.
+
CONFIG_NET_SCHED
When the kernel has several packets to send out over a network
device, it has to decide which ones to send first, which ones to
diff -uNrX dontdiff linux-2.5.41/net/Config.in linux-2.5.41-stefan/net/Config.in
--- linux-2.5.41/net/Config.in Tue Oct 1 09:06:24 2002
+++ linux-2.5.41-stefan/net/Config.in Tue Oct 8 22:44:07 2002
@@ -82,6 +82,7 @@
tristate 'WAN router' CONFIG_WAN_ROUTER
bool 'Fast switching (read help!)' CONFIG_NET_FASTROUTE
bool 'Forwarding between high speed interfaces' CONFIG_NET_HW_FLOWCONTROL
+ bool 'Device link state notification (EXPERIMENTAL)' CONFIG_LINKWATCH
fi
mainmenu_option next_comment
diff -uNrX dontdiff linux-2.5.41/net/core/Makefile linux-2.5.41-stefan/net/core/Makefile
--- linux-2.5.41/net/core/Makefile Tue Oct 1 09:07:40 2002
+++ linux-2.5.41-stefan/net/core/Makefile Sun Oct 13 12:37:08 2002
@@ -21,4 +21,6 @@
# Ugly. I wish all wireless drivers were moved in drivers/net/wireless
obj-$(CONFIG_NET_PCMCIA_RADIO) += wireless.o
+obj-$(CONFIG_LINKWATCH) += link_watch.o
+
include $(TOPDIR)/Rules.make
diff -uNrX dontdiff linux-2.5.41/net/core/dev.c linux-2.5.41-stefan/net/core/dev.c
--- linux-2.5.41/net/core/dev.c Tue Oct 8 22:18:51 2002
+++ linux-2.5.41-stefan/net/core/dev.c Sun Oct 13 14:00:55 2002
@@ -198,7 +198,6 @@
int netdev_fastroute_obstacles;
#endif
-
/*******************************************************************************
Protocol management and registration routines
@@ -261,6 +260,9 @@
br_write_unlock_bh(BR_NETPROTO_LOCK);
}
+#ifdef CONFIG_LINKWATCH
+void linkwatch_run_queue(void);
+#endif
/**
* dev_remove_pack - remove packet handler
@@ -2017,7 +2019,7 @@
IFF_RUNNING)) |
(dev->gflags & (IFF_PROMISC |
IFF_ALLMULTI));
- if (netif_running(dev) && netif_carrier_ok(dev))
+ if (netif_running(dev) && netif_operstate_to_iff_running(dev))
ifr->ifr_flags |= IFF_RUNNING;
return 0;
@@ -2432,6 +2434,10 @@
goto out;
#endif /* CONFIG_NET_DIVERT */
+ /* Initial operstate */
+ dev->operstate_lock = RW_LOCK_UNLOCKED;
+ dev->operstate = NETDEV_OPER_UNKNOWN;
+
dev->iflink = -1;
/* Init, if this function is available */
@@ -2457,13 +2463,6 @@
if (!dev->rebuild_header)
dev->rebuild_header = default_rebuild_header;
- /*
- * Default initial state at registry is that the
- * device is present.
- */
-
- set_bit(__LINK_STATE_PRESENT, &dev->state);
-
dev->next = NULL;
dev_init_scheduler(dev);
write_lock_bh(&dev_base_lock);
@@ -2592,6 +2591,18 @@
free_divert_blk(dev);
#endif
+#ifdef CONFIG_LINKWATCH
+ if (test_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) {
+ /* We must not have linkwatch events pending
+ * on unregister. If this happens, we simply
+ * run the queue unscheduled, resulting in a
+ * noop for this device
+ */
+ linkwatch_run_queue();
+ BUG_ON(test_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state));
+ }
+#endif
+
if (dev->features & NETIF_F_DYNALLOC) {
#ifdef NET_REFCNT_DEBUG
if (atomic_read(&dev->refcnt) != 1)
@@ -2735,6 +2746,8 @@
#ifdef CONFIG_NET_FASTROUTE
dev->fastpath_lock = RW_LOCK_UNLOCKED;
#endif
+ dev->operstate_lock = RW_LOCK_UNLOCKED;
+ dev->operstate = NETDEV_OPER_UNKNOWN;
dev->xmit_lock_owner = -1;
dev->iflink = -1;
dev_hold(dev);
@@ -2767,7 +2780,6 @@
if (!dev->rebuild_header)
dev->rebuild_header = default_rebuild_header;
dev_init_scheduler(dev);
- set_bit(__LINK_STATE_PRESENT, &dev->state);
}
}
@@ -2848,3 +2860,5 @@
return call_usermodehelper(argv [0], argv, envp);
}
#endif
+
+
diff -uNrX dontdiff linux-2.5.41/net/core/link_watch.c linux-2.5.41-stefan/net/core/link_watch.c
--- linux-2.5.41/net/core/link_watch.c Thu Jan 1 01:00:00 1970
+++ linux-2.5.41-stefan/net/core/link_watch.c Sun Oct 13 13:59:23 2002
@@ -0,0 +1,115 @@
+/*
+ * Linux network device link state notifaction
+ *
+ * Author:
+ * Stefan Rompf <sux@isg.de>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/workqueue.h>
+#include <linux/config.h>
+#include <linux/netdevice.h>
+#include <linux/if.h>
+#include <linux/rtnetlink.h>
+#include <linux/jiffies.h>
+#include <linux/spinlock.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <asm/bitops.h>
+#include <asm/types.h>
+
+
+static unsigned long linkwatch_nowake = 0;
+static unsigned long linkwatch_nextevent = 0;
+
+static void linkwatch_event(void *dummy);
+static DECLARE_WORK(linkwatch_work, linkwatch_event, NULL);
+
+static LIST_HEAD(lweventlist);
+static spinlock_t lweventlist_lock = SPIN_LOCK_UNLOCKED;
+
+struct lw_event {
+ struct list_head list;
+ struct net_device *dev;
+};
+
+/* Must be called with the rtnl semaphore held */
+void linkwatch_run_queue(void) {
+ LIST_HEAD(head);
+ struct list_head *n, *next;
+
+ spin_lock_irq(&lweventlist_lock);
+ list_splice_init(&lweventlist, &head);
+ spin_unlock_irq(&lweventlist_lock);
+
+ list_for_each_safe(n, next, &head) {
+ struct lw_event *event = list_entry(n, struct lw_event, list);
+ struct net_device *dev = event->dev;
+
+ kfree(event);
+ /* We are about to handle this device,
+ * so new events can be accepted
+ */
+ clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
+
+ if (dev->flags & IFF_UP) {
+ netdev_state_change(dev);
+ }
+
+ dev_put(dev);
+ }
+}
+
+
+static void linkwatch_event(void *dummy)
+{
+ /* Limit the number of linkwatch events to one
+ * per second so that a runaway driver does not
+ * cause a storm of messages on the netlink
+ * socket
+ */
+ linkwatch_nextevent = jiffies + HZ;
+ clear_bit(0, &linkwatch_nowake);
+
+ rtnl_lock();
+ linkwatch_run_queue();
+ rtnl_unlock();
+}
+
+
+void linkwatch_fire_event(struct net_device *dev)
+{
+ if (!test_and_set_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) {
+ unsigned long flags;
+ struct lw_event *event = kmalloc(sizeof(struct lw_event), GFP_ATOMIC);
+
+ if (unlikely(event == NULL)) {
+ clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
+ return;
+ }
+
+ dev_hold(dev);
+ event->dev = dev;
+
+ spin_lock_irqsave(&lweventlist_lock, flags);
+ list_add_tail(&event->list, &lweventlist);
+ spin_unlock_irqrestore(&lweventlist_lock, flags);
+
+ if (!test_and_set_bit(0, &linkwatch_nowake)) {
+ unsigned long thisevent = jiffies;
+
+ if (thisevent >= linkwatch_nextevent) {
+ schedule_work(&linkwatch_work);
+ } else {
+ schedule_delayed_work(&linkwatch_work, linkwatch_nextevent - thisevent);
+ }
+ }
+ }
+}
+
diff -uNrX dontdiff linux-2.5.41/net/core/rtnetlink.c linux-2.5.41-stefan/net/core/rtnetlink.c
--- linux-2.5.41/net/core/rtnetlink.c Tue Oct 1 09:07:57 2002
+++ linux-2.5.41-stefan/net/core/rtnetlink.c Sat Oct 12 14:27:43 2002
@@ -165,7 +165,7 @@
r->ifi_flags = dev->flags;
r->ifi_change = change;
- if (!netif_running(dev) || !netif_carrier_ok(dev))
+ if (!netif_running(dev) || !netif_operstate_to_iff_running(dev))
r->ifi_flags &= ~IFF_RUNNING;
else
r->ifi_flags |= IFF_RUNNING;
diff -uNrX dontdiff linux-2.5.41/net/netsyms.c linux-2.5.41-stefan/net/netsyms.c
--- linux-2.5.41/net/netsyms.c Tue Oct 8 22:18:53 2002
+++ linux-2.5.41-stefan/net/netsyms.c Sun Oct 13 13:27:40 2002
@@ -596,4 +596,8 @@
EXPORT_SYMBOL(wireless_send_event);
#endif /* CONFIG_NET_RADIO || CONFIG_NET_PCMCIA_RADIO */
+#ifdef CONFIG_LINKWATCH
+EXPORT_SYMBOL(linkwatch_fire_event);
+#endif
+
#endif /* CONFIG_NET */
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-13 12:48 ` Stefan Rompf
@ 2002-10-13 14:04 ` jamal
2002-10-15 9:53 ` Stefan Rompf
0 siblings, 1 reply; 24+ messages in thread
From: jamal @ 2002-10-13 14:04 UTC (permalink / raw)
To: Stefan Rompf; +Cc: netdev
Stefan,
This is looking good; some small nitpicks:
-where do you set the netlink state change, ifi_change etc?
I know we are waiting for Alexey to respond but how do you propagate
iff_up -> down and the cause fatale to user space?
-Do you really have to malloc and free everytime for that lw_event?
Dave, Alexey,
It's your call now to dissect it's maintainability; i am happy with it
when Stefan addresses the above nitpicks.
On Sun, 13 Oct 2002, Stefan Rompf wrote:
> While doing tests with a hacked vlan driver that creates
> NETDEV_OPER_LOWERDOWN/_UP events I found that I get a "No buffer space
> available" in ip monitor if the event list is longer than about 20
> entries. This can be worked around with setsockopt on SO_RCVBUF, but
> does anyone have a clue why netlink events are that expensive?
Take a look at the way memory is allocated in that area and you'll see it.
May i suggest thats another fire that may need to be put out at some
point? ;-> <hint, hint, wink, wink>
cheers,
jamal
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-12 14:09 ` jamal
@ 2002-10-13 19:14 ` kuznet
2002-10-13 20:30 ` jamal
0 siblings, 1 reply; 24+ messages in thread
From: kuznet @ 2002-10-13 19:14 UTC (permalink / raw)
To: jamal; +Cc: netdev
Hello!
> How about making use of ifi_change to extend them? Alexey, would this be
> proper use of ifi_change?
I did not understand the question.
Alexey
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-13 19:14 ` kuznet
@ 2002-10-13 20:30 ` jamal
2002-10-13 21:00 ` kuznet
0 siblings, 1 reply; 24+ messages in thread
From: jamal @ 2002-10-13 20:30 UTC (permalink / raw)
To: kuznet; +Cc: netdev
Look at RFC 2863 section 3.1.12; there is some description on operational
status. At the moment the only status is IFF_RUNNING in the ifi_flags.
So the question was could we use ifi_change to send the other pieces of
info (as per RFC 2863) and as implemented in Stefans patch?
If not, could we take advantage of that pad in the ifinfomsg? This should
not break any backward compatibility.
cheers,
jamal
On Sun, 13 Oct 2002 kuznet@ms2.inr.ac.ru wrote:
> Hello!
>
> > How about making use of ifi_change to extend them? Alexey, would this be
> > proper use of ifi_change?
>
> I did not understand the question.
>
> Alexey
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-13 20:30 ` jamal
@ 2002-10-13 21:00 ` kuznet
2002-10-13 21:34 ` jamal
0 siblings, 1 reply; 24+ messages in thread
From: kuznet @ 2002-10-13 21:00 UTC (permalink / raw)
To: jamal; +Cc: netdev
Hello!
> status. At the moment the only status is IFF_RUNNING in the ifi_flags.
> So the question was could we use ifi_change to send the other pieces of
> info
No, of course. The question is really strange. :-)
> If not, could we take advantage of that pad in the ifinfomsg?
ifi_flags has lots of spare space, 16 bits.
And the second: IFF_RUNNING seems to be enough. Their "dormant" and
"lowerLayerDown" are logically undistinuishable.
Alexey
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-13 21:00 ` kuznet
@ 2002-10-13 21:34 ` jamal
2002-10-13 22:04 ` kuznet
0 siblings, 1 reply; 24+ messages in thread
From: jamal @ 2002-10-13 21:34 UTC (permalink / raw)
To: kuznet; +Cc: netdev
On Mon, 14 Oct 2002 kuznet@ms2.inr.ac.ru wrote:
> Hello!
>
> > status. At the moment the only status is IFF_RUNNING in the ifi_flags.
> > So the question was could we use ifi_change to send the other pieces of
> > info
>
> No, of course. The question is really strange. :-)
>
> > If not, could we take advantage of that pad in the ifinfomsg?
>
> ifi_flags has lots of spare space, 16 bits.
>
Actually the extra flags are only valid when IFF_RUNNING is not set.
Maybe Stefan was pushing it to also want to flag tx operational failure ..
In any case please review his patch.
> And the second: IFF_RUNNING seems to be enough. Their "dormant" and
> "lowerLayerDown" are logically undistinuishable.
Some of those states are useless.
dormant may refer tothings like tunnel devices on top of physical
devices. Example that was given was a ipsec tunnel sending pings
periodically;
lowerLayerDown is when you have multiple phyical devices under a
aggregator like bonding or maybe even VLAN; in that case if one of the
physical devices underneath being down would imply "lowerLayerDown" flag
on the aggregagator device. A second query would reveal which of the
underneath devices is down.
cheers,
jamal
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-13 21:34 ` jamal
@ 2002-10-13 22:04 ` kuznet
2002-10-14 12:42 ` Stefan Rompf
2002-10-14 13:01 ` jamal
0 siblings, 2 replies; 24+ messages in thread
From: kuznet @ 2002-10-13 22:04 UTC (permalink / raw)
To: jamal; +Cc: netdev
Hello!
> Actually the extra flags are only valid when IFF_RUNNING is not set.
Even so...
Well, then I am inclined not to agree to give even one of those valuable
16 spare bits for this. :-)
I cannot imagine how much should I drink to consider states descibed
in this rfc as a valid abstraction. Device can be working and can be dead
by thousands of reasons. The best which I can propose is to show a string
somewhere in /proc (well, or as a _string_ attribute in RTM_NEWLINK),
explaining why device is not alive and let snmpd to translate this string
to these bogus states to generate traps.
Alexey
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-12 13:23 Patch: Idea for RFC2863 conform OperStatus Stefan Rompf
2002-10-12 13:13 ` jamal
2002-10-12 14:09 ` jamal
@ 2002-10-14 10:38 ` bert hubert
2002-10-14 11:16 ` Robert Olsson
2 siblings, 1 reply; 24+ messages in thread
From: bert hubert @ 2002-10-14 10:38 UTC (permalink / raw)
To: netdev
On Sat, Oct 12, 2002 at 03:23:53PM +0200, Stefan Rompf wrote:
> +CONFIG_LINKWATCH
> + When this option is enabled, the kernel will forward changes in the
> + operative ("RUNNING") state of an interface via the netlink socket.
> + This is most useful when running linux as a router.
I know people who would kill for this feature. So powers that be, please
consider merging.
Is there no quick way of getting all cards using the MII stuff to support
this in one go? mii-diag seems to work on most cards already?
Regards,
bert
--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-14 11:16 ` Robert Olsson
@ 2002-10-14 11:11 ` bert hubert
2002-10-14 11:50 ` Robert Olsson
0 siblings, 1 reply; 24+ messages in thread
From: bert hubert @ 2002-10-14 11:11 UTC (permalink / raw)
To: Robert Olsson; +Cc: netdev
On Mon, Oct 14, 2002 at 01:16:46PM +0200, Robert Olsson wrote:
> If you run a routing protocol BGP, OSPF etc it has it's own mechanism
> with timers for declaring neighbours/link partners down. But yes in some
> setups it can be useful -- with dampning.
I bet Zebra would like to use this. I also know people who want to use this
to change their own routes to, say, an alternate interface.
> PS. Have seen 100 kpps in a Linux production router now. :-)
With a bgp full view? 100kpps is not all that impressive in itself isn't it?
We benchmarked our nameserver at 120kpps recently and linux seemed
unimpressed :-)
Regards,
bert hubert
--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-14 10:38 ` bert hubert
@ 2002-10-14 11:16 ` Robert Olsson
2002-10-14 11:11 ` bert hubert
0 siblings, 1 reply; 24+ messages in thread
From: Robert Olsson @ 2002-10-14 11:16 UTC (permalink / raw)
To: bert hubert; +Cc: netdev
bert hubert writes:
> On Sat, Oct 12, 2002 at 03:23:53PM +0200, Stefan Rompf wrote:
>
> > +CONFIG_LINKWATCH
> > + When this option is enabled, the kernel will forward changes in the
> > + operative ("RUNNING") state of an interface via the netlink socket.
> > + This is most useful when running linux as a router.
>
> I know people who would kill for this feature. So powers that be, please
> consider merging.
>
> Is there no quick way of getting all cards using the MII stuff to support
> this in one go? mii-diag seems to work on most cards already?
Hello!
Well it has to be handled with somewhat caution too... Just a little link-flap
can cause massive network/routing changes.
If you run a routing protocol BGP, OSPF etc it has it's own mechanism with
timers for declaring neighbours/link partners down. But yes in some setups it
can be useful -- with dampning.
--ro
PS. Have seen 100 kpps in a Linux production router now. :-)
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-14 11:11 ` bert hubert
@ 2002-10-14 11:50 ` Robert Olsson
0 siblings, 0 replies; 24+ messages in thread
From: Robert Olsson @ 2002-10-14 11:50 UTC (permalink / raw)
To: bert hubert; +Cc: Robert Olsson, netdev
bert hubert writes:
> With a bgp full view? 100kpps is not all that impressive in itself isn't it?
> We benchmarked our nameserver at 120kpps recently and linux seemed
> unimpressed :-)
For me yes. :-) It connects one the major ftp sites... Box can handle 350
kpps w/o filter and connection tracking. With current traffic pattern we have
160 kpps for wire speed at GIGE.
Yes full BGP. Well even full BGP from 2 peers plus some handful prefixes from
other BGP peers.
ip route | wc -l
115670
Gated has 41 MB of BGP routes. With Zebra this would be ~75 MB.
ps aux
root 232 0.0 7.9 41392 40876 ? S Aug 15 36:54 gated
Cheers.
--ro
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-13 22:04 ` kuznet
@ 2002-10-14 12:42 ` Stefan Rompf
2002-10-14 13:11 ` jamal
2002-10-14 13:38 ` jamal
2002-10-14 13:01 ` jamal
1 sibling, 2 replies; 24+ messages in thread
From: Stefan Rompf @ 2002-10-14 12:42 UTC (permalink / raw)
To: kuznet, jamal; +Cc: netdev
Alexey, Jamal,
the first version of this patch just added the feature to distribute
state changes created by netif_carrier_on()/_off() inside the kernel and
to netlink via netdev_state_change(). This is still the part most
important to me - not only in 2.5, but also as a 2.4 backport.
I don't think RFC2863 state keeping makes much sense if we cannot
forward the results to userspace. So if you are happy with keeping the
current semantics, but using the more sophisticated implementation of
forwarding worked out last weekend, I'll be happy to provide a new patch
(also adressing Jamal's nitpicks). Just agree on one way.
Cheers, Stefan
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-13 22:04 ` kuznet
2002-10-14 12:42 ` Stefan Rompf
@ 2002-10-14 13:01 ` jamal
1 sibling, 0 replies; 24+ messages in thread
From: jamal @ 2002-10-14 13:01 UTC (permalink / raw)
To: kuznet; +Cc: netdev
On Mon, 14 Oct 2002 kuznet@ms2.inr.ac.ru wrote:
> Hello!
>
> > Actually the extra flags are only valid when IFF_RUNNING is not set.
>
> Even so...
>
> Well, then I am inclined not to agree to give even one of those valuable
> 16 spare bits for this. :-)
>
> I cannot imagine how much should I drink to consider states descibed
> in this rfc as a valid abstraction. Device can be working and can be dead
> by thousands of reasons. The best which I can propose is to show a string
> somewhere in /proc (well, or as a _string_ attribute in RTM_NEWLINK),
> explaining why device is not alive and let snmpd to translate this string
> to these bogus states to generate traps.
;-> Believe it or not people use these things to draw nice GUI
representations (search for lowerLayerDown at cisco for example);
how we represent them shouldnt matter whether its via ifi_flags, /proc
etc. Infact it should also be fine if we dont propagate them to user space
for now, but the abstraction should stay in the kernel at least.
cheers,
jamal
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-14 12:42 ` Stefan Rompf
@ 2002-10-14 13:11 ` jamal
2002-10-14 13:38 ` jamal
1 sibling, 0 replies; 24+ messages in thread
From: jamal @ 2002-10-14 13:11 UTC (permalink / raw)
To: Stefan Rompf; +Cc: kuznet, netdev
On Mon, 14 Oct 2002, Stefan Rompf wrote:
> Alexey, Jamal,
>
> the first version of this patch just added the feature to distribute
> state changes created by netif_carrier_on()/_off() inside the kernel and
> to netlink via netdev_state_change(). This is still the part most
> important to me - not only in 2.5, but also as a 2.4 backport.
>
> I don't think RFC2863 state keeping makes much sense if we cannot
> forward the results to userspace. So if you are happy with keeping the
> current semantics, but using the more sophisticated implementation of
> forwarding worked out last weekend, I'll be happy to provide a new patch
> (also adressing Jamal's nitpicks). Just agree on one way.
the RFC2863 semantics should stay; the thing we havent agreed on is
how to do it. Maybe for the first phase patch you dont need to expose them
to user space - maybe in the way Alexey suggested as attributes without
loosing any of the flag bits...
Alexey should make that call.
cheers,
jamal
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-14 12:42 ` Stefan Rompf
2002-10-14 13:11 ` jamal
@ 2002-10-14 13:38 ` jamal
2002-10-14 18:14 ` Stefan Rompf
1 sibling, 1 reply; 24+ messages in thread
From: jamal @ 2002-10-14 13:38 UTC (permalink / raw)
To: Stefan Rompf; +Cc: kuznet, netdev
If you think about it NOTPRESENT was always there even without these
changes but was never exposed. PRESENT will always somehow get mapped
to IFF_RUNNING (0/1).
- NOTPRESENT makes sense more from a NMS pov where a hotplug device has
been removed and SNMP finds that the device is no longer there. This
can be done without any state from the kernel being exposed.
- UNKNOWN is when the NMS cant find the state of the device - maybe when
they cant reach us. Again we dont need to expose this from the kernel and
if it is not useful in the kernel perhaps should be deleted.
The remainder seem to make more sense to me.
thoughts?
cheers,
jamal
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-14 13:38 ` jamal
@ 2002-10-14 18:14 ` Stefan Rompf
2002-10-14 18:55 ` David Brownell
0 siblings, 1 reply; 24+ messages in thread
From: Stefan Rompf @ 2002-10-14 18:14 UTC (permalink / raw)
To: jamal; +Cc: kuznet, netdev, david-b
Hi Jamal,
> - NOTPRESENT makes sense more from a NMS pov where a hotplug device has
> been removed and SNMP finds that the device is no longer there. This
> can be done without any state from the kernel being exposed.
I want to forward this question mainly to the USB people. David, as you
are familiar with this thread, how does an USB driver react when an
ethernet device is configured, running and then disconnected from the
USB port? From RFC2863, I'd expect the devicename ethx not to be removed
at least until it is ifconfigured down, but change from UP to NOTPRESENT
until reconnection. So it would be useful to have this state inside the
kernel.
> - UNKNOWN is when the NMS cant find the state of the device - maybe when
> they cant reach us. Again we dont need to expose this from the kernel and
> if it is not useful in the kernel perhaps should be deleted.
We have two different views here: For an external NMS, it makes perfect
sense to change devices of a host from whatever to UNKWOWN if the host
becomes unreachable. For the host itself, UNKNOWN can also be a driver
that does not know about link state detection.
I opt against removal of these states.
Cheers, Stefan
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-14 18:14 ` Stefan Rompf
@ 2002-10-14 18:55 ` David Brownell
2002-10-14 19:03 ` David Brownell
0 siblings, 1 reply; 24+ messages in thread
From: David Brownell @ 2002-10-14 18:55 UTC (permalink / raw)
To: Stefan Rompf; +Cc: jamal, kuznet, netdev
>>- NOTPRESENT makes sense more from a NMS pov where a hotplug device has
>>been removed and SNMP finds that the device is no longer there. This
>>can be done without any state from the kernel being exposed.
>
>
> I want to forward this question mainly to the USB people. David, as you
> are familiar with this thread, how does an USB driver react when an
> ethernet device is configured, running and then disconnected from the
> USB port? From RFC2863, I'd expect the devicename ethx not to be removed
> at least until it is ifconfigured down, but change from UP to NOTPRESENT
> until reconnection. So it would be useful to have this state inside the
> kernel.
Right now almost all USB networking drivers make the device disappear
immediately ... certainly the ones that have made sure they handle the
USB disconnect processing correctly (no more I/O to that device!) tend
to do that, by calling unregister_netdev().
And they don't try reconnection. The goal of disconnect processing has
been to scrub out all the relevant device state; only the usb-storage
driver tries to keep a history of devices that have ever been attached,
so it can make "/dev/sdg" mean the same thing until reboot.
I wouldn't swear that unregistering is the perfect solution, but it's
clearly better than the oopses that tended to show up previously.
(And oddly enough, the drivers that _don't_ unregister seem to be the
ones that still have an EXPERIMENTAL label in the 2.5 kernels.)
- Dave
(Note that I'm just skimming this thread at the moment, limited time...
good that you cc'd me directly.)
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-14 18:55 ` David Brownell
@ 2002-10-14 19:03 ` David Brownell
0 siblings, 0 replies; 24+ messages in thread
From: David Brownell @ 2002-10-14 19:03 UTC (permalink / raw)
To: Stefan Rompf, jamal; +Cc: kuznet, netdev
> Right now almost all USB networking drivers make the device disappear
> immediately ... certainly the ones that have made sure they handle the
> USB disconnect processing correctly (no more I/O to that device!) tend
> to do that, by calling unregister_netdev().
... which would let Linux implement a NotPresent(ifname) test
just by seeing if it's registered, unless that RFC demands some
more abstruse meaning. (I've not read it lately.)
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-13 14:04 ` jamal
@ 2002-10-15 9:53 ` Stefan Rompf
2002-10-16 2:49 ` jamal
0 siblings, 1 reply; 24+ messages in thread
From: Stefan Rompf @ 2002-10-15 9:53 UTC (permalink / raw)
To: jamal; +Cc: netdev
[-- Attachment #1: Type: text/plain, Size: 658 bytes --]
Hi Jamal,
attached is the latest version of the patch. Changes:
-Try to use a static struct lw_event for an event. For systems without
slave devices, this will avoid memory allocation in most cases. But,
adding code and data it permanently takes as much memory as about ten of
the additional pointers you didn't want to have in the net_device
structure ;-)
-moved the event queue flushing in unregister_netdev() down some lines
so that it is not attempted for new style devices with destructor.
According to Alexey not wanting to expand the netlink message, the only
result of this patch visible to userspace is the IFF_RUNNING emulation.
Cheers, Stefan
[-- Attachment #2: patch-rfc2863-2.5.41-3 --]
[-- Type: text/plain, Size: 12478 bytes --]
diff -uNrX dontdiff linux-2.5.41/include/linux/netdevice.h linux-2.5.41-stefan/include/linux/netdevice.h
--- linux-2.5.41/include/linux/netdevice.h Tue Oct 8 22:18:50 2002
+++ linux-2.5.41-stefan/include/linux/netdevice.h Sun Oct 13 12:47:13 2002
@@ -204,10 +204,23 @@
{
__LINK_STATE_XOFF=0,
__LINK_STATE_START,
- __LINK_STATE_PRESENT,
+ __LINK_STATE_PRESENT_OBSOLETE,
__LINK_STATE_SCHED,
- __LINK_STATE_NOCARRIER,
- __LINK_STATE_RX_SCHED
+ __LINK_STATE_NOCARRIER_OBSOLETE,
+ __LINK_STATE_RX_SCHED,
+ __LINK_STATE_LINKWATCH_PENDING
+};
+
+
+/* Device operative state as per RFC2863 */
+enum netdev_operstate_t {
+ NETDEV_OPER_UP = 1,
+ NETDEV_OPER_DOWN, /* Obsoletes LINK_STATE_NOCARRIER */
+ NETDEV_OPER_TESTING,
+ NETDEV_OPER_UNKNOWN,
+ NETDEV_OPER_DORMANT,
+ NETDEV_OPER_NOTPRESENT, /* Obsoletes !LINK_STATE_PRESENT */
+ NETDEV_OPER_LOWERDOWN
};
@@ -308,6 +321,10 @@
* which this device is member of.
*/
+ /* Operative state, access semaphore */
+ rwlock_t operstate_lock;
+ unsigned char operstate;
+
/* Interface address info. */
unsigned char broadcast[MAX_ADDR_LEN]; /* hw bcast add */
unsigned char dev_addr[MAX_ADDR_LEN]; /* hw address */
@@ -631,34 +648,76 @@
* who is responsible for serialization of these calls.
*/
+#ifdef CONFIG_LINKWATCH
+extern void linkwatch_fire_event(struct net_device *dev);
+#endif
+
+static inline unsigned char netif_set_operstate(struct net_device *dev, unsigned char newstate)
+{
+ unsigned long flags;
+ unsigned char oldstate;
+
+ write_lock_irqsave(&dev->operstate_lock, flags);
+ oldstate = dev->operstate;
+ dev->operstate = newstate;
+ write_unlock_irqrestore(&dev->operstate_lock, flags);
+
+#ifdef CONFIG_LINKWATCH
+ if (oldstate != newstate) linkwatch_fire_event(dev);
+#endif
+
+ return oldstate;
+}
+
+static inline unsigned char netif_get_operstate(struct net_device *dev)
+{
+ unsigned long flags;
+ unsigned char state;
+
+ read_lock_irqsave(&dev->operstate_lock, flags);
+ state = dev->operstate;
+ read_unlock_irqrestore(&dev->operstate_lock, flags);
+
+ return state;
+}
+
static inline int netif_carrier_ok(struct net_device *dev)
{
- return !test_bit(__LINK_STATE_NOCARRIER, &dev->state);
+ return netif_get_operstate(dev) != NETDEV_OPER_UP;
+}
+
+static inline int netif_operstate_to_iff_running(struct net_device *dev)
+{
+ unsigned char state = netif_get_operstate(dev);
+
+ return((1 << state) &
+ (1 << NETDEV_OPER_UP | 1 << NETDEV_OPER_UNKNOWN));
}
extern void __netdev_watchdog_up(struct net_device *dev);
+
static inline void netif_carrier_on(struct net_device *dev)
{
- clear_bit(__LINK_STATE_NOCARRIER, &dev->state);
+ netif_set_operstate(dev, NETDEV_OPER_UP);
if (netif_running(dev))
__netdev_watchdog_up(dev);
}
static inline void netif_carrier_off(struct net_device *dev)
{
- set_bit(__LINK_STATE_NOCARRIER, &dev->state);
+ netif_set_operstate(dev, NETDEV_OPER_DOWN);
}
/* Hot-plugging. */
static inline int netif_device_present(struct net_device *dev)
{
- return test_bit(__LINK_STATE_PRESENT, &dev->state);
+ return netif_get_operstate(dev) != NETDEV_OPER_NOTPRESENT;
}
static inline void netif_device_detach(struct net_device *dev)
{
- if (test_and_clear_bit(__LINK_STATE_PRESENT, &dev->state) &&
+ if (netif_set_operstate(dev, NETDEV_OPER_NOTPRESENT) != NETDEV_OPER_NOTPRESENT &&
netif_running(dev)) {
netif_stop_queue(dev);
}
@@ -666,7 +725,7 @@
static inline void netif_device_attach(struct net_device *dev)
{
- if (!test_and_set_bit(__LINK_STATE_PRESENT, &dev->state) &&
+ if (netif_set_operstate(dev, NETDEV_OPER_UNKNOWN) == NETDEV_OPER_NOTPRESENT &&
netif_running(dev)) {
netif_wake_queue(dev);
__netdev_watchdog_up(dev);
diff -uNrX dontdiff linux-2.5.41/net/Config.help linux-2.5.41-stefan/net/Config.help
--- linux-2.5.41/net/Config.help Tue Oct 1 09:06:18 2002
+++ linux-2.5.41-stefan/net/Config.help Sat Oct 12 00:56:59 2002
@@ -472,6 +472,17 @@
However, do not say Y here if you did not experience any serious
problems.
+CONFIG_LINKWATCH
+ When this option is enabled, the kernel will forward changes in the
+ operative ("RUNNING") state of an interface via the netlink socket.
+ This is most useful when running linux as a router.
+
+ Note that currently not many drivers support this, compliant ones
+ can be found by watching the the RUNNING flag in ifconfig output
+ that should follow operative state.
+
+ If unsure, say 'N'.
+
CONFIG_NET_SCHED
When the kernel has several packets to send out over a network
device, it has to decide which ones to send first, which ones to
diff -uNrX dontdiff linux-2.5.41/net/Config.in linux-2.5.41-stefan/net/Config.in
--- linux-2.5.41/net/Config.in Tue Oct 1 09:06:24 2002
+++ linux-2.5.41-stefan/net/Config.in Tue Oct 8 22:44:07 2002
@@ -82,6 +82,7 @@
tristate 'WAN router' CONFIG_WAN_ROUTER
bool 'Fast switching (read help!)' CONFIG_NET_FASTROUTE
bool 'Forwarding between high speed interfaces' CONFIG_NET_HW_FLOWCONTROL
+ bool 'Device link state notification (EXPERIMENTAL)' CONFIG_LINKWATCH
fi
mainmenu_option next_comment
diff -uNrX dontdiff linux-2.5.41/net/core/Makefile linux-2.5.41-stefan/net/core/Makefile
--- linux-2.5.41/net/core/Makefile Tue Oct 1 09:07:40 2002
+++ linux-2.5.41-stefan/net/core/Makefile Sun Oct 13 12:37:08 2002
@@ -21,4 +21,6 @@
# Ugly. I wish all wireless drivers were moved in drivers/net/wireless
obj-$(CONFIG_NET_PCMCIA_RADIO) += wireless.o
+obj-$(CONFIG_LINKWATCH) += link_watch.o
+
include $(TOPDIR)/Rules.make
diff -uNrX dontdiff linux-2.5.41/net/core/dev.c linux-2.5.41-stefan/net/core/dev.c
--- linux-2.5.41/net/core/dev.c Tue Oct 8 22:18:51 2002
+++ linux-2.5.41-stefan/net/core/dev.c Mon Oct 14 23:00:00 2002
@@ -198,7 +198,6 @@
int netdev_fastroute_obstacles;
#endif
-
/*******************************************************************************
Protocol management and registration routines
@@ -261,6 +260,9 @@
br_write_unlock_bh(BR_NETPROTO_LOCK);
}
+#ifdef CONFIG_LINKWATCH
+void linkwatch_run_queue(void);
+#endif
/**
* dev_remove_pack - remove packet handler
@@ -2017,7 +2019,7 @@
IFF_RUNNING)) |
(dev->gflags & (IFF_PROMISC |
IFF_ALLMULTI));
- if (netif_running(dev) && netif_carrier_ok(dev))
+ if (netif_running(dev) && netif_operstate_to_iff_running(dev))
ifr->ifr_flags |= IFF_RUNNING;
return 0;
@@ -2432,6 +2434,10 @@
goto out;
#endif /* CONFIG_NET_DIVERT */
+ /* Initial operstate */
+ dev->operstate_lock = RW_LOCK_UNLOCKED;
+ dev->operstate = NETDEV_OPER_UNKNOWN;
+
dev->iflink = -1;
/* Init, if this function is available */
@@ -2457,13 +2463,6 @@
if (!dev->rebuild_header)
dev->rebuild_header = default_rebuild_header;
- /*
- * Default initial state at registry is that the
- * device is present.
- */
-
- set_bit(__LINK_STATE_PRESENT, &dev->state);
-
dev->next = NULL;
dev_init_scheduler(dev);
write_lock_bh(&dev_base_lock);
@@ -2641,6 +2640,17 @@
/* Rebroadcast unregister notification */
notifier_call_chain(&netdev_chain,
NETDEV_UNREGISTER, dev);
+
+#ifdef CONFIG_LINKWATCH
+ if (test_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) {
+ /* We must not have linkwatch events pending
+ * on unregister. If this happens, we simply
+ * run the queue unscheduled, resulting in a
+ * noop for this device
+ */
+ linkwatch_run_queue();
+ }
+#endif
}
current->state = TASK_INTERRUPTIBLE;
schedule_timeout(HZ / 4);
@@ -2735,6 +2745,8 @@
#ifdef CONFIG_NET_FASTROUTE
dev->fastpath_lock = RW_LOCK_UNLOCKED;
#endif
+ dev->operstate_lock = RW_LOCK_UNLOCKED;
+ dev->operstate = NETDEV_OPER_UNKNOWN;
dev->xmit_lock_owner = -1;
dev->iflink = -1;
dev_hold(dev);
@@ -2767,7 +2779,6 @@
if (!dev->rebuild_header)
dev->rebuild_header = default_rebuild_header;
dev_init_scheduler(dev);
- set_bit(__LINK_STATE_PRESENT, &dev->state);
}
}
@@ -2848,3 +2859,5 @@
return call_usermodehelper(argv [0], argv, envp);
}
#endif
+
+
diff -uNrX dontdiff linux-2.5.41/net/core/link_watch.c linux-2.5.41-stefan/net/core/link_watch.c
--- linux-2.5.41/net/core/link_watch.c Thu Jan 1 01:00:00 1970
+++ linux-2.5.41-stefan/net/core/link_watch.c Mon Oct 14 22:51:02 2002
@@ -0,0 +1,134 @@
+/*
+ * Linux network device link state notifaction
+ *
+ * Author:
+ * Stefan Rompf <sux@isg.de>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ */
+
+#include <linux/workqueue.h>
+#include <linux/config.h>
+#include <linux/netdevice.h>
+#include <linux/if.h>
+#include <linux/rtnetlink.h>
+#include <linux/jiffies.h>
+#include <linux/spinlock.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+#include <asm/bitops.h>
+#include <asm/types.h>
+
+
+enum lw_bits {
+ LW_RUNNING = 0,
+ LW_SE_USED
+};
+
+static unsigned long linkwatch_flags = 0;
+static unsigned long linkwatch_nextevent = 0;
+
+static void linkwatch_event(void *dummy);
+static DECLARE_WORK(linkwatch_work, linkwatch_event, NULL);
+
+static LIST_HEAD(lweventlist);
+static spinlock_t lweventlist_lock = SPIN_LOCK_UNLOCKED;
+
+struct lw_event {
+ struct list_head list;
+ struct net_device *dev;
+};
+
+/* Avoid kmalloc() for most systems */
+struct lw_event singleevent;
+
+/* Must be called with the rtnl semaphore held */
+void linkwatch_run_queue(void) {
+ LIST_HEAD(head);
+ struct list_head *n, *next;
+
+ spin_lock_irq(&lweventlist_lock);
+ list_splice_init(&lweventlist, &head);
+ spin_unlock_irq(&lweventlist_lock);
+
+ list_for_each_safe(n, next, &head) {
+ struct lw_event *event = list_entry(n, struct lw_event, list);
+ struct net_device *dev = event->dev;
+
+ if (event == &singleevent) {
+ clear_bit(LW_SE_USED, &linkwatch_flags);
+ } else {
+ kfree(event);
+ }
+
+ /* We are about to handle this device,
+ * so new events can be accepted
+ */
+ clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
+
+ if (dev->flags & IFF_UP) {
+ netdev_state_change(dev);
+ }
+
+ dev_put(dev);
+ }
+}
+
+
+static void linkwatch_event(void *dummy)
+{
+ /* Limit the number of linkwatch events to one
+ * per second so that a runaway driver does not
+ * cause a storm of messages on the netlink
+ * socket
+ */
+ linkwatch_nextevent = jiffies + HZ;
+ clear_bit(LW_RUNNING, &linkwatch_flags);
+
+ rtnl_lock();
+ linkwatch_run_queue();
+ rtnl_unlock();
+}
+
+
+void linkwatch_fire_event(struct net_device *dev)
+{
+ if (!test_and_set_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state)) {
+ unsigned long flags;
+ struct lw_event *event;
+
+ if (test_and_set_bit(LW_SE_USED, &linkwatch_flags)) {
+ event = kmalloc(sizeof(struct lw_event), GFP_ATOMIC);
+
+ if (unlikely(event == NULL)) {
+ clear_bit(__LINK_STATE_LINKWATCH_PENDING, &dev->state);
+ return;
+ }
+ } else {
+ event = &singleevent;
+ }
+
+ dev_hold(dev);
+ event->dev = dev;
+
+ spin_lock_irqsave(&lweventlist_lock, flags);
+ list_add_tail(&event->list, &lweventlist);
+ spin_unlock_irqrestore(&lweventlist_lock, flags);
+
+ if (!test_and_set_bit(LW_RUNNING, &linkwatch_flags)) {
+ unsigned long thisevent = jiffies;
+
+ if (thisevent >= linkwatch_nextevent) {
+ schedule_work(&linkwatch_work);
+ } else {
+ schedule_delayed_work(&linkwatch_work, linkwatch_nextevent - thisevent);
+ }
+ }
+ }
+}
+
diff -uNrX dontdiff linux-2.5.41/net/core/rtnetlink.c linux-2.5.41-stefan/net/core/rtnetlink.c
--- linux-2.5.41/net/core/rtnetlink.c Tue Oct 1 09:07:57 2002
+++ linux-2.5.41-stefan/net/core/rtnetlink.c Sat Oct 12 14:27:43 2002
@@ -165,7 +165,7 @@
r->ifi_flags = dev->flags;
r->ifi_change = change;
- if (!netif_running(dev) || !netif_carrier_ok(dev))
+ if (!netif_running(dev) || !netif_operstate_to_iff_running(dev))
r->ifi_flags &= ~IFF_RUNNING;
else
r->ifi_flags |= IFF_RUNNING;
diff -uNrX dontdiff linux-2.5.41/net/netsyms.c linux-2.5.41-stefan/net/netsyms.c
--- linux-2.5.41/net/netsyms.c Tue Oct 8 22:18:53 2002
+++ linux-2.5.41-stefan/net/netsyms.c Sun Oct 13 13:27:40 2002
@@ -596,4 +596,8 @@
EXPORT_SYMBOL(wireless_send_event);
#endif /* CONFIG_NET_RADIO || CONFIG_NET_PCMCIA_RADIO */
+#ifdef CONFIG_LINKWATCH
+EXPORT_SYMBOL(linkwatch_fire_event);
+#endif
+
#endif /* CONFIG_NET */
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-15 9:53 ` Stefan Rompf
@ 2002-10-16 2:49 ` jamal
2002-10-21 21:38 ` Stefan Rompf
0 siblings, 1 reply; 24+ messages in thread
From: jamal @ 2002-10-16 2:49 UTC (permalink / raw)
To: Stefan Rompf; +Cc: netdev
On Tue, 15 Oct 2002, Stefan Rompf wrote:
> Hi Jamal,
>
> attached is the latest version of the patch. Changes:
>
> -Try to use a static struct lw_event for an event. For systems without
> slave devices, this will avoid memory allocation in most cases. But,
> adding code and data it permanently takes as much memory as about ten of
> the additional pointers you didn't want to have in the net_device
> structure ;-)
>
That lw_event is still bothering me ;-> But i dont wanna push it any
further ;-> I think we got some good stuff. Dave and Alexey should make
the call now.
cheers,
jamal
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Patch: Idea for RFC2863 conform OperStatus
2002-10-16 2:49 ` jamal
@ 2002-10-21 21:38 ` Stefan Rompf
0 siblings, 0 replies; 24+ messages in thread
From: Stefan Rompf @ 2002-10-21 21:38 UTC (permalink / raw)
To: davem, kuznet; +Cc: netdev
Hi David, Hi Alexey,
jamal wrote:
> That lw_event is still bothering me ;-> But i dont wanna push it any
> further ;-> I think we got some good stuff. Dave and Alexey should make
> the call now.
did you already have a chance to look at the latest patch?
Cheers, Stefan
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2002-10-21 21:38 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-12 13:23 Patch: Idea for RFC2863 conform OperStatus Stefan Rompf
2002-10-12 13:13 ` jamal
2002-10-13 12:48 ` Stefan Rompf
2002-10-13 14:04 ` jamal
2002-10-15 9:53 ` Stefan Rompf
2002-10-16 2:49 ` jamal
2002-10-21 21:38 ` Stefan Rompf
2002-10-12 14:09 ` jamal
2002-10-13 19:14 ` kuznet
2002-10-13 20:30 ` jamal
2002-10-13 21:00 ` kuznet
2002-10-13 21:34 ` jamal
2002-10-13 22:04 ` kuznet
2002-10-14 12:42 ` Stefan Rompf
2002-10-14 13:11 ` jamal
2002-10-14 13:38 ` jamal
2002-10-14 18:14 ` Stefan Rompf
2002-10-14 18:55 ` David Brownell
2002-10-14 19:03 ` David Brownell
2002-10-14 13:01 ` jamal
2002-10-14 10:38 ` bert hubert
2002-10-14 11:16 ` Robert Olsson
2002-10-14 11:11 ` bert hubert
2002-10-14 11:50 ` Robert Olsson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).