Netdev List
 help / color / mirror / Atom feed
* [RFC v2 1/2] net/hsr: Add support for IEC 62439-3 High-availability Seamless Redundancy
From: Arvid Brodin @ 2012-07-04  0:12 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: Stephen Hemminger, Alexey Kuznetsov, Javier Boticario,
	Bruno Ferreira

The kernel patch.

 Documentation/networking/hsr/hsr_genl.c |  213 +++++++++++++
 include/linux/if_ether.h                |    1 +
 include/linux/if_link.h                 |   11 +
 net/Kconfig                             |    1 +
 net/Makefile                            |    1 +
 net/hsr/Kconfig                         |   84 +++++
 net/hsr/Makefile                        |    7 +
 net/hsr/hsr_device.c                    |  531 +++++++++++++++++++++++++++++++
 net/hsr/hsr_device.h                    |   27 ++
 net/hsr/hsr_framereg.c                  |  328 +++++++++++++++++++
 net/hsr/hsr_framereg.h                  |   54 ++++
 net/hsr/hsr_main.c                      |  411 ++++++++++++++++++++++++
 net/hsr/hsr_netlink.c                   |  293 +++++++++++++++++
 net/hsr/hsr_netlink.h                   |   64 ++++
 net/hsr/hsr_private.h                   |  114 +++++++
 15 files changed, 2140 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/networking/hsr/hsr_genl.c
 create mode 100644 net/hsr/Kconfig
 create mode 100644 net/hsr/Makefile
 create mode 100644 net/hsr/hsr_device.c
 create mode 100644 net/hsr/hsr_device.h
 create mode 100644 net/hsr/hsr_framereg.c
 create mode 100644 net/hsr/hsr_framereg.h
 create mode 100644 net/hsr/hsr_main.c
 create mode 100644 net/hsr/hsr_netlink.c
 create mode 100644 net/hsr/hsr_netlink.h
 create mode 100644 net/hsr/hsr_private.h



diff --git a/Documentation/networking/hsr/hsr_genl.c b/Documentation/networking/hsr/hsr_genl.c
new file mode 100644
index 0000000..1319258
--- /dev/null
+++ b/Documentation/networking/hsr/hsr_genl.c
@@ -0,0 +1,213 @@
+/*
+ * Copyright 2011-2012 Autronica Fire and Security AS
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * Author(s):
+ *	2011-2012 Arvid Brodin, arvid.brodin@xdin.com
+ *
+ * Userspace example of using Generic Netlink (through libnl-3) to get HSR
+ * ("High-availability Seamless Redundancy") link/network status.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <netlink/netlink.h>
+#include <netlink/socket.h>
+#include <netlink/attr.h>
+#include <netlink/genl/genl.h>
+#include <netlink/genl/ctrl.h>
+#include "../../linux-next/net/hsr/hsr_netlink.h"
+
+
+static struct nla_policy hsr_genl_policy[HSR_A_MAX + 1] = {
+	[HSR_A_NODE_ADDR] = { .type = NLA_UNSPEC },
+	[HSR_A_IFINDEX] = { .type = NLA_U32 },
+	[HSR_A_IF1AGE] = { .type = NLA_U32 },
+	[HSR_A_IF2AGE] = { .type = NLA_U32 },
+};
+
+#define ETH_ALEN	6
+
+void print_mac(const unsigned char *addr)
+{
+	int i;
+
+	for (i = 0; i < ETH_ALEN; i++)
+		printf("%02x ", addr[i]);
+}
+
+
+int parse_genlmsg(struct nl_msg *msg, void *arg)
+{
+	struct nlattr *attrs[HSR_A_MAX + 1];
+	int rc;
+	struct genlmsghdr *hdr;
+	int i;
+
+	rc = genlmsg_parse(nlmsg_hdr(msg), 0, attrs, HSR_A_MAX, hsr_genl_policy);
+	if (rc < 0) {
+		printf("Error parsing genlmsg: %d\n", rc);
+		return rc;
+	}
+
+
+	/*
+	 * Extract command ID from "message" -> "netlink header" ->
+	 * "generic netlink header".
+	 *
+	 * These are the command enums used when creating a genl msg header
+	 * in the kernel with genlmsg_put().
+	 */
+	hdr = genlmsg_hdr(nlmsg_hdr(msg));
+
+	switch (hdr->cmd) {
+	case HSR_C_RING_ERROR:
+		printf("Ring error: \n");
+		break;
+	case HSR_C_NODE_DOWN:
+		printf("Node down: \n");
+		break;
+	case HSR_C_SET_NODE_STATUS:
+		printf("Node status: \n");
+		break;
+	default:
+		printf("Unknown genl message (%d)\n", hdr->cmd);
+	}
+
+
+	/*
+	 * Extract the attached data (the "attributes").
+	 */
+	for (i = 0; i < HSR_A_MAX + 1; i++)
+		if (attrs[i]) {
+			switch (attrs[i]->nla_type) {
+			case HSR_A_NODE_ADDR:
+				printf("    node address ");
+				print_mac(nla_data(attrs[i]));
+				printf("\n");
+				break;
+			case HSR_A_IFINDEX:
+				printf("    interface index %d\n", nla_get_u32(attrs[i]));
+				break;
+			case HSR_A_IF1AGE:
+				printf("    last frame over slave #1 %d ms ago\n", (int) nla_get_u32(attrs[i]));
+				break;
+			case HSR_A_IF2AGE:
+				printf("    last frame over slave #2 %d ms ago\n", (int) nla_get_u32(attrs[i]));
+				break;
+			default:
+				printf("    unknown attribute type: %d\n", attrs[i]->nla_type);
+			}
+		}
+
+	return 0;
+}
+
+/*
+ * Send a "simple" (header only) Generic Netlink message
+int query_link_status(int family)
+{
+	return (genl_send_simple(nlsk, family, HSR_C_GET_STATUS, 1, 0));
+}
+ */
+
+int query_node_status(struct nl_sock *nlsk, int family, int ifindex, const unsigned char
node_addr[ETH_ALEN])
+{
+	struct nl_msg *msg;
+	void *user_hdr;
+
+	msg = nlmsg_alloc();
+	if (!msg)
+		return -1;
+
+	user_hdr = genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, family,
+						0, 0, HSR_C_GET_NODE_STATUS, 1);
+	if (!user_hdr)
+		goto nla_put_failure;
+
+/*
+ * Query by interface name could be implemented in the kernel if needed:
+  	NLA_PUT_STRING(msg, HSR_A_IFNAME, ifname);
+ */
+	NLA_PUT_U32(msg, HSR_A_IFINDEX, ifindex);
+	NLA_PUT(msg, HSR_A_NODE_ADDR, ETH_ALEN, node_addr);
+
+	printf("Querying if %d for status of node ", ifindex);
+	print_mac(node_addr);
+	printf("\n");
+
+	return (nl_send_auto(nlsk, msg));
+
+nla_put_failure:
+	nlmsg_free(msg);
+	return -1;
+}
+
+
+int main()
+{
+	struct nl_sock *nlsk;
+	int hsr_mgroup;
+	int rc;
+
+	nlsk = nl_socket_alloc();
+	if (!nlsk) {
+		printf("nl_socket_alloc() failed\n");
+		return EXIT_FAILURE;
+	}
+	nl_socket_disable_seq_check(nlsk);
+	nl_socket_modify_cb(nlsk, NL_CB_VALID, NL_CB_CUSTOM, parse_genlmsg, NULL);
+	genl_connect(nlsk);
+
+	/*
+	 * Sign up for HSR messages
+	 */
+	hsr_mgroup = genl_ctrl_resolve_grp(nlsk, "HSR", "hsr-network");
+	if (hsr_mgroup < 0) {
+		printf("genl_ctrl_resolve_grp() failed: %d\n", hsr_mgroup);
+		rc = EXIT_FAILURE;
+		goto out;
+	}
+
+	printf("Registering for multicast group %d\n", hsr_mgroup);
+	rc = nl_socket_add_memberships(nlsk, hsr_mgroup, 0);
+	if (rc < 0) {
+		printf("nl_socket_add_memberships() failed: %d\n", rc);
+		goto out;
+	}
+
+	/*
+	 * Send a query about the status of another node on the HSR network:
+	 */
+	int hsr_family;
+	/* The hsr if we send the enquiry to (get it with e.g.
+	 * 'cat /sys/class/net/hsr0/ifindex'): */
+	const int hsr_ifindex = 4;
+	/* The node to enquire about: */
+	const unsigned char node[ETH_ALEN] = {0x00, 0x24, 0x74, 0x00, 0x17, 0xAD};
+
+	hsr_family = genl_ctrl_resolve(nlsk, "HSR");
+	if (hsr_family < 0) {
+		printf("genl_ctrl_resolve() failed: %d\n", hsr_family);
+		goto receive;
+	}
+	rc = query_node_status(nlsk, hsr_family, hsr_ifindex, node);
+	printf("query_node_status() returned %d\n", rc);
+
+	/*
+	 * Receive messages
+	 */
+receive:
+	while (1)
+		nl_recvmsgs_default(nlsk);
+
+	rc = EXIT_SUCCESS;
+out:
+	nl_close(nlsk);
+	nl_socket_free(nlsk);
+	return rc;
+}
diff --git a/include/linux/if_ether.h b/include/linux/if_ether.h
index 56d907a..0d0e2f9 100644
--- a/include/linux/if_ether.h
+++ b/include/linux/if_ether.h
@@ -83,6 +83,7 @@
 #define ETH_P_TIPC	0x88CA		/* TIPC 			*/
 #define ETH_P_8021AH	0x88E7          /* 802.1ah Backbone Service Tag */
 #define ETH_P_1588	0x88F7		/* IEEE 1588 Timesync */
+#define ETH_P_HSR	0x88FB		/* IEC 62439-3 HSR/PRP		*/
 #define ETH_P_FCOE	0x8906		/* Fibre Channel over Ethernet  */
 #define ETH_P_TDLS	0x890D          /* TDLS */
 #define ETH_P_FIP	0x8914		/* FCoE Initialization Protocol */
diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 4b24ff4..3e3efb4 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -391,4 +391,15 @@ struct ifla_port_vsi {
 	__u8 pad[3];
 };

+/* HSR section */
+
+enum {
+	IFLA_HSR_UNSPEC,
+	IFLA_HSR_SLAVE1,
+	IFLA_HSR_SLAVE2,
+	__IFLA_HSR_MAX,
+};
+
+#define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
+
 #endif /* _LINUX_IF_LINK_H */
diff --git a/net/Kconfig b/net/Kconfig
index e07272d..22446d3 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -216,6 +216,7 @@ source "net/dcb/Kconfig"
 source "net/dns_resolver/Kconfig"
 source "net/batman-adv/Kconfig"
 source "net/openvswitch/Kconfig"
+source "net/hsr/Kconfig"

 config RPS
 	boolean
diff --git a/net/Makefile b/net/Makefile
index ad432fa..7d8787f 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -70,3 +70,4 @@ obj-$(CONFIG_CEPH_LIB)		+= ceph/
 obj-$(CONFIG_BATMAN_ADV)	+= batman-adv/
 obj-$(CONFIG_NFC)		+= nfc/
 obj-$(CONFIG_OPENVSWITCH)	+= openvswitch/
+obj-$(CONFIG_HSR)		+= hsr/
diff --git a/net/hsr/Kconfig b/net/hsr/Kconfig
new file mode 100644
index 0000000..895afda
--- /dev/null
+++ b/net/hsr/Kconfig
@@ -0,0 +1,84 @@
+#
+# IEC 62439-3 High-availability Seamless Redundancy
+#
+
+config HSR
+	tristate "High-availability Seamless Redundancy (HSR)"
+	---help---
+	  If you say Y here, then your Linux box will be able to act as a
+	  DANH ("Doubly attached node implementing HSR"). For this to work,
+	  your Linux box needs (at least) two physical Ethernet interfaces,
+	  and you need to enslave these to a virtual hsr interface using the
+	  appropriate user space tool, i.e.:
+
+	  # ip link add name hsr0 type hsr dev1 dev2
+
+	  Your Linux box must be connected as a node in a ring network
+	  together with other HSR capable nodes.
+
+	  All Ethernet frames sent over the hsr device will be sent in both
+	  directions on the ring (over both slave ports), giving a redundant,
+	  instant fail-over network.
+
+	  Each HSR node in the ring acts like a bridge for HSR frames, but
+	  filters frames that have been forwarded earlier.
+
+	  This code is a "best effort" to comply with the HSR standard as
+	  described in IEC 62439-3, but no compliancy tests have been made.
+	  You need to perform any and all necessary tests yourself before
+	  relying on this code in a safety critical system. In particular, the
+	  standard is very diffuse on how to use the Ring ID field in the HSR
+	  tag, and it's probable that this code does not do the right thing.
+
+	  If unsure, say N.
+
+if HAVE_EFFICIENT_UNALIGNED_ACCESS
+
+config NONSTANDARD_HSR
+	bool "HSR: Use efficient tag (breaks HSR standard, read help!)"
+	depends on HSR
+	---help---
+	  The HSR standard specifies a 6-byte HSR tag to be inserted into the
+	  transmitted network frames. This breaks the 32-bit alignment that the
+	  Linux network stack relies on, and would cause kernel panics on
+	  certain architectures. To avoid this, the whole frame payload is
+	  memmoved 2 bytes on reception on these architectures - which is very
+	  inefficient!
+
+	  If you select Y here, 2 bytes of padding is inserted into the HSR tag,
+	  which makes it possible to skip the memmove. This however breaks
+	  compatibility with compliant HSR devices. I.e., either all or none of
+	  the devices in your HSR ring needs to have this option set.
+
+	  Your architecture has HAVE_EFFICIENT_UNALIGNED_ACCESS, so you do not
+	  need this unless you have other nodes in your ring which have this
+	  option set.
+
+	  If unsure, say N.
+
+endif # HAVE_EFFICIENT_UNALIGNED_ACCESS
+if !HAVE_EFFICIENT_UNALIGNED_ACCESS
+
+config NONSTANDARD_HSR
+	bool "HSR: Use efficient tag (breaks HSR standard, read help!)"
+	depends on HSR
+	---help---
+	  The HSR standard specifies a 6-byte HSR tag to be inserted into the
+	  transmitted network frames. This breaks the 32-bit alignment that the
+	  Linux network stack relies on, and would cause kernel panics on
+	  certain architectures. To avoid this, the whole frame payload is
+	  memmoved 2 bytes on reception on these architectures - which is very
+	  inefficient!
+
+	  If you select Y here, 2 bytes of padding is inserted into the HSR tag,
+	  which makes it possible to skip the memmove. This however breaks
+	  compatibility with compliant HSR devices. I.e., either all or none of
+	  the devices in your HSR ring needs to have this option set.
+
+	  Your architecture does not have HAVE_EFFICIENT_UNALIGNED_ACCESS, so
+	  you should seriously consider saying Y here if performance is at all
+	  important to you.
+
+	  If unsure, say N.
+
+endif # !HAVE_EFFICIENT_UNALIGNED_ACCESS
diff --git a/net/hsr/Makefile b/net/hsr/Makefile
new file mode 100644
index 0000000..b68359f
--- /dev/null
+++ b/net/hsr/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for HSR
+#
+
+obj-$(CONFIG_HSR)	+= hsr.o
+
+hsr-y			:= hsr_main.o hsr_framereg.o hsr_device.o hsr_netlink.o
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c
new file mode 100644
index 0000000..e95c006
--- /dev/null
+++ b/net/hsr/hsr_device.c
@@ -0,0 +1,531 @@
+/*
+ * Copyright 2011-2012 Autronica Fire and Security AS
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * Author(s):
+ *	2011-2012 Arvid Brodin, arvid.brodin@xdin.com
+ *
+ * This file contains device methods for creating, using and destroying
+ * virtual HSR devices.
+ */
+
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <linux/etherdevice.h>
+#include <linux/if_arp.h>
+#include <linux/rtnetlink.h>
+#include <linux/netfilter.h>
+#include <linux/netpoll.h>
+#include "hsr_framereg.h"
+#include "hsr_private.h"
+
+
+static int is_admin_up(struct net_device *dev)
+{
+	return (dev->flags & IFF_UP);
+}
+
+static int is_operstate_up(struct net_device *dev)
+{
+	return (dev->operstate == IF_OPER_UP);
+}
+
+static void __hsr_set_operstate(struct net_device *dev, int transition)
+{
+	if (dev->operstate != transition) {
+/*
+		switch (transition) {
+		case IF_OPER_UP:
+			printk(KERN_INFO "%s: new operstate is IF_OPER_UP\n", dev->name);
+			break;
+		default:
+			printk(KERN_INFO "%s: new operstate is !IF_OPER_UP (%d)\n", dev->name, transition);
+		}
+*/
+		write_lock_bh(&dev_base_lock);
+		dev->operstate = transition;
+		write_unlock_bh(&dev_base_lock);
+		netdev_state_change(dev);
+	}
+}
+
+void hsr_set_operstate(struct net_device *hsr_dev, struct net_device *slave1,
+						struct net_device *slave2)
+{
+	if (!is_admin_up(hsr_dev)) {
+		__hsr_set_operstate(hsr_dev, IF_OPER_DOWN);
+		return;
+	}
+/*
+	printk(KERN_INFO "Slave1/2 operstate: %d/%d\n",
+					slave1->operstate, slave2->operstate);
+*/
+	if (is_operstate_up(slave1) || is_operstate_up(slave2))
+		__hsr_set_operstate(hsr_dev, IF_OPER_UP);
+	else
+		__hsr_set_operstate(hsr_dev, IF_OPER_LOWERLAYERDOWN);
+}
+
+void hsr_set_carrier(struct net_device *hsr_dev, struct net_device *slave1,
+						struct net_device *slave2)
+{
+	if (is_operstate_up(slave1) || is_operstate_up(slave2))
+		netif_carrier_on(hsr_dev);
+	else
+		netif_carrier_off(hsr_dev);
+}
+
+
+void hsr_check_announce(struct net_device *hsr_dev, int old_operstate)
+{
+	struct hsr_priv *hsr_priv;
+
+	hsr_priv = netdev_priv(hsr_dev);
+
+	if ((hsr_dev->operstate == IF_OPER_UP) && (old_operstate != IF_OPER_UP)) {
+		/* Went up */
+		hsr_priv->announce_count = 0;
+		hsr_priv->announce_timer.expires = jiffies +
+				msecs_to_jiffies(HSR_ANNOUNCE_INTERVAL);
+		add_timer(&hsr_priv->announce_timer);
+	}
+
+	if ((hsr_dev->operstate != IF_OPER_UP) && (old_operstate == IF_OPER_UP))
+		/* Went down */
+		del_timer(&hsr_priv->announce_timer);
+}
+
+
+
+static int hsr_dev_open(struct net_device *dev)
+{
+	struct hsr_priv *hsr_priv;
+
+	hsr_priv = netdev_priv(dev);
+
+	dev_open(hsr_priv->slave_data[0].dev);
+	dev_open(hsr_priv->slave_data[1].dev);
+
+	return 0;
+}
+
+static int hsr_dev_close(struct net_device *dev)
+{
+	// FIXME: reset status of slaves
+	return 0;
+}
+
+
+static void hsr_fill_tag(struct hsr_ethhdr *hsr_ethhdr, struct hsr_priv *hsr_priv)
+{
+	u16 path;
+	u16 LSDU_size;
+	unsigned long irqflags;
+
+	/*
+	 * IEC 62439-1, p 48, says the 4-bit "path" field can take values
+	 * between 0001-1001 ("ring identifier", for regular HSR frames),
+	 * or 1111 ("HSR management", supervision frames). Unfortunately, the
+	 * spec writers forgot to explain what a "ring identifier" is, or
+	 * how it is used. So we just set this to 0001 for regular frames,
+	 * and 1111 for supervision frames.
+	 */
+	path = 0x1;
+
+	/*
+	 * IEC 62439-1, p 12: "The link service data unit in an Ethernet frame
+	 * is the content of the frame located between the Length/Type field
+	 * and the Frame Check Sequence."
+	 *
+	 * IEC 62439-3, p 48, specifies the "original LPDU" to include the
+	 * original "LT" field (what "LT" means is not explained anywhere as
+	 * far as I can see - perhaps "Length/Type"?). So LSDU_size might
+	 * equal original length + 2.
+	 *   Also, the fact that this field is not used anywhere (might be used
+	 * by a RedBox connecting HSR and PRP nets?) means I cannot test its
+	 * correctness. Instead of guessing, I set this to 0 here, to make any
+	 * problems immediately apparent. Anyone using this driver with PRP/HSR
+	 * RedBoxes might need to fix this...
+	 */
+	LSDU_size = 0;
+	hsr_ethhdr->hsr_tag.path_and_LSDU_size =
+					htons((u16) (path << 12) | LSDU_size);
+
+	spin_lock_irqsave(&hsr_priv->seqlock, irqflags);
+	hsr_ethhdr->hsr_tag.sequence_nr = htons(hsr_priv->sequence_nr);
+	hsr_priv->sequence_nr++;
+	spin_unlock_irqrestore(&hsr_priv->seqlock, irqflags);
+
+	hsr_ethhdr->hsr_tag.encap_proto = hsr_ethhdr->ethhdr.h_proto;
+
+	hsr_ethhdr->ethhdr.h_proto = htons(ETH_P_HSR);
+}
+
+static int slave_xmit(struct sk_buff *skb, struct net_device *dev,
+						struct net_device *hsr_dev)
+{
+	skb_set_dev(skb, dev);
+	skb->priority = 1; // FIXME: what does this mean?
+
+	// FIXME: what's netpoll_tx_running?
+	if (netpoll_tx_running(hsr_dev))
+		return skb->dev->netdev_ops->ndo_start_xmit(skb, skb->dev);
+
+	return dev_queue_xmit(skb);
+}
+
+
+static int hsr_dev_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct hsr_priv *hsr_priv;
+	struct hsr_ethhdr *hsr_ethhdr;
+	struct sk_buff *skb2;
+	int res1, res2;
+
+	hsr_priv = netdev_priv(dev);
+	hsr_ethhdr = (struct hsr_ethhdr *) skb->data;
+
+	if ((ntohs(skb->protocol) != ETH_P_HSR) ||
+			(ntohs(hsr_ethhdr->ethhdr.h_proto) != ETH_P_HSR)) {
+
+		hsr_fill_tag(hsr_ethhdr, hsr_priv);
+		skb->protocol = htons(ETH_P_HSR);
+	}
+
+	skb2 = skb_clone(skb, GFP_ATOMIC);
+
+	res1 = NET_XMIT_DROP;
+	res2 = NET_XMIT_DROP;
+	res1 = slave_xmit(skb, hsr_priv->slave_data[0].dev, dev);
+	if (skb2) {
+		/* Address substitution (IEC62439-3 pp 26, 50): replace mac
+		 * address of outgoing frame with that of the outgoing slave's.
+		 */
+		memcpy(hsr_ethhdr->ethhdr.h_source,
+					hsr_priv->slave_data[1].dev->dev_addr,
+					ETH_ALEN);
+		res2 = slave_xmit(skb2, hsr_priv->slave_data[1].dev, dev);
+	}
+
+	if (likely(res1 == NET_XMIT_SUCCESS || res1 == NET_XMIT_CN ||
+			res2 == NET_XMIT_SUCCESS || res2 == NET_XMIT_CN)) {
+		hsr_priv->dev->stats.tx_packets++;
+		hsr_priv->dev->stats.tx_bytes += skb->len;
+	} else
+		hsr_priv->dev->stats.tx_dropped++;
+
+	return NETDEV_TX_OK;
+}
+
+
+static int hsr_header_create(struct sk_buff *skb, struct net_device *dev,
+					unsigned short type,
+					const void *daddr, const void *saddr,
+					unsigned int len)
+{
+	int res;
+
+	/* Make room for the HSR tag now. We will fill it in later (in
+	   hsr_dev_xmit) */
+	skb_push(skb, HSR_TAGLEN);
+	res = eth_header(skb, dev, type, daddr, saddr, len + HSR_TAGLEN);
+	if (res <= 0)
+		return res;
+	skb_reset_mac_header(skb);
+
+	return res + HSR_TAGLEN;
+}
+
+
+static const struct header_ops hsr_header_ops = {
+	.create	 = hsr_header_create,
+	.parse	 = eth_header_parse,
+};
+
+
+static void send_hsr_supervision_frame(struct net_device *hsr_dev, u8 type)
+{
+	struct hsr_priv *hsr_priv;
+	struct sk_buff *skb;
+	struct hsr_ethhdr *hsr_ethhdr;
+	unsigned char *mac;
+	u16 path, HSR_Ver, HSR_TLV_Length;
+	unsigned long irqflags;
+
+	skb = alloc_skb(sizeof(struct ethhdr) +
+				sizeof(struct hsr_supervision_tag) +
+				LL_RESERVED_SPACE(hsr_dev) +
+				hsr_dev->needed_tailroom, GFP_ATOMIC);
+	if (skb == NULL)
+		return;
+
+	hsr_priv = netdev_priv(hsr_dev);
+
+	skb_reserve(skb, LL_RESERVED_SPACE(hsr_dev));
+	skb_reset_network_header(skb);
+
+	/* Payload: MacAddressA */
+	mac = (unsigned char *) skb_put(skb, ETH_ALEN);
+	memcpy(mac, hsr_dev->dev_addr, ETH_ALEN);
+
+	skb->dev = hsr_dev;
+	skb->protocol = htons(ETH_P_HSR);
+
+	if (dev_hard_header(skb, skb->dev, ETH_P_HSR, hsr_multicast_addr,
+					skb->dev->dev_addr, skb->len) < 0)
+		goto out;
+
+	hsr_ethhdr = (struct hsr_ethhdr *) skb->data;
+
+	path = 0x0f;
+	HSR_Ver = 0;
+	hsr_ethhdr->hsr_tag.path_and_LSDU_size = htons(path << 12 | HSR_Ver);
+
+	spin_lock_irqsave(&hsr_priv->seqlock, irqflags);
+	hsr_ethhdr->hsr_tag.sequence_nr = htons(hsr_priv->sequence_nr);
+	hsr_priv->sequence_nr++;
+	spin_unlock_irqrestore(&hsr_priv->seqlock, irqflags);
+
+	HSR_TLV_Length = 12;
+	hsr_ethhdr->hsr_tag.encap_proto = htons(type << 8 | HSR_TLV_Length);
+
+	dev_queue_xmit(skb);
+	return;
+
+out:
+	kfree_skb(skb);
+}
+
+
+/*
+ * Announce (supervision frame) timer function
+ */
+static void hsr_announce(unsigned long data)
+{
+	struct hsr_priv *hsr_priv;
+
+	hsr_priv = (struct hsr_priv *) data;
+
+	if (hsr_priv->announce_count < 3) {
+		send_hsr_supervision_frame(hsr_priv->dev, HSR_TLV_ANNOUNCE);
+		hsr_priv->announce_count++;
+	} else
+		send_hsr_supervision_frame(hsr_priv->dev, HSR_TLV_LIFE_CHECK);
+
+	if (hsr_priv->announce_count < 3)
+		hsr_priv->announce_timer.expires = jiffies +
+				msecs_to_jiffies(HSR_ANNOUNCE_INTERVAL);
+	else
+		hsr_priv->announce_timer.expires = jiffies +
+				msecs_to_jiffies(HSR_LIFE_CHECK_INTERVAL);
+
+	if (is_admin_up(hsr_priv->dev))
+		add_timer(&hsr_priv->announce_timer);
+}
+
+
+
+
+static void restore_slaves(struct net_device *hsr_dev)
+{
+	struct hsr_priv *hsr_priv;
+	struct net_device *slave[2];
+	int i;
+	int res;
+
+	hsr_priv = netdev_priv(hsr_dev);
+	for (i = 0; i < 2; i++)
+		slave[i] = hsr_priv->slave_data[i].dev;
+
+	rtnl_lock();
+
+	/* Restore promiscuity */
+	for (i = 0; i < 2; i++) {
+		if (!hsr_priv->slave_data[i].promisc)
+			continue;
+		res = dev_set_promiscuity(slave[i],
+					-hsr_priv->slave_data[i].promisc);
+		if (res)
+			pr_info("HSR: Cannot restore promiscuity (%s, %d)\n",
+							slave[i]->name,
+							res);
+	}
+
+	/* Restore up state */
+/*
+	for (i = 0; i < 2; i++)
+		if (hsr_priv->slave_data[i].was_up)
+			dev_open(slave[i]);
+		else
+			dev_close(slave[i]);
+*/
+	rtnl_unlock();
+}
+
+static void reclaim_hsr_dev(struct rcu_head *rh)
+{
+	struct hsr_priv *hsr_priv;
+
+	hsr_priv = container_of(rh, struct hsr_priv, rcu_head);
+	free_netdev(hsr_priv->dev);
+}
+
+/*
+ * According to comments in the declaration of struct net_device, this function
+ * is "Called from unregister, can be used to call free_netdev". Ok then...
+ */
+static void hsr_dev_destroy(struct net_device *hsr_dev)
+{
+	struct hsr_priv *hsr_priv;
+
+	hsr_priv = netdev_priv(hsr_dev);
+
+	del_timer(&hsr_priv->announce_timer);
+	unregister_hsr_master(hsr_priv);    /* calls list_del_rcu on hsr_priv */
+	restore_slaves(hsr_dev);
+	call_rcu(&hsr_priv->rcu_head, reclaim_hsr_dev);   /* reclaim hsr_priv */
+}
+
+static struct net_device_ops hsr_device_ops = {
+	.ndo_open = hsr_dev_open,
+	.ndo_stop = hsr_dev_close,
+	.ndo_start_xmit = hsr_dev_xmit,
+};
+
+
+void hsr_dev_setup(struct net_device *dev)
+{
+	random_ether_addr(dev->dev_addr);
+
+	ether_setup(dev);
+	dev->header_ops		 = &hsr_header_ops;
+	dev->netdev_ops		 = &hsr_device_ops;
+	dev->hard_header_len	+= HSR_TAGLEN;
+	dev->mtu		-= HSR_TAGLEN;
+	dev->tx_queue_len	 = 0;
+
+	dev->destructor = hsr_dev_destroy;
+}
+
+
+/*
+ * If dev is a HSR master, return 1; otherwise, return 0.
+ */
+int is_hsr_master(struct net_device *dev)
+{
+	return (dev->netdev_ops->ndo_start_xmit == hsr_dev_xmit);
+}
+
+static int check_slave_ok(struct net_device *dev)
+{
+	/* Don't allow HSR on non-ethernet like devices */
+	if ((dev->flags & IFF_LOOPBACK) || (dev->type != ARPHRD_ETHER) ||
+						(dev->addr_len != ETH_ALEN)) {
+		pr_info("%s: Cannot enslave loopback or non-ethernet device\n",
+								dev->name);
+		return -EINVAL;
+	}
+
+	/* Don't allow enslaving hsr devices */
+	if (is_hsr_master(dev)) {
+		pr_info("%s: Don't try to create trees of hsr devices!\n",
+								dev->name);
+		return -ELOOP;
+	}
+
+	/* FIXME: What about VLAN devices, bonded devices, etc? */
+
+	return 0;
+}
+
+int hsr_dev_finalize(struct net_device *hsr_dev, struct net_device *slave[2])
+{
+	struct hsr_priv *hsr_priv;
+	int i;
+	int res;
+
+	hsr_priv = netdev_priv(hsr_dev);
+	hsr_priv->dev = hsr_dev;
+	INIT_LIST_HEAD(&hsr_priv->node_db);
+	INIT_LIST_HEAD(&hsr_priv->self_node_db);
+	for (i = 0; i < 2; i++)
+		hsr_priv->slave_data[i].dev = slave[i];
+
+	spin_lock_init(&hsr_priv->seqlock);
+	hsr_priv->sequence_nr = 0;
+
+	init_timer(&hsr_priv->announce_timer);
+	hsr_priv->announce_timer.function = hsr_announce;
+	hsr_priv->announce_timer.data = (unsigned long) hsr_priv;
+
+
+/*
+ * FIXME: do I need to set the value of these?
+ *
+ * - hsr_dev->flags
+ * - hsr_dev->priv_flags
+ */
+
+	for (i = 0; i < 2; i++) {
+		res = check_slave_ok(slave[i]);
+		if (res)
+			return res;
+	}
+
+	hsr_dev->features = slave[0]->features & slave[1]->features;
+	hsr_dev->features |= NETIF_F_LLTX; /* Prevent recursive tx locking */
+
+	/* Save/init data needed for restore */
+	for (i = 0; i < 2; i++) {
+		hsr_priv->slave_data[i].was_up = slave[i]->flags & IFF_UP;
+		hsr_priv->slave_data[i].promisc = 0;
+	}
+
+	/* Set hsr_dev's MAC address to that of mac_slave1 */
+	memcpy(hsr_dev->dev_addr, hsr_priv->slave_data[0].dev->dev_addr,
+							hsr_dev->addr_len);
+
+	/* MTU */
+	for (i = 0; i < 2; i++)
+		if (slave[i]->mtu < hsr_dev->mtu)
+			hsr_dev->mtu = slave[i]->mtu;
+
+	/* Make sure the 1st call to netif_carrier_on() gets through */
+	netif_carrier_off(hsr_dev);
+
+	/* Promiscuity */
+	for (i = 0; i < 2; i++) {
+		res = dev_set_promiscuity(slave[i], 1);
+		if (res) {
+			pr_info("HSR: Cannot set promiscuous mode (%s, %d)\n",
+								slave[i]->name,
+								res);
+			goto fail;
+		}
+		/* Remember what we have done so we can restore it later */
+		hsr_priv->slave_data[i].promisc = 1;
+	}
+
+	/* Make sure we recognize frames from ourselves in hsr_rcv() */
+	res = frameref_create_self_node(&hsr_priv->self_node_db,
+					hsr_dev->dev_addr,
+					hsr_priv->slave_data[1].dev->dev_addr);
+	if (res < 0)
+		goto fail;
+
+	res = register_netdevice(hsr_dev);
+	if (res)
+		goto fail;
+
+	register_hsr_master(hsr_priv);
+
+	return 0;
+
+fail:
+	restore_slaves(hsr_dev);
+	return res;
+}
diff --git a/net/hsr/hsr_device.h b/net/hsr/hsr_device.h
new file mode 100644
index 0000000..a7596a2
--- /dev/null
+++ b/net/hsr/hsr_device.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright 2011-2012 Autronica Fire and Security AS
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * Author(s):
+ *	2011-2012 Arvid Brodin, arvid.brodin@xdin.com
+ */
+
+#ifndef __HSR_DEVICE_H
+#define __HSR_DEVICE_H
+
+#include <linux/netdevice.h>
+
+void hsr_dev_setup(struct net_device *dev);
+int hsr_dev_finalize(struct net_device *hsr_dev, struct net_device *slave[2]);
+void hsr_set_operstate(struct net_device *hsr_dev, struct net_device *slave1,
+						struct net_device *slave2);
+void hsr_set_carrier(struct net_device *hsr_dev, struct net_device *slave1,
+						struct net_device *slave2);
+void hsr_check_announce(struct net_device *hsr_dev, int old_operstate);
+int is_hsr_master(struct net_device *dev);
+
+#endif /* __HSR_DEVICE_H */
diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
new file mode 100644
index 0000000..f92bc9f
--- /dev/null
+++ b/net/hsr/hsr_framereg.c
@@ -0,0 +1,328 @@
+/*
+ * Copyright 2011-2012 Autronica Fire and Security AS
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * Author(s):
+ *	2011-2012 Arvid Brodin, arvid.brodin@xdin.com
+ *
+ * The HSR spec says never to forward the same frame twice on the same
+ * interface. A frame is identified by its source MAC address and its HSR
+ * sequence number. This code keeps track of senders and their sequence numbers
+ * to allow filtering of duplicate frames.
+ */
+
+#include <linux/if_ether.h>
+#include <linux/etherdevice.h>
+#include <linux/slab.h>
+#include <linux/rculist.h>
+#include "hsr_private.h"
+#include "hsr_framereg.h"
+#include "hsr_netlink.h"
+
+
+/*
+	TODO: use hash lists for mac addresses (linux/jhash.h)?
+*/
+
+struct node_entry {
+	struct list_head mac_list;
+	unsigned char	MacAddressA[ETH_ALEN];
+	unsigned char	MacAddressB[ETH_ALEN];
+	unsigned long	time_in[HSR_MAX_SLAVE];
+	u16		seq_out[HSR_MAX_DEV];
+	struct rcu_head rcu_head;
+};
+
+
+unsigned char *node_get_addr(struct node_entry *node)
+{
+	return node->MacAddressA;
+}
+
+
+/*
+ * Search for mac entry. Caller must hold rcu read lock.
+ */
+static struct node_entry *find_node_by_AddrA(struct list_head *node_db,
+						unsigned char addr[ETH_ALEN])
+{
+	struct node_entry *node;
+
+	list_for_each_entry_rcu(node, node_db, mac_list)
+		if (!compare_ether_addr(node->MacAddressA, addr))
+			return node;
+
+	return NULL;
+}
+
+/*
+ * Search for mac entry. Caller must hold rcu read lock.
+ */
+static struct node_entry *find_node_by_AddrB(struct list_head *node_db,
+						unsigned char addr[ETH_ALEN])
+{
+	struct node_entry *node;
+
+	list_for_each_entry_rcu(node, node_db, mac_list)
+		if (!compare_ether_addr(node->MacAddressB, addr))
+			return node;
+
+	return NULL;
+}
+
+/*
+ * Search for mac entry. Caller must hold rcu read lock.
+ */
+struct node_entry *framereg_find_node(struct list_head *node_db,
+							struct sk_buff *skb)
+{
+	struct node_entry *node;
+	struct ethhdr *ethhdr;
+
+	if (!skb_mac_header_was_set(skb))
+		return NULL;
+
+	ethhdr = (struct ethhdr *) skb_mac_header(skb);
+
+	list_for_each_entry_rcu(node, node_db, mac_list) {
+		if (!compare_ether_addr(node->MacAddressA, ethhdr->h_source))
+			return node;
+		if (!compare_ether_addr(node->MacAddressB, ethhdr->h_source))
+			return node;
+	}
+
+	return NULL;
+}
+
+/*
+ * Helper for device init; the self_node_db is used in hsr_rcv() to recognize
+ * frames from self that's been looped over the HSR ring.
+ */
+int frameref_create_self_node(struct list_head *self_node_db,
+						unsigned char addr_a[ETH_ALEN],
+						unsigned char addr_b[ETH_ALEN])
+{
+	struct node_entry *node;
+
+	node = kmalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return -ENOMEM;
+
+	memcpy(node->MacAddressA, addr_a, ETH_ALEN);
+	memcpy(node->MacAddressB, addr_b, ETH_ALEN);
+
+	list_add_tail_rcu(&node->mac_list, self_node_db);
+	return 0;
+}
+
+int framereg_merge_node(struct hsr_priv *hsr_priv, enum hsr_dev_idx dev_idx,
+							struct sk_buff *skb)
+{
+	struct ethhdr *ethhdr;
+	struct hsr_supervision_tag *hsr_stag;
+	struct node_entry *node;
+	int i;
+	int found;
+
+	ethhdr = (struct ethhdr *) skb_mac_header(skb);
+	hsr_stag = (struct hsr_supervision_tag *)
+				(&((struct hsr_ethhdr *) ethhdr)->hsr_tag);
+
+	found = 1;
+	rcu_read_lock();
+	node = find_node_by_AddrA(&hsr_priv->node_db, hsr_stag->MacAddressA);
+	if (!node) {
+		rcu_read_unlock();
+		found = 0;
+		node = kmalloc(sizeof(*node), GFP_ATOMIC);
+		if (!node)
+			return -ENOMEM;
+
+		memcpy(node->MacAddressA, hsr_stag->MacAddressA, ETH_ALEN);
+		memcpy(node->MacAddressB, ethhdr->h_source, ETH_ALEN);
+
+		for (i = 0; i < HSR_MAX_SLAVE; i++)
+			node->time_in[i] = 0;
+		for (i = 0; i < HSR_MAX_DEV; i++)
+			node->seq_out[i] = 0;
+/*
+		printk(KERN_INFO "HSR: Added node %pM / %pM\n",
+							node->MacAddressA,
+							node->MacAddressB);
+*/
+	}
+
+	/* Merge node if it's PICS_SUBS capable */
+	if (compare_ether_addr(hsr_stag->MacAddressA, ethhdr->h_source)) {
+		memcpy(node->MacAddressB, ethhdr->h_source, ETH_ALEN);
+/*
+		printk(KERN_INFO "HSR: Merged node %pM / %pM\n",
+							node->MacAddressA,
+							node->MacAddressB);
+*/
+	}
+
+	node->time_in[dev_idx] = jiffies;
+
+	if (found)
+		rcu_read_unlock();
+	else
+		list_add_tail_rcu(&node->mac_list, &hsr_priv->node_db);
+
+	return 0;
+}
+
+
+void hsr_addr_subst(struct hsr_priv *hsr_priv, struct sk_buff *skb)
+{
+	struct ethhdr *ethhdr;
+	struct node_entry *node;
+
+	ethhdr = (struct ethhdr *) skb_mac_header(skb);
+	rcu_read_lock();
+	node = find_node_by_AddrB(&hsr_priv->node_db, ethhdr->h_source);
+	if (node) {
+/*
+		printk(KERN_INFO "HSR: Substituting %pM -> %pM\n",
+							ethhdr->h_source,
+							node->MacAddressA);
+*/
+		memcpy(ethhdr->h_source, node->MacAddressA, ETH_ALEN);
+	} /*else
+		printk(KERN_INFO "HSR: Not substituting addr %pM\n",
+							ethhdr->h_source);
+*/
+	rcu_read_unlock();
+}
+
+
+
+/*
+ * above(a, b) - return 1 if a > b, 0 otherwise.
+ * Uses C 16-bit unsigned arithmetic, with differences > (1 << 15) interpreted
+ * as negative.
+ */
+#define	MAX_RANGE_DIFF	(1 << 15)
+static int above(u16 a, u16 b)
+{
+	if ((u16) (a - b) == (u16) (b - a))
+		return (a > b);
+	return (((u16) (a - b) > (u16) 0) &&
+		((u16) (a - b) <= (u16) MAX_RANGE_DIFF));
+}
+#define below(a, b)		above((b), (a))
+#define above_or_equal(a, b)	(!below((a), (b)))
+#define below_or_equal(a, b)	(!above((a), (b)))
+
+
+void framereg_frame_in(struct node_entry *node, enum hsr_dev_idx dev_idx)
+{
+	if ((dev_idx < 0) || (dev_idx >= HSR_MAX_DEV)) {
+		WARN_ON(1);
+		return;
+	}
+//	printk(KERN_INFO "node %pM; dev_idx %d\n", node->MacAddressA, dev_idx);
+	node->time_in[dev_idx] = jiffies;
+}
+
+/*
+ * Parameters:
+ *	'skb' is a HSR Ethernet frame (with a HSR tag inserted), with a valid
+ *	ethhdr->h_source address and skb->mac_header set.
+ *
+ * Return:
+ *	 1 if frame can be shown to have been sent recently on this interface,
+ *	 0 otherwise, or
+ *	 negative error code on error
+ */
+int framereg_frame_out(struct node_entry *node, enum hsr_dev_idx dev_idx,
+							struct sk_buff *skb)
+{
+	struct hsr_ethhdr *hsr_ethhdr;
+
+	if ((dev_idx < 0) || (dev_idx >= HSR_MAX_DEV)) {
+		WARN_ON(1);
+		return -EINVAL;
+	}
+	if (!skb_mac_header_was_set(skb)) {
+		printk(KERN_INFO "%s:%d: MAC header not set\n", __func__, __LINE__);
+		return -EINVAL;
+	}
+	hsr_ethhdr = (struct hsr_ethhdr *) skb_mac_header(skb);
+
+	if (below_or_equal(hsr_ethhdr->hsr_tag.sequence_nr,
+							node->seq_out[dev_idx]))
+		return 1;
+
+	node->seq_out[dev_idx] = hsr_ethhdr->hsr_tag.sequence_nr;
+	return 0;
+}
+
+
+static void node_entry_reclaim(struct rcu_head *rh)
+{
+	kfree(container_of(rh, struct node_entry, rcu_head));
+}
+
+/*
+ * Remove stale sequence_nr records. Called by timer every
+ * HSR_LIFE_CHECK_INTERVAL (two seconds or so). This is also the only function
+ * that removes mac_entries; it shouldn't need to be rcu_read_lock():ed.
+ */
+void framereg_prune_nodes(struct list_head *node_db)
+{
+	struct node_entry *node_entry, *node_entry_next;
+	unsigned long timestamp;
+
+	list_for_each_entry_safe(node_entry, node_entry_next, node_db, mac_list) {
+
+		timestamp = max(node_entry->time_in[HSR_DEV_SLAVE1],
+				node_entry->time_in[HSR_DEV_SLAVE2]);
+
+		/* Warn only as long as we get frames at all */
+		if (time_is_after_jiffies(timestamp +
+					msecs_to_jiffies(1.5*MAX_SLAVE_DIFF))) {
+
+			/* Check for open ring */
+			if (time_after(node_entry->time_in[HSR_DEV_SLAVE2],
+					node_entry->time_in[HSR_DEV_SLAVE1] +
+					msecs_to_jiffies(MAX_SLAVE_DIFF)))
+				hsr_nl_ringerror(node_entry->MacAddressA, HSR_DEV_SLAVE1);
+			else if (time_after(node_entry->time_in[HSR_DEV_SLAVE1],
+					node_entry->time_in[HSR_DEV_SLAVE2] +
+					msecs_to_jiffies(MAX_SLAVE_DIFF)))
+				hsr_nl_ringerror(node_entry->MacAddressA, HSR_DEV_SLAVE2);
+		}
+
+		/* Prune old entries */
+		if (time_is_before_jiffies(timestamp +
+					msecs_to_jiffies(HSR_NODE_FORGET_TIME))) {
+			hsr_nl_nodedown(node_entry->MacAddressA);
+			list_del_rcu(&node_entry->mac_list);
+			call_rcu(&node_entry->rcu_head, node_entry_reclaim);
+		}
+	}
+}
+
+
+void framereg_get_node_times(struct hsr_priv *hsr_priv,
+				unsigned char addr[ETH_ALEN],
+				unsigned long *time1, unsigned long *time2)
+{
+	struct node_entry *node;
+
+	rcu_read_lock();
+	node = find_node_by_AddrA(&hsr_priv->node_db, addr);
+	if (!node) {
+		*time1 = 0;
+		*time2 = 0;
+	} else {
+		*time1 = node->time_in[HSR_DEV_SLAVE1];
+		*time2 = node->time_in[HSR_DEV_SLAVE2];
+	}
+	rcu_read_unlock();
+}
diff --git a/net/hsr/hsr_framereg.h b/net/hsr/hsr_framereg.h
new file mode 100644
index 0000000..7617b83
--- /dev/null
+++ b/net/hsr/hsr_framereg.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright 2011-2012 Autronica Fire and Security AS
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * Author(s):
+ *	2011-2012 Arvid Brodin, arvid.brodin@xdin.com
+ */
+
+#ifndef _HSR_FRAMEREG_H
+#define _HSR_FRAMEREG_H
+
+#include "hsr_private.h"
+
+enum hsr_dev_idx {
+	HSR_DEV_SLAVE1 = 0,
+	HSR_DEV_SLAVE2,
+	HSR_DEV_MASTER,
+};
+
+struct node_entry;
+
+#define HSR_MAX_SLAVE	(HSR_DEV_SLAVE2 + 1)
+#define HSR_MAX_DEV	(HSR_DEV_MASTER + 1)
+/*
+int framereg_add_node(struct hsr_priv *hsr_priv, unsigned char addr[ETH_ALEN],
+				enum hsr_dev_idx dev_idx, unsigned long time,
+				u16 sequence_nr);
+*/
+struct node_entry *framereg_find_node(struct list_head *node_db,
+							struct sk_buff *skb);
+int framereg_merge_node(struct hsr_priv *hsr_priv, enum hsr_dev_idx dev_idx,
+							struct sk_buff *skb);
+void hsr_addr_subst(struct hsr_priv *hsr_priv, struct sk_buff *skb);
+
+void framereg_frame_in(struct node_entry *node, enum hsr_dev_idx);
+
+int framereg_frame_out(struct node_entry *node, enum hsr_dev_idx,
+							struct sk_buff *skb);
+void framereg_prune_nodes(struct list_head *node_db);
+
+void framereg_get_node_times(struct hsr_priv *hsr_priv,
+				unsigned char addr[ETH_ALEN],
+				unsigned long *time1, unsigned long *time2);
+int frameref_create_self_node(struct list_head *self_node_db,
+						unsigned char addr_a[ETH_ALEN],
+						unsigned char addr_b[ETH_ALEN]);
+
+unsigned char *node_get_addr(struct node_entry *node);
+
+#endif /* _HSR_FRAMEREG_H */
diff --git a/net/hsr/hsr_main.c b/net/hsr/hsr_main.c
new file mode 100644
index 0000000..f17f222
--- /dev/null
+++ b/net/hsr/hsr_main.c
@@ -0,0 +1,411 @@
+/*
+ * Copyright 2011-2012 Autronica Fire and Security AS
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * Author(s):
+ *	2011-2012 Arvid Brodin, arvid.brodin@xdin.com
+ *
+ * In addition to routines for registering and unregistering HSR support, this
+ * file also contains the receive routine that handles all incoming frames with
+ * Ethertype (protocol) ETH_P_HSR.
+ */
+
+#include <linux/netdevice.h>
+#include <linux/rculist.h>
+#include <linux/timer.h>
+#include <linux/etherdevice.h>
+#include "hsr_private.h"
+#include "hsr_device.h"
+#include "hsr_netlink.h"
+#include "hsr_framereg.h"
+
+
+/* Multicast address for HSR Supervision frames */
+const u8 hsr_multicast_addr[ETH_ALEN] = {0x01, 0x15, 0x4e, 0x00, 0x01, 0x00};
+
+
+/* List of all registered virtual HSR devices */
+static LIST_HEAD(hsr_list);
+
+void register_hsr_master(struct hsr_priv *hsr_priv)
+{
+	list_add_tail_rcu(&hsr_priv->hsr_list, &hsr_list);
+}
+
+void unregister_hsr_master(struct hsr_priv *hsr_priv)
+{
+	struct hsr_priv *hsr_priv_it;
+
+	list_for_each_entry(hsr_priv_it, &hsr_list, hsr_list)
+		if (hsr_priv_it == hsr_priv) {
+			list_del_rcu(&hsr_priv_it->hsr_list);
+			return;
+		}
+}
+
+
+/*
+ * If dev is a HSR slave device, return the virtual master device. Return NULL
+ * otherwise.
+ */
+static struct hsr_priv *get_hsr_master(struct net_device *dev)
+{
+	struct hsr_priv *hsr_priv;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(hsr_priv, &hsr_list, hsr_list)
+		if ((dev == hsr_priv->slave_data[0].dev) ||
+				(dev == hsr_priv->slave_data[1].dev)) {
+			rcu_read_unlock();
+			return hsr_priv;
+		}
+
+	rcu_read_unlock();
+	return NULL;
+}
+
+/*
+ * If dev is a HSR slave device, return the other slave device. Return NULL
+ * otherwise.
+ */
+static struct hsr_slave_data *get_other_slave(struct hsr_priv *hsr_priv,
+							struct net_device *dev)
+{
+	if (dev == hsr_priv->slave_data[0].dev)
+		return &hsr_priv->slave_data[1];
+	if (dev == hsr_priv->slave_data[1].dev)
+		return &hsr_priv->slave_data[0];
+
+	return NULL;
+}
+
+
+static int hsr_netdev_notify(struct notifier_block *nb, unsigned long event,
+								void *ptr)
+{
+
+/*
+ * Should do:
+ *
+ * - error monitoring (broken link)
+ * - slave monitoring (disallow down, reconfiguring ?)
+
+	register_netdevice_notifier(...);
+	NETDEV_GOING_DOWN
+	NETDEV_CHANGEADDR
+	NETDEV_CHANGE (dev->flags)
+	NETDEV_UNREGISTER
+ */
+
+	struct net_device *slave, *other_slave;
+	struct hsr_priv *hsr_priv;
+	struct hsr_slave_data *other_data;
+	int old_operstate;
+
+	hsr_priv = get_hsr_master(ptr);
+	if (hsr_priv) { /* Is ptr a slave device? */
+		slave = ptr;
+		other_data = get_other_slave(hsr_priv, slave);
+		other_slave = other_data->dev;
+	} else {
+		if (!is_hsr_master(ptr))
+			return NOTIFY_DONE;
+		hsr_priv = netdev_priv(ptr);
+		slave = hsr_priv->slave_data[0].dev;
+		other_slave = hsr_priv->slave_data[1].dev;
+	}
+
+	switch (event) {
+	case NETDEV_UP:		/* Administrative state DOWN */
+//printk(KERN_INFO "Got %s event NETDEV_UP\n", ((struct net_device *) ptr)->name);
+		goto netdev_change;
+	case NETDEV_DOWN:	/* Administrative state UP */
+//printk(KERN_INFO "Got %s event NETDEV_DOWN\n", ((struct net_device *) ptr)->name);
+		goto netdev_change;
+	case NETDEV_CHANGE:	/* Link (carrier) state changes */
+//printk(KERN_INFO "Got %s event NETDEV_CHANGE\n", ((struct net_device *) ptr)->name);
+netdev_change:
+		old_operstate = hsr_priv->dev->operstate;
+		hsr_set_carrier(hsr_priv->dev, slave, other_slave);
+		hsr_set_operstate(hsr_priv->dev, slave, other_slave);
+		hsr_check_announce(hsr_priv->dev, old_operstate);
+	}
+
+	return NOTIFY_DONE;
+}
+
+
+static struct timer_list prune_timer;
+
+static void hsr_prune_nodes(unsigned long data)
+{
+	struct hsr_priv *hsr_priv;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(hsr_priv, &hsr_list, hsr_list)
+		framereg_prune_nodes(&hsr_priv->node_db);
+	rcu_read_unlock();
+
+	prune_timer.expires = jiffies + msecs_to_jiffies(PRUNE_PERIOD);
+	add_timer(&prune_timer);
+}
+
+
+static struct sk_buff *strip_hsr_tag(struct sk_buff *skb)
+{
+	struct hsr_tag *hsr_tag;
+	struct sk_buff *skb2;
+
+	skb2 = skb_share_check(skb, GFP_ATOMIC);
+	if (unlikely(!skb2))
+		goto err_free;
+	skb = skb2;
+
+	if (unlikely(!pskb_may_pull(skb, HSR_TAGLEN)))
+		goto err_free;
+
+	hsr_tag = (struct hsr_tag *) skb->data;
+	skb->protocol = hsr_tag->encap_proto;
+	skb_reset_network_header(skb);  // Huh???
+	skb_pull_rcsum(skb, HSR_TAGLEN);
+
+	return skb;
+
+err_free:
+	kfree_skb(skb);
+	return NULL;
+}
+
+
+/*
+ * The uses I can see for these HSR supervision frames are:
+ * 1) Use the frames that are sent after node initialization ("HSR_TLV.Type =
+ *    22") to reset any sequence_nr counters belonging to that node. Useful if
+ *    the other node's counter has been reset for some reason.
+ *    --
+ *    Or not - resetting the counter and bridging the frame would create a
+ *    loop, unfortunately.
+ *
+ * 2) Use the LifeCheck frames to detect ring breaks. I.e. if no LifeCheck
+ *    frame is received from a particular node, we know something is wrong.
+ *    We just register these (as with normal frames) and throw them away.
+ *
+ * 3) These could also be used to allow different MAC addresses for the two
+ *    slave interfaces. This is mentioned in the standard but not explained.
+ */
+static int handle_supervision_frame(struct hsr_priv *hsr_priv,
+				enum hsr_dev_idx dev_idx, struct sk_buff *skb)
+{
+	struct hsr_supervision_tag *hsr_stag;
+
+	if (compare_ether_addr(eth_hdr(skb)->h_dest, hsr_multicast_addr))
+		return 0;
+
+	hsr_stag = (struct hsr_supervision_tag *) skb->data;
+	if (ntohs(hsr_stag->path_and_HSR_ver) >> 12 != 0x0f)
+		return 0;
+	if ((hsr_stag->HSR_TLV_Type != HSR_TLV_ANNOUNCE) &&
+				(hsr_stag->HSR_TLV_Type != HSR_TLV_LIFE_CHECK))
+		return 0;
+	if (hsr_stag->HSR_TLV_Length != 12)
+		return 0;
+/*
+	if (hsr_stag->HSR_TLV_Type == HSR_TLV_ANNOUNCE)
+		printk(KERN_INFO "HSR: Got Announce frame from %02x\n",
+					eth_hdr(skb)->h_source[ETH_ALEN-1]);
+*/
+	framereg_merge_node(hsr_priv, dev_idx, skb);
+
+	return 1;
+}
+
+
+/*
+ * Implementation somewhat according to IEC-62439-3, p. 43
+ */
+static int hsr_rcv(struct sk_buff *skb, struct net_device *dev,
+			struct packet_type *pt, struct net_device *orig_dev)
+{
+	struct hsr_priv *hsr_priv;
+	struct hsr_slave_data *other_slave_data;
+	struct node_entry *node;
+	int deliver_to_self;
+	struct sk_buff *skb_deliver;
+	enum hsr_dev_idx dev_in_idx, dev_other_idx;
+	int ret;
+
+	hsr_priv = get_hsr_master(dev);
+
+	if (!hsr_priv) {
+		printk(KERN_INFO "HSR: Got HSR frame on non-HSR device; "
+							"dropping it.\n");
+		kfree_skb(skb);
+		return NET_RX_DROP;
+	}
+
+	if (dev == hsr_priv->slave_data[0].dev) {
+		dev_in_idx = HSR_DEV_SLAVE1;
+		dev_other_idx = HSR_DEV_SLAVE2;
+	} else {
+		dev_in_idx = HSR_DEV_SLAVE2;
+		dev_other_idx = HSR_DEV_SLAVE1;
+	}
+
+	node = framereg_find_node(&hsr_priv->self_node_db, skb);
+	if (node) {
+		/* Always kill frames sent by ourselves */
+		kfree_skb(skb);
+		return NET_RX_SUCCESS;
+	}
+
+	/* Receive this frame? */
+	deliver_to_self = 0;
+	if ((skb->pkt_type == PACKET_HOST) ||
+				(skb->pkt_type == PACKET_MULTICAST) ||
+				(skb->pkt_type == PACKET_BROADCAST))
+		deliver_to_self = 1;
+	else if (!compare_ether_addr(eth_hdr(skb)->h_dest,
+						hsr_priv->dev->dev_addr)) {
+		skb->pkt_type = PACKET_HOST;
+		deliver_to_self = 1;
+	}
+
+	if (handle_supervision_frame(hsr_priv, dev_in_idx, skb) == 1)
+		deliver_to_self = 0;
+
+	rcu_read_lock(); /* node_db */
+	node = framereg_find_node(&hsr_priv->node_db, skb);
+	if (!node) {
+		/* Source node unknown; don't create a network loop */
+		rcu_read_unlock();
+		printk(KERN_INFO "HSR: Got HSR frame from unknown node %pM "
+				 "on dev %s: dropping it.\n",
+				 eth_hdr(skb)->h_source, dev->name);
+		kfree_skb(skb);
+		return NET_RX_DROP;
+	}
+
+	if (framereg_frame_out(node, HSR_DEV_MASTER, skb) == 1)
+		deliver_to_self = 0;
+
+	framereg_frame_in(node, dev_in_idx);
+
+	/* Forward this frame? */
+	other_slave_data = NULL;
+	if (skb->pkt_type != PACKET_HOST) {
+		other_slave_data = get_other_slave(hsr_priv, dev);
+		if (framereg_frame_out(node, dev_other_idx, skb) == 1)
+			other_slave_data = NULL;
+	}
+
+	rcu_read_unlock(); /* node_db */
+
+	if (!deliver_to_self && !other_slave_data) {
+		kfree_skb(skb);
+		return NET_RX_SUCCESS;
+	}
+
+	skb_deliver = skb;
+	if (deliver_to_self && other_slave_data) {
+#if !defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && \
+						!defined(CONFIG_NONSTANDARD_HSR)
+		/* We have to memmove the whole payload below */
+		skb_deliver = skb_copy(skb, GFP_ATOMIC);
+#else
+		skb_deliver = skb_clone(skb, GFP_ATOMIC);
+#endif
+		if (!skb_deliver) {
+			deliver_to_self = 0;
+			hsr_priv->dev->stats.rx_dropped++;
+		}
+	}
+
+	if (deliver_to_self) {
+		skb_deliver = strip_hsr_tag(skb_deliver);
+		if (!skb_deliver) {
+			hsr_priv->dev->stats.rx_dropped++;
+			goto forward;
+		}
+#if !defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && \
+						!defined(CONFIG_NONSTANDARD_HSR)
+		/*
+		 * skb_deliver should be linear here, after the call to
+		 * skb_copy() in the block above. We need to memmove the
+		 * whole payload to work around alignment problems caused by
+		 * the 6-byte HSR tag.
+		 */
+		memmove(skb_deliver->data - HSR_TAGLEN, skb_deliver->data,
+							skb_deliver->len);
+		skb_deliver->data -= HSR_TAGLEN;
+		skb_deliver->tail -= HSR_TAGLEN;
+		skb_reset_network_header(skb_deliver); // FIXME - should prbl be mac_header()?
+#endif
+		skb_deliver->dev = hsr_priv->dev;
+		hsr_addr_subst(hsr_priv, skb_deliver);
+		ret = netif_rx(skb_deliver);
+		if (ret == NET_RX_DROP)
+			hsr_priv->dev->stats.rx_dropped++;
+		else {
+			hsr_priv->dev->stats.rx_packets++;
+			hsr_priv->dev->stats.rx_bytes += skb->len;
+		}
+	}
+
+forward:
+	if (other_slave_data) {
+		skb_push(skb, ETH_HLEN);
+		skb->dev = other_slave_data->dev;
+		dev_queue_xmit(skb);
+	}
+
+	return NET_RX_SUCCESS;
+}
+
+
+static struct packet_type hsr_pt __read_mostly = {
+	.type = htons(ETH_P_HSR),
+	.func = hsr_rcv,
+};
+
+static struct notifier_block hsr_nb = {
+	.notifier_call = hsr_netdev_notify,	/* Slave event notifications */
+};
+
+
+static int __init hsr_init(void)
+{
+	int res;
+
+	BUG_ON(sizeof(struct hsr_tag) != HSR_TAGLEN);
+	BUG_ON(sizeof(struct hsr_ethhdr) != ETH_HLEN + HSR_TAGLEN);
+
+	dev_add_pack(&hsr_pt);
+
+	init_timer(&prune_timer);
+	prune_timer.function = hsr_prune_nodes;
+	prune_timer.data = 0;
+	prune_timer.expires = jiffies + msecs_to_jiffies(PRUNE_PERIOD);
+	add_timer(&prune_timer);
+
+	register_netdevice_notifier(&hsr_nb);
+
+	res = hsr_netlink_init();
+
+	return res;
+}
+
+static void __exit hsr_exit(void)
+{
+	unregister_netdevice_notifier(&hsr_nb);
+	del_timer(&prune_timer);
+	hsr_netlink_exit();
+	dev_remove_pack(&hsr_pt);
+}
+
+module_init(hsr_init);
+module_exit(hsr_exit);
+MODULE_LICENSE("GPL");
diff --git a/net/hsr/hsr_netlink.c b/net/hsr/hsr_netlink.c
new file mode 100644
index 0000000..fee910a
--- /dev/null
+++ b/net/hsr/hsr_netlink.c
@@ -0,0 +1,293 @@
+/*
+ * Copyright 2011-2012 Autronica Fire and Security AS
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * Author(s):
+ *	2011-2012 Arvid Brodin, arvid.brodin@xdin.com
+ *
+ * Routines for handling Netlink messages for HSR.
+ */
+
+#include "hsr_netlink.h"
+#include <linux/kernel.h>
+#include <net/rtnetlink.h>
+#include <net/genetlink.h>
+#include "hsr_private.h"
+#include "hsr_device.h"
+#include "hsr_framereg.h"
+
+static const struct nla_policy hsr_policy[IFLA_HSR_MAX + 1] = {
+	[IFLA_HSR_SLAVE1]	= { .type = NLA_U32 },
+	[IFLA_HSR_SLAVE2]	= { .type = NLA_U32 },
+};
+
+
+/*
+ * Here, it seems a netdevice has already been allocated for us, and the
+ * hsr_dev_setup routine has been executed. Nice!
+ */
+static int hsr_newlink(struct net *src_net, struct net_device *dev,
+				struct nlattr *tb[], struct nlattr *data[])
+{
+	struct net_device *link[2];
+
+	if (!data[IFLA_HSR_SLAVE1]) {
+		printk(KERN_INFO "IFLA_HSR_SLAVE1 missing!\n");
+		return -EINVAL;
+	}
+	link[0] = __dev_get_by_index(src_net, nla_get_u32(data[IFLA_HSR_SLAVE1]));
+	if (!data[IFLA_HSR_SLAVE2]) {
+		printk(KERN_INFO "IFLA_HSR_SLAVE2 missing!\n");
+		return -EINVAL;
+	}
+	link[1] = __dev_get_by_index(src_net, nla_get_u32(data[IFLA_HSR_SLAVE2]));
+
+	if (!link[0] || !link[1])
+		return -ENODEV;
+	if (link[0] == link[1])
+		return -EINVAL;
+
+	return hsr_dev_finalize(dev, link);
+}
+
+static struct rtnl_link_ops hsr_link_ops __read_mostly = {
+	.kind		= "hsr",
+	.maxtype	= IFLA_HSR_MAX,
+	.policy		= hsr_policy,
+	.priv_size	= sizeof(struct hsr_priv),
+	.setup		= hsr_dev_setup,
+//	.validate	= vlan_validate,
+	.newlink	= hsr_newlink,
+//	.changelink	= vlan_changelink,
+//	.dellink	= hsr_dellink,  dev->destructor() called automatically?
+//	.get_size	= vlan_get_size,
+//	.fill_info	= vlan_fill_info,
+};
+
+
+
+/* attribute policy */
+/* NLA_BINARY missing in libnl; use unspec in userspace instead. */
+static struct nla_policy hsr_genl_policy[HSR_A_MAX + 1] = {
+	[HSR_A_NODE_ADDR] = { .type = NLA_BINARY, .len = ETH_ALEN },
+	[HSR_A_IFINDEX] = { .type = NLA_U32 },
+	[HSR_A_IF1AGE] = { .type = NLA_U32 }, /* 32-bit int */
+	[HSR_A_IF2AGE] = { .type = NLA_U32 }, /* 32-bit int */
+};
+
+static struct genl_family hsr_genl_family = {
+	.id = GENL_ID_GENERATE,
+	.hdrsize = 0,
+	.name = "HSR",
+	.version = 1,
+	.maxattr = HSR_A_MAX,
+};
+
+static struct genl_multicast_group hsr_network_genl_mcgrp = {
+	.name = "hsr-network",
+};
+
+static int hsr_genl_seq = 0;
+
+
+
+struct sk_buff *hsr_create_genl_msg(void **pmsg_head, unsigned gfp, int cmd)
+{
+	struct sk_buff *skb;
+
+//	printk("Sending HSR_C_[%d]\n", cmd);
+
+	skb = genlmsg_new(NLMSG_GOODSIZE, gfp);
+	if (!skb)
+		return NULL;
+
+	*pmsg_head = genlmsg_put(skb, 0, hsr_genl_seq++, &hsr_genl_family, 0, cmd);
+	if (!pmsg_head) {
+		kfree_skb(skb);
+		return NULL;
+	}
+
+	return skb;
+}
+
+
+/*
+ * This is called if for some node with MAC address addr, we only get frames
+ * over one of the slave interfaces. This would indicate an open network ring
+ * (i.e. a link has failed somewhere).
+ */
+void hsr_nl_ringerror(unsigned char addr[ETH_ALEN], int dev_idx)
+{
+	struct sk_buff *skb;
+	void *msg_head;
+
+	skb = hsr_create_genl_msg(&msg_head, GFP_ATOMIC, HSR_C_RING_ERROR);
+	if (!skb)
+		return;
+
+	NLA_PUT(skb, HSR_A_NODE_ADDR, ETH_ALEN, addr);
+	NLA_PUT_U32(skb, HSR_A_IFINDEX, dev_idx);
+
+	genlmsg_end(skb, msg_head);
+	genlmsg_multicast(skb, 0, hsr_network_genl_mcgrp.id, GFP_ATOMIC);
+
+	return;
+
+nla_put_failure:
+	kfree_skb(skb);
+}
+
+/*
+ * This is called when we haven't heard from the node with MAC address addr for
+ * some time (before the node is removed from the node table/list).
+ */
+void hsr_nl_nodedown(unsigned char addr[ETH_ALEN])
+{
+	struct sk_buff *skb;
+	void *msg_head;
+
+	skb = hsr_create_genl_msg(&msg_head, GFP_ATOMIC, HSR_C_NODE_DOWN);
+	if (!skb)
+		return;
+
+	NLA_PUT(skb, HSR_A_NODE_ADDR, ETH_ALEN, addr);
+
+	genlmsg_end(skb, msg_head);
+	genlmsg_multicast(skb, 0, hsr_network_genl_mcgrp.id, GFP_ATOMIC);
+
+	return;
+
+nla_put_failure:
+	kfree_skb(skb);
+}
+
+/*
+ * HSR_C_GET_NODE_STATUS lets userspace query the internal HSR node table
+ * about the status of a specific node in the network, defined by its MAC
+ * address.
+ *
+ * Input: hsr ifindex, node mac address
+ * Output: hsr ifindex, node mac address (copied from request),
+ * 	   age of latest frame from node over slave 1, slave 2 [ms]
+ */
+static int hsr_get_node_status(struct sk_buff *skb_in, struct genl_info *info)
+{
+	/* For receiving */
+	struct nlattr *na;
+	char *node_addr;
+	struct net_device *hsr_dev;
+
+	/* For sending */
+	struct sk_buff *skb_out;
+	void *msg_head;
+	struct hsr_priv *hsr_priv;
+	unsigned long time1, time2;
+
+	if (!info)
+		goto invalid;
+
+	na = info->attrs[HSR_A_IFINDEX];
+	if (!na)
+		goto invalid;
+	na = info->attrs[HSR_A_NODE_ADDR];
+	if (!na)
+		goto invalid;
+
+	hsr_dev = __dev_get_by_index(genl_info_net(info),
+					nla_get_u32(info->attrs[HSR_A_IFINDEX]));
+	if (!hsr_dev)
+		goto invalid;
+	if (!is_hsr_master(hsr_dev))
+		goto invalid;
+
+
+	/* Send reply */
+
+	skb_out = hsr_create_genl_msg(&msg_head, GFP_ATOMIC,
+							HSR_C_SET_NODE_STATUS);
+	if (!skb_out)
+		return -ENOMEM;
+
+	NLA_PUT_U32(skb_out, HSR_A_IFINDEX, hsr_dev->ifindex);
+
+	node_addr = nla_data(info->attrs[HSR_A_NODE_ADDR]);
+	NLA_PUT(skb_out, HSR_A_NODE_ADDR, ETH_ALEN, node_addr);
+
+	hsr_priv = netdev_priv(hsr_dev);
+	framereg_get_node_times(hsr_priv, node_addr, &time1, &time2);
+
+	NLA_PUT_U32(skb_out, HSR_A_IF1AGE, time1 ?
+					jiffies_to_msecs(jiffies - time1) : -1);
+	NLA_PUT_U32(skb_out, HSR_A_IF2AGE, time2 ?
+					jiffies_to_msecs(jiffies - time2) : -1);
+
+	genlmsg_end(skb_out, msg_head);
+	genlmsg_unicast(genl_info_net(info), skb_out, info->snd_pid);
+
+	return 0;
+
+nla_put_failure:
+	kfree_skb(skb_out);
+
+	return -ENOMEM;
+
+invalid:
+	return -EINVAL;
+}
+
+static struct genl_ops hsr_ops_get_node_status = {
+	.cmd = HSR_C_GET_NODE_STATUS,
+	.flags = 0,
+	.policy = hsr_genl_policy,
+	.doit = hsr_get_node_status,
+	.dumpit = NULL,
+};
+
+
+int __init hsr_netlink_init(void)
+{
+	int rc;
+
+	rc = rtnl_link_register(&hsr_link_ops);
+	if (rc)
+		goto fail_rtnl_link_register;
+
+	rc = genl_register_family(&hsr_genl_family);
+	if (rc)
+		goto fail_genl_register_family;
+
+	rc = genl_register_ops(&hsr_genl_family, &hsr_ops_get_node_status);
+	if (rc)
+		goto fail_genl_register_ops;
+
+	rc = genl_register_mc_group(&hsr_genl_family, &hsr_network_genl_mcgrp);
+	if (rc)
+		goto fail_genl_register_mc_group;
+
+	return 0;
+
+fail_genl_register_mc_group:
+	genl_unregister_ops(&hsr_genl_family, &hsr_ops_get_node_status);
+fail_genl_register_ops:
+	genl_unregister_family(&hsr_genl_family);
+fail_genl_register_family:
+	rtnl_link_unregister(&hsr_link_ops);
+fail_rtnl_link_register:
+
+	return rc;
+}
+
+void __exit hsr_netlink_exit(void)
+{
+	genl_unregister_mc_group(&hsr_genl_family, &hsr_network_genl_mcgrp);
+	genl_unregister_ops(&hsr_genl_family, &hsr_ops_get_node_status);
+	genl_unregister_family(&hsr_genl_family);
+
+	rtnl_link_unregister(&hsr_link_ops);
+}
+
+MODULE_ALIAS_RTNL_LINK("hsr");
diff --git a/net/hsr/hsr_netlink.h b/net/hsr/hsr_netlink.h
new file mode 100644
index 0000000..4282d9f
--- /dev/null
+++ b/net/hsr/hsr_netlink.h
@@ -0,0 +1,64 @@
+/*
+ * Copyright 2011-2012 Autronica Fire and Security AS
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * Author(s):
+ *	2011-2012 Arvid Brodin, arvid.brodin@xdin.com
+ */
+
+#ifndef __HSR_NETLINK_H
+#define __HSR_NETLINK_H
+
+/* attributes */
+enum {
+	HSR_A_UNSPEC,
+	HSR_A_NODE_ADDR,
+	HSR_A_IFINDEX,
+	HSR_A_IF1AGE,
+	HSR_A_IF2AGE,
+	__HSR_A_MAX,
+};
+#define HSR_A_MAX (__HSR_A_MAX - 1)
+
+
+#ifdef __KERNEL__
+
+#include <linux/if_ether.h>
+#include <linux/module.h>
+
+int __init hsr_netlink_init(void);
+void __exit hsr_netlink_exit(void);
+
+void hsr_nl_ringerror(unsigned char addr[ETH_ALEN], int dev_idx);
+void hsr_nl_nodedown(unsigned char addr[ETH_ALEN]);
+void hsr_nl_framedrop(int dropcount, int dev_idx);
+void hsr_nl_linkdown(int dev_idx);
+
+
+/*
+ * Generic Netlink HSR family definition
+ */
+
+
+#endif /* __KERNEL__ */
+
+
+
+/* commands */
+enum {
+	HSR_C_UNSPEC,
+	HSR_C_RING_ERROR,
+	HSR_C_NODE_DOWN,
+	HSR_C_GET_NODE_STATUS,
+	HSR_C_SET_NODE_STATUS,
+	__HSR_C_MAX,
+};
+#define HSR_C_MAX (__HSR_C_MAX - 1)
+
+
+
+#endif /* __HSR_NETLINK_H */
diff --git a/net/hsr/hsr_private.h b/net/hsr/hsr_private.h
new file mode 100644
index 0000000..1522907
--- /dev/null
+++ b/net/hsr/hsr_private.h
@@ -0,0 +1,114 @@
+/*
+ * Copyright 2011-2012 Autronica Fire and Security AS
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ * Author(s):
+ *	2011-2012 Arvid Brodin, arvid.brodin@xdin.com
+ */
+
+#ifndef _HSR_PRIVATE_H
+#define _HSR_PRIVATE_H
+
+#include <linux/netdevice.h>
+#include <linux/list.h>
+
+
+/*
+ * Time constants as specified in the HSR specification (IEC-62439-3) Table 8.
+ * All values in milliseconds.
+ */
+#define HSR_LIFE_CHECK_INTERVAL		 2000 /* ms */
+#define HSR_NODE_FORGET_TIME		60000 /* ms */
+#define HSR_ANNOUNCE_INTERVAL		  100 /* ms */
+
+/*
+ * By how much may slave1 and slave2 timestamps of latest received frame from
+ * each node differ before we notify of communication problem?
+ */
+#define MAX_SLAVE_DIFF			 3000 /* ms */
+
+/*
+ * How often shall we check for broken ring and remove node entries older than
+ * HSR_NODE_FORGET_TIME?
+ */
+#define PRUNE_PERIOD			 3000 /* ms */
+
+
+#define HSR_TLV_ANNOUNCE		   22
+#define HSR_TLV_LIFE_CHECK		   23
+
+
+/*
+ * HSR Tag.
+ * As defined in IEC-62439-3, the HSR tag is really { ethertype = 0x88FB, path,
+ * LSDU_size, sequence Nr }. But we let eth_header() create { h_dest, h_source,
+ * h_proto = 0x88FB }, and add { path, LSDU_size, sequence Nr, encapsulated
+ * protocol } instead.
+ */
+#ifdef CONFIG_NONSTANDARD_HSR
+#define HSR_TAGLEN	8
+#else
+#define HSR_TAGLEN	6
+#endif
+struct hsr_tag {
+/*
+	This is nice but I'm not sure it is "portably compatible" with
+	endianness swaps:
+	__be16		path:4;
+	__be16		LSDU_size:12;
+*/
+	__be16		path_and_LSDU_size;
+	__be16		sequence_nr;
+	__be16		encap_proto;
+#ifdef CONFIG_NONSTANDARD_HSR
+	__be16		padding;
+#endif
+} __packed;
+
+struct hsr_ethhdr {
+	struct ethhdr	ethhdr;
+	struct hsr_tag	hsr_tag;
+} __packed;
+
+
+struct hsr_supervision_tag {
+	__be16		path_and_HSR_ver;
+	__be16		sequence_nr;
+	__u8		HSR_TLV_Type;
+	__u8		HSR_TLV_Length;
+#ifdef CONFIG_NONSTANDARD_HSR
+	__be16		padding;
+#endif
+	unsigned char	MacAddressA[ETH_ALEN];
+} __packed;
+
+
+struct hsr_slave_data {
+	struct net_device	*dev;
+	int promisc;
+	int was_up;
+};
+
+struct hsr_priv {
+	struct list_head	hsr_list;	/* List of hsr devices */
+	struct rcu_head		rcu_head;
+	struct net_device	*dev;
+	struct hsr_slave_data	slave_data[2];
+	struct list_head	node_db;	/* Other HSR nodes */
+	struct list_head	self_node_db;	/* MACs of slaves */
+	struct timer_list	announce_timer;	/* Supervision frame dispatch */
+	int announce_count;
+	u16 sequence_nr;
+	spinlock_t seqlock;
+};
+
+extern const u8 hsr_multicast_addr[ETH_ALEN];
+
+void register_hsr_master(struct hsr_priv *hsr_priv);
+void unregister_hsr_master(struct hsr_priv *hsr_priv);
+
+#endif /*  _HSR_PRIVATE_H */


-- 
Arvid Brodin | Consultant (Linux)
XDIN AB | Jan Stenbecks Torg 17 | SE-164 40 Kista | Sweden | xdin.com

^ permalink raw reply related

* [RFC v2 0/2] net/hsr: Add support for IEC 62439-3 High-availability Seamless Redundancy
From: Arvid Brodin @ 2012-07-03 23:58 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: Stephen Hemminger, Alexey Kuznetsov, Javier Boticario,
	Bruno Ferreira

Hi,

This is v2 of this RFC. Background information about this patch and HSR can be found in
the v1 submit: http://www.spinics.net/lists/netdev/msg192817.html.

This patch is now quite useable.

Known major functional problems:
 * Sometimes, when used with slaves that are already up and using IPv6, no packets seem
   to get through. (Possibly a bug in the driver for my slave interfaces - not sure yet.)

Other problems:
 * There are a few FIXMEs that might need attention - except "reset status of slaves",
   these are things I could use some help with.
 * The kernel patch is against linux-next-120330 (so not quite up to date).
 * The iproute patch is against iproute-2.6.35.
 * The code needs cleanup, there are some commented debug printouts and such.

Major changes to v1:
 * Added iproute patch.
 * Fixed lockdep problem.
 * Moved to Generic Netlink for HSR-specific userspace comms.
 * Added userspace Generic Netlink example.
 * Added HSR address substitution (so slaves kan keep their MAC addresses).


Part 1/2 contains the kernel patch (including the userspace example).
Part 2/2 contains the iproute patch (applies to iproute-2.6.35).


(I'm going on holiday on thursday so don't feel hurt if you write me and don't get a
reply. I'll check your messages when I get back early august though!)

-- 
Arvid Brodin | Consultant (Linux)
XDIN AB | Jan Stenbecks Torg 17 | SE-164 40 Kista | Sweden | xdin.com

^ permalink raw reply

* Re: [PATCH v6] sctp: be more restrictive in transport selection on bundled sacks
From: Neil Horman @ 2012-07-03 23:42 UTC (permalink / raw)
  To: Jan Ceuleers; +Cc: David Miller, netdev, vyasevich, linux-sctp
In-Reply-To: <4FF33DBE.7060103@computer.org>

On Tue, Jul 03, 2012 at 08:45:18PM +0200, Jan Ceuleers wrote:
> On 07/02/2012 02:25 PM, Neil Horman wrote:
> ...
> 
> > How does this language sound to you?
> ...
> 
> > +tree maintanier may reapply the subsystem maintainers Acked-by: to the new
> 
> s/maintanier/maintainer/
> 
> HTH, Jan
Thanks Jan, I've fixed it in my local tree, it'll get rolled in the next
version. 
Neil

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [PATCH] netdev: driver: ethernet: add sysfs interface for ti cpsw
From: John Fastabend @ 2012-07-03 23:38 UTC (permalink / raw)
  To: s-paulraj; +Cc: David Miller, cyril, mugunthanvnm, netdev
In-Reply-To: <20120703.161623.1843927245895613219.davem@davemloft.net>

On 7/3/2012 4:16 PM, David Miller wrote:
> From: <s-paulraj@ti.com>
> Date: Tue, 3 Jul 2012 15:51:26 -0400
>
>> From: Sandeep Paulraj <s-paulraj@ti.com>
>>
>> This patch adds sysfs entries for address lookup engine entries and
>> control for the ALE(address lookup engine) found in TI SOC's.
>>
>>
>> Signed-off-by: Sandeep Paulraj <s-paulraj@ti.com>
>
> You may not create private, driver specific, unique interfaces to
> configure your hardware.
>
> You must use existing facilities such as ethtool to add such things.
> If the existing facilities are insufficient, you must extend them to
> meet your (and potentially other's) needs.
> --

I can't seem to dig up the original email perhaps my server is slow
today but did you consider using these hooks ndo_fdb_add(),
ndo_fdb_del(), and ndo_fdb_dump() added recently.

These are for adding/deleting and dumping the address forwarding
databases. Failing that would something like this RFC with another
attribute work,

http://comments.gmane.org/gmane.linux.network/232104

Thanks,
John

^ permalink raw reply

* Re: [PATCH 12/19] neigh: Convert over to dst_neigh_lookup_skb().
From: David Miller @ 2012-07-03 23:18 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev
In-Reply-To: <1341353309.2839.19.camel@bwh-desktop.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Tue, 3 Jul 2012 23:08:29 +0100

> On Tue, 2012-07-03 at 02:46 -0700, David Miller wrote:
>> Signed-off-by: David S. Miller <davem@davemloft.net>
>> ---
>>  net/core/neighbour.c |   10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>> 
>> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
>> index a793af9..eb3efdc 100644
>> --- a/net/core/neighbour.c
>> +++ b/net/core/neighbour.c
>> @@ -1202,9 +1202,15 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
>>  
>>  			rcu_read_lock();
>>  			/* On shaper/eql skb->dst->neighbour != neigh :( */
> 
> It might be time to delete that comment too.

It's still accurate, so it needs to be adjusted rather then removed.
sch_teql creates this situation as well.

What this code is effectively doing is reinjecting the packet back to
the top-most neigh, and it will filter back down to the thing that
uses a different neigh for packet output.

^ permalink raw reply

* [PATCH] net/fsl_pq_mdio: use spin_event_timeout() to poll the indicator register
From: Timur Tabi @ 2012-07-03 23:16 UTC (permalink / raw)
  To: Andy Fleming, davem, netdev

Macro spin_event_timeout() was designed for simple polling of hardware
registers with a timeout, so use it when we poll the MIIMIND register.
This allows us to return an error code instead of polling indefinitely.

Note that PHY_INIT_TIMEOUT is a count of loop iterations, so we can't use
it for spin_event_timeout(), which asks for microseconds.

Signed-off-by: Timur Tabi <timur@freescale.com>
---
 drivers/net/ethernet/freescale/fsl_pq_mdio.c |   25 +++++++++++++++----------
 1 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fsl_pq_mdio.c b/drivers/net/ethernet/freescale/fsl_pq_mdio.c
index 9eb8159..ab0fabd 100644
--- a/drivers/net/ethernet/freescale/fsl_pq_mdio.c
+++ b/drivers/net/ethernet/freescale/fsl_pq_mdio.c
@@ -64,6 +64,8 @@ struct fsl_pq_mdio_priv {
 int fsl_pq_local_mdio_write(struct fsl_pq_mdio __iomem *regs, int mii_id,
 		int regnum, u16 value)
 {
+	u32 status;
+
 	/* Set the PHY address and the register address we want to write */
 	out_be32(&regs->miimadd, (mii_id << 8) | regnum);
 
@@ -71,10 +73,10 @@ int fsl_pq_local_mdio_write(struct fsl_pq_mdio __iomem *regs, int mii_id,
 	out_be32(&regs->miimcon, value);
 
 	/* Wait for the transaction to finish */
-	while (in_be32(&regs->miimind) & MIIMIND_BUSY)
-		cpu_relax();
+	status = spin_event_timeout(!(in_be32(&regs->miimind) &	MIIMIND_BUSY),
+		1000, 0);
 
-	return 0;
+	return status ? 0 : -ETIMEDOUT;
 }
 
 /*
@@ -91,6 +93,7 @@ int fsl_pq_local_mdio_read(struct fsl_pq_mdio __iomem *regs,
 		int mii_id, int regnum)
 {
 	u16 value;
+	u32 status;
 
 	/* Set the PHY address and the register address we want to read */
 	out_be32(&regs->miimadd, (mii_id << 8) | regnum);
@@ -99,9 +102,11 @@ int fsl_pq_local_mdio_read(struct fsl_pq_mdio __iomem *regs,
 	out_be32(&regs->miimcom, 0);
 	out_be32(&regs->miimcom, MII_READ_COMMAND);
 
-	/* Wait for the transaction to finish */
-	while (in_be32(&regs->miimind) & (MIIMIND_NOTVALID | MIIMIND_BUSY))
-		cpu_relax();
+	/* Wait for the transaction to finish, normally less than 100us */
+	status = spin_event_timeout(!(in_be32(&regs->miimind) &
+		(MIIMIND_NOTVALID | MIIMIND_BUSY)), 1000, 0);
+	if (!status)
+		return -ETIMEDOUT;
 
 	/* Grab the value of the register from miimstat */
 	value = in_be32(&regs->miimstat);
@@ -144,7 +149,7 @@ int fsl_pq_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
 static int fsl_pq_mdio_reset(struct mii_bus *bus)
 {
 	struct fsl_pq_mdio __iomem *regs = fsl_pq_mdio_get_regs(bus);
-	int timeout = PHY_INIT_TIMEOUT;
+	u32 status;
 
 	mutex_lock(&bus->mdio_lock);
 
@@ -155,12 +160,12 @@ static int fsl_pq_mdio_reset(struct mii_bus *bus)
 	out_be32(&regs->miimcfg, MIIMCFG_INIT_VALUE);
 
 	/* Wait until the bus is free */
-	while ((in_be32(&regs->miimind) & MIIMIND_BUSY) && timeout--)
-		cpu_relax();
+	status = spin_event_timeout(!(in_be32(&regs->miimind) &	MIIMIND_BUSY),
+		1000, 0);
 
 	mutex_unlock(&bus->mdio_lock);
 
-	if (timeout < 0) {
+	if (!status) {
 		printk(KERN_ERR "%s: The MII Bus is stuck!\n",
 				bus->name);
 		return -EBUSY;
-- 
1.7.3.4

^ permalink raw reply related

* Re: [PATCH] netdev: driver: ethernet: add sysfs interface for ti cpsw
From: David Miller @ 2012-07-03 23:16 UTC (permalink / raw)
  To: s-paulraj; +Cc: cyril, mugunthanvnm, netdev
In-Reply-To: <1341345086-25093-1-git-send-email-s-paulraj@ti.com>

From: <s-paulraj@ti.com>
Date: Tue, 3 Jul 2012 15:51:26 -0400

> From: Sandeep Paulraj <s-paulraj@ti.com>
> 
> This patch adds sysfs entries for address lookup engine entries and
> control for the ALE(address lookup engine) found in TI SOC's.
> 
> 
> Signed-off-by: Sandeep Paulraj <s-paulraj@ti.com>

You may not create private, driver specific, unique interfaces to
configure your hardware.

You must use existing facilities such as ethtool to add such things.
If the existing facilities are insufficient, you must extend them to
meet your (and potentially other's) needs.

^ permalink raw reply

* Re: [PATCH net 1/7] qlge: Fixed packet transmit errors due to potential driver errors.
From: David Miller @ 2012-07-03 23:14 UTC (permalink / raw)
  To: jitendra.kalsaria; +Cc: netdev, ron.mercer, Dept_NX_Linux_NIC_Driver
In-Reply-To: <5E4F49720D0BAD499EE1F01232234BA877435B2797@AVEXMB1.qlogic.org>

From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Date: Tue, 3 Jul 2012 12:38:04 -0700

> This patch was intended to remove the line that increments the
> tx_error statistic when the queue is correctly stopped.

It isn't correctly stopped, for the millionth time this condition is a
BUG, there is a kernel log message there because it is a BUG, are you
blind?
	if (unlikely(atomic_read(&tx_ring->tx_count) < 2)) {
		netif_info(qdev, tx_queued, qdev->ndev,
			   "%s: shutting down tx queue %d du to lack of resources.\n",
			   __func__, tx_ring_idx);
		netif_stop_subqueue(ndev, tx_ring->wq_id);
		atomic_inc(&tx_ring->queue_stopped);
		tx_ring->tx_errors++;
		return NETDEV_TX_BUSY;
	}

THIS CODE BLOCK SHOULD NEVER EXECUTE.  It's a driver bug, it should
never happen.

Even if the driver recovers correctly, it's still an error condition.

It's a bug, and bumping the statistic is not wrong at all.  You should
find out why this happens, because it's a bug, and it should be fixed.

^ permalink raw reply

* Re: [RFC PATCH 00/10] Make XPS usable within ixgbe
From: John Fastabend @ 2012-07-03 22:41 UTC (permalink / raw)
  To: Tom Herbert, Alexander Duyck
  Cc: netdev, davem, jeffrey.t.kirsher, edumazet, bhutchings,
	alexander.duyck
In-Reply-To: <CA+mtBx-ZGFUT6JcxiLcTOMteN1-DS5_eoAC5CgU7UEFup1FaqQ@mail.gmail.com>

On 7/3/2012 3:30 PM, Tom Herbert wrote:
> Hi Alexander,
>
> Thanks for this work!
>
> Some general comments:
>
> 1) skb_tx_hash is called from a handful of drivers (bnx2x, ixgbe,
> mlx4, and bonding).  Would it make sent to call xps_get_cpu from that
> function (unfortunately the use of ndo_select_queue is likely
> bypassing xps unnecessarily in these drivers).

I suspect we can get rid of the select_queue cases for at least
bnx2x, ixgbe, and mlx4. We might need to be a bit clever to resolve
the mlx4 case but should be doable.

Anyways I would like to see these cases refactored away.

> 2) Instead of (or maybe in addition to) allowing driver to program xps
> maps, we could parameterize get_xps_cpu to optionally include a bit
> map of acceptable queues.  This would be useful to define a
> hierarchical queue selection (like first choose a set for QoS, then
> amongst those chose one base on xps).
>

Agreed.

We likely need something like (2) to get this to work with mqprio and
other QOS schemes in use.

.John

^ permalink raw reply

* Re: [PATCH net 1/7] qlge: Fixed packet transmit errors due to potential driver errors.
From: Francois Romieu @ 2012-07-03 22:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jitendra Kalsaria, David Miller, netdev, Ron Mercer,
	Dept-NX Linux NIC Driver
In-Reply-To: <1341347235.2583.796.camel@edumazet-glaptop>

Eric Dumazet <eric.dumazet@gmail.com> :
> On Tue, 2012-07-03 at 12:38 -0700, Jitendra Kalsaria wrote:
> 
> > I think my patch description might have been misleading. We are not
> > fixing a logical problem but rather a statistics reporting problem.
> > Our transmit function is not getting called when queue is full but
> > when we stop the queue it increment tx_error statistic and one of our
> > customers is running a test that deliberately floods the queue causing
> > it to periodically be stopped. The customer has not reported logical
> > problem with the test were driver perform very well but they merely
> > pointed out that we were incorrectly reporting the queue full
> > condition as a tx_error.
> > 
> > This patch was intended to remove the line that increments the
> > tx_error statistic when the queue is correctly stopped.
> 
> I believe everybody kindly ask you to fix the driver logic instead
> of trying to hide to your customers the problems.

:o/

Jitendra was speaking about qlge_ethtool_ops.self_test(). It will need
fixing as well.

> In fact, you could just BUG() at this point, and maybe David will accept
> such a patch.

Mildly. It would turn qlge_ethtool_ops.self_test() into a system killer.

[...]
> testing atomic_read(&tx_ring->tx_count) at the beginning of qlge_send()
> is too late. NETDEV_TX_BUSY is deprecated.

Yes.

Returning NETDEV_TX_BUSY when dma mapping fails in ql_map_send isn't nice
either.

-- 
Ueimor

^ permalink raw reply

* RE: [PATCH net 1/7] qlge: Fixed packet transmit errors due to potential driver errors.
From: Jitendra Kalsaria @ 2012-07-03 22:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Ron Mercer, Dept-NX Linux NIC Driver,
	krkumar2@in.ibm.com
In-Reply-To: <1341347235.2583.796.camel@edumazet-glaptop>



-----Original Message-----
>From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
>Sent: Tuesday, July 03, 2012 1:27 PM
>To: Jitendra Kalsaria
>Cc: David Miller; netdev; Ron Mercer; Dept-NX Linux NIC Driver
>Subject: RE: [PATCH net 1/7] qlge: Fixed packet transmit errors due to potential driver errors.
>
>On Tue, 2012-07-03 at 12:38 -0700, Jitendra Kalsaria wrote:
>
>I believe everybody kindly ask you to fix the driver logic instead
>of trying to hide to your customers the problems.
>
>In fact, you could just BUG() at this point, and maybe David will accept
>such a patch.
>
>testing atomic_read(&tx_ring->tx_count) at the beginning of qlge_send()
>is too late. NETDEV_TX_BUSY is deprecated.
>
>Well behaving drivers should perform the test at the end of their
>ndo_start_xmit() and stop the queue so that next packet wont come at
>all.

Thanks everyone and will definitely change the logic.




^ permalink raw reply

* Re: [RFC PATCH 00/10] Make XPS usable within ixgbe
From: Tom Herbert @ 2012-07-03 22:30 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, davem, jeffrey.t.kirsher, edumazet, bhutchings,
	alexander.duyck
In-Reply-To: <20120630000652.29939.11108.stgit@gitlad.jf.intel.com>

Hi Alexander,

Thanks for this work!

Some general comments:

1) skb_tx_hash is called from a handful of drivers (bnx2x, ixgbe,
mlx4, and bonding).  Would it make sent to call xps_get_cpu from that
function (unfortunately the use of ndo_select_queue is likely
bypassing xps unnecessarily in these drivers).
2) Instead of (or maybe in addition to) allowing driver to program xps
maps, we could parameterize get_xps_cpu to optionally include a bit
map of acceptable queues.  This would be useful to define a
hierarchical queue selection (like first choose a set for QoS, then
amongst those chose one base on xps).

Tom

On Fri, Jun 29, 2012 at 5:16 PM, Alexander Duyck
<alexander.h.duyck@intel.com> wrote:
> The following patch series makes it so that the ixgbe driver can support
> ATR even when the number of queues is less than the number of CPUs.  To do
> this I have updated the kernel to support letting drivers set their own XPS
> configuration.  To do this it was necessary to move the code out of the
> sysfs specific code and into the dev specific regions.
>
> I am still working out a few issues such as the fact that with routing I
> only ever seem to be able to get the first queue that is mapped to the CPU
> when XPS is enabled.
>
> Also I am looking for input on if it is acceptable to only let the
> set_channels/get_channels calls report/set the number of queues per traffic
> class as I implemented the code this way to avoid any significant conflicts
> between the DCB traffic classes code and these functions.
>
> ---
>
> Alexander Duyck (10):
>       ixgbe: Add support for set_channels ethtool operation
>       ixgbe: Add support for displaying the number of Tx/Rx channels
>       ixgbe: Update ixgbe driver to use __dev_pick_tx in ixgbe_select_queue
>       ixgbe: Add function for setting XPS queue mapping
>       ixgbe: Define FCoE and Flow director limits much sooner to allow for changes
>       net: Add support for XPS without SYSFS being defined
>       net: Rewrite netif_set_xps_queues to address several issues
>       net: Rewrite netif_reset_xps_queue to allow for better code reuse
>       net: Add functions netif_reset_xps_queue and netif_set_xps_queue
>       net: Split core bits of dev_pick_tx into __dev_pick_tx
>
>
>  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |  112 +++++++++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |   10 -
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |   48 +++-
>  include/linux/netdevice.h                        |   15 +
>  net/Kconfig                                      |    2
>  net/core/dev.c                                   |  283 ++++++++++++++++++++--
>  net/core/net-sysfs.c                             |  160 ------------
>  7 files changed, 428 insertions(+), 202 deletions(-)
>
> --
> Thanks,
>
> Alex

^ permalink raw reply

* Re: [PATCH 12/19] neigh: Convert over to dst_neigh_lookup_skb().
From: Ben Hutchings @ 2012-07-03 22:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20120703.024644.638173463391088464.davem@davemloft.net>

On Tue, 2012-07-03 at 02:46 -0700, David Miller wrote:
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>  net/core/neighbour.c |   10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index a793af9..eb3efdc 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -1202,9 +1202,15 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
>  
>  			rcu_read_lock();
>  			/* On shaper/eql skb->dst->neighbour != neigh :( */

It might be time to delete that comment too.

Ben.

> -			if (dst && (n2 = dst_get_neighbour_noref(dst)) != NULL)
> -				n1 = n2;
> +			n2 = NULL;
> +			if (dst) {
> +				n2 = dst_neigh_lookup_skb(dst, skb);
> +				if (n2)
> +					n1 = n2;
> +			}
>  			n1->output(n1, skb);
> +			if (n2)
> +				neigh_release(n2);
>  			rcu_read_unlock();
>  
>  			write_lock_bh(&neigh->lock);

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH] netem: fix rate extension and drop accounting
From: Hagen Paul Pfeifer @ 2012-07-03 22:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Yuchung Cheng, Andreas Terzis, Mark Gordon
In-Reply-To: <1341309257.2583.153.camel@edumazet-glaptop>

* Eric Dumazet | 2012-07-03 11:54:17 [+0200]:

>> commit 7bc0f28c7a0c (netem: rate extension) did wrong maths when packet
>> is enqueued while queue is not empty.
>> 
>> Result is unexpected cumulative delays
>> 
>> # tc qd add dev eth0 root est 1sec 4sec netem delay 200ms rate 100kbit
>> # ping -i 0.1 172.30.42.18
>> PING 172.30.42.18 (172.30.42.18) 56(84) bytes of data.
>> 64 bytes from 172.30.42.18: icmp_req=1 ttl=64 time=208 ms
>> 64 bytes from 172.30.42.18: icmp_req=2 ttl=64 time=424 ms
>> 64 bytes from 172.30.42.18: icmp_req=3 ttl=64 time=838 ms
>> 64 bytes from 172.30.42.18: icmp_req=4 ttl=64 time=1142 ms
>> 64 bytes from 172.30.42.18: icmp_req=5 ttl=64 time=1335 ms
>> 64 bytes from 172.30.42.18: icmp_req=6 ttl=64 time=1949 ms
>> 64 bytes from 172.30.42.18: icmp_req=7 ttl=64 time=2450 ms
>> 64 bytes from 172.30.42.18: icmp_req=8 ttl=64 time=2840 ms
>> 64 bytes from 172.30.42.18: icmp_req=9 ttl=64 time=3121 ms
>> 64 bytes from 172.30.42.18: icmp_req=10 ttl=64 time=3291 ms
>> 64 bytes from 172.30.42.18: icmp_req=11 ttl=64 time=3784 ms

Strange, we test the patch in detail. I will take a look ...

^ permalink raw reply

* RE: [PATCH net 1/7] qlge: Fixed packet transmit errors due to potential driver errors.
From: Eric Dumazet @ 2012-07-03 20:27 UTC (permalink / raw)
  To: Jitendra Kalsaria
  Cc: David Miller, netdev, Ron Mercer, Dept-NX Linux NIC Driver
In-Reply-To: <5E4F49720D0BAD499EE1F01232234BA877435B2797@AVEXMB1.qlogic.org>

On Tue, 2012-07-03 at 12:38 -0700, Jitendra Kalsaria wrote:

> I think my patch description might have been misleading. We are not
> fixing a logical problem but rather a statistics reporting problem.
> Our transmit function is not getting called when queue is full but
> when we stop the queue it increment tx_error statistic and one of our
> customers is running a test that deliberately floods the queue causing
> it to periodically be stopped. The customer has not reported logical
> problem with the test were driver perform very well but they merely
> pointed out that we were incorrectly reporting the queue full
> condition as a tx_error.
> 
> This patch was intended to remove the line that increments the
> tx_error statistic when the queue is correctly stopped.

I believe everybody kindly ask you to fix the driver logic instead
of trying to hide to your customers the problems.

In fact, you could just BUG() at this point, and maybe David will accept
such a patch.

testing atomic_read(&tx_ring->tx_count) at the beginning of qlge_send()
is too late. NETDEV_TX_BUSY is deprecated.

Well behaving drivers should perform the test at the end of their
ndo_start_xmit() and stop the queue so that next packet wont come at
all.

^ permalink raw reply

* Greetings to you!
From: Stacy Anna Scott @ 2012-07-03 19:49 UTC (permalink / raw)


My name is Ms Stacy Ann Scott from London, I would like to have an important discussion with you. It’s private, for more details. please kindly Contact me on my email: stacyannascott@yahoo.com
Thanks, 

Yours truly, 

Ms Stacy A Scott.

 

^ permalink raw reply

* RE: [PATCH net 1/7] qlge: Fixed packet transmit errors due to potential driver errors.
From: Jitendra Kalsaria @ 2012-07-03 19:38 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Ron Mercer, Dept-NX Linux NIC Driver
In-Reply-To: <20120702.184134.1131493483786674336.davem@davemloft.net>


-----Original Message-----
>From: David Miller [mailto:davem@davemloft.net] 
>Sent: Monday, July 02, 2012 6:42 PM
>To: Jitendra Kalsaria
>Cc: netdev; Ron Mercer; Dept-NX Linux NIC Driver
>Subject: Re: [PATCH net 1/7] qlge: Fixed packet transmit errors due to potential driver errors.
>
>From: David Miller <davem@davemloft.net>
>Date: Mon, 02 Jul 2012 18:38:26 -0700 (PDT)
>
>> From: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
>> Date: Mon, 2 Jul 2012 18:30:47 -0700
>> 
>>> As per your comments, TX ring full is not expected behavior? All I
>>> can think of increasing the TX queue to 1024 and clean-up in timer
>>> instead of interrupt?
>> 
>> Your transmit function should never be invoked when the queue is
>> full, logic elsewhere in your driver should have stopped the queue
>> therefore preventing further invocations of your transmit function
>> until you wake the queue when space is liberated in the TX ring.
>
>BTW, did it even occur to you that there is a kernel log message here
>in this code path for a reason?
>
>That log message is there because this event is unexpected and a
>driver error.

I think my patch description might have been misleading. We are not fixing a logical problem but rather a statistics reporting problem. Our transmit function is not getting called when queue is full but when we stop the queue it increment tx_error statistic and one of our customers is running a test that deliberately floods the queue causing it to periodically be stopped. The customer has not reported logical problem with the test were driver perform very well but they merely pointed out that we were incorrectly reporting the queue full condition as a tx_error.

This patch was intended to remove the line that increments the tx_error statistic when the queue is correctly stopped.

^ permalink raw reply

* Re: [PATCH v6] sctp: be more restrictive in transport selection on bundled sacks
From: Jan Ceuleers @ 2012-07-03 18:45 UTC (permalink / raw)
  To: Neil Horman; +Cc: David Miller, netdev, vyasevich, linux-sctp
In-Reply-To: <20120702122531.GA29681@hmsreliant.think-freely.org>

On 07/02/2012 02:25 PM, Neil Horman wrote:
...

> How does this language sound to you?
...

> +tree maintanier may reapply the subsystem maintainers Acked-by: to the new

s/maintanier/maintainer/

HTH, Jan

^ permalink raw reply

* Re: AF_BUS socket address family
From: Chris Friesen @ 2012-07-03 17:18 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: David Miller, vincent.sanders, netdev, linux-kernel
In-Reply-To: <4FF32352.5040800@genband.com>

On 07/03/2012 10:52 AM, Chris Friesen wrote:

> To be fair, since it was implemented as a separate protocol family the
> maintenance burden actually hasn't been large--it's been fairly simple
> to port between versions. Also, we do embedded telecom stuff and don't
> jump kernel versions all that often. (It's a big headache, requires
> coordinating between multiple vendors, etc.)
>
> In our case we typically send small (100-200 byte) messages to a
> smallish (1-10) number of listeners, though there are exceptions of
> course. Back before I started the original implementation used a
> userspace daemon, but it had a number of issues. Originally I was
> focussed on the performance gains but I must admit that since then other
> factors have made that less of an issue.

I should point out that some of the other factors that have been 
discussed for AF_BUS also hold true for our implementation:

--strict ordering
--reliable (in our case, if the sender has space in the tx buffer then 
messages get to all recipients with buffer space, there are kernel logs 
if recipients don't have space)

Also, the fact that it's in the kernel rather than a userspace daemon 
reduces priority inversion type issues.  Presumably this would apply to 
an IP-multicast based solution as well.

One problem that I ran into back when I was experimenting with this 
stuff was trying to isolate host-local IP multicast from the rest of the 
network.  It would be suboptimal to need to set up filtering and such 
before being able to use the communication protocol.

Chris

^ permalink raw reply

* Re: AF_BUS socket address family
From: Chris Friesen @ 2012-07-03 16:52 UTC (permalink / raw)
  To: Javier Martinez Canillas
  Cc: David Miller, vincent.sanders, netdev, linux-kernel
In-Reply-To: <4FF1BBD5.8080804@collabora.co.uk>

On 07/02/2012 09:18 AM, Javier Martinez Canillas wrote:

> We tried different approaches before developing the AF_BUS socket family and one
> of them was extending AF_UNIX to support multicast. We posted our patches [1]
> and the feedback was that the AF_UNIX code was already a complex and difficult
> code to maintain. So, we decided to implement a new family (AF_BUS) that is
> orthogonal to the rest of the networking stack and no added complexity nor
> performance penalty would pay a user not using our IPC solution.

That's what I ended up doing as well.  In our case it's basically a 
stripped-down AF_UNIX with only datagram support, no security, no fd 
passing, etc., but with with the addition of multicast and wildcard (for 
debugging).

> Looking at netdev archives I saw that you both raised the question about
> multicast on unix sockets and post an implementation on early 2003. So if I
> understand correctly you are maintaining an out-of-tree solution for around 9
> years now.

That's correct.

> It would be a great help if you can join the discussion and explain the
> arguments of your company (and the others companies you were talking about) in
> favor of a simpler multicast socket family.
>
> The fact that your company spent lots of engineering resources to maintain an
> out-of-tree patch-set for 9 years should raise some eyebrows and convince more
> than one people that a simpler local multicast solution is needed on the Linux
> kernel (which was one of the reasons why Google also developed Binder I guess).

To be fair, since it was implemented as a separate protocol family the 
maintenance burden actually hasn't been large--it's been fairly simple 
to port between versions.  Also, we do embedded telecom stuff and don't 
jump kernel versions all that often.  (It's a big headache, requires 
coordinating between multiple vendors, etc.)

In our case we typically send small (100-200 byte) messages to a 
smallish (1-10) number of listeners, though there are exceptions of 
course.  Back before I started the original implementation used a 
userspace daemon, but it had a number of issues.  Originally I was 
focussed on the performance gains but I must admit that since then other 
factors have made that less of an issue.

Among other things, this messaging is used on some systems to configure 
the IP addressing for the system, so it does simplify things to not use 
an IP-based protocol for this purpose.

Also, back when I did my original implementation IP multicast wasn't 
supported on the loopback device--David, has that changed since then? 
If it has, then we probably could figure out a way to make it work using 
IP multicast, but I don't know that it would be worth the effort given 
the minimal ongoing maintenance costs for our patch.

Chris

^ permalink raw reply

* Re: "ADDRCONF(NETDEV_UP): eth0: link is not ready" with IPv6
From: Nicolas Ferre @ 2012-07-03 16:15 UTC (permalink / raw)
  To: Arvid Brodin, Ben Hutchings
  Cc: netdev@vger.kernel.org, Alexey Kuznetsov, Stephen Hemminger,
	linux-arm-kernel
In-Reply-To: <4FF313F6.7010600@xdin.com>

On 07/03/2012 05:47 PM, Arvid Brodin :
> (Added MACB "patch" contact Nicolas Ferre to CC list.)

Hi,

(adding linux-arm-kernel)

> On 2012-06-29 17:24, Ben Hutchings wrote:
>> On Fri, 2012-06-29 at 02:36 +0000, Arvid Brodin wrote:
>>> Hi,
>>>
>>> After 'ip link set eth0 up' on an avr32 board (network driver macb), the device ends up in
>>> operational mode "UNKNOWN":
>>>
>>> # ip link
>>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
>>>     link/ether 00:24:74:00:17:9d brd ff:ff:ff:ff:ff:ff
>>>
>>> Unplugging and plugging in the network cable gets the device to mode "UP".
>>>
>>> This is a problem for me because I'm trying to use this device as a "slave" device (for a
>>> virtual HSR device*) and I need to be able to decide if the slave device is operational or
>>> not.
>>>
>>> Following Stephen's advice here:
>>> http://kerneltrap.org/mailarchive/linux-netdev/2008/9/24/3398834 I checked the macb.c code
>>> and noticed they do not call netif_carrier_off() neither before register_netdev() nor in
>>> dev_open().
>
>> It should be called after register_netdev() and before the driver's
>> ndo_open implementation returns.

After having read several drivers, it seems that some are calling
netif_carrier_off() *before* register_netdev() and some *after*... What
is the proper way?


> I'm guessing this allows linkwatch to do netif_carrier_on() some time after the dev_open()?
> 
> Besides not calling netif_carrier_off() in dev_open(), the Cadence/MACB driver calls
> netif_carrier_off() in dev_close(). Is this correct?
> 
> 
> How should I handle carrier state for a virtual device? The device should have "carrier"
> as long as at least one of the underlying physical interfaces is operational (which I
> guess means operational state UP). Would it be correct to watch NETDEV_CHANGE and DOWN/UP
> events of the slaves and call netif_carrier_on()/off() on the virtual device depending on
> the slaves' states? 
>>
>>> I added the call before register_netdev(), which fixed the problem. However, if I then
>>> enable IPv6:
>>>
>>> # ip link set eth0 up
>>> ADDRCONF(NETDEV_UP): eth0: link is not ready
>>> eth0: link up (100/Full)
>>> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
>>
>> This looks normal.
> 
> Good, that narrows it down a bit.
> 
>>
>>> Any idea what is happening / what I'm doing wrong? (This is not just cosmetic; is some
>>> situations this seems to kill the interface - e.g. ping does not work, down/up does not
>>> help...) Things work fine without IPv6 configured.
>>
>> Perhaps some packets sent automatically by IPv6 are triggering a driver
>> bug?  Or there is a bug in multicast support, which IPv6 always uses.

Sorry, I have no clue on this topic. But I am eager to know if you find
something. I can queue your patch for netif_carrier_off() at least...

Best regards,
-- 
Nicolas Ferre

^ permalink raw reply

* Re: "ADDRCONF(NETDEV_UP): eth0: link is not ready" with IPv6
From: Ben Hutchings @ 2012-07-03 15:55 UTC (permalink / raw)
  To: Arvid Brodin
  Cc: netdev@vger.kernel.org, Alexey Kuznetsov, Stephen Hemminger,
	Nicolas Ferre
In-Reply-To: <4FF313F6.7010600@xdin.com>

On Tue, 2012-07-03 at 15:47 +0000, Arvid Brodin wrote:
> (Added MACB "patch" contact Nicolas Ferre to CC list.)
> 
> On 2012-06-29 17:24, Ben Hutchings wrote:
> > On Fri, 2012-06-29 at 02:36 +0000, Arvid Brodin wrote:
> >> Hi,
> >>
> >> After 'ip link set eth0 up' on an avr32 board (network driver macb), the device ends up in
> >> operational mode "UNKNOWN":
> >>
> >> # ip link
> >> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
> >>     link/ether 00:24:74:00:17:9d brd ff:ff:ff:ff:ff:ff
> >>
> >> Unplugging and plugging in the network cable gets the device to mode "UP".
> >>
> >> This is a problem for me because I'm trying to use this device as a "slave" device (for a
> >> virtual HSR device*) and I need to be able to decide if the slave device is operational or
> >> not.
> >>
> >> Following Stephen's advice here:
> >> http://kerneltrap.org/mailarchive/linux-netdev/2008/9/24/3398834 I checked the macb.c code
> >> and noticed they do not call netif_carrier_off() neither before register_netdev() nor in
> >> dev_open().
> > 
> > It should be called after register_netdev() and before the driver's
> > ndo_open implementation returns.
> 
> I'm guessing this allows linkwatch to do netif_carrier_on() some time after the dev_open()?

No, the driver is always responsible for calling
netif_carrier_{on,off}() in a timely manner.  link_watch takes care of
stopping the software TX queues if the link goes down.

> Besides not calling netif_carrier_off() in dev_open(), the Cadence/MACB driver calls
> netif_carrier_off() in dev_close(). Is this correct?

Unnecessary but harmless.

> How should I handle carrier state for a virtual device? The device should have "carrier"
> as long as at least one of the underlying physical interfaces is operational (which I
> guess means operational state UP). Would it be correct to watch NETDEV_CHANGE and DOWN/UP
> events of the slaves and call netif_carrier_on()/off() on the virtual device depending on
> the slaves' states?
[...]

That sounds about right.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: "ADDRCONF(NETDEV_UP): eth0: link is not ready" with IPv6
From: Arvid Brodin @ 2012-07-03 15:47 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev@vger.kernel.org, Alexey Kuznetsov, Stephen Hemminger,
	Nicolas Ferre
In-Reply-To: <1340983473.3066.6.camel@bwh-desktop.uk.solarflarecom.com>

(Added MACB "patch" contact Nicolas Ferre to CC list.)

On 2012-06-29 17:24, Ben Hutchings wrote:
> On Fri, 2012-06-29 at 02:36 +0000, Arvid Brodin wrote:
>> Hi,
>>
>> After 'ip link set eth0 up' on an avr32 board (network driver macb), the device ends up in
>> operational mode "UNKNOWN":
>>
>> # ip link
>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
>>     link/ether 00:24:74:00:17:9d brd ff:ff:ff:ff:ff:ff
>>
>> Unplugging and plugging in the network cable gets the device to mode "UP".
>>
>> This is a problem for me because I'm trying to use this device as a "slave" device (for a
>> virtual HSR device*) and I need to be able to decide if the slave device is operational or
>> not.
>>
>> Following Stephen's advice here:
>> http://kerneltrap.org/mailarchive/linux-netdev/2008/9/24/3398834 I checked the macb.c code
>> and noticed they do not call netif_carrier_off() neither before register_netdev() nor in
>> dev_open().
> 
> It should be called after register_netdev() and before the driver's
> ndo_open implementation returns.

I'm guessing this allows linkwatch to do netif_carrier_on() some time after the dev_open()?

Besides not calling netif_carrier_off() in dev_open(), the Cadence/MACB driver calls
netif_carrier_off() in dev_close(). Is this correct?


How should I handle carrier state for a virtual device? The device should have "carrier"
as long as at least one of the underlying physical interfaces is operational (which I
guess means operational state UP). Would it be correct to watch NETDEV_CHANGE and DOWN/UP
events of the slaves and call netif_carrier_on()/off() on the virtual device depending on
the slaves' states?


> 
>> I added the call before register_netdev(), which fixed the problem. However, if I then
>> enable IPv6:
>>
>> # ip link set eth0 up
>> ADDRCONF(NETDEV_UP): eth0: link is not ready
>> eth0: link up (100/Full)
>> ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> 
> This looks normal.

Good, that narrows it down a bit.

> 
>> Any idea what is happening / what I'm doing wrong? (This is not just cosmetic; is some
>> situations this seems to kill the interface - e.g. ping does not work, down/up does not
>> help...) Things work fine without IPv6 configured.
> 
> Perhaps some packets sent automatically by IPv6 are triggering a driver
> bug?  Or there is a bug in multicast support, which IPv6 always uses.
> 
> Ben.
> 
>> *N.B. I'm writing a driver for a network protocol called "High-availability Seamless
>> Redundancy".
> 


-- 
Arvid Brodin | Consultant (Linux)
XDIN AB | Jan Stenbecks Torg 17 | SE-164 40 Kista | Sweden | xdin.com

^ permalink raw reply

* RE: [PATCH 00/13] drivers: hv: kvp
From: KY Srinivasan @ 2012-07-03 15:35 UTC (permalink / raw)
  To: Stephen Hemminger, Olaf Hering
  Cc: apw@canonical.com, Greg KH, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, devel@linuxdriverproject.org
In-Reply-To: <642e6af6-fc6b-4c54-a4a6-f5bdd38512c7@tahiti.vyatta.com>



> -----Original Message-----
> From: Stephen Hemminger [mailto:stephen.hemminger@vyatta.com]
> Sent: Tuesday, July 03, 2012 11:03 AM
> To: Olaf Hering
> Cc: Greg KH; apw@canonical.com; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org; netdev@vger.kernel.org; KY Srinivasan
> Subject: Re: [PATCH 00/13] drivers: hv: kvp
> 
> 
> > On Mon, Jul 02, KY Srinivasan wrote:
> >
> > > While I toyed with your proposal, I feel it just pushes the problem
> > > out of the daemon code - we would still need to write distro
> > > specific
> > > scripts. If this approach is something that everybody is
> > > comfortable
> > > with, I can take a stab at implementing that.
> >
> > Until NetworkManager is feature complete and until every distro is
> > using
> > NetworkManager per default the kvp_daemon needs distro specific code
> > to
> > get and set network related settings.
> > Doing it with an external script will simplify debugging and changes
> > to
> > the code.
> 
> Although,  Network Manager is a good tool for what it does;
> it is not appropriate for every distro. It is overkill
> in embedded systems, and it's GUI dependency makes it unmanageable
> on servers.

Thanks Stephen. I will retain the code that I currently have for the "GET" side and
I will implement a script as Olaf suggested that can be distro specific to implement
the SET operation.

Regards,

K. Y

^ permalink raw reply

* RE: [PATCH 00/13] drivers: hv: kvp
From: KY Srinivasan @ 2012-07-03 15:32 UTC (permalink / raw)
  To: Olaf Hering
  Cc: Greg KH, apw@canonical.com, devel@linuxdriverproject.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <20120703132049.GA10663@aepfle.de>



> -----Original Message-----
> From: Olaf Hering [mailto:olaf@aepfle.de]
> Sent: Tuesday, July 03, 2012 9:21 AM
> To: KY Srinivasan
> Cc: Greg KH; apw@canonical.com; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org; netdev@vger.kernel.org
> Subject: Re: [PATCH 00/13] drivers: hv: kvp
> 
> On Mon, Jul 02, KY Srinivasan wrote:
> 
> > While I toyed with your proposal, I feel it just pushes the problem
> > out of the daemon code - we would still need to write distro specific
> > scripts. If this approach is something that everybody is comfortable
> > with, I can take a stab at implementing that.
> 
> Until NetworkManager is feature complete and until every distro is using
> NetworkManager per default the kvp_daemon needs distro specific code to
> get and set network related settings.
> Doing it with an external script will simplify debugging and changes to
> the code.

Fair enough. I will keep my current implementation of the GET operation as is since
it is distro independent. On the SET side, I will implement a script as you have suggested.

Regards,

K. Y

> 
> Olaf
> 
> 


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox