Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Fw: Bug in ks8851.c
From: Denis Kirjanov @ 2013-03-29 14:07 UTC (permalink / raw)
  To: Max.Nekludov
  Cc: Linus Torvalds, David S. Miller, Jiri Pirko, linus971,
	Linux Kernel Mailing List, Matt Renzelmann, Network Development,
	Stephen Boyd, Greg Ungerer, linux-arm-kernel
In-Reply-To: <OF2DFE75BF.1BA5B9C8-ON44257B3D.00492A11-44257B3D.0049472A@gb.elster.com>

please respin this patch with a commit description.

On 3/29/13, Max.Nekludov@us.elster.com <Max.Nekludov@us.elster.com> wrote:
>
> Linus,
>
> I tried to send the mail to 'Ben Dooks <ben@simtec.co.uk>' but the address
> is dead now.
>> I assume you've tested it in practice?
> Yes, I'm running the modified code both in bootloader and Linux kernel on
> my board.
>
> Thanks,
> Max
>
>  Signed-off-by: Max Nekludov <Max.Nekludov@us.elster.com>
>  ---
>  drivers/net/ethernet/micrel/ks8851.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/micrel/ks8851.c
> b/drivers/net/ethernet/micrel/ks8851.c
> index 33bcb63d56a2..8fb481252e2c 100644
> --- a/drivers/net/ethernet/micrel/ks8851.c
> +++ b/drivers/net/ethernet/micrel/ks8851.c
> @@ -528,7 +528,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
>  	for (; rxfc != 0; rxfc--) {
>  		rxh = ks8851_rdreg32(ks, KS_RXFHSR);
>  		rxstat = rxh & 0xffff;
> -		rxlen = rxh >> 16;
> +		rxlen = (rxh >> 16) & 0xfff;
>
>  		netif_dbg(ks, rx_status, ks->netdev,
>  			  "rx: stat 0x%04x, len 0x%04x\n", rxstat, rxlen);
>
>
>
>
> Max,
>  please cc the actual maintainers of the driver. The patch looks sane,
> though. I assume you've tested it in practice?
>
> You also seem to have based this on an ancient version, the code has
> long since moved from drivers/net/ks8851.c to
> drivers/net/ethernet/micrel/ks8851.c (back in June of 2011), and it's
> missing a sign-off from you.
>
> I'm attaching an updated patch for the rename/capitalization issue.
>
>      Linus
>
> On Thu, Mar 28, 2013 at 11:25 AM,  <Max.Nekludov@us.elster.com> wrote:
>>
>> According to the Datasheet (page 52):
>> 15-12 Reserved
>> 11-0 RXBC Receive Byte Count
>> This field indicates the present received frame byte size.
>>
>> I suppose the code has a bug:
>>                 rxh = ks8851_rdreg32(ks, KS_RXFHSR);
>>                 rxstat = rxh & 0xffff;
>>                 rxlen = rxh >> 16; // BUG!!! 0xFFF mask should be applied
>>
>> P.S.
>> without bit mask applied I saw rxlen equal to 15360 which is bigger then
>> entire RX queue size (12KB).
>>
>> Thanks,
>> Max Nekludov
>>
>> From cb3199cee4490f98d6062e32a75ca377a32b55bc Mon Sep 17 00:00:00 2001
>> From: Max Neklyudov <macscomp@gmail.com>
>> Date: Tue, 26 Mar 2013 11:46:57 +0400
>> Subject: [PATCH] Fix bug in ks8851 driver
>>
>> ---
>>  drivers/net/ks8851.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ks8851.c b/drivers/net/ks8851.c
>> index 91a93cb..0dc03da 100644
>> --- a/drivers/net/ks8851.c
>> +++ b/drivers/net/ks8851.c
>> @@ -553,7 +553,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
>>         for (; rxfc != 0; rxfc--) {
>>                 rxh = ks8851_rdreg32(ks, KS_RXFHSR);
>>                 rxstat = rxh & 0xffff;
>> -               rxlen = rxh >> 16;
>> +               rxlen = (rxh >> 16) & 0xFFF;
>>
>>                 netif_dbg(ks, rx_status, ks->netdev,
>>                           "rx: stat 0x%04x, len 0x%04x\n", rxstat,
> rxlen);
>> --
>> 1.7.10.4
>>
>
>
> ______________________________________________________________________
> This email has been spam and virus checked by Elster IT Services.(See
> attached file: patch.diff)

^ permalink raw reply

* Re: [PATCH] netlink: fix the warning introduced by netlink API replacement
From: Thomas Graf @ 2013-03-29 13:58 UTC (permalink / raw)
  To: Hong Zhiguo; +Cc: netdev, davem, stephen, brian.haley
In-Reply-To: <1364563360-3516-1-git-send-email-honkiko@gmail.com>

On 03/29/13 at 09:22pm, Hong Zhiguo wrote:
> Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
> ---
>  net/ieee802154/netlink.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ieee802154/netlink.c b/net/ieee802154/netlink.c
> index 9247252..91b0363 100644
> --- a/net/ieee802154/netlink.c
> +++ b/net/ieee802154/netlink.c
> @@ -65,7 +65,8 @@ struct sk_buff *ieee802154_nl_create(int flags, u8 req)
>  int ieee802154_nl_mcast(struct sk_buff *msg, unsigned int group)
>  {
>  	/* XXX: nlh is right at the start of msg */
> -	void *hdr = genlmsg_data(nlmsg_data(msg->data));
> +	struct nlmsghdr *nlh = (struct nlmsghdr *)msg->data;
> +	void *hdr = genlmsg_data(nlmsg_data(nlh));

You should be using nlmsg_hdr() which would also allow to drop
the 'XXX' comment.

Obviously this was a partial API abuse that lead to these warnings
and the whole point of converting is to trigger such warnings instead
of silently accept msitakes, but it would have been great to catch
this in the first place by compiling with allmodconfig + some random
configs.

^ permalink raw reply

* [PATCH net-next 5/6] openvswitch: Use ETH_ALEN to define ethernet addresses
From: Thomas Graf @ 2013-03-29 13:46 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev, dev
In-Reply-To: <cover.1364563346.git.tgraf@suug.ch>

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 include/linux/openvswitch.h | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index 67d6c7b..8b9d721 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -20,6 +20,7 @@
 #define _LINUX_OPENVSWITCH_H 1
 
 #include <linux/types.h>
+#include <linux/if_ether.h>
 
 /**
  * struct ovs_header - header for OVS Generic Netlink messages.
@@ -269,8 +270,8 @@ enum ovs_frag_type {
 #define OVS_FRAG_TYPE_MAX (__OVS_FRAG_TYPE_MAX - 1)
 
 struct ovs_key_ethernet {
-	__u8	 eth_src[6];
-	__u8	 eth_dst[6];
+	__u8	 eth_src[ETH_ALEN];
+	__u8	 eth_dst[ETH_ALEN];
 };
 
 struct ovs_key_ipv4 {
@@ -316,14 +317,14 @@ struct ovs_key_arp {
 	__be32 arp_sip;
 	__be32 arp_tip;
 	__be16 arp_op;
-	__u8   arp_sha[6];
-	__u8   arp_tha[6];
+	__u8   arp_sha[ETH_ALEN];
+	__u8   arp_tha[ETH_ALEN];
 };
 
 struct ovs_key_nd {
 	__u32 nd_target[4];
-	__u8  nd_sll[6];
-	__u8  nd_tll[6];
+	__u8  nd_sll[ETH_ALEN];
+	__u8  nd_tll[ETH_ALEN];
 };
 
 /**
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 2/6] openvswitch: Use nla_memcpy() to memcpy() data from attributes
From: Thomas Graf @ 2013-03-29 13:46 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev, dev
In-Reply-To: <cover.1364563346.git.tgraf@suug.ch>

Less error prone as it takes into account the length of both the
destination buffer and the source attribute and documents when
data is copied from an attribute.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/openvswitch/datapath.c | 2 +-
 net/openvswitch/flow.c     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 5b58d16..bca63c8 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -672,7 +672,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 		goto err;
 	skb_reserve(packet, NET_IP_ALIGN);
 
-	memcpy(__skb_put(packet, len), nla_data(a[OVS_PACKET_ATTR_PACKET]), len);
+	nla_memcpy(__skb_put(packet, len), a[OVS_PACKET_ATTR_PACKET], len);
 
 	skb_reset_mac_header(packet);
 	eth = eth_hdr(packet);
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 3324868..cf9328b 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -211,7 +211,7 @@ struct sw_flow_actions *ovs_flow_actions_alloc(const struct nlattr *actions)
 		return ERR_PTR(-ENOMEM);
 
 	sfa->actions_len = actions_len;
-	memcpy(sfa->actions, nla_data(actions), actions_len);
+	nla_memcpy(sfa->actions, actions, actions_len);
 	return sfa;
 }
 
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 1/6] openvswitch: Specify the minimal length of OVS_PACKET_ATTR_PACKET in the policy
From: Thomas Graf @ 2013-03-29 13:46 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev, dev
In-Reply-To: <cover.1364563346.git.tgraf@suug.ch>

Specifying the minimal length in the policy makes it reuseable
and documents the interface.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/openvswitch/datapath.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 8759265..5b58d16 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -662,8 +662,7 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info)
 
 	err = -EINVAL;
 	if (!a[OVS_PACKET_ATTR_PACKET] || !a[OVS_PACKET_ATTR_KEY] ||
-	    !a[OVS_PACKET_ATTR_ACTIONS] ||
-	    nla_len(a[OVS_PACKET_ATTR_PACKET]) < ETH_HLEN)
+	    !a[OVS_PACKET_ATTR_ACTIONS])
 		goto err;
 
 	len = nla_len(a[OVS_PACKET_ATTR_PACKET]);
@@ -744,7 +743,7 @@ err:
 }
 
 static const struct nla_policy packet_policy[OVS_PACKET_ATTR_MAX + 1] = {
-	[OVS_PACKET_ATTR_PACKET] = { .type = NLA_UNSPEC },
+	[OVS_PACKET_ATTR_PACKET] = { .len = ETH_HLEN },
 	[OVS_PACKET_ATTR_KEY] = { .type = NLA_NESTED },
 	[OVS_PACKET_ATTR_ACTIONS] = { .type = NLA_NESTED },
 };
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 6/6] openvswitch: Expose <linux/openvswitch.h> to userspace
From: Thomas Graf @ 2013-03-29 13:46 UTC (permalink / raw)
  To: Jesse Gross; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <cover.1364563346.git.tgraf-G/eBtMaohhA@public.gmane.org>

It contains the public netlink interface bits required by userspace to
make use of the interface.

Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
---
 include/linux/openvswitch.h      | 433 +------------------------------------
 include/uapi/linux/Kbuild        |   1 +
 include/uapi/linux/openvswitch.h | 456 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 458 insertions(+), 432 deletions(-)
 create mode 100644 include/uapi/linux/openvswitch.h

diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h
index 8b9d721..e6b240b 100644
--- a/include/linux/openvswitch.h
+++ b/include/linux/openvswitch.h
@@ -19,437 +19,6 @@
 #ifndef _LINUX_OPENVSWITCH_H
 #define _LINUX_OPENVSWITCH_H 1
 
-#include <linux/types.h>
-#include <linux/if_ether.h>
-
-/**
- * struct ovs_header - header for OVS Generic Netlink messages.
- * @dp_ifindex: ifindex of local port for datapath (0 to make a request not
- * specific to a datapath).
- *
- * Attributes following the header are specific to a particular OVS Generic
- * Netlink family, but all of the OVS families use this header.
- */
-
-struct ovs_header {
-	int dp_ifindex;
-};
-
-/* Datapaths. */
-
-#define OVS_DATAPATH_FAMILY  "ovs_datapath"
-#define OVS_DATAPATH_MCGROUP "ovs_datapath"
-#define OVS_DATAPATH_VERSION 0x1
-
-enum ovs_datapath_cmd {
-	OVS_DP_CMD_UNSPEC,
-	OVS_DP_CMD_NEW,
-	OVS_DP_CMD_DEL,
-	OVS_DP_CMD_GET,
-	OVS_DP_CMD_SET
-};
-
-/**
- * enum ovs_datapath_attr - attributes for %OVS_DP_* commands.
- * @OVS_DP_ATTR_NAME: Name of the network device that serves as the "local
- * port".  This is the name of the network device whose dp_ifindex is given in
- * the &struct ovs_header.  Always present in notifications.  Required in
- * %OVS_DP_NEW requests.  May be used as an alternative to specifying
- * dp_ifindex in other requests (with a dp_ifindex of 0).
- * @OVS_DP_ATTR_UPCALL_PID: The Netlink socket in userspace that is initially
- * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
- * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
- * not be sent.
- * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the
- * datapath.  Always present in notifications.
- *
- * These attributes follow the &struct ovs_header within the Generic Netlink
- * payload for %OVS_DP_* commands.
- */
-enum ovs_datapath_attr {
-	OVS_DP_ATTR_UNSPEC,
-	OVS_DP_ATTR_NAME,       /* name of dp_ifindex netdev */
-	OVS_DP_ATTR_UPCALL_PID, /* Netlink PID to receive upcalls */
-	OVS_DP_ATTR_STATS,      /* struct ovs_dp_stats */
-	__OVS_DP_ATTR_MAX
-};
-
-#define OVS_DP_ATTR_MAX (__OVS_DP_ATTR_MAX - 1)
-
-struct ovs_dp_stats {
-	__u64 n_hit;             /* Number of flow table matches. */
-	__u64 n_missed;          /* Number of flow table misses. */
-	__u64 n_lost;            /* Number of misses not sent to userspace. */
-	__u64 n_flows;           /* Number of flows present */
-};
-
-struct ovs_vport_stats {
-	__u64   rx_packets;		/* total packets received       */
-	__u64   tx_packets;		/* total packets transmitted    */
-	__u64   rx_bytes;		/* total bytes received         */
-	__u64   tx_bytes;		/* total bytes transmitted      */
-	__u64   rx_errors;		/* bad packets received         */
-	__u64   tx_errors;		/* packet transmit problems     */
-	__u64   rx_dropped;		/* no space in linux buffers    */
-	__u64   tx_dropped;		/* no space available in linux  */
-};
-
-/* Fixed logical ports. */
-#define OVSP_LOCAL      ((__u32)0)
-
-/* Packet transfer. */
-
-#define OVS_PACKET_FAMILY "ovs_packet"
-#define OVS_PACKET_VERSION 0x1
-
-enum ovs_packet_cmd {
-	OVS_PACKET_CMD_UNSPEC,
-
-	/* Kernel-to-user notifications. */
-	OVS_PACKET_CMD_MISS,    /* Flow table miss. */
-	OVS_PACKET_CMD_ACTION,  /* OVS_ACTION_ATTR_USERSPACE action. */
-
-	/* Userspace commands. */
-	OVS_PACKET_CMD_EXECUTE  /* Apply actions to a packet. */
-};
-
-/**
- * enum ovs_packet_attr - attributes for %OVS_PACKET_* commands.
- * @OVS_PACKET_ATTR_PACKET: Present for all notifications.  Contains the entire
- * packet as received, from the start of the Ethernet header onward.  For
- * %OVS_PACKET_CMD_ACTION, %OVS_PACKET_ATTR_PACKET reflects changes made by
- * actions preceding %OVS_ACTION_ATTR_USERSPACE, but %OVS_PACKET_ATTR_KEY is
- * the flow key extracted from the packet as originally received.
- * @OVS_PACKET_ATTR_KEY: Present for all notifications.  Contains the flow key
- * extracted from the packet as nested %OVS_KEY_ATTR_* attributes.  This allows
- * userspace to adapt its flow setup strategy by comparing its notion of the
- * flow key against the kernel's.
- * @OVS_PACKET_ATTR_ACTIONS: Contains actions for the packet.  Used
- * for %OVS_PACKET_CMD_EXECUTE.  It has nested %OVS_ACTION_ATTR_* attributes.
- * @OVS_PACKET_ATTR_USERDATA: Present for an %OVS_PACKET_CMD_ACTION
- * notification if the %OVS_ACTION_ATTR_USERSPACE action specified an
- * %OVS_USERSPACE_ATTR_USERDATA attribute, with the same length and content
- * specified there.
- *
- * These attributes follow the &struct ovs_header within the Generic Netlink
- * payload for %OVS_PACKET_* commands.
- */
-enum ovs_packet_attr {
-	OVS_PACKET_ATTR_UNSPEC,
-	OVS_PACKET_ATTR_PACKET,      /* Packet data. */
-	OVS_PACKET_ATTR_KEY,         /* Nested OVS_KEY_ATTR_* attributes. */
-	OVS_PACKET_ATTR_ACTIONS,     /* Nested OVS_ACTION_ATTR_* attributes. */
-	OVS_PACKET_ATTR_USERDATA,    /* OVS_ACTION_ATTR_USERSPACE arg. */
-	__OVS_PACKET_ATTR_MAX
-};
-
-#define OVS_PACKET_ATTR_MAX (__OVS_PACKET_ATTR_MAX - 1)
-
-/* Virtual ports. */
-
-#define OVS_VPORT_FAMILY  "ovs_vport"
-#define OVS_VPORT_MCGROUP "ovs_vport"
-#define OVS_VPORT_VERSION 0x1
-
-enum ovs_vport_cmd {
-	OVS_VPORT_CMD_UNSPEC,
-	OVS_VPORT_CMD_NEW,
-	OVS_VPORT_CMD_DEL,
-	OVS_VPORT_CMD_GET,
-	OVS_VPORT_CMD_SET
-};
-
-enum ovs_vport_type {
-	OVS_VPORT_TYPE_UNSPEC,
-	OVS_VPORT_TYPE_NETDEV,   /* network device */
-	OVS_VPORT_TYPE_INTERNAL, /* network device implemented by datapath */
-	__OVS_VPORT_TYPE_MAX
-};
-
-#define OVS_VPORT_TYPE_MAX (__OVS_VPORT_TYPE_MAX - 1)
-
-/**
- * enum ovs_vport_attr - attributes for %OVS_VPORT_* commands.
- * @OVS_VPORT_ATTR_PORT_NO: 32-bit port number within datapath.
- * @OVS_VPORT_ATTR_TYPE: 32-bit %OVS_VPORT_TYPE_* constant describing the type
- * of vport.
- * @OVS_VPORT_ATTR_NAME: Name of vport.  For a vport based on a network device
- * this is the name of the network device.  Maximum length %IFNAMSIZ-1 bytes
- * plus a null terminator.
- * @OVS_VPORT_ATTR_OPTIONS: Vport-specific configuration information.
- * @OVS_VPORT_ATTR_UPCALL_PID: The Netlink socket in userspace that
- * OVS_PACKET_CMD_MISS upcalls will be directed to for packets received on
- * this port.  A value of zero indicates that upcalls should not be sent.
- * @OVS_VPORT_ATTR_STATS: A &struct ovs_vport_stats giving statistics for
- * packets sent or received through the vport.
- *
- * These attributes follow the &struct ovs_header within the Generic Netlink
- * payload for %OVS_VPORT_* commands.
- *
- * For %OVS_VPORT_CMD_NEW requests, the %OVS_VPORT_ATTR_TYPE and
- * %OVS_VPORT_ATTR_NAME attributes are required.  %OVS_VPORT_ATTR_PORT_NO is
- * optional; if not specified a free port number is automatically selected.
- * Whether %OVS_VPORT_ATTR_OPTIONS is required or optional depends on the type
- * of vport.
- * and other attributes are ignored.
- *
- * For other requests, if %OVS_VPORT_ATTR_NAME is specified then it is used to
- * look up the vport to operate on; otherwise dp_idx from the &struct
- * ovs_header plus %OVS_VPORT_ATTR_PORT_NO determine the vport.
- */
-enum ovs_vport_attr {
-	OVS_VPORT_ATTR_UNSPEC,
-	OVS_VPORT_ATTR_PORT_NO,	/* u32 port number within datapath */
-	OVS_VPORT_ATTR_TYPE,	/* u32 OVS_VPORT_TYPE_* constant. */
-	OVS_VPORT_ATTR_NAME,	/* string name, up to IFNAMSIZ bytes long */
-	OVS_VPORT_ATTR_OPTIONS, /* nested attributes, varies by vport type */
-	OVS_VPORT_ATTR_UPCALL_PID, /* u32 Netlink PID to receive upcalls */
-	OVS_VPORT_ATTR_STATS,	/* struct ovs_vport_stats */
-	__OVS_VPORT_ATTR_MAX
-};
-
-#define OVS_VPORT_ATTR_MAX (__OVS_VPORT_ATTR_MAX - 1)
-
-/* Flows. */
-
-#define OVS_FLOW_FAMILY  "ovs_flow"
-#define OVS_FLOW_MCGROUP "ovs_flow"
-#define OVS_FLOW_VERSION 0x1
-
-enum ovs_flow_cmd {
-	OVS_FLOW_CMD_UNSPEC,
-	OVS_FLOW_CMD_NEW,
-	OVS_FLOW_CMD_DEL,
-	OVS_FLOW_CMD_GET,
-	OVS_FLOW_CMD_SET
-};
-
-struct ovs_flow_stats {
-	__u64 n_packets;         /* Number of matched packets. */
-	__u64 n_bytes;           /* Number of matched bytes. */
-};
-
-enum ovs_key_attr {
-	OVS_KEY_ATTR_UNSPEC,
-	OVS_KEY_ATTR_ENCAP,	/* Nested set of encapsulated attributes. */
-	OVS_KEY_ATTR_PRIORITY,  /* u32 skb->priority */
-	OVS_KEY_ATTR_IN_PORT,   /* u32 OVS dp port number */
-	OVS_KEY_ATTR_ETHERNET,  /* struct ovs_key_ethernet */
-	OVS_KEY_ATTR_VLAN,	/* be16 VLAN TCI */
-	OVS_KEY_ATTR_ETHERTYPE,	/* be16 Ethernet type */
-	OVS_KEY_ATTR_IPV4,      /* struct ovs_key_ipv4 */
-	OVS_KEY_ATTR_IPV6,      /* struct ovs_key_ipv6 */
-	OVS_KEY_ATTR_TCP,       /* struct ovs_key_tcp */
-	OVS_KEY_ATTR_UDP,       /* struct ovs_key_udp */
-	OVS_KEY_ATTR_ICMP,      /* struct ovs_key_icmp */
-	OVS_KEY_ATTR_ICMPV6,    /* struct ovs_key_icmpv6 */
-	OVS_KEY_ATTR_ARP,       /* struct ovs_key_arp */
-	OVS_KEY_ATTR_ND,        /* struct ovs_key_nd */
-	OVS_KEY_ATTR_SKB_MARK,  /* u32 skb mark */
-	__OVS_KEY_ATTR_MAX
-};
-
-#define OVS_KEY_ATTR_MAX (__OVS_KEY_ATTR_MAX - 1)
-
-/**
- * enum ovs_frag_type - IPv4 and IPv6 fragment type
- * @OVS_FRAG_TYPE_NONE: Packet is not a fragment.
- * @OVS_FRAG_TYPE_FIRST: Packet is a fragment with offset 0.
- * @OVS_FRAG_TYPE_LATER: Packet is a fragment with nonzero offset.
- *
- * Used as the @ipv4_frag in &struct ovs_key_ipv4 and as @ipv6_frag &struct
- * ovs_key_ipv6.
- */
-enum ovs_frag_type {
-	OVS_FRAG_TYPE_NONE,
-	OVS_FRAG_TYPE_FIRST,
-	OVS_FRAG_TYPE_LATER,
-	__OVS_FRAG_TYPE_MAX
-};
-
-#define OVS_FRAG_TYPE_MAX (__OVS_FRAG_TYPE_MAX - 1)
-
-struct ovs_key_ethernet {
-	__u8	 eth_src[ETH_ALEN];
-	__u8	 eth_dst[ETH_ALEN];
-};
-
-struct ovs_key_ipv4 {
-	__be32 ipv4_src;
-	__be32 ipv4_dst;
-	__u8   ipv4_proto;
-	__u8   ipv4_tos;
-	__u8   ipv4_ttl;
-	__u8   ipv4_frag;	/* One of OVS_FRAG_TYPE_*. */
-};
-
-struct ovs_key_ipv6 {
-	__be32 ipv6_src[4];
-	__be32 ipv6_dst[4];
-	__be32 ipv6_label;	/* 20-bits in least-significant bits. */
-	__u8   ipv6_proto;
-	__u8   ipv6_tclass;
-	__u8   ipv6_hlimit;
-	__u8   ipv6_frag;	/* One of OVS_FRAG_TYPE_*. */
-};
-
-struct ovs_key_tcp {
-	__be16 tcp_src;
-	__be16 tcp_dst;
-};
-
-struct ovs_key_udp {
-	__be16 udp_src;
-	__be16 udp_dst;
-};
-
-struct ovs_key_icmp {
-	__u8 icmp_type;
-	__u8 icmp_code;
-};
-
-struct ovs_key_icmpv6 {
-	__u8 icmpv6_type;
-	__u8 icmpv6_code;
-};
-
-struct ovs_key_arp {
-	__be32 arp_sip;
-	__be32 arp_tip;
-	__be16 arp_op;
-	__u8   arp_sha[ETH_ALEN];
-	__u8   arp_tha[ETH_ALEN];
-};
-
-struct ovs_key_nd {
-	__u32 nd_target[4];
-	__u8  nd_sll[ETH_ALEN];
-	__u8  nd_tll[ETH_ALEN];
-};
-
-/**
- * enum ovs_flow_attr - attributes for %OVS_FLOW_* commands.
- * @OVS_FLOW_ATTR_KEY: Nested %OVS_KEY_ATTR_* attributes specifying the flow
- * key.  Always present in notifications.  Required for all requests (except
- * dumps).
- * @OVS_FLOW_ATTR_ACTIONS: Nested %OVS_ACTION_ATTR_* attributes specifying
- * the actions to take for packets that match the key.  Always present in
- * notifications.  Required for %OVS_FLOW_CMD_NEW requests, optional for
- * %OVS_FLOW_CMD_SET requests.
- * @OVS_FLOW_ATTR_STATS: &struct ovs_flow_stats giving statistics for this
- * flow.  Present in notifications if the stats would be nonzero.  Ignored in
- * requests.
- * @OVS_FLOW_ATTR_TCP_FLAGS: An 8-bit value giving the OR'd value of all of the
- * TCP flags seen on packets in this flow.  Only present in notifications for
- * TCP flows, and only if it would be nonzero.  Ignored in requests.
- * @OVS_FLOW_ATTR_USED: A 64-bit integer giving the time, in milliseconds on
- * the system monotonic clock, at which a packet was last processed for this
- * flow.  Only present in notifications if a packet has been processed for this
- * flow.  Ignored in requests.
- * @OVS_FLOW_ATTR_CLEAR: If present in a %OVS_FLOW_CMD_SET request, clears the
- * last-used time, accumulated TCP flags, and statistics for this flow.
- * Otherwise ignored in requests.  Never present in notifications.
- *
- * These attributes follow the &struct ovs_header within the Generic Netlink
- * payload for %OVS_FLOW_* commands.
- */
-enum ovs_flow_attr {
-	OVS_FLOW_ATTR_UNSPEC,
-	OVS_FLOW_ATTR_KEY,       /* Sequence of OVS_KEY_ATTR_* attributes. */
-	OVS_FLOW_ATTR_ACTIONS,   /* Nested OVS_ACTION_ATTR_* attributes. */
-	OVS_FLOW_ATTR_STATS,     /* struct ovs_flow_stats. */
-	OVS_FLOW_ATTR_TCP_FLAGS, /* 8-bit OR'd TCP flags. */
-	OVS_FLOW_ATTR_USED,      /* u64 msecs last used in monotonic time. */
-	OVS_FLOW_ATTR_CLEAR,     /* Flag to clear stats, tcp_flags, used. */
-	__OVS_FLOW_ATTR_MAX
-};
-
-#define OVS_FLOW_ATTR_MAX (__OVS_FLOW_ATTR_MAX - 1)
-
-/**
- * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE action.
- * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to sample with
- * @OVS_ACTION_ATTR_SAMPLE.  A value of 0 samples no packets, a value of
- * %UINT32_MAX samples all packets and intermediate values sample intermediate
- * fractions of packets.
- * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling event.
- * Actions are passed as nested attributes.
- *
- * Executes the specified actions with the given probability on a per-packet
- * basis.
- */
-enum ovs_sample_attr {
-	OVS_SAMPLE_ATTR_UNSPEC,
-	OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
-	OVS_SAMPLE_ATTR_ACTIONS,     /* Nested OVS_ACTION_ATTR_* attributes. */
-	__OVS_SAMPLE_ATTR_MAX,
-};
-
-#define OVS_SAMPLE_ATTR_MAX (__OVS_SAMPLE_ATTR_MAX - 1)
-
-/**
- * enum ovs_userspace_attr - Attributes for %OVS_ACTION_ATTR_USERSPACE action.
- * @OVS_USERSPACE_ATTR_PID: u32 Netlink PID to which the %OVS_PACKET_CMD_ACTION
- * message should be sent.  Required.
- * @OVS_USERSPACE_ATTR_USERDATA: If present, its variable-length argument is
- * copied to the %OVS_PACKET_CMD_ACTION message as %OVS_PACKET_ATTR_USERDATA.
- */
-enum ovs_userspace_attr {
-	OVS_USERSPACE_ATTR_UNSPEC,
-	OVS_USERSPACE_ATTR_PID,	      /* u32 Netlink PID to receive upcalls. */
-	OVS_USERSPACE_ATTR_USERDATA,  /* Optional user-specified cookie. */
-	__OVS_USERSPACE_ATTR_MAX
-};
-
-#define OVS_USERSPACE_ATTR_MAX (__OVS_USERSPACE_ATTR_MAX - 1)
-
-/**
- * struct ovs_action_push_vlan - %OVS_ACTION_ATTR_PUSH_VLAN action argument.
- * @vlan_tpid: Tag protocol identifier (TPID) to push.
- * @vlan_tci: Tag control identifier (TCI) to push.  The CFI bit must be set
- * (but it will not be set in the 802.1Q header that is pushed).
- *
- * The @vlan_tpid value is typically %ETH_P_8021Q.  The only acceptable TPID
- * values are those that the kernel module also parses as 802.1Q headers, to
- * prevent %OVS_ACTION_ATTR_PUSH_VLAN followed by %OVS_ACTION_ATTR_POP_VLAN
- * from having surprising results.
- */
-struct ovs_action_push_vlan {
-	__be16 vlan_tpid;	/* 802.1Q TPID. */
-	__be16 vlan_tci;	/* 802.1Q TCI (VLAN ID and priority). */
-};
-
-/**
- * enum ovs_action_attr - Action types.
- *
- * @OVS_ACTION_ATTR_OUTPUT: Output packet to port.
- * @OVS_ACTION_ATTR_USERSPACE: Send packet to userspace according to nested
- * %OVS_USERSPACE_ATTR_* attributes.
- * @OVS_ACTION_ATTR_SET: Replaces the contents of an existing header.  The
- * single nested %OVS_KEY_ATTR_* attribute specifies a header to modify and its
- * value.
- * @OVS_ACTION_ATTR_PUSH_VLAN: Push a new outermost 802.1Q header onto the
- * packet.
- * @OVS_ACTION_ATTR_POP_VLAN: Pop the outermost 802.1Q header off the packet.
- * @OVS_ACTION_ATTR_SAMPLE: Probabilitically executes actions, as specified in
- * the nested %OVS_SAMPLE_ATTR_* attributes.
- *
- * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
- * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
- * type may not be changed.
- */
-
-enum ovs_action_attr {
-	OVS_ACTION_ATTR_UNSPEC,
-	OVS_ACTION_ATTR_OUTPUT,	      /* u32 port number. */
-	OVS_ACTION_ATTR_USERSPACE,    /* Nested OVS_USERSPACE_ATTR_*. */
-	OVS_ACTION_ATTR_SET,          /* One nested OVS_KEY_ATTR_*. */
-	OVS_ACTION_ATTR_PUSH_VLAN,    /* struct ovs_action_push_vlan. */
-	OVS_ACTION_ATTR_POP_VLAN,     /* No argument. */
-	OVS_ACTION_ATTR_SAMPLE,       /* Nested OVS_SAMPLE_ATTR_*. */
-	__OVS_ACTION_ATTR_MAX
-};
-
-#define OVS_ACTION_ATTR_MAX (__OVS_ACTION_ATTR_MAX - 1)
+#include <uapi/linux/openvswitch.h>
 
 #endif /* _LINUX_OPENVSWITCH_H */
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 5c8a1d2..d8fbc6a 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -285,6 +285,7 @@ header-y += nvram.h
 header-y += omap3isp.h
 header-y += omapfb.h
 header-y += oom.h
+header-y += openvswitch.h
 header-y += packet_diag.h
 header-y += param.h
 header-y += parport.h
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
new file mode 100644
index 0000000..405918d
--- /dev/null
+++ b/include/uapi/linux/openvswitch.h
@@ -0,0 +1,456 @@
+
+/*
+ * Copyright (c) 2007-2011 Nicira Networks.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301, USA
+ */
+
+#ifndef _UAPI__LINUX_OPENVSWITCH_H
+#define _UAPI__LINUX_OPENVSWITCH_H 1
+
+#include <linux/types.h>
+#include <linux/if_ether.h>
+
+/**
+ * struct ovs_header - header for OVS Generic Netlink messages.
+ * @dp_ifindex: ifindex of local port for datapath (0 to make a request not
+ * specific to a datapath).
+ *
+ * Attributes following the header are specific to a particular OVS Generic
+ * Netlink family, but all of the OVS families use this header.
+ */
+
+struct ovs_header {
+	int dp_ifindex;
+};
+
+/* Datapaths. */
+
+#define OVS_DATAPATH_FAMILY  "ovs_datapath"
+#define OVS_DATAPATH_MCGROUP "ovs_datapath"
+#define OVS_DATAPATH_VERSION 0x1
+
+enum ovs_datapath_cmd {
+	OVS_DP_CMD_UNSPEC,
+	OVS_DP_CMD_NEW,
+	OVS_DP_CMD_DEL,
+	OVS_DP_CMD_GET,
+	OVS_DP_CMD_SET
+};
+
+/**
+ * enum ovs_datapath_attr - attributes for %OVS_DP_* commands.
+ * @OVS_DP_ATTR_NAME: Name of the network device that serves as the "local
+ * port".  This is the name of the network device whose dp_ifindex is given in
+ * the &struct ovs_header.  Always present in notifications.  Required in
+ * %OVS_DP_NEW requests.  May be used as an alternative to specifying
+ * dp_ifindex in other requests (with a dp_ifindex of 0).
+ * @OVS_DP_ATTR_UPCALL_PID: The Netlink socket in userspace that is initially
+ * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
+ * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
+ * not be sent.
+ * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through the
+ * datapath.  Always present in notifications.
+ *
+ * These attributes follow the &struct ovs_header within the Generic Netlink
+ * payload for %OVS_DP_* commands.
+ */
+enum ovs_datapath_attr {
+	OVS_DP_ATTR_UNSPEC,
+	OVS_DP_ATTR_NAME,       /* name of dp_ifindex netdev */
+	OVS_DP_ATTR_UPCALL_PID, /* Netlink PID to receive upcalls */
+	OVS_DP_ATTR_STATS,      /* struct ovs_dp_stats */
+	__OVS_DP_ATTR_MAX
+};
+
+#define OVS_DP_ATTR_MAX (__OVS_DP_ATTR_MAX - 1)
+
+struct ovs_dp_stats {
+	__u64 n_hit;             /* Number of flow table matches. */
+	__u64 n_missed;          /* Number of flow table misses. */
+	__u64 n_lost;            /* Number of misses not sent to userspace. */
+	__u64 n_flows;           /* Number of flows present */
+};
+
+struct ovs_vport_stats {
+	__u64   rx_packets;		/* total packets received       */
+	__u64   tx_packets;		/* total packets transmitted    */
+	__u64   rx_bytes;		/* total bytes received         */
+	__u64   tx_bytes;		/* total bytes transmitted      */
+	__u64   rx_errors;		/* bad packets received         */
+	__u64   tx_errors;		/* packet transmit problems     */
+	__u64   rx_dropped;		/* no space in linux buffers    */
+	__u64   tx_dropped;		/* no space available in linux  */
+};
+
+/* Fixed logical ports. */
+#define OVSP_LOCAL      ((__u32)0)
+
+/* Packet transfer. */
+
+#define OVS_PACKET_FAMILY "ovs_packet"
+#define OVS_PACKET_VERSION 0x1
+
+enum ovs_packet_cmd {
+	OVS_PACKET_CMD_UNSPEC,
+
+	/* Kernel-to-user notifications. */
+	OVS_PACKET_CMD_MISS,    /* Flow table miss. */
+	OVS_PACKET_CMD_ACTION,  /* OVS_ACTION_ATTR_USERSPACE action. */
+
+	/* Userspace commands. */
+	OVS_PACKET_CMD_EXECUTE  /* Apply actions to a packet. */
+};
+
+/**
+ * enum ovs_packet_attr - attributes for %OVS_PACKET_* commands.
+ * @OVS_PACKET_ATTR_PACKET: Present for all notifications.  Contains the entire
+ * packet as received, from the start of the Ethernet header onward.  For
+ * %OVS_PACKET_CMD_ACTION, %OVS_PACKET_ATTR_PACKET reflects changes made by
+ * actions preceding %OVS_ACTION_ATTR_USERSPACE, but %OVS_PACKET_ATTR_KEY is
+ * the flow key extracted from the packet as originally received.
+ * @OVS_PACKET_ATTR_KEY: Present for all notifications.  Contains the flow key
+ * extracted from the packet as nested %OVS_KEY_ATTR_* attributes.  This allows
+ * userspace to adapt its flow setup strategy by comparing its notion of the
+ * flow key against the kernel's.
+ * @OVS_PACKET_ATTR_ACTIONS: Contains actions for the packet.  Used
+ * for %OVS_PACKET_CMD_EXECUTE.  It has nested %OVS_ACTION_ATTR_* attributes.
+ * @OVS_PACKET_ATTR_USERDATA: Present for an %OVS_PACKET_CMD_ACTION
+ * notification if the %OVS_ACTION_ATTR_USERSPACE action specified an
+ * %OVS_USERSPACE_ATTR_USERDATA attribute, with the same length and content
+ * specified there.
+ *
+ * These attributes follow the &struct ovs_header within the Generic Netlink
+ * payload for %OVS_PACKET_* commands.
+ */
+enum ovs_packet_attr {
+	OVS_PACKET_ATTR_UNSPEC,
+	OVS_PACKET_ATTR_PACKET,      /* Packet data. */
+	OVS_PACKET_ATTR_KEY,         /* Nested OVS_KEY_ATTR_* attributes. */
+	OVS_PACKET_ATTR_ACTIONS,     /* Nested OVS_ACTION_ATTR_* attributes. */
+	OVS_PACKET_ATTR_USERDATA,    /* OVS_ACTION_ATTR_USERSPACE arg. */
+	__OVS_PACKET_ATTR_MAX
+};
+
+#define OVS_PACKET_ATTR_MAX (__OVS_PACKET_ATTR_MAX - 1)
+
+/* Virtual ports. */
+
+#define OVS_VPORT_FAMILY  "ovs_vport"
+#define OVS_VPORT_MCGROUP "ovs_vport"
+#define OVS_VPORT_VERSION 0x1
+
+enum ovs_vport_cmd {
+	OVS_VPORT_CMD_UNSPEC,
+	OVS_VPORT_CMD_NEW,
+	OVS_VPORT_CMD_DEL,
+	OVS_VPORT_CMD_GET,
+	OVS_VPORT_CMD_SET
+};
+
+enum ovs_vport_type {
+	OVS_VPORT_TYPE_UNSPEC,
+	OVS_VPORT_TYPE_NETDEV,   /* network device */
+	OVS_VPORT_TYPE_INTERNAL, /* network device implemented by datapath */
+	__OVS_VPORT_TYPE_MAX
+};
+
+#define OVS_VPORT_TYPE_MAX (__OVS_VPORT_TYPE_MAX - 1)
+
+/**
+ * enum ovs_vport_attr - attributes for %OVS_VPORT_* commands.
+ * @OVS_VPORT_ATTR_PORT_NO: 32-bit port number within datapath.
+ * @OVS_VPORT_ATTR_TYPE: 32-bit %OVS_VPORT_TYPE_* constant describing the type
+ * of vport.
+ * @OVS_VPORT_ATTR_NAME: Name of vport.  For a vport based on a network device
+ * this is the name of the network device.  Maximum length %IFNAMSIZ-1 bytes
+ * plus a null terminator.
+ * @OVS_VPORT_ATTR_OPTIONS: Vport-specific configuration information.
+ * @OVS_VPORT_ATTR_UPCALL_PID: The Netlink socket in userspace that
+ * OVS_PACKET_CMD_MISS upcalls will be directed to for packets received on
+ * this port.  A value of zero indicates that upcalls should not be sent.
+ * @OVS_VPORT_ATTR_STATS: A &struct ovs_vport_stats giving statistics for
+ * packets sent or received through the vport.
+ *
+ * These attributes follow the &struct ovs_header within the Generic Netlink
+ * payload for %OVS_VPORT_* commands.
+ *
+ * For %OVS_VPORT_CMD_NEW requests, the %OVS_VPORT_ATTR_TYPE and
+ * %OVS_VPORT_ATTR_NAME attributes are required.  %OVS_VPORT_ATTR_PORT_NO is
+ * optional; if not specified a free port number is automatically selected.
+ * Whether %OVS_VPORT_ATTR_OPTIONS is required or optional depends on the type
+ * of vport.
+ * and other attributes are ignored.
+ *
+ * For other requests, if %OVS_VPORT_ATTR_NAME is specified then it is used to
+ * look up the vport to operate on; otherwise dp_idx from the &struct
+ * ovs_header plus %OVS_VPORT_ATTR_PORT_NO determine the vport.
+ */
+enum ovs_vport_attr {
+	OVS_VPORT_ATTR_UNSPEC,
+	OVS_VPORT_ATTR_PORT_NO,	/* u32 port number within datapath */
+	OVS_VPORT_ATTR_TYPE,	/* u32 OVS_VPORT_TYPE_* constant. */
+	OVS_VPORT_ATTR_NAME,	/* string name, up to IFNAMSIZ bytes long */
+	OVS_VPORT_ATTR_OPTIONS, /* nested attributes, varies by vport type */
+	OVS_VPORT_ATTR_UPCALL_PID, /* u32 Netlink PID to receive upcalls */
+	OVS_VPORT_ATTR_STATS,	/* struct ovs_vport_stats */
+	__OVS_VPORT_ATTR_MAX
+};
+
+#define OVS_VPORT_ATTR_MAX (__OVS_VPORT_ATTR_MAX - 1)
+
+/* Flows. */
+
+#define OVS_FLOW_FAMILY  "ovs_flow"
+#define OVS_FLOW_MCGROUP "ovs_flow"
+#define OVS_FLOW_VERSION 0x1
+
+enum ovs_flow_cmd {
+	OVS_FLOW_CMD_UNSPEC,
+	OVS_FLOW_CMD_NEW,
+	OVS_FLOW_CMD_DEL,
+	OVS_FLOW_CMD_GET,
+	OVS_FLOW_CMD_SET
+};
+
+struct ovs_flow_stats {
+	__u64 n_packets;         /* Number of matched packets. */
+	__u64 n_bytes;           /* Number of matched bytes. */
+};
+
+enum ovs_key_attr {
+	OVS_KEY_ATTR_UNSPEC,
+	OVS_KEY_ATTR_ENCAP,	/* Nested set of encapsulated attributes. */
+	OVS_KEY_ATTR_PRIORITY,  /* u32 skb->priority */
+	OVS_KEY_ATTR_IN_PORT,   /* u32 OVS dp port number */
+	OVS_KEY_ATTR_ETHERNET,  /* struct ovs_key_ethernet */
+	OVS_KEY_ATTR_VLAN,	/* be16 VLAN TCI */
+	OVS_KEY_ATTR_ETHERTYPE,	/* be16 Ethernet type */
+	OVS_KEY_ATTR_IPV4,      /* struct ovs_key_ipv4 */
+	OVS_KEY_ATTR_IPV6,      /* struct ovs_key_ipv6 */
+	OVS_KEY_ATTR_TCP,       /* struct ovs_key_tcp */
+	OVS_KEY_ATTR_UDP,       /* struct ovs_key_udp */
+	OVS_KEY_ATTR_ICMP,      /* struct ovs_key_icmp */
+	OVS_KEY_ATTR_ICMPV6,    /* struct ovs_key_icmpv6 */
+	OVS_KEY_ATTR_ARP,       /* struct ovs_key_arp */
+	OVS_KEY_ATTR_ND,        /* struct ovs_key_nd */
+	OVS_KEY_ATTR_SKB_MARK,  /* u32 skb mark */
+	__OVS_KEY_ATTR_MAX
+};
+
+#define OVS_KEY_ATTR_MAX (__OVS_KEY_ATTR_MAX - 1)
+
+/**
+ * enum ovs_frag_type - IPv4 and IPv6 fragment type
+ * @OVS_FRAG_TYPE_NONE: Packet is not a fragment.
+ * @OVS_FRAG_TYPE_FIRST: Packet is a fragment with offset 0.
+ * @OVS_FRAG_TYPE_LATER: Packet is a fragment with nonzero offset.
+ *
+ * Used as the @ipv4_frag in &struct ovs_key_ipv4 and as @ipv6_frag &struct
+ * ovs_key_ipv6.
+ */
+enum ovs_frag_type {
+	OVS_FRAG_TYPE_NONE,
+	OVS_FRAG_TYPE_FIRST,
+	OVS_FRAG_TYPE_LATER,
+	__OVS_FRAG_TYPE_MAX
+};
+
+#define OVS_FRAG_TYPE_MAX (__OVS_FRAG_TYPE_MAX - 1)
+
+struct ovs_key_ethernet {
+	__u8	 eth_src[ETH_ALEN];
+	__u8	 eth_dst[ETH_ALEN];
+};
+
+struct ovs_key_ipv4 {
+	__be32 ipv4_src;
+	__be32 ipv4_dst;
+	__u8   ipv4_proto;
+	__u8   ipv4_tos;
+	__u8   ipv4_ttl;
+	__u8   ipv4_frag;	/* One of OVS_FRAG_TYPE_*. */
+};
+
+struct ovs_key_ipv6 {
+	__be32 ipv6_src[4];
+	__be32 ipv6_dst[4];
+	__be32 ipv6_label;	/* 20-bits in least-significant bits. */
+	__u8   ipv6_proto;
+	__u8   ipv6_tclass;
+	__u8   ipv6_hlimit;
+	__u8   ipv6_frag;	/* One of OVS_FRAG_TYPE_*. */
+};
+
+struct ovs_key_tcp {
+	__be16 tcp_src;
+	__be16 tcp_dst;
+};
+
+struct ovs_key_udp {
+	__be16 udp_src;
+	__be16 udp_dst;
+};
+
+struct ovs_key_icmp {
+	__u8 icmp_type;
+	__u8 icmp_code;
+};
+
+struct ovs_key_icmpv6 {
+	__u8 icmpv6_type;
+	__u8 icmpv6_code;
+};
+
+struct ovs_key_arp {
+	__be32 arp_sip;
+	__be32 arp_tip;
+	__be16 arp_op;
+	__u8   arp_sha[ETH_ALEN];
+	__u8   arp_tha[ETH_ALEN];
+};
+
+struct ovs_key_nd {
+	__u32 nd_target[4];
+	__u8  nd_sll[ETH_ALEN];
+	__u8  nd_tll[ETH_ALEN];
+};
+
+/**
+ * enum ovs_flow_attr - attributes for %OVS_FLOW_* commands.
+ * @OVS_FLOW_ATTR_KEY: Nested %OVS_KEY_ATTR_* attributes specifying the flow
+ * key.  Always present in notifications.  Required for all requests (except
+ * dumps).
+ * @OVS_FLOW_ATTR_ACTIONS: Nested %OVS_ACTION_ATTR_* attributes specifying
+ * the actions to take for packets that match the key.  Always present in
+ * notifications.  Required for %OVS_FLOW_CMD_NEW requests, optional for
+ * %OVS_FLOW_CMD_SET requests.
+ * @OVS_FLOW_ATTR_STATS: &struct ovs_flow_stats giving statistics for this
+ * flow.  Present in notifications if the stats would be nonzero.  Ignored in
+ * requests.
+ * @OVS_FLOW_ATTR_TCP_FLAGS: An 8-bit value giving the OR'd value of all of the
+ * TCP flags seen on packets in this flow.  Only present in notifications for
+ * TCP flows, and only if it would be nonzero.  Ignored in requests.
+ * @OVS_FLOW_ATTR_USED: A 64-bit integer giving the time, in milliseconds on
+ * the system monotonic clock, at which a packet was last processed for this
+ * flow.  Only present in notifications if a packet has been processed for this
+ * flow.  Ignored in requests.
+ * @OVS_FLOW_ATTR_CLEAR: If present in a %OVS_FLOW_CMD_SET request, clears the
+ * last-used time, accumulated TCP flags, and statistics for this flow.
+ * Otherwise ignored in requests.  Never present in notifications.
+ *
+ * These attributes follow the &struct ovs_header within the Generic Netlink
+ * payload for %OVS_FLOW_* commands.
+ */
+enum ovs_flow_attr {
+	OVS_FLOW_ATTR_UNSPEC,
+	OVS_FLOW_ATTR_KEY,       /* Sequence of OVS_KEY_ATTR_* attributes. */
+	OVS_FLOW_ATTR_ACTIONS,   /* Nested OVS_ACTION_ATTR_* attributes. */
+	OVS_FLOW_ATTR_STATS,     /* struct ovs_flow_stats. */
+	OVS_FLOW_ATTR_TCP_FLAGS, /* 8-bit OR'd TCP flags. */
+	OVS_FLOW_ATTR_USED,      /* u64 msecs last used in monotonic time. */
+	OVS_FLOW_ATTR_CLEAR,     /* Flag to clear stats, tcp_flags, used. */
+	__OVS_FLOW_ATTR_MAX
+};
+
+#define OVS_FLOW_ATTR_MAX (__OVS_FLOW_ATTR_MAX - 1)
+
+/**
+ * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE action.
+ * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to sample with
+ * @OVS_ACTION_ATTR_SAMPLE.  A value of 0 samples no packets, a value of
+ * %UINT32_MAX samples all packets and intermediate values sample intermediate
+ * fractions of packets.
+ * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling event.
+ * Actions are passed as nested attributes.
+ *
+ * Executes the specified actions with the given probability on a per-packet
+ * basis.
+ */
+enum ovs_sample_attr {
+	OVS_SAMPLE_ATTR_UNSPEC,
+	OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */
+	OVS_SAMPLE_ATTR_ACTIONS,     /* Nested OVS_ACTION_ATTR_* attributes. */
+	__OVS_SAMPLE_ATTR_MAX,
+};
+
+#define OVS_SAMPLE_ATTR_MAX (__OVS_SAMPLE_ATTR_MAX - 1)
+
+/**
+ * enum ovs_userspace_attr - Attributes for %OVS_ACTION_ATTR_USERSPACE action.
+ * @OVS_USERSPACE_ATTR_PID: u32 Netlink PID to which the %OVS_PACKET_CMD_ACTION
+ * message should be sent.  Required.
+ * @OVS_USERSPACE_ATTR_USERDATA: If present, its variable-length argument is
+ * copied to the %OVS_PACKET_CMD_ACTION message as %OVS_PACKET_ATTR_USERDATA.
+ */
+enum ovs_userspace_attr {
+	OVS_USERSPACE_ATTR_UNSPEC,
+	OVS_USERSPACE_ATTR_PID,	      /* u32 Netlink PID to receive upcalls. */
+	OVS_USERSPACE_ATTR_USERDATA,  /* Optional user-specified cookie. */
+	__OVS_USERSPACE_ATTR_MAX
+};
+
+#define OVS_USERSPACE_ATTR_MAX (__OVS_USERSPACE_ATTR_MAX - 1)
+
+/**
+ * struct ovs_action_push_vlan - %OVS_ACTION_ATTR_PUSH_VLAN action argument.
+ * @vlan_tpid: Tag protocol identifier (TPID) to push.
+ * @vlan_tci: Tag control identifier (TCI) to push.  The CFI bit must be set
+ * (but it will not be set in the 802.1Q header that is pushed).
+ *
+ * The @vlan_tpid value is typically %ETH_P_8021Q.  The only acceptable TPID
+ * values are those that the kernel module also parses as 802.1Q headers, to
+ * prevent %OVS_ACTION_ATTR_PUSH_VLAN followed by %OVS_ACTION_ATTR_POP_VLAN
+ * from having surprising results.
+ */
+struct ovs_action_push_vlan {
+	__be16 vlan_tpid;	/* 802.1Q TPID. */
+	__be16 vlan_tci;	/* 802.1Q TCI (VLAN ID and priority). */
+};
+
+/**
+ * enum ovs_action_attr - Action types.
+ *
+ * @OVS_ACTION_ATTR_OUTPUT: Output packet to port.
+ * @OVS_ACTION_ATTR_USERSPACE: Send packet to userspace according to nested
+ * %OVS_USERSPACE_ATTR_* attributes.
+ * @OVS_ACTION_ATTR_SET: Replaces the contents of an existing header.  The
+ * single nested %OVS_KEY_ATTR_* attribute specifies a header to modify and its
+ * value.
+ * @OVS_ACTION_ATTR_PUSH_VLAN: Push a new outermost 802.1Q header onto the
+ * packet.
+ * @OVS_ACTION_ATTR_POP_VLAN: Pop the outermost 802.1Q header off the packet.
+ * @OVS_ACTION_ATTR_SAMPLE: Probabilitically executes actions, as specified in
+ * the nested %OVS_SAMPLE_ATTR_* attributes.
+ *
+ * Only a single header can be set with a single %OVS_ACTION_ATTR_SET.  Not all
+ * fields within a header are modifiable, e.g. the IPv4 protocol and fragment
+ * type may not be changed.
+ */
+
+enum ovs_action_attr {
+	OVS_ACTION_ATTR_UNSPEC,
+	OVS_ACTION_ATTR_OUTPUT,	      /* u32 port number. */
+	OVS_ACTION_ATTR_USERSPACE,    /* Nested OVS_USERSPACE_ATTR_*. */
+	OVS_ACTION_ATTR_SET,          /* One nested OVS_KEY_ATTR_*. */
+	OVS_ACTION_ATTR_PUSH_VLAN,    /* struct ovs_action_push_vlan. */
+	OVS_ACTION_ATTR_POP_VLAN,     /* No argument. */
+	OVS_ACTION_ATTR_SAMPLE,       /* Nested OVS_SAMPLE_ATTR_*. */
+	__OVS_ACTION_ATTR_MAX
+};
+
+#define OVS_ACTION_ATTR_MAX (__OVS_ACTION_ATTR_MAX - 1)
+
+#endif /* _LINUX_OPENVSWITCH_H */
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 4/6] openvswitch: Move common genl notify code into ovs_notify()
From: Thomas Graf @ 2013-03-29 13:46 UTC (permalink / raw)
  To: Jesse Gross; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <cover.1364563346.git.tgraf-G/eBtMaohhA@public.gmane.org>

Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
---
 net/openvswitch/datapath.c | 36 ++++++++++++++++--------------------
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 49ee37b..d406503 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -71,6 +71,13 @@ static int ovs_net_id __read_mostly;
 static void rehash_flow_table(struct work_struct *work);
 static DECLARE_DELAYED_WORK(rehash_flow_wq, rehash_flow_table);
 
+static void ovs_notify(struct sk_buff *skb, struct genl_info *info,
+		       struct genl_multicast_group *grp)
+{
+	genl_notify(skb, genl_info_net(info), info->snd_portid,
+		    grp->id, info->nlhdr, GFP_KERNEL);
+}
+
 /**
  * DOC: Locking:
  *
@@ -1061,9 +1068,7 @@ static int ovs_flow_cmd_new_or_set(struct sk_buff *skb, struct genl_info *info)
 	}
 
 	if (!IS_ERR(reply))
-		genl_notify(reply, genl_info_net(info), info->snd_portid,
-			   ovs_dp_flow_multicast_group.id, info->nlhdr,
-			   GFP_KERNEL);
+		ovs_notify(reply, info, &ovs_dp_flow_multicast_group);
 	else
 		netlink_set_err(sock_net(skb->sk)->genl_sock, 0,
 				ovs_dp_flow_multicast_group.id, PTR_ERR(reply));
@@ -1150,8 +1155,7 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct genl_info *info)
 
 	ovs_flow_deferred_free(flow);
 
-	genl_notify(reply, genl_info_net(info), info->snd_portid,
-		    ovs_dp_flow_multicast_group.id, info->nlhdr, GFP_KERNEL);
+	ovs_notify(reply, info, &ovs_dp_flow_multicast_group);
 	return 0;
 }
 
@@ -1383,9 +1387,7 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct genl_info *info)
 	list_add_tail(&dp->list_node, &ovs_net->dps);
 	rtnl_unlock();
 
-	genl_notify(reply, genl_info_net(info), info->snd_portid,
-		    ovs_dp_datapath_multicast_group.id, info->nlhdr,
-		    GFP_KERNEL);
+	ovs_notify(reply, info, &ovs_dp_datapath_multicast_group);
 	return 0;
 
 err_destroy_local_port:
@@ -1453,9 +1455,7 @@ static int ovs_dp_cmd_del(struct sk_buff *skb, struct genl_info *info)
 
 	__dp_destroy(dp);
 
-	genl_notify(reply, genl_info_net(info), info->snd_portid,
-		    ovs_dp_datapath_multicast_group.id, info->nlhdr,
-		    GFP_KERNEL);
+	ovs_notify(reply, info, &ovs_dp_datapath_multicast_group);
 
 	return 0;
 }
@@ -1479,9 +1479,7 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct genl_info *info)
 		return 0;
 	}
 
-	genl_notify(reply, genl_info_net(info), info->snd_portid,
-		    ovs_dp_datapath_multicast_group.id, info->nlhdr,
-		    GFP_KERNEL);
+	ovs_notify(reply, info, &ovs_dp_datapath_multicast_group);
 
 	return 0;
 }
@@ -1727,8 +1725,8 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		ovs_dp_detach_port(vport);
 		goto exit_unlock;
 	}
-	genl_notify(reply, genl_info_net(info), info->snd_portid,
-		    ovs_dp_vport_multicast_group.id, info->nlhdr, GFP_KERNEL);
+
+	ovs_notify(reply, info, &ovs_dp_vport_multicast_group);
 
 exit_unlock:
 	rtnl_unlock();
@@ -1769,8 +1767,7 @@ static int ovs_vport_cmd_set(struct sk_buff *skb, struct genl_info *info)
 		goto exit_unlock;
 	}
 
-	genl_notify(reply, genl_info_net(info), info->snd_portid,
-		    ovs_dp_vport_multicast_group.id, info->nlhdr, GFP_KERNEL);
+	ovs_notify(reply, info, &ovs_dp_vport_multicast_group);
 
 exit_unlock:
 	rtnl_unlock();
@@ -1804,8 +1801,7 @@ static int ovs_vport_cmd_del(struct sk_buff *skb, struct genl_info *info)
 	err = 0;
 	ovs_dp_detach_port(vport);
 
-	genl_notify(reply, genl_info_net(info), info->snd_portid,
-		    ovs_dp_vport_multicast_group.id, info->nlhdr, GFP_KERNEL);
+	ovs_notify(reply, info, &ovs_dp_vport_multicast_group);
 
 exit_unlock:
 	rtnl_unlock();
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 3/6] openvswitch: Refine Netlink message size calculation and kill FLOW_BUFSIZE
From: Thomas Graf @ 2013-03-29 13:46 UTC (permalink / raw)
  To: Jesse Gross; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <cover.1364563346.git.tgraf-G/eBtMaohhA@public.gmane.org>

Kills the FLOW_BUFSIZE constant which needs to be calculated manually
and replaces it with key_attr_size() based on nla_total_size().
Calculates the size of datapath messages instead of relying on
NLMSG_DEFAULT_SIZE and moves the existing message size calculations
into own functions for clarity.

Signed-off-by: Thomas Graf <tgraf-G/eBtMaohhA@public.gmane.org>
---
 net/openvswitch/datapath.c | 76 +++++++++++++++++++++++++++++++---------------
 net/openvswitch/flow.h     | 21 -------------
 2 files changed, 52 insertions(+), 45 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index bca63c8..49ee37b 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -337,6 +337,35 @@ static int queue_gso_packets(struct net *net, int dp_ifindex,
 	return err;
 }
 
+static size_t key_attr_size(void)
+{
+	return    nla_total_size(4)   /* OVS_KEY_ATTR_PRIORITY */
+		+ nla_total_size(4)   /* OVS_KEY_ATTR_IN_PORT */
+		+ nla_total_size(4)   /* OVS_KEY_ATTR_SKB_MARK */
+		+ nla_total_size(12)  /* OVS_KEY_ATTR_ETHERNET */
+		+ nla_total_size(2)   /* OVS_KEY_ATTR_ETHERTYPE */
+		+ nla_total_size(4)   /* OVS_KEY_ATTR_8021Q */
+		+ nla_total_size(0)   /* OVS_KEY_ATTR_ENCAP */
+		+ nla_total_size(2)   /* OVS_KEY_ATTR_ETHERTYPE */
+		+ nla_total_size(40)  /* OVS_KEY_ATTR_IPV6 */
+		+ nla_total_size(2)   /* OVS_KEY_ATTR_ICMPV6 */
+		+ nla_total_size(28); /* OVS_KEY_ATTR_ND */
+}
+
+static size_t upcall_msg_size(const struct sk_buff *skb,
+			      const struct nlattr *userdata)
+{
+	size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
+		+ nla_total_size(skb->len) /* OVS_PACKET_ATTR_PACKET */
+		+ nla_total_size(key_attr_size()); /* OVS_PACKET_ATTR_KEY */
+
+	/* OVS_PACKET_ATTR_USERDATA */
+	if (userdata)
+		size += NLA_ALIGN(userdata->nla_len);
+
+	return size;
+}
+
 static int queue_userspace_packet(struct net *net, int dp_ifindex,
 				  struct sk_buff *skb,
 				  const struct dp_upcall_info *upcall_info)
@@ -345,7 +374,6 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 	struct sk_buff *nskb = NULL;
 	struct sk_buff *user_skb; /* to be queued to userspace */
 	struct nlattr *nla;
-	unsigned int len;
 	int err;
 
 	if (vlan_tx_tag_present(skb)) {
@@ -366,13 +394,7 @@ static int queue_userspace_packet(struct net *net, int dp_ifindex,
 		goto out;
 	}
 
-	len = sizeof(struct ovs_header);
-	len += nla_total_size(skb->len);
-	len += nla_total_size(FLOW_BUFSIZE);
-	if (upcall_info->userdata)
-		len += NLA_ALIGN(upcall_info->userdata->nla_len);
-
-	user_skb = genlmsg_new(len, GFP_ATOMIC);
+	user_skb = genlmsg_new(upcall_msg_size(skb, upcall_info->userdata), GFP_ATOMIC);
 	if (!user_skb) {
 		err = -ENOMEM;
 		goto out;
@@ -801,6 +823,16 @@ static struct genl_multicast_group ovs_dp_flow_multicast_group = {
 	.name = OVS_FLOW_MCGROUP
 };
 
+static size_t ovs_flow_cmd_msg_size(const struct sw_flow_actions *acts)
+{
+	return NLMSG_ALIGN(sizeof(struct ovs_header))
+		+ nla_total_size(key_attr_size()) /* OVS_FLOW_ATTR_KEY */
+		+ nla_total_size(sizeof(struct ovs_flow_stats)) /* OVS_FLOW_ATTR_STATS */
+		+ nla_total_size(1) /* OVS_FLOW_ATTR_TCP_FLAGS */
+		+ nla_total_size(8) /* OVS_FLOW_ATTR_USED */
+		+ nla_total_size(acts->actions_len); /* OVS_FLOW_ATTR_ACTIONS */
+}
+
 /* Called with genl_lock. */
 static int ovs_flow_cmd_fill_info(struct sw_flow *flow, struct datapath *dp,
 				  struct sk_buff *skb, u32 portid,
@@ -879,25 +911,11 @@ error:
 static struct sk_buff *ovs_flow_cmd_alloc_info(struct sw_flow *flow)
 {
 	const struct sw_flow_actions *sf_acts;
-	int len;
 
 	sf_acts = rcu_dereference_protected(flow->sf_acts,
 					    lockdep_genl_is_held());
 
-	/* OVS_FLOW_ATTR_KEY */
-	len = nla_total_size(FLOW_BUFSIZE);
-	/* OVS_FLOW_ATTR_ACTIONS */
-	len += nla_total_size(sf_acts->actions_len);
-	/* OVS_FLOW_ATTR_STATS */
-	len += nla_total_size(sizeof(struct ovs_flow_stats));
-	/* OVS_FLOW_ATTR_TCP_FLAGS */
-	len += nla_total_size(1);
-	/* OVS_FLOW_ATTR_USED */
-	len += nla_total_size(8);
-
-	len += NLMSG_ALIGN(sizeof(struct ovs_header));
-
-	return genlmsg_new(len, GFP_KERNEL);
+	return genlmsg_new(ovs_flow_cmd_msg_size(sf_acts), GFP_KERNEL);
 }
 
 static struct sk_buff *ovs_flow_cmd_build_info(struct sw_flow *flow,
@@ -1213,6 +1231,16 @@ static struct genl_multicast_group ovs_dp_datapath_multicast_group = {
 	.name = OVS_DATAPATH_MCGROUP
 };
 
+static size_t ovs_dp_cmd_msg_size(void)
+{
+	size_t msgsize = NLMSG_ALIGN(sizeof(struct ovs_header));
+
+	msgsize += nla_total_size(IFNAMSIZ);
+	msgsize += nla_total_size(sizeof(struct ovs_dp_stats));
+
+	return msgsize;
+}
+
 static int ovs_dp_cmd_fill_info(struct datapath *dp, struct sk_buff *skb,
 				u32 portid, u32 seq, u32 flags, u8 cmd)
 {
@@ -1251,7 +1279,7 @@ static struct sk_buff *ovs_dp_cmd_build_info(struct datapath *dp, u32 portid,
 	struct sk_buff *skb;
 	int retval;
 
-	skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	skb = genlmsg_new(ovs_dp_cmd_msg_size(), GFP_KERNEL);
 	if (!skb)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index a7bb60f..0875fde 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -138,27 +138,6 @@ int ovs_flow_extract(struct sk_buff *, u16 in_port, struct sw_flow_key *,
 void ovs_flow_used(struct sw_flow *, struct sk_buff *);
 u64 ovs_flow_used_time(unsigned long flow_jiffies);
 
-/* Upper bound on the length of a nlattr-formatted flow key.  The longest
- * nlattr-formatted flow key would be:
- *
- *                         struct  pad  nl hdr  total
- *                         ------  ---  ------  -----
- *  OVS_KEY_ATTR_PRIORITY      4    --     4      8
- *  OVS_KEY_ATTR_IN_PORT       4    --     4      8
- *  OVS_KEY_ATTR_SKB_MARK      4    --     4      8
- *  OVS_KEY_ATTR_ETHERNET     12    --     4     16
- *  OVS_KEY_ATTR_ETHERTYPE     2     2     4      8  (outer VLAN ethertype)
- *  OVS_KEY_ATTR_8021Q         4    --     4      8
- *  OVS_KEY_ATTR_ENCAP         0    --     4      4  (VLAN encapsulation)
- *  OVS_KEY_ATTR_ETHERTYPE     2     2     4      8  (inner VLAN ethertype)
- *  OVS_KEY_ATTR_IPV6         40    --     4     44
- *  OVS_KEY_ATTR_ICMPV6        2     2     4      8
- *  OVS_KEY_ATTR_ND           28    --     4     32
- *  -------------------------------------------------
- *  total                                       152
- */
-#define FLOW_BUFSIZE 152
-
 int ovs_flow_to_nlattrs(const struct sw_flow_key *, struct sk_buff *);
 int ovs_flow_from_nlattrs(struct sw_flow_key *swkey, int *key_lenp,
 		      const struct nlattr *);
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 0/6] Open vSwitch updates
From: Thomas Graf @ 2013-03-29 13:46 UTC (permalink / raw)
  To: Jesse Gross; +Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA

A couple of small Open vSwitch fixes and cleanups that accumulated
while working on larger pieces.

The biggest change is the exposure of <linux/openvswitch.h> to user
space to make the Netlink interface available without requiring every
appliation to copy the header file from a source tree.

Thomas Graf (6):
  openvswitch: Specify the minimal length of OVS_PACKET_ATTR_PACKET in
    the policy
  openvswitch: Use nla_memcpy() to memcpy() data from attributes
  openvswitch: Refine Netlink message size calculation and kill
    FLOW_BUFSIZE
  openvswitch: Move common genl notify code into ovs_notify()
  openvswitch: Use ETH_ALEN to define ethernet addresses
  openvswitch: Expose <linux/openvswitch.h> to userspace

 include/linux/openvswitch.h      | 432 +------------------------------------
 include/uapi/linux/Kbuild        |   1 +
 include/uapi/linux/openvswitch.h | 456 +++++++++++++++++++++++++++++++++++++++
 net/openvswitch/datapath.c       | 119 +++++-----
 net/openvswitch/flow.c           |   2 +-
 net/openvswitch/flow.h           |  21 --
 6 files changed, 530 insertions(+), 501 deletions(-)
 create mode 100644 include/uapi/linux/openvswitch.h

-- 
1.7.11.7

^ permalink raw reply

* Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
From: Eric Dumazet @ 2013-03-29 13:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Jiri Pirko, Andy Gospodarek, David S. Miller, LKML, netdev,
	Nicolas de Pesloüan, Thomas Gleixner, Guy Streeter,
	Paul E. McKenney, stephen
In-Reply-To: <1364563039.10629.19.camel@gandalf.local.home>

On Fri, 2013-03-29 at 09:17 -0400, Steven Rostedt wrote:

> I've thought about this too, but I wasn't sure we wanted two
> synchronize_*() functions, as the caller does a synchronize as well.
> That said, I think this is the more robust solution and it lets all
> rx_handler() functions assume that their rx_handler_data is set. And it
> removes the check from the fast path which outweighs an added
> synchronization in the slow path.
> 

Note that I used synchronize_net(), which does a
synchronize_rcu_expedited() when RTNL is locked, so its normally quite
fast.

> Acked-by: Steven Rostedt <rostedt@goodmis.org>
> 
> Thanks!

Thanks a lot for your very detailed report and analysis !

^ permalink raw reply

* [PATCH] man: packet.7: document fanout, ring and auxiliary options
From: Willem de Bruijn @ 2013-03-29 13:29 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w,
	linux-man-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, kaber-dcUjhNyLwpNeoWH0uzbU5w,
	scott.a.mcmillan-ral2JQCrhuEAvxtiuMwx3w,
	johann.baudy-1YmjpbiIw0bR7s880joybQ,
	herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q, Willem de Bruijn
In-Reply-To: <CA+FuTScjf2nkPykOkscuWXMuUSfAhDtQGCBhByxcyJ_cSsOPcQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

The packet socket manual page does not list all socket options.

This patch adds descriptions of the common packet socket options
  PACKET_AUXDATA, PACKET_FANOUT, PACKET_RX_RING, PACKET_STATISTICS,
  PACKET_TX_RING

and the ring-specific options
  PACKET_LOSS, PACKET_RESERVE, PACKET_TIMESTAMP, PACKET_VERSION

It does not yet add descriptions for
  PACKET_COPY_THRESH, PACKET_HDRLEN, PACKET_ORIGDEV,
  PACKET_TX_HAS_OFF, PACKET_TX_TIMESTAMP, PACKET_VNET_HDR

It tries to balance being informative with exposing kernel detail
that is unlikely to be used by most readers or that may change
frequently. For implementation details, the manpage points to the
documentation in kernel Documentation/networking. Let me know if
options should be added or removed.

Source: PACKET_FANOUT, PACKET_RX_RING and PACKET_VERSION are in
/tools/testing/net/psock_fanout.c in the latest Linux kernel source
tree. PACKET_STATISTICS was in the first version of that test.
PACKET_TX_RING I have used elsewhere. The other options are based
on reading kernel code.

If you are on the CC: list, then you are the author of one of
the commits referred to in this manpage. If you can, please
check whether my description of your change is correct. Thanks.

Signed-off-by: Willem de Bruijn <willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
---
 man7/packet.7 | 207 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 198 insertions(+), 9 deletions(-)

diff --git a/man7/packet.7 b/man7/packet.7
index 006f2ac..a84ebee 100644
--- a/man7/packet.7
+++ b/man7/packet.7
@@ -177,17 +177,22 @@ and
 .I sll_ifindex
 are used.
 .SS Socket options
+Packet socket options are configured by calling
+.BR setsockopt (2)
+with level
+.BR SOL_PACKET .
+.TP
+.BR PACKET_ADD_MEMBERSHIP
+.PD 0
+.TP
+.BR PACKET_DROP_MEMBERSHIP
+.PD
 Packet sockets can be used to configure physical layer multicasting
 and promiscuous mode.
-It works by calling
-.BR setsockopt (2)
-on a packet socket for
-.B SOL_PACKET
-and one of the options
 .B PACKET_ADD_MEMBERSHIP
-to add a binding or
+adds a binding and
 .B PACKET_DROP_MEMBERSHIP
-to drop it.
+drops it.
 They both expect a
 .B packet_mreq
 structure as argument:
@@ -227,11 +232,195 @@ In addition the traditional ioctls
 .BR SIOCADDMULTI ,
 .B SIOCDELMULTI
 can be used for the same purpose.
+.TP
+.BR PACKET_AUXDATA " (since Linux 2.6.21)"
+.\" commit 8dc4194474159660d7f37c495e3fc3f10d0db8cc
+If this binary option is enabled, the packet socket passes a metadata
+structure along with each packet in the
+.BR recvmsg (2)
+control field.
+The structure can be read with
+.BR cmsg (3).
+It is defined as
+
+.in +4n
+.nf
+struct tpacket_auxdata {
+    __u32 tp_status;
+    __u32 tp_len;      /* packet length */
+    __u32 tp_snaplen;  /* captured length */
+    __u16 tp_mac;
+    __u16 tp_net;
+    __u16 tp_vlan_tci;
+    __u16 tp_padding;
+};
+.fi
+.in
+
+.I tp_net
+stores the offset to the network layer.
+If the packet socket is of type
+.BR SOCK_DGRAM ,
+then
+.I tp_mac
+is the same.
+If it is of type
+.BR SOCK_RAW ,
+then that field stores the offset to the link layer frame.
+.TP
+.BR PACKET_FANOUT " (since Linux 3.1)"
+.\" commit dc99f600698dcac69b8f56dda9a8a00d645c5ffc
+To scale processing across threads, packet sockets can form a fanout
+group.
+In this mode, each matching packet is enqueued onto only one
+socket in the group.
+A socket joins a fanout group by calling
+.BR setsockopt (2)
+with level
+.B SOL_PACKET
+and option
+.BR PACKET_FANOUT .
+Each network namespace can have up to 65536 independent groups.
+A socket selects a group by encoding the ID in the first 16 bits of
+the integer option value.
+The first packet socket to join a group implicitly creates it.
+To successfully join an existing group, subsequent packet sockets
+must have the same protocol, device settings and fanout mode and
+flags (see below).
+Packet sockets can leave a fanout group only by closing the socket.
+The group is deleted when the last socket is closed.
+
+Fanout supports multiple algorithms to spread traffic between sockets.
+The default mode,
+.BR PACKET_FANOUT_HASH ,
+sends packets from the same flow to the same socket to maintain
+per-flow ordering.
+For each packet, it chooses a socket by taking the packet flow hash
+modulo the number of sockets in the group, where a flow hash is a hash
+over network layer address and optional transport layer port fields.
+The load balance mode
+.BR PACKET_FANOUT_LB
+implements a round-robin algorithm.
+.BR PACKET_FANOUT_CPU
+selects the socket based on the CPU that the packet arrived on.
+
+Fanout modes can take additional options.
+IP fragmentation causes packets from the same flow to have different
+flow hashes.
+The flag
+.BR PACKET_FANOUT_FLAG_DEFRAG ,
+if set, causes packet to be defragmented before fanout is applied, to
+preserve order even in this case.
+Fanout mode and options are communicated in the second 16 bits of the
+integer option value.
+.TP
+.BR PACKET_LOSS " (with PACKET_TX_RING)"
+If set, do not silently drop a packet on transmission error, but
+return it with status set to
+.BR TP_STATUS_WRONG_FORMAT .
+.TP
+.BR PACKET_RESERVE " (with PACKET_RX_RING)"
+By default, a packet receive ring writes packets immediately following the
+metadata structure and alignment padding.
+This integer option reserves additional headroom.
+.TP
+.BR PACKET_RX_RING
+Create a memory mapped ring buffer for asynchronous packet reception.
+The packet socket reserves a contiguous region of application address
+space, lays it out into an array of packet slots and copies packets
+(up to
+.IR tp_snaplen)
+into subsequent slots.
+Each packet is preceded by a metadata structure similar to
+.IR tpacket_auxdata .
+Packet socket and application communicate the head and tail of the ring
+through the
+.I tp_status
+field.
+The packet socket owns all slots with status
+.BR TP_STATUS_KERNEL .
+After filling a slot, it changes the status of the slot to transfer
+ownership to the application.
+During normal operation, the new status is
+.BR TP_STATUS_USER ,
+to signal that a correctly received packet has been stored.
+When the application has finished processing a packet, it transfers
+ownership of the slot back to the socket by setting the status to
+.BR TP_STATUS_KERNEL .
+Packet sockets implement multiple variants of the packet ring.
+The implementation details are described in
+.IR Documentation/networking/packet_mmap.txt
+in the Linux kernel source tree.
+.TP
+.BR PACKET_STATISTICS
+Retrieve packet socket statistics in the form of a structure
+
+.in +4n
+.nf
+struct tpacket_stats {
+    __u32 tp_packets;  /* total packet count */
+    __u32 tp_drops;    /* dropped packet count */
+};
+.fi
+.in
+
+Receiving statistics resets the internal counters.
+The statistics structure differs when using a ring of variant
+.BR TPACKET_V3 .
+.TP
+.BR PACKET_TIMESTAMP " (with PACKET_RX_RING)"
+.\" commit 614f60fa9d73a9e8fdff3df83381907fea7c5649
+The packet receive ring always stores a timestamp in the metadata header.
+By default, this is a software generated timestamp generated when the
+packet is copied into the ring.
+This integer option selects the type of timestamp.
+Besides the default, it support the two hardware formats described in
+.IR Documentation/networking/timestamping.txt
+in the Linux kernel source tree.
+.TP
+.BR PACKET_TX_RING " (since Linux 2.6.31)"
+.\" commit 69e3c75f4d541a6eb151b3ef91f34033cb3ad6e1
+Create a memory mapped ring buffer for packet transmission.
+This option is similar to
+.BR PACKET_RX_RING
+and takes the same arguments.
+The application writes packets into slots with status
+.BR TP_STATUS_AVAILABLE
+and schedules them for transmission by changing the status to
+.BR TP_STATUS_SEND_REQUEST .
+When packets are ready to be transmitted, the application calls
+.BR send (2)
+or a variant thereof.
+The
+.I buf
+and
+.I len
+fields of this call are ignored.
+If an address is passed using
+.BR sendto (2)
+or
+.BR sendmsg (2) ,
+then that overrides the socket default.
+On successful transmission, the socket resets the slot to
+.BR TP_STATUS_AVAILABLE .
+It discards packets silently on error unless
+.BR PACKET_LOSS
+is set.
+.TP
+.BR PACKET_VERSION " (with PACKET_RX_RING)"
+.\" commit bbd6ef87c544d88c30e4b762b1b61ef267a7d279
+By default,
+.BR PACKET_RX_RING
+creates a packet receive ring of variant
+.BR TPACKET_V1 .
+To create another variant, configure the desired variant by setting this
+integer option before creating the ring.
+
 .SS Ioctls
 .B SIOCGSTAMP
 can be used to receive the timestamp of the last received packet.
 Argument is a
-.I struct timeval.
+.I struct timeval .
 .\" FIXME Document SIOCGSTAMPNS
 
 In addition all standard ioctls defined in
@@ -318,7 +507,7 @@ header to get a fully conforming packet.
 Incoming 802.3 packets are not multiplexed on the DSAP/SSAP protocol
 fields; instead they are supplied to the user as protocol
 .B ETH_P_802_2
-with the LLC header prepended.
+with the LLC header prefixed.
 It is thus not possible to bind to
 .BR ETH_P_802_3 ;
 bind to
-- 
1.8.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] man: packet.7: document fanout, ring and auxiliary options
From: Willem de Bruijn @ 2013-03-29 13:25 UTC (permalink / raw)
  To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-man-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CAKgNAkgYG7_iSAg0zZYs4V4TbYaBpQmNFcV8=XBGwUfrzi1amA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Thu, Mar 28, 2013 at 6:01 AM, Michael Kerrisk (man-pages)
<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Willem,
>
> Thanks for sending this patch. This all looks good and authoritative.
> Could I ask you to make a few small clean-ups and resubmit? See below.

Thanks for reviewing the patch, Michael. I will send the revised
version following this email.

> On Mon, Mar 18, 2013 at 6:13 PM, Willem de Bruijn <willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>> The packet socket manual page does not list all socket options.
>>
>> This patch adds descriptions of the common packet socket options
>>   PACKET_AUXDATA, PACKET_FANOUT, PACKET_RX_RING, PACKET_STATISTICS,
>>   PACKET_TX_RING
>>
>> and the ring-specific options
>>   PACKET_LOSS, PACKET_RESERVE, PACKET_TIMESTAMP, PACKET_VERSION
>>
>> It does not yet add descriptions for
>>   PACKET_COPY_THRESH, PACKET_HDRLEN, PACKET_ORIGDEV,
>>   PACKET_TX_HAS_OFF, PACKET_TX_TIMESTAMP, PACKET_VNET_HDR
>>
>> It tries to balance being informative with exposing kernel detail
>> that is unlikely to be used by most readers or that may change
>> frequently. For implementation details, the manpage points to the
>> documentation in kernel Documentation/networking. Let me know if
>> options should be added or removed.
>
> For the commit log message, could you just add a few lines for each of
> the options stating how you determined the information. Also, if there
> are specific individuals who could Ack the patch, please CC them and
> ask them if they might Ack the patch.

I will cc: the developers of the commits referenced in the man page.

>
>> Signed-off-by: Willem de Bruijn <willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>> ---
>>  man7/packet.7 | 183 +++++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 175 insertions(+), 8 deletions(-)
>>
>> diff --git a/man7/packet.7 b/man7/packet.7
>> index 006f2ac..a9cc168 100644
>> --- a/man7/packet.7
>> +++ b/man7/packet.7
>> @@ -177,17 +177,21 @@ and
>>  .I sll_ifindex
>>  are used.
>>  .SS Socket options
>> +Packet socket options are configured by calling
>> +. BR setsockopt (2)
>> +with level SOL_PACKET.
>
> +with level
> +.BR SOL_PACKET .
>
>> +.TP
>> +.BR PACKET_ADD_MEMBERSHIP
>> +.PD 0
>> +.TP
>> +.BR PACKET_DROP_MEMBERSHIP
>> +.PD
>>  Packet sockets can be used to configure physical layer multicasting
>>  and promiscuous mode.
>> -It works by calling
>> -.BR setsockopt (2)
>> -on a packet socket for
>> -.B SOL_PACKET
>> -and one of the options
>>  .B PACKET_ADD_MEMBERSHIP
>> -to add a binding or
>> +adds a binding and
>>  .B PACKET_DROP_MEMBERSHIP
>> -to drop it.
>> +drops it.
>>  They both expect a
>>  .B packet_mreq
>>  structure as argument:
>> @@ -227,6 +231,169 @@ In addition the traditional ioctls
>>  .BR SIOCADDMULTI ,
>>  .B SIOCDELMULTI
>>  can be used for the same purpose.
>> +.TP
>> +.BR PACKET_AUXDATA " (since Linux 2.6.21)"
>> +.\" commit 8dc419447
>
> It's great that you include these commit IDs, but I strongly prefer to
> have the full 40-char ID. Potentially useful one day for scripting,
> etc. Same comment for the instances below.
>
>> +If this binary option is enabled, the packet socket passes a metadata
>> +structure along with each packet in the
>> +.BR recvmsg (2)
>> +control field. The
>
> Please start new sentences on new source lines (see man-pages(7)).
> Same comment at numerous places below.
>
>
>> +structure can be read with
>> +.BR cmsg (3). It is defined as
>
> Formatting broken there. Start new line after the period.
>
>> +
>> +.in +4n
>> +.nf
>> +struct tpacket_auxdata {
>> +    __u32 tp_status;
>> +    __u32 tp_len;      /* packet length */
>> +    __u32 tp_snaplen;  /* captured length */
>> +    __u16 tp_mac;
>> +    __u16 tp_net;
>> +    __u16 tp_vlan_tci;
>> +    __u16 tp_padding;
>> +};
>> +.fi
>> +.in
>> +
>> +.B tp_net
>
> .I tp_net
>
>> +stores the offset to the network layer. If the packet socket is of type
>> +.BR SOCK_DGRAM ,
>> +then
>> +.B tp_mac
>> +is the same. If it is of type
>> +.B SOCK_RAW ,
>
> .BR SOCK_RAW ,
>
>> +then that stores the offset to the link layer frame.
>> +.TP
>> +.BR PACKET_FANOUT " (since Linux 3.1)"
>> +.\" commit dc99f6006
>> +To scale processing across threads, packet sockets can form a fanout
>> +group. In this mode, each matching packet is enqueued onto only one
>> +socket in the group. A socket joins a fanout group by calling
>> +.B setsockopt(2)
>> +with level SOL_PACKET and option PACKET_FANOUT.
>
> .B SOL_PACKET
> .BR PACKET_FANOUT .
>
>> +Each network namespace can have up to 65536 independent groups. A
>> +socket selects a group by encoding the ID in the first 16 bits of
>> +the integer option value. The first packet socket to join a group
>> +implicitly creates it. To successfully join an existing group,
>> +subsequent packet sockets must have the same
>> +protocol, device settings and fanout mode and flags (see below).
>> +Packet sockets can leave a fanout group only by closing the socket.
>> +The group is deleted when the last socket is closed.
>> +
>> +Fanout supports multiple algorithms to spread traffic between sockets.
>> +The default mode,
>> +. BR PACKET_FANOUT_HASH ,
>> +sends packets from the same flow to the same socket to maintain per-flow
>> +ordering. For each packet, it chooses a socket by taking the packet
>> +flow hash modulo the number of sockets in the group, where a flow hash
>> +is a hash over network layer address and optional transport layer port
>> +fields. The load balance mode
>> +. BR PACKET_FANOUT_LB
>> +implements a round robin algorithm.
>
> round-robin
>
>> +. BR PACKET_FANOUT_CPU
>> +selects the socket based on the cpu that the packet arrived on.
>
> CPU
>
>> +
>> +Fanout modes can take additional options. IP fragmentation causes packets
>> +from the same flow to have different flow hashes. The flag
>> +.BR PACKET_FANOUT_FLAG_DEFRAG ,
>> +if set, causes packet to be defragmented before fanout is applied, to
>> +preserve order even in this case. Fanout mode and options are communicated
>> +in the second 16 bits of the integer option value.
>> +.TP
>> +.BR PACKET_LOSS " (with PACKET_TX_RING)"
>> +If set, do not silently drop on transmission errors, but return the
>> +packet with status set to
>> +.BR TP_STATUS_WRONG_FORMAT
>> +.TP
>> +.BR PACKET_RESERVE " (with PACKET_RX_RING)"
>> +By default, a packet receive ring writes packets immediately following the
>> +metadata structure and alignment padding. This integer option reserves
>> +additional headroom.
>> +.TP
>> +.BR PACKET_RX_RING
>> +Create a memory mapped ring buffer for asynchronous packet reception.
>> +The packet socket reserves a contiguous region of application address
>> +space, lays it out into an array of packet slots and copies packets
>> +(up to snaplen)
>
> .IR tp_snaplen )
>
>> into subsequent slots. Each packet is preceded by a
>> +metadata structure similar to
>> +.B tpacket_auxdata.
>
> .IR tpacket_auxdata .
>
>> +Packet socket and application communicate the head and tail of the ring
>> +through the
>> +.B tp_status
>
> .I
>
>> +field. The packet socket owns all slots with status
>> +.BR TP_STATUS_KERNEL .
>> +After filling a slot, it changes the status of the slot to transfer
>> +ownership to the application. During normal operation, the new status is
>> +.BR TP_STATUS_USER ,
>> +to signal that a correctly received packet has been stored. When the
>> +application has finished processing a packet, it transfers ownership of
>> +the slot back to the socket by setting the status to
>> +.BR TP_STATUS_KERNEL .
>> +Packet sockets implement multiple
>> +variants of the packet ring. The implementation details are described in
>> +.IR Documentation/networking/packet_mmap.txt
>> +in the Linux kernel source tree.
>> +.TP
>> +.BR PACKET_STATISTICS
>> +Retrieve packet socket statistics in the form of a structure
>> +
>> +.in +4n
>> +.nf
>> +struct tpacket_stats {
>> +    __u32 tp_packets;  /* total packet count */
>> +    __u32 tp_drops;    /* dropped packet count */
>> +};
>> +.fi
>> +.in
>> +
>> +Receiving statistics resets the internal counters. The exact statistics
>> +structure differs when using a ring of variant
>> +.BR TPACKET_V3 .
>> +.TP
>> +.BR PACKET_TIMESTAMP " (with PACKET_RX_RING)"
>> +The packet receive ring always stores a timestamp in the metadata header.
>> +By default, this is a software generated timestamp generated when the
>> +packet is copied into the ring. This integer option selects the type of
>> +timestamp. Besides the default, it support the two hardware formats
>> +described in
>> +.IR Documentation/networking/timestamping.txt
>> +in the Linux kernel source tree.
>> +.TP
>> +.BR PACKET_TX_RING " (since Linux 2.6.31)"
>> +.\" commit 69e3c75f4
>> +Create a memory mapped ring buffer for packet transmission. This option
>> +is similar to
>> +.BR PACKET_RX_RING
>> +and takes the same arguments. The application writes packets into slots
>> +with status
>> +.BR TP_STATUS_AVAILABLE
>> +and schedules them for transmission by changing the status to
>> +.BR TP_STATUS_SEND_REQUEST .
>> +When packets are ready to be transmitted, the application calls
>> +.BR send (2)
>> +Or a variant thereof. The
>
> s/Or/or/
>
>> +.B buf
>
> .I buf
>
>> +and
>> +.B len
>
> .I len
>
>> +fields of this call are ignored. If an address is passed using
>> +.BR sendto (2)
>> +or
>> +.BR sendmsg (2) ,
>> +then that overrides the socket default. On successful transmission, the
>> +socket resets the slot to
>> +.BR TP_STATUS_AVAILABLE .
>> +It discards packets silently on error unless
>> +.BR PACKET_LOSS
>> +is set.
>> +.TP
>> +.BR PACKET_VERSION " (with PACKET_RX_RING)"
>> +By default,
>> +.BR PACKET_RX_RING
>> +creates a packet receive ring of variant
>> +.BR TPACKET_V1 .
>> +To create another variant, configure the desired variant by setting this
>> +integer option before creating the ring.
>> +
>>  .SS Ioctls
>>  .B SIOCGSTAMP
>>  can be used to receive the timestamp of the last received packet.
>> @@ -318,7 +485,7 @@ header to get a fully conforming packet.
>>  Incoming 802.3 packets are not multiplexed on the DSAP/SSAP protocol
>>  fields; instead they are supplied to the user as protocol
>>  .B ETH_P_802_2
>> -with the LLC header prepended.
>> +with the LLC header prefixed.
>>  It is thus not possible to bind to
>>  .BR ETH_P_802_3 ;
>>  bind to
>
> Thanks,
>
> Michael
>
> --
> Michael Kerrisk
> Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
> Author of "The Linux Programming Interface"; http://man7.org/tlpi/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] netlink: fix the warning introduced by netlink API replacement
From: Hong Zhiguo @ 2013-03-29 13:22 UTC (permalink / raw)
  To: netdev; +Cc: davem, stephen, brian.haley, tgraf
In-Reply-To: <1364402824-32680-1-git-send-email-honkiko@gmail.com>

Signed-off-by: Hong Zhiguo <honkiko@gmail.com>
---
 net/ieee802154/netlink.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ieee802154/netlink.c b/net/ieee802154/netlink.c
index 9247252..91b0363 100644
--- a/net/ieee802154/netlink.c
+++ b/net/ieee802154/netlink.c
@@ -65,7 +65,8 @@ struct sk_buff *ieee802154_nl_create(int flags, u8 req)
 int ieee802154_nl_mcast(struct sk_buff *msg, unsigned int group)
 {
 	/* XXX: nlh is right at the start of msg */
-	void *hdr = genlmsg_data(nlmsg_data(msg->data));
+	struct nlmsghdr *nlh = (struct nlmsghdr *)msg->data;
+	void *hdr = genlmsg_data(nlmsg_data(nlh));
 
 	if (genlmsg_end(msg, hdr) < 0)
 		goto out;
@@ -98,7 +99,8 @@ struct sk_buff *ieee802154_nl_new_reply(struct genl_info *info,
 int ieee802154_nl_reply(struct sk_buff *msg, struct genl_info *info)
 {
 	/* XXX: nlh is right at the start of msg */
-	void *hdr = genlmsg_data(nlmsg_data(msg->data));
+	struct nlmsghdr *nlh = (struct nlmsghdr *)msg->data;
+	void *hdr = genlmsg_data(nlmsg_data(nlh));
 
 	if (genlmsg_end(msg, hdr) < 0)
 		goto out;
-- 
1.7.10.4

^ permalink raw reply related

* Fw: Bug in ks8851.c
From: Max.Nekludov @ 2013-03-29 13:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David S. Miller, Jiri Pirko, linus971, Linux Kernel Mailing List,
	Matt Renzelmann, Network Development, Stephen Boyd, Greg Ungerer,
	linux-arm-kernel

[-- Attachment #1: Type: text/plain, Size: 2972 bytes --]


Linus,

I tried to send the mail to 'Ben Dooks <ben@simtec.co.uk>' but the address
is dead now.
> I assume you've tested it in practice?
Yes, I'm running the modified code both in bootloader and Linux kernel on
my board.

Thanks,
Max

 Signed-off-by: Max Nekludov <Max.Nekludov@us.elster.com>
 ---
 drivers/net/ethernet/micrel/ks8851.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/micrel/ks8851.c
b/drivers/net/ethernet/micrel/ks8851.c
index 33bcb63d56a2..8fb481252e2c 100644
--- a/drivers/net/ethernet/micrel/ks8851.c
+++ b/drivers/net/ethernet/micrel/ks8851.c
@@ -528,7 +528,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
 	for (; rxfc != 0; rxfc--) {
 		rxh = ks8851_rdreg32(ks, KS_RXFHSR);
 		rxstat = rxh & 0xffff;
-		rxlen = rxh >> 16;
+		rxlen = (rxh >> 16) & 0xfff;

 		netif_dbg(ks, rx_status, ks->netdev,
 			  "rx: stat 0x%04x, len 0x%04x\n", rxstat, rxlen);




Max,
 please cc the actual maintainers of the driver. The patch looks sane,
though. I assume you've tested it in practice?

You also seem to have based this on an ancient version, the code has
long since moved from drivers/net/ks8851.c to
drivers/net/ethernet/micrel/ks8851.c (back in June of 2011), and it's
missing a sign-off from you.

I'm attaching an updated patch for the rename/capitalization issue.

     Linus

On Thu, Mar 28, 2013 at 11:25 AM,  <Max.Nekludov@us.elster.com> wrote:
>
> According to the Datasheet (page 52):
> 15-12 Reserved
> 11-0 RXBC Receive Byte Count
> This field indicates the present received frame byte size.
>
> I suppose the code has a bug:
>                 rxh = ks8851_rdreg32(ks, KS_RXFHSR);
>                 rxstat = rxh & 0xffff;
>                 rxlen = rxh >> 16; // BUG!!! 0xFFF mask should be applied
>
> P.S.
> without bit mask applied I saw rxlen equal to 15360 which is bigger then
> entire RX queue size (12KB).
>
> Thanks,
> Max Nekludov
>
> From cb3199cee4490f98d6062e32a75ca377a32b55bc Mon Sep 17 00:00:00 2001
> From: Max Neklyudov <macscomp@gmail.com>
> Date: Tue, 26 Mar 2013 11:46:57 +0400
> Subject: [PATCH] Fix bug in ks8851 driver
>
> ---
>  drivers/net/ks8851.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ks8851.c b/drivers/net/ks8851.c
> index 91a93cb..0dc03da 100644
> --- a/drivers/net/ks8851.c
> +++ b/drivers/net/ks8851.c
> @@ -553,7 +553,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
>         for (; rxfc != 0; rxfc--) {
>                 rxh = ks8851_rdreg32(ks, KS_RXFHSR);
>                 rxstat = rxh & 0xffff;
> -               rxlen = rxh >> 16;
> +               rxlen = (rxh >> 16) & 0xFFF;
>
>                 netif_dbg(ks, rx_status, ks->netdev,
>                           "rx: stat 0x%04x, len 0x%04x\n", rxstat,
rxlen);
> --
> 1.7.10.4
>


______________________________________________________________________
This email has been spam and virus checked by Elster IT Services.(See
attached file: patch.diff)

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 625 bytes --]

 drivers/net/ethernet/micrel/ks8851.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/micrel/ks8851.c b/drivers/net/ethernet/micrel/ks8851.c
index 33bcb63d56a2..8fb481252e2c 100644
--- a/drivers/net/ethernet/micrel/ks8851.c
+++ b/drivers/net/ethernet/micrel/ks8851.c
@@ -528,7 +528,7 @@ static void ks8851_rx_pkts(struct ks8851_net *ks)
 	for (; rxfc != 0; rxfc--) {
 		rxh = ks8851_rdreg32(ks, KS_RXFHSR);
 		rxstat = rxh & 0xffff;
-		rxlen = rxh >> 16;
+		rxlen = (rxh >> 16) & 0xfff;
 
 		netif_dbg(ks, rx_status, ks->netdev,
 			  "rx: stat 0x%04x, len 0x%04x\n", rxstat, rxlen);

^ permalink raw reply related

* Re: [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
From: Steven Rostedt @ 2013-03-29 13:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Pirko, Andy Gospodarek, David S. Miller, LKML, netdev,
	Nicolas de Pesloüan, Thomas Gleixner, Guy Streeter,
	Paul E. McKenney, stephen
In-Reply-To: <1364562082.5113.16.camel@edumazet-glaptop>

On Fri, 2013-03-29 at 06:01 -0700, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> On Fri, 2013-03-29 at 10:48 +0100, Jiri Pirko wrote:
> 
> > Hmm. I think that this might be issue introduced by:
> > commit a9b3cd7f323b2e57593e7215362a7b02fc933e3a
> > Author: Stephen Hemminger <shemminger@vyatta.com>
> > Date:   Mon Aug 1 16:19:00 2011 +0000
> > 
> >     rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER
> > 
> > 
> > Because, if rcu_dereference(dev->rx_handler) is null,
> > rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe
> > we are hitting following scenario:
> > 
> > 
> >    CPU0				CPU1
> >    ----				----
> >   			    dev->rx_handler_data = NULL
> >  rcu_read_lock()
> >  			    dev->rx_handler = NULL
> > 
> > 
> > CPU0 will see rx_handler set and yet, rx_handler_data nulled. Write
> > barrier in rcu_assign_pointer() might prevent this reorder from happening.
> > Therefore I suggest:
> > 
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 0caa38e..c16b829 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3332,8 +3332,8 @@ void netdev_rx_handler_unregister(struct net_device *dev)
> >  {
> >  
> >  	ASSERT_RTNL();
> > -	RCU_INIT_POINTER(dev->rx_handler, NULL);
> > -	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
> > +	rcu_assign_pointer(dev->rx_handler, NULL);
> > +	rcu_assign_pointer(dev->rx_handler_data, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
> >  
> > 
> 
> Nope this changes nothing at all.

Exactly! In fact, the bug triggered on an older kernel that had the
original rcu_assign_pointer()

> 
> However, we can fix the bug in a different way, if we want to avoid a
> test in fast path.
> 
> With following patch, we can make sure that a reader seeing a non NULL
> rx_handler has a guarantee to see a non NULL rx_handler_data
> 

[..]

> We can fix bug this in two ways. First is adding a test in
> bond_handle_frame() and others to check if rx_handler_data is NULL.
> 
> A second way is adding a synchronize_net() in
> netdev_rx_handler_unregister() to make sure that a rcu protected reader
> has the guarantee to see a non NULL rx_handler_data.
> 
> The second way is better as it avoids an extra test in fast path.
> 
> Reported-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jiri Pirko <jpirko@redhat.com>
> Cc: Paul E. McKenney <paulmck@us.ibm.com>
> ---
>  net/core/dev.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b13e5c7..56932a4 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3314,6 +3314,7 @@ int netdev_rx_handler_register(struct net_device *dev,
>  	if (dev->rx_handler)
>  		return -EBUSY;
>  
> +	/* Note: rx_handler_data must be set before rx_handler */
>  	rcu_assign_pointer(dev->rx_handler_data, rx_handler_data);
>  	rcu_assign_pointer(dev->rx_handler, rx_handler);
>  
> @@ -3334,6 +3335,11 @@ void netdev_rx_handler_unregister(struct net_device *dev)
>  
>  	ASSERT_RTNL();
>  	RCU_INIT_POINTER(dev->rx_handler, NULL);
> +	/* a reader seeing a non NULL rx_handler in a rcu_read_lock()
> +	 * section has a guarantee to see a non NULL rx_handler_data
> +	 * as well.
> +	 */
> +	synchronize_net();

I've thought about this too, but I wasn't sure we wanted two
synchronize_*() functions, as the caller does a synchronize as well.
That said, I think this is the more robust solution and it lets all
rx_handler() functions assume that their rx_handler_data is set. And it
removes the check from the fast path which outweighs an added
synchronization in the slow path.

Acked-by: Steven Rostedt <rostedt@goodmis.org>

Thanks!

-- Steve

>  	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
>  }
>  EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
> 

^ permalink raw reply

* [PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()
From: Eric Dumazet @ 2013-03-29 13:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Steven Rostedt, Andy Gospodarek, David S. Miller, LKML, netdev,
	Nicolas de Pesloüan, Thomas Gleixner, Guy Streeter,
	Paul E. McKenney, stephen
In-Reply-To: <20130329094856.GB1677@minipsycho.orion>

From: Eric Dumazet <edumazet@google.com>

On Fri, 2013-03-29 at 10:48 +0100, Jiri Pirko wrote:

> Hmm. I think that this might be issue introduced by:
> commit a9b3cd7f323b2e57593e7215362a7b02fc933e3a
> Author: Stephen Hemminger <shemminger@vyatta.com>
> Date:   Mon Aug 1 16:19:00 2011 +0000
> 
>     rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER
> 
> 
> Because, if rcu_dereference(dev->rx_handler) is null,
> rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe
> we are hitting following scenario:
> 
> 
>    CPU0				CPU1
>    ----				----
>   			    dev->rx_handler_data = NULL
>  rcu_read_lock()
>  			    dev->rx_handler = NULL
> 
> 
> CPU0 will see rx_handler set and yet, rx_handler_data nulled. Write
> barrier in rcu_assign_pointer() might prevent this reorder from happening.
> Therefore I suggest:
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 0caa38e..c16b829 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3332,8 +3332,8 @@ void netdev_rx_handler_unregister(struct net_device *dev)
>  {
>  
>  	ASSERT_RTNL();
> -	RCU_INIT_POINTER(dev->rx_handler, NULL);
> -	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
> +	rcu_assign_pointer(dev->rx_handler, NULL);
> +	rcu_assign_pointer(dev->rx_handler_data, NULL);
>  }
>  EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
>  
> 

Nope this changes nothing at all.

However, we can fix the bug in a different way, if we want to avoid a
test in fast path.

With following patch, we can make sure that a reader seeing a non NULL
rx_handler has a guarantee to see a non NULL rx_handler_data

Thanks

[PATCH] net: add a synchronize_net() in netdev_rx_handler_unregister()

commit 35d48903e97819 (bonding: fix rx_handler locking) added a race
in bonding driver, reported by Steven Rostedt who did a very good
diagnosis :

<quoting Steven>

I'm currently debugging a crash in an old 3.0-rt kernel that one of our
customers is seeing. The bug happens with a stress test that loads and
unloads the bonding module in a loop (I don't know all the details as
I'm not the one that is directly interacting with the customer). But the
bug looks to be something that may still be present and possibly present
in mainline too. It will just be much harder to trigger it in mainline.

In -rt, interrupts are threads, and can schedule in and out just like
any other thread. Note, mainline now supports interrupt threads so this
may be easily reproducible in mainline as well. I don't have the ability
to tell the customer to try mainline or other kernels, so my hands are
somewhat tied to what I can do.

But according to a core dump, I tracked down that the eth irq thread
crashed in bond_handle_frame() here:

        slave = bond_slave_get_rcu(skb->dev);
        bond = slave->bond; <--- BUG


the slave returned was NULL and accessing slave->bond caused a NULL
pointer dereference.

Looking at the code that unregisters the handler:

void netdev_rx_handler_unregister(struct net_device *dev)
{

        ASSERT_RTNL();
        RCU_INIT_POINTER(dev->rx_handler, NULL);
        RCU_INIT_POINTER(dev->rx_handler_data, NULL);
}

Which is basically:
        dev->rx_handler = NULL;
        dev->rx_handler_data = NULL;

And looking at __netif_receive_skb() we have:

        rx_handler = rcu_dereference(skb->dev->rx_handler);
        if (rx_handler) {
                if (pt_prev) {
                        ret = deliver_skb(skb, pt_prev, orig_dev);
                        pt_prev = NULL;
                }
                switch (rx_handler(&skb)) {

My question to all of you is, what stops this interrupt from happening
while the bonding module is unloading?  What happens if the interrupt
triggers and we have this:


        CPU0                    CPU1
        ----                    ----
  rx_handler = skb->dev->rx_handler

                        netdev_rx_handler_unregister() {
                           dev->rx_handler = NULL;
                           dev->rx_handler_data = NULL;

  rx_handler()
   bond_handle_frame() {
    slave = skb->dev->rx_handler;
    bond = slave->bond; <-- NULL pointer dereference!!!


What protection am I missing in the bond release handler that would
prevent the above from happening?

</quoting Steven>

We can fix bug this in two ways. First is adding a test in
bond_handle_frame() and others to check if rx_handler_data is NULL.

A second way is adding a synchronize_net() in
netdev_rx_handler_unregister() to make sure that a rcu protected reader
has the guarantee to see a non NULL rx_handler_data.

The second way is better as it avoids an extra test in fast path.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
---
 net/core/dev.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index b13e5c7..56932a4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3314,6 +3314,7 @@ int netdev_rx_handler_register(struct net_device *dev,
 	if (dev->rx_handler)
 		return -EBUSY;
 
+	/* Note: rx_handler_data must be set before rx_handler */
 	rcu_assign_pointer(dev->rx_handler_data, rx_handler_data);
 	rcu_assign_pointer(dev->rx_handler, rx_handler);
 
@@ -3334,6 +3335,11 @@ void netdev_rx_handler_unregister(struct net_device *dev)
 
 	ASSERT_RTNL();
 	RCU_INIT_POINTER(dev->rx_handler, NULL);
+	/* a reader seeing a non NULL rx_handler in a rcu_read_lock()
+	 * section has a guarantee to see a non NULL rx_handler_data
+	 * as well.
+	 */
+	synchronize_net();
 	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
 }
 EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);

^ permalink raw reply related

* [PATCH] cirrus: cs89x0: remove two obsolete Kconfig macros
From: Paul Bolle @ 2013-03-29 10:51 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

The CONFIG_ARCH_IXDP2X01 and CONFIG_MACH_IXDP2351 Kconfig macros are
unused since the ixp23xx and ixp2000 platforms were removed in v3.5. So
remove the last code still depending on these macros. And since
CS89x0_NONISA_IRQ was only set if either of these two macros was defined
we can also remove that macro and the code depending on it.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
---
Compile tested only, for 32 bit x86. (Which was slightly more work than
I expected, since Fedora - which I happen to run - has ISA disabled as
of Fedora 17. And I use their configuration as base for my local setups.
Do other distributions also disable ISA by default?)

 drivers/net/ethernet/cirrus/cs89x0.c | 54 +-----------------------------------
 1 file changed, 1 insertion(+), 53 deletions(-)

diff --git a/drivers/net/ethernet/cirrus/cs89x0.c b/drivers/net/ethernet/cirrus/cs89x0.c
index 1384469..8f90eca 100644
--- a/drivers/net/ethernet/cirrus/cs89x0.c
+++ b/drivers/net/ethernet/cirrus/cs89x0.c
@@ -101,23 +101,6 @@ static char version[] __initdata =
  * them to system IRQ numbers. This mapping is card specific and is set to
  * the configuration of the Cirrus Eval board for this chip.
  */
-#if defined(CONFIG_MACH_IXDP2351)
-#define CS89x0_NONISA_IRQ
-static unsigned int netcard_portlist[] __used __initdata = {
-	IXDP2351_VIRT_CS8900_BASE, 0
-};
-static unsigned int cs8900_irq_map[] = {
-	IRQ_IXDP2351_CS8900, 0, 0, 0
-};
-#elif defined(CONFIG_ARCH_IXDP2X01)
-#define CS89x0_NONISA_IRQ
-static unsigned int netcard_portlist[] __used __initdata = {
-	IXDP2X01_CS8900_VIRT_BASE, 0
-};
-static unsigned int cs8900_irq_map[] = {
-	IRQ_IXDP2X01_CS8900, 0, 0, 0
-};
-#else
 #ifndef CONFIG_CS89x0_PLATFORM
 static unsigned int netcard_portlist[] __used __initdata = {
 	0x300, 0x320, 0x340, 0x360, 0x200, 0x220, 0x240,
@@ -127,7 +110,6 @@ static unsigned int cs8900_irq_map[] = {
 	10, 11, 12, 5
 };
 #endif
-#endif
 
 #if DEBUGGING
 static unsigned int net_debug = DEBUGGING;
@@ -210,32 +192,6 @@ static int __init media_fn(char *str)
 __setup("cs89x0_media=", media_fn);
 #endif
 
-#if defined(CONFIG_MACH_IXDP2351)
-static u16
-readword(unsigned long base_addr, int portno)
-{
-	return __raw_readw(base_addr + (portno << 1));
-}
-
-static void
-writeword(unsigned long base_addr, int portno, u16 value)
-{
-	__raw_writew(value, base_addr + (portno << 1));
-}
-#elif defined(CONFIG_ARCH_IXDP2X01)
-static u16
-readword(unsigned long base_addr, int portno)
-{
-	return __raw_readl(base_addr + (portno << 1));
-}
-
-static void
-writeword(unsigned long base_addr, int portno, u16 value)
-{
-	__raw_writel(value, base_addr + (portno << 1));
-}
-#endif
-
 static void readwords(struct net_local *lp, int portno, void *buf, int length)
 {
 	u8 *buf8 = (u8 *)buf;
@@ -908,7 +864,7 @@ net_open(struct net_device *dev)
 			goto bad_out;
 		}
 	} else {
-#if !defined(CS89x0_NONISA_IRQ) && !defined(CONFIG_CS89x0_PLATFORM)
+#if !defined(CONFIG_CS89x0_PLATFORM)
 		if (((1 << dev->irq) & lp->irq_map) == 0) {
 			pr_err("%s: IRQ %d is not in our map of allowable IRQs, which is %x\n",
 			       dev->name, dev->irq, lp->irq_map);
@@ -1321,9 +1277,7 @@ static const struct net_device_ops net_ops = {
 static void __init reset_chip(struct net_device *dev)
 {
 #if !defined(CONFIG_MACH_MX31ADS)
-#if !defined(CS89x0_NONISA_IRQ)
 	struct net_local *lp = netdev_priv(dev);
-#endif /* CS89x0_NONISA_IRQ */
 	int reset_start_time;
 
 	writereg(dev, PP_SelfCTL, readreg(dev, PP_SelfCTL) | POWER_ON_RESET);
@@ -1331,7 +1285,6 @@ static void __init reset_chip(struct net_device *dev)
 	/* wait 30 ms */
 	msleep(30);
 
-#if !defined(CS89x0_NONISA_IRQ)
 	if (lp->chip_type != CS8900) {
 		/* Hardware problem requires PNP registers to be reconfigured after a reset */
 		iowrite16(PP_CS8920_ISAINT, lp->virt_addr + ADD_PORT);
@@ -1344,7 +1297,6 @@ static void __init reset_chip(struct net_device *dev)
 		iowrite8((dev->mem_start >> 8) & 0xff,
 			 lp->virt_addr + DATA_PORT + 1);
 	}
-#endif /* CS89x0_NONISA_IRQ */
 
 	/* Wait until the chip is reset */
 	reset_start_time = jiffies;
@@ -1579,9 +1531,6 @@ cs89x0_probe1(struct net_device *dev, void __iomem *ioaddr, int modular)
 		i = lp->isa_config & INT_NO_MASK;
 #ifndef CONFIG_CS89x0_PLATFORM
 		if (lp->chip_type == CS8900) {
-#ifdef CS89x0_NONISA_IRQ
-			i = cs8900_irq_map[0];
-#else
 			/* Translate the IRQ using the IRQ mapping table. */
 			if (i >= ARRAY_SIZE(cs8900_irq_map))
 				pr_err("invalid ISA interrupt number %d\n", i);
@@ -1599,7 +1548,6 @@ cs89x0_probe1(struct net_device *dev, void __iomem *ioaddr, int modular)
 					lp->irq_map = ((irq_map_buff[0] >> 8) |
 						       (irq_map_buff[1] << 8));
 			}
-#endif
 		}
 #endif
 		if (!dev->irq)
-- 
1.7.11.7

^ permalink raw reply related

* [PATCH net-next 1/1] MAINTAINERS: Update netxen_nic maintainers list
From: Manish Chopra @ 2013-03-29  9:54 UTC (permalink / raw)
  To: davem; +Cc: netdev, Dept_NX_Linux_NIC_Driver

o Add myself to netxen_nic maintainers list

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
---
 MAINTAINERS |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d32cb8d..3082c49 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5563,6 +5563,7 @@ F:	include/uapi/linux/if_*
 F:	include/uapi/linux/netdevice.h
 
 NETXEN (1/10) GbE SUPPORT
+M:	Manish Chopra <manish.chopra@qlogic.com>
 M:	Sony Chacko <sony.chacko@qlogic.com>
 M:	Rajesh Borundia <rajesh.borundia@qlogic.com>
 L:	netdev@vger.kernel.org
-- 
1.7.1

^ permalink raw reply related

* Re: [BUG] Crash with NULL pointer dereference in bond_handle_frame in -rt (possibly mainline)
From: Jiri Pirko @ 2013-03-29  9:48 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Steven Rostedt, Andy Gospodarek, David S. Miller, LKML, netdev,
	Nicolas de Pesloüan, Thomas Gleixner, Guy Streeter,
	Paul E. McKenney, stephen
In-Reply-To: <1364491792.15753.47.camel@edumazet-glaptop>

Thu, Mar 28, 2013 at 06:29:52PM CET, eric.dumazet@gmail.com wrote:
>On Thu, 2013-03-28 at 13:16 -0400, Steven Rostedt wrote:
>> Hi,
>> 
>> I'm currently debugging a crash in an old 3.0-rt kernel that one of our
>> customers is seeing. The bug happens with a stress test that loads and
>> unloads the bonding module in a loop (I don't know all the details as
>> I'm not the one that is directly interacting with the customer). But the
>> bug looks to be something that may still be present and possibly present
>> in mainline too. It will just be much harder to trigger it in mainline.
>> 
>> In -rt, interrupts are threads, and can schedule in and out just like
>> any other thread. Note, mainline now supports interrupt threads so this
>> may be easily reproducible in mainline as well. I don't have the ability
>> to tell the customer to try mainline or other kernels, so my hands are
>> somewhat tied to what I can do.
>> 
>> But according to a core dump, I tracked down that the eth irq thread
>> crashed in bond_handle_frame() here:
>> 
>> 	slave = bond_slave_get_rcu(skb->dev);
>> 	bond = slave->bond; <--- BUG
>> 
>> 
>> the slave returned was NULL and accessing slave->bond caused a NULL
>> pointer dereference.
>> 
>> Looking at the code that unregisters the handler:
>> 
>> void netdev_rx_handler_unregister(struct net_device *dev)
>> {
>> 
>>         ASSERT_RTNL();
>>         RCU_INIT_POINTER(dev->rx_handler, NULL);
>>         RCU_INIT_POINTER(dev->rx_handler_data, NULL);
>> }
>> 
>> Which is basically:
>> 	dev->rx_handler = NULL;
>> 	dev->rx_handler_data = NULL;
>> 
>> And looking at __netif_receive_skb() we have:
>> 
>>         rx_handler = rcu_dereference(skb->dev->rx_handler);
>>         if (rx_handler) {
>>                 if (pt_prev) {
>>                         ret = deliver_skb(skb, pt_prev, orig_dev);
>>                         pt_prev = NULL;
>>                 }
>>                 switch (rx_handler(&skb)) {
>> 
>> My question to all of you is, what stops this interrupt from happening
>> while the bonding module is unloading?  What happens if the interrupt
>> triggers and we have this:
>> 
>> 
>> 	CPU0			CPU1
>> 	----			----
>>   rx_handler = skb->dev->rx_handler
>> 
>> 			netdev_rx_handler_unregister() {
>> 			   dev->rx_handler = NULL;
>> 			   dev->rx_handler_data = NULL;
>> 
>>   rx_handler()
>>    bond_handle_frame() {
>>     slave = skb->dev->rx_handler;
>>     bond = slave->bond; <-- NULL pointer dereference!!!
>> 
>> 
>> What protection am I missing in the bond release handler that would
>> prevent the above from happening?


Hmm. I think that this might be issue introduced by:
commit a9b3cd7f323b2e57593e7215362a7b02fc933e3a
Author: Stephen Hemminger <shemminger@vyatta.com>
Date:   Mon Aug 1 16:19:00 2011 +0000

    rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER


Because, if rcu_dereference(dev->rx_handler) is null,
rcu_dereference(dev->rx_handler_data) is never done. Therefore I believe
we are hitting following scenario:


   CPU0				CPU1
   ----				----
  			    dev->rx_handler_data = NULL
 rcu_read_lock()
 			    dev->rx_handler = NULL


CPU0 will see rx_handler set and yet, rx_handler_data nulled. Write
barrier in rcu_assign_pointer() might prevent this reorder from happening.
Therefore I suggest:

diff --git a/net/core/dev.c b/net/core/dev.c
index 0caa38e..c16b829 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3332,8 +3332,8 @@ void netdev_rx_handler_unregister(struct net_device *dev)
 {
 
 	ASSERT_RTNL();
-	RCU_INIT_POINTER(dev->rx_handler, NULL);
-	RCU_INIT_POINTER(dev->rx_handler_data, NULL);
+	rcu_assign_pointer(dev->rx_handler, NULL);
+	rcu_assign_pointer(dev->rx_handler_data, NULL);
 }
 EXPORT_SYMBOL_GPL(netdev_rx_handler_unregister);
 

>
>Nothing :(
>
>bug introduced in commit 35d48903e9781975e823b359ee85c257c9ff5c1c
>(bonding: fix rx_handler locking)
>
>CC Jiri
>
>Fix seems simple :
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 6bbd90e..7956ca5 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1457,6 +1457,8 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
> 	*pskb = skb;
> 
> 	slave = bond_slave_get_rcu(skb->dev);
>+	if (!slave)
>+		return ret;
> 	bond = slave->bond;
> 
> 	if (bond->params.arp_interval)
>
>
>

^ permalink raw reply related

* [PATCH 1/1] DM9000B: driver initialization upgrade
From: Joseph CHANG @ 2013-03-29  9:13 UTC (permalink / raw)
  To: David S. Miller, Bill Pemberton, Matthew Leach,
	Greg Kroah-Hartman, Joseph CHANG, Jiri Pirko, netdev
  Cc: linux-kernel, Joseph CHANG

Fix bug for DM9000 revision B which contain a DSP PHY

DM9000B use DSP PHY instead previouse DM9000 revisions' analog PHY,
So need extra change in initialization, For
explicity PHY Reset and PHY init parameter, and
first DM9000_NCR reset need NCR_MAC_LBK bit by dm9000_probe().

Following DM9000_NCR reset cause by dm9000_open() clear the
NCR_MAC_LBK bit.

Without this fix, Power-up FIFO pointers error happen around 2%
rate among Davicom's customers' boards. With this fix, All above
cases can be solved.

Signed-off-by: Joseph CHANG <josright123@gmail.com>
---
 drivers/net/ethernet/davicom/dm9000.c |  214 +++++++++++++++++----------------
 drivers/net/ethernet/davicom/dm9000.h |   11 ++-
 2 files changed, 120 insertions(+), 105 deletions(-)

diff --git a/drivers/net/ethernet/davicom/dm9000.c b/drivers/net/ethernet/davicom/dm9000.c
index 8cdf025..9eada8e 100644
--- a/drivers/net/ethernet/davicom/dm9000.c
+++ b/drivers/net/ethernet/davicom/dm9000.c
@@ -257,6 +257,107 @@ static void dm9000_dumpblk_32bit(void __iomem *reg, int count)
 		tmp = readl(reg);
 }
 
+/*
+ * Sleep, either by using msleep() or if we are suspending, then
+ * use mdelay() to sleep.
+ */
+static void dm9000_msleep(board_info_t *db, unsigned int ms)
+{
+	if (db->in_suspend)
+		mdelay(ms);
+	else
+		msleep(ms);
+}
+
+/* Read a word from phyxcer */
+static int
+dm9000_phy_read(struct net_device *dev, int phy_reg_unused, int reg)
+{
+	board_info_t *db = netdev_priv(dev);
+	unsigned long flags;
+	unsigned int reg_save;
+	int ret;
+
+	mutex_lock(&db->addr_lock);
+
+	spin_lock_irqsave(&db->lock, flags);
+
+	/* Save previous register address */
+	reg_save = readb(db->io_addr);
+
+	/* Fill the phyxcer register into REG_0C */
+	iow(db, DM9000_EPAR, DM9000_PHY | reg);
+
+	/* Issue phyxcer read command */
+	iow(db, DM9000_EPCR, EPCR_ERPRR | EPCR_EPOS);
+
+	writeb(reg_save, db->io_addr);
+	spin_unlock_irqrestore(&db->lock, flags);
+
+	dm9000_msleep(db, 1);		/* Wait read complete */
+
+	spin_lock_irqsave(&db->lock, flags);
+	reg_save = readb(db->io_addr);
+
+	iow(db, DM9000_EPCR, 0x0);	/* Clear phyxcer read command */
+
+	/* The read data keeps on REG_0D & REG_0E */
+	ret = (ior(db, DM9000_EPDRH) << 8) | ior(db, DM9000_EPDRL);
+
+	/* restore the previous address */
+	writeb(reg_save, db->io_addr);
+	spin_unlock_irqrestore(&db->lock, flags);
+
+	mutex_unlock(&db->addr_lock);
+
+	dm9000_dbg(db, 5, "phy_read[%02x] -> %04x\n", reg, ret);
+	return ret;
+}
+
+/* Write a word to phyxcer */
+static void
+dm9000_phy_write(struct net_device *dev,
+		 int phyaddr_unused, int reg, int value)
+{
+	board_info_t *db = netdev_priv(dev);
+	unsigned long flags;
+	unsigned long reg_save;
+
+	dm9000_dbg(db, 5, "phy_write[%02x] = %04x\n", reg, value);
+	mutex_lock(&db->addr_lock);
+
+	spin_lock_irqsave(&db->lock, flags);
+
+	/* Save previous register address */
+	reg_save = readb(db->io_addr);
+
+	/* Fill the phyxcer register into REG_0C */
+	iow(db, DM9000_EPAR, DM9000_PHY | reg);
+
+	/* Fill the written data into REG_0D & REG_0E */
+	iow(db, DM9000_EPDRL, value);
+	iow(db, DM9000_EPDRH, value >> 8);
+
+	/* Issue phyxcer write command */
+	iow(db, DM9000_EPCR, EPCR_EPOS | EPCR_ERPRW);
+
+	writeb(reg_save, db->io_addr);
+	spin_unlock_irqrestore(&db->lock, flags);
+
+	dm9000_msleep(db, 1);		/* Wait write complete */
+
+	spin_lock_irqsave(&db->lock, flags);
+	reg_save = readb(db->io_addr);
+
+	iow(db, DM9000_EPCR, 0x0);	/* Clear phyxcer write command */
+
+	/* restore the previous address */
+	writeb(reg_save, db->io_addr);
+
+	spin_unlock_irqrestore(&db->lock, flags);
+	mutex_unlock(&db->addr_lock);
+}
+
 /* dm9000_set_io
  *
  * select the specified set of io routines to use with the
@@ -795,6 +896,9 @@ dm9000_init_dm9000(struct net_device *dev)
 
 	iow(db, DM9000_GPCR, GPCR_GEP_CNTL);	/* Let GPIO0 output */
 
+	dm9000_phy_write(dev, 0, MII_BMCR, BMCR_RESET); /* PHY RESET */
+	dm9000_phy_write(dev, 0, MII_DM_DSPCR, DSPCR_INIT_PARAM); /* Init */
+
 	ncr = (db->flags & DM9000_PLATF_EXT_PHY) ? NCR_EXT_PHY : 0;
 
 	/* if wol is needed, then always set NCR_WAKEEN otherwise we end
@@ -1201,109 +1305,6 @@ dm9000_open(struct net_device *dev)
 	return 0;
 }
 
-/*
- * Sleep, either by using msleep() or if we are suspending, then
- * use mdelay() to sleep.
- */
-static void dm9000_msleep(board_info_t *db, unsigned int ms)
-{
-	if (db->in_suspend)
-		mdelay(ms);
-	else
-		msleep(ms);
-}
-
-/*
- *   Read a word from phyxcer
- */
-static int
-dm9000_phy_read(struct net_device *dev, int phy_reg_unused, int reg)
-{
-	board_info_t *db = netdev_priv(dev);
-	unsigned long flags;
-	unsigned int reg_save;
-	int ret;
-
-	mutex_lock(&db->addr_lock);
-
-	spin_lock_irqsave(&db->lock,flags);
-
-	/* Save previous register address */
-	reg_save = readb(db->io_addr);
-
-	/* Fill the phyxcer register into REG_0C */
-	iow(db, DM9000_EPAR, DM9000_PHY | reg);
-
-	iow(db, DM9000_EPCR, EPCR_ERPRR | EPCR_EPOS);	/* Issue phyxcer read command */
-
-	writeb(reg_save, db->io_addr);
-	spin_unlock_irqrestore(&db->lock,flags);
-
-	dm9000_msleep(db, 1);		/* Wait read complete */
-
-	spin_lock_irqsave(&db->lock,flags);
-	reg_save = readb(db->io_addr);
-
-	iow(db, DM9000_EPCR, 0x0);	/* Clear phyxcer read command */
-
-	/* The read data keeps on REG_0D & REG_0E */
-	ret = (ior(db, DM9000_EPDRH) << 8) | ior(db, DM9000_EPDRL);
-
-	/* restore the previous address */
-	writeb(reg_save, db->io_addr);
-	spin_unlock_irqrestore(&db->lock,flags);
-
-	mutex_unlock(&db->addr_lock);
-
-	dm9000_dbg(db, 5, "phy_read[%02x] -> %04x\n", reg, ret);
-	return ret;
-}
-
-/*
- *   Write a word to phyxcer
- */
-static void
-dm9000_phy_write(struct net_device *dev,
-		 int phyaddr_unused, int reg, int value)
-{
-	board_info_t *db = netdev_priv(dev);
-	unsigned long flags;
-	unsigned long reg_save;
-
-	dm9000_dbg(db, 5, "phy_write[%02x] = %04x\n", reg, value);
-	mutex_lock(&db->addr_lock);
-
-	spin_lock_irqsave(&db->lock,flags);
-
-	/* Save previous register address */
-	reg_save = readb(db->io_addr);
-
-	/* Fill the phyxcer register into REG_0C */
-	iow(db, DM9000_EPAR, DM9000_PHY | reg);
-
-	/* Fill the written data into REG_0D & REG_0E */
-	iow(db, DM9000_EPDRL, value);
-	iow(db, DM9000_EPDRH, value >> 8);
-
-	iow(db, DM9000_EPCR, EPCR_EPOS | EPCR_ERPRW);	/* Issue phyxcer write command */
-
-	writeb(reg_save, db->io_addr);
-	spin_unlock_irqrestore(&db->lock, flags);
-
-	dm9000_msleep(db, 1);		/* Wait write complete */
-
-	spin_lock_irqsave(&db->lock,flags);
-	reg_save = readb(db->io_addr);
-
-	iow(db, DM9000_EPCR, 0x0);	/* Clear phyxcer write command */
-
-	/* restore the previous address */
-	writeb(reg_save, db->io_addr);
-
-	spin_unlock_irqrestore(&db->lock, flags);
-	mutex_unlock(&db->addr_lock);
-}
-
 static void
 dm9000_shutdown(struct net_device *dev)
 {
@@ -1502,7 +1503,12 @@ dm9000_probe(struct platform_device *pdev)
 	db->flags |= DM9000_PLATF_SIMPLE_PHY;
 #endif
 
-	dm9000_reset(db);
+	/* Fixing bug on dm9000_probe, takeover dm9000_reset(db),
+	 * Need 'NCR_MAC_LBK' bit to indeed stable our DM9000 fifo
+	 * while probe stage.
+	 */
+
+	iow(db, DM9000_NCR, NCR_MAC_LBK | NCR_RST);
 
 	/* try multiple times, DM9000 sometimes gets the read wrong */
 	for (i = 0; i < 8; i++) {
diff --git a/drivers/net/ethernet/davicom/dm9000.h b/drivers/net/ethernet/davicom/dm9000.h
index 55688bd..9ce058a 100644
--- a/drivers/net/ethernet/davicom/dm9000.h
+++ b/drivers/net/ethernet/davicom/dm9000.h
@@ -69,7 +69,9 @@
 #define NCR_WAKEEN          (1<<6)
 #define NCR_FCOL            (1<<4)
 #define NCR_FDX             (1<<3)
-#define NCR_LBK             (3<<1)
+
+#define NCR_RESERVED        (3<<1)
+#define NCR_MAC_LBK         (1<<1)
 #define NCR_RST	            (1<<0)
 
 #define NSR_SPEED           (1<<7)
@@ -167,5 +169,12 @@
 #define ISR_LNKCHNG		(1<<5)
 #define ISR_UNDERRUN		(1<<4)
 
+/* Davicom MII registers.
+ */
+
+#define MII_DM_DSPCR		0x1b    /* DSP Control Register */
+
+#define DSPCR_INIT_PARAM	0xE100	/* DSP init parameter */
+
 #endif /* _DM9000X_H_ */
 
-- 
1.7.1

^ permalink raw reply related

* RE: r8169 auto speed down issue
From: hayeswang @ 2013-03-29  8:20 UTC (permalink / raw)
  To: 'Francois Romieu'
  Cc: netdev, linux-kernel, bowgotsai, 'Ryankao'
In-Reply-To: <20130329072056.GA31269@electric-eye.fr.zoreil.com>

Francois Romieu [mailto:romieu@fr.zoreil.com] 
> Sent: Friday, March 29, 2013 3:21 PM
> To: Hayeswang
> Cc: netdev@vger.kernel.org; linux-kernel@vger.kernel.org; 
> bowgotsai@google.com; Ryankao
> Subject: Re: r8169 auto speed down issue
> 
[...]
> 
> I don't get your point. Can you reformulate ?

Sorry for my unclear descriptor. I just think a case that the nic suspends or
shutdowns without cable plugging. Then, the cable is plugged again. If the nic
speed down to 10M and the link partner force 100M, the issue appears again. If
the nic doesn't speed down for normal link partner, it requires more power when
the linking recovers. Finally, I determine to set the speed to 10M when the link
partner supports 10M. And for the other case, setting the speed to 100M. This
avoids the giga nic to keep the speed to 1000M, and could fix this issue.
However, I wonder if there is a switch which forces the speed to giga.

Best Regards,
Hayes

^ permalink raw reply

* [PATCH] phy: Elimination the forced speed reduction algorithm.
From: Kirill Kapranov @ 2013-03-26 13:34 UTC (permalink / raw)
  To: netdev
  Cc: davem, bhutchings, peppe.cavallaro, joe, bruce.w.allan,
	linux-kernel, Kirill Kapranov

In case of fixed speed set up for a NIC (e.g. ethtool -s eth0 autoneg off speed
100 duplex full) with an ethernet cable plugged off, the mentioned algorithm
slows down a NIC speed, so further cable hook-up leads to nonoperable link state.

Signed-off-by: Kirill Kapranov <kapranoff@inbox.ru>
---
 drivers/net/phy/phy.c |   50 +------------------------------------------------
 1 files changed, 1 insertions(+), 49 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index ef9ea92..fc2dd94 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -463,33 +463,6 @@ void phy_stop_machine(struct phy_device *phydev)
 }
 
 /**
- * phy_force_reduction - reduce PHY speed/duplex settings by one step
- * @phydev: target phy_device struct
- *
- * Description: Reduces the speed/duplex settings by one notch,
- *   in this order--
- *   1000/FULL, 1000/HALF, 100/FULL, 100/HALF, 10/FULL, 10/HALF.
- *   The function bottoms out at 10/HALF.
- */
-static void phy_force_reduction(struct phy_device *phydev)
-{
-	int idx;
-
-	idx = phy_find_setting(phydev->speed, phydev->duplex);
-	
-	idx++;
-
-	idx = phy_find_valid(idx, phydev->supported);
-
-	phydev->speed = settings[idx].speed;
-	phydev->duplex = settings[idx].duplex;
-
-	pr_info("Trying %d/%s\n",
-		phydev->speed, DUPLEX_FULL == phydev->duplex ? "FULL" : "HALF");
-}
-
-
-/**
  * phy_error - enter HALTED state for this PHY device
  * @phydev: target phy_device struct
  *
@@ -818,30 +791,11 @@ void phy_state_machine(struct work_struct *work)
 				phydev->adjust_link(phydev->attached_dev);
 
 			} else if (0 == phydev->link_timeout--) {
-				int idx;
-
 				needs_aneg = 1;
 				/* If we have the magic_aneg bit,
 				 * we try again */
 				if (phydev->drv->flags & PHY_HAS_MAGICANEG)
 					break;
-
-				/* The timer expired, and we still
-				 * don't have a setting, so we try
-				 * forcing it until we find one that
-				 * works, starting from the fastest speed,
-				 * and working our way down */
-				idx = phy_find_valid(0, phydev->supported);
-
-				phydev->speed = settings[idx].speed;
-				phydev->duplex = settings[idx].duplex;
-
-				phydev->autoneg = AUTONEG_DISABLE;
-
-				pr_info("Trying %d/%s\n",
-					phydev->speed,
-					DUPLEX_FULL == phydev->duplex ?
-					"FULL" : "HALF");
 			}
 			break;
 		case PHY_NOLINK:
@@ -866,10 +820,8 @@ void phy_state_machine(struct work_struct *work)
 				phydev->state = PHY_RUNNING;
 				netif_carrier_on(phydev->attached_dev);
 			} else {
-				if (0 == phydev->link_timeout--) {
-					phy_force_reduction(phydev);
+				if (0 == phydev->link_timeout--) 
 					needs_aneg = 1;
-				}
 			}
 
 			phydev->adjust_link(phydev->attached_dev);
-- 
1.7.2.5

^ permalink raw reply related

* Re: r8169 auto speed down issue
From: Francois Romieu @ 2013-03-29  7:20 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev, linux-kernel, bowgotsai, 'Ryankao'
In-Reply-To: <83FC3118294143DC81063B29EF570FC1@realtek.com.tw>

hayeswang <hayeswang@realtek.com> :
> Francois Romieu [mailto:romieu@fr.zoreil.com] 
> [...]
> > Your description suggests that testing against the link 
> > partner ability to work at 10M instead of testing for
                                   ^^^^^^^^^^ -> "and"
> > tp->link_ok could be good enough.

As a policy we may assume that capabilities of the link partner won't
change after a link loss event - and there is currently no way we can
change this policy - but it won't always work as expected (especially
if "expected == automagically").

[...]
> Furthermore, should it not speed down without linking, even though the cable
> would be plugged after suspending or shutdowning?

I don't get your point. Can you reformulate ?

-- 
Ueimor

^ permalink raw reply

* Re: [net-next PATCH] [RFC] [v2] net: add option to enable error queue packets waking select
From: Richard Cochran @ 2013-03-29  7:20 UTC (permalink / raw)
  To: Jacob Keller; +Cc: netdev, Jeffrey Kirsher, Matthew Vick
In-Reply-To: <20130328211925.7644.15781.stgit@jekeller-hub.jf.intel.com>

On Thu, Mar 28, 2013 at 02:19:25PM -0700, Jacob Keller wrote:
> Currently, when a socket receives something on the error queue it only wakes up
> the socket on select if it is in the "read" list, that is the socket has
> something to read. It is useful also to wake the socket if it is in the error
> list, which would enable software to wait on error queue packets without waking
> up for regular data on the socket. The main use case is for receiving
> timestamped transmit packets which return the timestamp to the socket via the
> error queue. This enables an application to select on the socket for the error
> queue only instead of for the regular traffic.

I would fully support having this kind of ability. As it stands now,
the SO_TIMESTAMPING interface makes it very inconvenient for the
application to obtain transmit time stamps.

As a concrete example, take the ptp4l [1] program. After sending a
packet for which a time stamp is expected, we call recvmsg() on the
MSG_ERRQUEUE repeatedly, but give up after a certain number of
tries. The repetitions are needed because the driver may delay the
time stamped packet for quite some time, and the giving up is required
because it is possible for time stamps to be lost or dropped.

Having the ability to poll/select the error queue would open more
attractive possibilities for applications. Instead of a cheesy retry
loop, just poll/select the error queue with a reasonable timeout
value.

Thanks,
Richard

1. http://linuxptp.sourceforge.net/
   file: sk.c
   line: 199
   function: sk_receive()

^ permalink raw reply

* [PATCH] r8169: fix auto speed down issue
From: Hayes Wang @ 2013-03-29  7:11 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, bowgotsai, Hayes Wang

It would cause no link after suspending or shutdowning when the
nic changes the speed to 10M and connects to a link partner which
forces the speed to 100M.

Check the link partner ability to determine if to change the speed
to 10M when suspending or shutdowning. Regardless of keeping the
speed to giga for power saving.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/ethernet/realtek/r8169.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 28fb50a..a9eedf7 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -3818,6 +3818,21 @@ static void rtl_init_mdio_ops(struct rtl8169_private *tp)
 	}
 }
 
+static void rtl_speed_down(struct rtl8169_private *tp)
+{
+	u32	adv;
+	int	lpa;
+
+	rtl_writephy(tp, 0x1f, 0x0000);
+	lpa = rtl_readphy(tp, MII_LPA);
+
+	adv = ADVERTISED_10baseT_Half | ADVERTISED_10baseT_Full;
+	if (!(lpa & (ADVERTISE_10HALF | ADVERTISE_10FULL)))
+		adv |= ADVERTISED_100baseT_Half | ADVERTISED_100baseT_Full;
+
+	rtl8169_set_speed(tp->dev, AUTONEG_ENABLE, SPEED_100, DUPLEX_FULL, adv);
+}
+
 static void rtl_wol_suspend_quirk(struct rtl8169_private *tp)
 {
 	void __iomem *ioaddr = tp->mmio_addr;
@@ -3848,9 +3863,7 @@ static bool rtl_wol_pll_power_down(struct rtl8169_private *tp)
 	if (!(__rtl8169_get_wol(tp) & WAKE_ANY))
 		return false;
 
-	rtl_writephy(tp, 0x1f, 0x0000);
-	rtl_writephy(tp, MII_BMCR, 0x0000);
-
+	rtl_speed_down(tp);
 	rtl_wol_suspend_quirk(tp);
 
 	return true;
-- 
1.8.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox