Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: enh @ 2010-05-04  3:58 UTC (permalink / raw)
  To: Brian Haley; +Cc: netdev
In-Reply-To: <4BDF8387.4000303@hp.com>

On Mon, May 3, 2010 at 19:16, Brian Haley <brian.haley@hp.com> wrote:
> enh wrote:
>> RFC 3493 (http://tools.ietf.org/rfc/rfc3493.txt) says:
>>
>>       IPV6_MULTICAST_HOPS
>>
>>          Set the hop limit to use for outgoing multicast packets.  (Note
>>          a separate option - IPV6_UNICAST_HOPS - is provided to set the
>>          hop limit to use for outgoing unicast packets.)
>>
>>          The interpretation of the argument is the same as for the
>>          IPV6_UNICAST_HOPS option:
>>
>>             x < -1:        return an error of EINVAL
>>             x == -1:       use kernel default
>>             0 <= x <= 255: use x
>>             x >= 256:      return an error of EINVAL
>>
>>             If IPV6_MULTICAST_HOPS is not set, the default is 1
>>             (same as IPv4 today)
>>
>>          Argument type: int
>>
>> but if i create a socket and call getsockopt, i get 64, not 1. this
>> happens both on Android (2.6.32) and on Ubuntu 8.04 (2.6.24).
>
> <snip>
>
>> is this a bug? is this the right place to report it? thanks!
>
> It looks like a bug to me, feel free to send along a patch :)

a grep for IPV6_DEFAULT_MCASTHOPS suggests it isn't used:

http://git.kernel.org/?p=linux%2Fkernel%2Fgit%2Fnext%2Flinux-next.git&a=search&h=HEAD&st=grep&s=IPV6_DEFAULT_MCASTHOPS

i assumed IPV6_DEFAULT_HOPLIMIT (the unicast hop limit) was being used
by accident where IPV6_DEFAULT_MCASTHOPS should be used.

looking at net/ipv6/ipv6_sockglue.c, i see that getsockopt for both
unicast and multicast hop limits defaults to the device's hop_limit:

1109         case IPV6_UNICAST_HOPS:
1110         case IPV6_MULTICAST_HOPS:
1111         {
1112                 struct dst_entry *dst;
1113
1114                 if (optname == IPV6_UNICAST_HOPS)
1115                         val = np->hop_limit;
1116                 else
1117                         val = np->mcast_hops;
1118
1119                 if (val < 0) {
1120                         rcu_read_lock();
1121                         dst = __sk_dst_get(sk);
1122                         if (dst)
1123                                 val = ip6_dst_hoplimit(dst);
1124                         rcu_read_unlock();
1125                 }
1126
1127                 if (val < 0)
1128                         val = sock_net(sk)->ipv6.devconf_all->hop_limit;
1129                 break;
1130         }

and look how net/ipv6/af_inet6.c initializes the two fields:

 202         np->hop_limit   = -1;
 203         np->mcast_hops  = -1;

so the easiest fix would be to change net/ipv6/af_inet6.c to:

 202         np->hop_limit   = -1; /* Use the configured device default. */
 203         np->mcast_hops  = IPV6_DEFAULT_MCASTHOPS; /* Use RFC 3493
default. */

userspace programmers still have the ability to ask for the device's
default by calling setsockopt with the value -1 (as mentioned in the
RFC). in practice, i'd imagine anyone who actually wanted to use that
feature would want a separate tunable from the existing unicast one.

> -Brian
>
>



-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/

^ permalink raw reply

* [PATCH] ethernet: add sanity check before memory dereferencing
From: Changli Gao @ 2010-05-04  3:33 UTC (permalink / raw)
  To: David S. Miller; +Cc: Eric Dumazet, netdev, Changli Gao

add sanity check before memory dereferencing

Some callers of eth_type_trans() only can assure the length of the packets
passed to it is not less than ETH_HLEN. We'd better check the packets length
before dereferencing skb->data.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/ethernet/eth.c |    5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 61ec032..215c839 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -158,7 +158,6 @@ EXPORT_SYMBOL(eth_rebuild_header);
 __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ethhdr *eth;
-	unsigned char *rawp;
 
 	skb->dev = dev;
 	skb_reset_mac_header(skb);
@@ -199,15 +198,13 @@ __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev)
 	if (ntohs(eth->h_proto) >= 1536)
 		return eth->h_proto;
 
-	rawp = skb->data;
-
 	/*
 	 *      This is a magic hack to spot IPX packets. Older Novell breaks
 	 *      the protocol design and runs IPX over 802.3 without an 802.2 LLC
 	 *      layer. We look for FFFF which isn't a used 802.2 SSAP/DSAP. This
 	 *      won't work for fault tolerant netware but does for the rest.
 	 */
-	if (*(unsigned short *)rawp == 0xFFFF)
+	if (skb->len >= 2 && *(unsigned short *)(skb->data) == 0xFFFF)
 		return htons(ETH_P_802_3);
 
 	/*

^ permalink raw reply related

* Re: OOP in ip_cmsg_recv (net-next)
From: Eric Dumazet @ 2010-05-04  4:43 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, netdev
In-Reply-To: <20100503.152351.181464355.davem@davemloft.net>

Le lundi 03 mai 2010 à 15:23 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 03 May 2010 19:21:09 +0200
> 
> >  
> > -	/* skb is now orphaned, might be freed outside of locked section */
> > -	consume_skb(skb);
> > +	/* skb is now orphaned, can be freed outside of locked section */
> > +	__kfree_skb(skb);
> >  }
> >  EXPORT_SYMBOL(skb_free_datagram_locked);
> 
> Eric, if you do this you undo the utility of the SKB packet drop tracing
> that Neil wrote.
> 
> consome_skb() says that the application actually took in the packet and
> we didn't drop it due to some error or similar.
> 
> Whereas __kfree_skb() is going to be tagged as a packet drop and the
> data didn't reach the application.
> 
> So if you need to use __kfree_skb() to fix this you'll need to somehow
> add some appropriate annotations for the tracer.  Perhaps add a
> __consume_skb() that is marked for the tracing stuff and does what
> you need.
> --

David, if I am not mistaken (not thea yet for me this early morning) the
tracer you mention is included in kfree_skb(), not in __kfree_skb() :

void kfree_skb(struct sk_buff *skb)
{
        if (unlikely(!skb))
                return;
        if (likely(atomic_read(&skb->users) == 1))
                smp_rmb();
        else if (likely(!atomic_dec_and_test(&skb->users)))
                return;
        trace_kfree_skb(skb, __builtin_return_address(0));
        __kfree_skb(skb);
}
EXPORT_SYMBOL(kfree_skb);



I only copied part of consume_skb() which doesnt call
trace_kfree_skb() :

void consume_skb(struct sk_buff *skb)
{
        if (unlikely(!skb))
                return;
        if (likely(atomic_read(&skb->users) == 1))
                smp_rmb();
        else if (likely(!atomic_dec_and_test(&skb->users)))
                return;
        __kfree_skb(skb);
}
EXPORT_SYMBOL(consume_skb);


So I believe my second patch is a bit better : We dont even lock the
socket in the (rare) case we should not orphan the skb ;)

We keep the two slab calls outside of sock lock, so we keep sock locked
for a very very short time period (remember we now use lock_sock_bh() :
producers now might spin on the lock instead of queueing packet in
backlog)

Thanks !

[PATCH net-next-2.6] net: skb_free_datagram_locked() fix

Commit 4b0b72f7dd617b ( net: speedup udp receive path )
introduced a bug in skb_free_datagram_locked().

We should not skb_orphan() skb if we dont have the guarantee we are the
last skb user, this might happen with MSG_PEEK concurrent users.

To keep socket locked for the smallest period of time, we split
consume_skb() logic, inlined in skb_free_datagram_locked()

Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/datagram.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index 95b851f..e009753 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -229,13 +229,18 @@ EXPORT_SYMBOL(skb_free_datagram);
 
 void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
 {
+	if (likely(atomic_read(&skb->users) == 1))
+		smp_rmb();
+	else if (likely(!atomic_dec_and_test(&skb->users)))
+		return;
+
 	lock_sock_bh(sk);
 	skb_orphan(skb);
 	sk_mem_reclaim_partial(sk);
 	unlock_sock_bh(sk);
 
-	/* skb is now orphaned, might be freed outside of locked section */
-	consume_skb(skb);
+	/* skb is now orphaned, can be freed outside of locked section */
+	__kfree_skb(skb);
 }
 EXPORT_SYMBOL(skb_free_datagram_locked);
 



^ permalink raw reply related

* Re: Question about vlans, bonding, etc.
From: Eric Dumazet @ 2010-05-04  4:48 UTC (permalink / raw)
  To: George B.; +Cc: netdev
In-Reply-To: <i2tb65cae941005031706o19a6f1e9zc86b33d73462113f@mail.gmail.com>

Le lundi 03 mai 2010 à 17:06 -0700, George B. a écrit :
> Watching the "Receive issues with bonding and vlans" thread brought a
> question to mind.  In what order should things be done for best
> performance?
> 
> For example, say I have a pair of ethernet interfaces.  Do I slave the
> ethernet interfaces to the bond device and then make the vlans on the
> bond devices?
> Or do I make the vlans on the ethernet devices and then bond the vlan
> interfaces?
> 
> In the first case I would have:
> 
> 
> 
> bond0.3--|     |------eth0
>              bond0
> bond0.5--|     |------eth1
> 
> The second case would be:
> 
>       |------------------eth0.5-----|
>       |          |-------eth0.3---eth0
> bond0  bond1
>       |          |-------eth1.3---eth1
>       |------------------eth1.5-----|
> 
> I am using he first method currently as it seemed more intuitive to me
> at the time to bond the ethernets and then put the vlans on the bonds
> but it seems life might be easier for the vlan driver if it is bound
> directly to the hardware.  I am using Intel NICs (igb driver) with 4
> queues per NIC.
> 
> Would there be a performance difference expected between the two
> configurations?  Can the vlan driver "see through" the bond interface
> to the
> hardware and take advantage of multiple queues if the hardware
> supports it in the first configuration?

Unfortunatly, first combination is not multiqueue aware yet.

You'll need to patch bonding driver like this if your nics have 4
queues :

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 85e813c..98cc3c0 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4915,8 +4915,8 @@ int bond_create(struct net *net, const char *name)
 
        rtnl_lock();
 
-       bond_dev = alloc_netdev(sizeof(struct bonding), name ? name : "",
-                               bond_setup);
+       bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name : "",
+                               bond_setup, 4);
        if (!bond_dev) {
                pr_err("%s: eek! can't alloc netdev!\n", name);
                rtnl_unlock();



^ permalink raw reply related

* [net-next-2.6 PATCH 0/3] Add port-profile netlink support
From: Scott Feldman @ 2010-05-04  4:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, chrisw, arnd

The following series adds port-profile netlink support and adds an
implementation to Cisco's enic netdev driver:

	1/3: Adds port-profile netlink RTM_SETLINK/RTM_GETLINK support, and
	     adds matching netdev ops net_{set|get}_vf_port_profile.

	2/3: Adds enic support for net_{set|get}_vf_port_profile for enic
	     dynamic devices.

	3/3: (please don't apply) Enables SR-IOV support for enic to
	     illustrate support for port-profile netlink using SR-IOV-
	     compliant devices.

The SETLINK/GETLINK support follows the model for other IFLA_VF_* msgs used
for SR-IOV devices where the receipent of the netlink msg is the PF, but the
target is the VF.

The intent of this patch set is to cover both definitions of port-profile
as defined by Cisco's enic use and as defined by VSI discover protocol (VDP),
used in VEPA implemenations.  While both definitions are based on pre-
standards, the concept of a port-profile to be applied to an external switch
port on behalf of a virtual machine interface is common, as well as many
of the fields defining the protocols.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>

^ permalink raw reply

* [net-next-2.6 PATCH 1/3] Add netdev/netlink port-profile support (take IV, was iovnl)
From: Scott Feldman @ 2010-05-04  4:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, chrisw, arnd
In-Reply-To: <20100504045327.9261.33330.stgit@savbu-pc100.cisco.com>

From: Scott Feldman <scofeldm@cisco.com>

Add new netdev ops ndo_{set|get}_vf_port_profile to allow setting of
port-profile on a netdev interface.  Extends netlink socket RTM_SETLINK/
RTM_GETLINK with new sub cmd called IFLA_VF_PORT_PROFILE (added to end of
IFLA_cmd list).

A port-profile is used to configure/enable the external switch port backing
the netdev interface, not to configure the host-facing side of the netdev.  A
port-profile is an identifier known to the switch.  How port-profiles are
installed on the switch or how available port-profiles are made know to the
host is outside the scope of this patch.

The general flow is the port-profile is applied to a host netdev interface
using RTM_SETLINK, the receiver of the RTM_SETLINK msg (more about that later)
communicates with the switch, and the switch port backing the host netdev
interface is configured/enabled based on the settings defined by the port-
profile.  What those settings comprise, and how those settings are managed is
again outside the scope of this patch, since this patch only deals with the
first step in the flow.

Since we're using netlink sockets, the receiver of the RTM_SETLINK msg can
be in kernel- or user-space.  For kernel-space recipient, rtnetlink.c, the
new ndo_set_vf_port_profile netdev op is called to set the port-profile.
User-space recipients can decide how they comminucate the IFLA_VF_PORT_PROFILE
to the external switch.

There is a RTM_GETLINK cmd to to return port-profile setting of an
interface and to also return the status of the last port-profile.

IFLA_VF_PORT_PROFILE is modeled after the existing IFLA_VF_* cmd where a
VF number is passed in to identify the virtual function (VF) of an SR-IOV-
capable device.  In this case, the target of IFLA_VF_PORT_PROFILE msg is the
netdev physical function (PF) device.  The PF will apply the port-profile
to the VF.  IFLA_VF_PORT_PROFILE can also be used for devices that don't
adhere to SR-IOV and can apply the port-profile directly to the netdev
target.  In this case, the VF number is ignored.

Passing in a NULL port-profile is used to delete the port-profile association.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>
---
 include/linux/if_link.h   |   25 +++++++++++++++++++++++++
 include/linux/netdevice.h |   10 ++++++++++
 net/core/rtnetlink.c      |   41 ++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 75 insertions(+), 1 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index cfd420b..d763358 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -116,6 +116,7 @@ enum {
 	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
 	IFLA_VFINFO,
 	IFLA_STATS64,
+	IFLA_VF_PORT_PROFILE,
 	__IFLA_MAX
 };
 
@@ -259,4 +260,28 @@ struct ifla_vf_info {
 	__u32 qos;
 	__u32 tx_rate;
 };
+
+enum {
+	IFLA_VF_PORT_PROFILE_STATUS_UNKNOWN,
+	IFLA_VF_PORT_PROFILE_STATUS_SUCCESS,
+	IFLA_VF_PORT_PROFILE_STATUS_INPROGRESS,
+	IFLA_VF_PORT_PROFILE_STATUS_ERROR,
+};
+
+#define IFLA_VF_PORT_PROFILE_MAX	40
+#define IFLA_VF_UUID_MAX		40
+#define IFLA_VF_CLIENT_NAME_MAX		40
+
+struct ifla_vf_port_profile {
+	__u32 vf;
+	__u32 flags;
+	__u32 status;
+	__u8 port_profile[IFLA_VF_PORT_PROFILE_MAX];
+	__u8 mac[32];					/* MAX_ADDR_LEN */
+	/* UUID e.g. "CEEFD3B1-9E11-11DE-BDFD-000BAB01C0FB" */
+	__u8 host_uuid[IFLA_VF_UUID_MAX];
+	__u8 client_uuid[IFLA_VF_UUID_MAX];
+	__u8 client_name[IFLA_VF_CLIENT_NAME_MAX];	/* e.g. "vm0-eth1" */
+};
+
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3c5ed5f..949abdb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -696,6 +696,10 @@ struct netdev_rx_queue {
  * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
  * int (*ndo_get_vf_config)(struct net_device *dev,
  *			    int vf, struct ifla_vf_info *ivf);
+ * int (*ndo_set_vf_port_profile)(struct net_device *dev, int vf,
+ *				  struct ifla_vf_port_profile *ivp);
+ * int (*ndo_get_vf_port_profile)(struct net_device *dev, int vf,
+ *				  struct ifla_vf_port_profile *ivp);
  */
 #define HAVE_NET_DEVICE_OPS
 struct net_device_ops {
@@ -744,6 +748,12 @@ struct net_device_ops {
 	int			(*ndo_get_vf_config)(struct net_device *dev,
 						     int vf,
 						     struct ifla_vf_info *ivf);
+	int			(*ndo_set_vf_port_profile)(
+					struct net_device *dev, int vf,
+					struct ifla_vf_port_profile *ivp);
+	int			(*ndo_get_vf_port_profile)(
+					struct net_device *dev, int vf,
+					struct ifla_vf_port_profile *ivp);
 #if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
 	int			(*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 78c8598..82420c9 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -747,17 +747,42 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 		goto nla_put_failure;
 	copy_rtnl_link_stats64(nla_data(attr), stats);
 
+	if (dev->dev.parent)
+		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
+	else
+		NLA_PUT_U32(skb, IFLA_NUM_VF, 0);
+
 	if (dev->netdev_ops->ndo_get_vf_config && dev->dev.parent) {
 		int i;
 		struct ifla_vf_info ivi;
 
-		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
 		for (i = 0; i < dev_num_vf(dev->dev.parent); i++) {
 			if (dev->netdev_ops->ndo_get_vf_config(dev, i, &ivi))
 				break;
 			NLA_PUT(skb, IFLA_VFINFO, sizeof(ivi), &ivi);
 		}
 	}
+
+	if (dev->netdev_ops->ndo_get_vf_port_profile) {
+		struct ifla_vf_port_profile ivp;
+
+		if (dev->dev.parent) {
+			int i;
+
+			for (i = 0; i < dev_num_vf(dev->dev.parent); i++) {
+				if (dev->netdev_ops->ndo_get_vf_port_profile(
+					dev, i, &ivp))
+					break;
+				NLA_PUT(skb, IFLA_VF_PORT_PROFILE,
+					sizeof(ivp), &ivp);
+			}
+		} else if (!dev->netdev_ops->ndo_get_vf_port_profile(dev,
+			0, &ivp)) {
+			NLA_PUT(skb, IFLA_VF_PORT_PROFILE,
+				sizeof(ivp), &ivp);
+		}
+	}
+
 	if (dev->rtnl_link_ops) {
 		if (rtnl_link_fill(skb, dev) < 0)
 			goto nla_put_failure;
@@ -824,6 +849,8 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 				    .len = sizeof(struct ifla_vf_vlan) },
 	[IFLA_VF_TX_RATE]	= { .type = NLA_BINARY,
 				    .len = sizeof(struct ifla_vf_tx_rate) },
+	[IFLA_VF_PORT_PROFILE]	= { .type = NLA_BINARY,
+				    .len = sizeof(struct ifla_vf_port_profile)},
 };
 EXPORT_SYMBOL(ifla_policy);
 
@@ -1028,6 +1055,18 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
 	}
 	err = 0;
 
+	if (tb[IFLA_VF_PORT_PROFILE]) {
+		struct ifla_vf_port_profile *ivp;
+		ivp = nla_data(tb[IFLA_VF_PORT_PROFILE]);
+		err = -EOPNOTSUPP;
+		if (ops->ndo_set_vf_port_profile)
+			err = ops->ndo_set_vf_port_profile(dev, ivp->vf, ivp);
+		if (err < 0)
+			goto errout;
+		modified = 1;
+	}
+	err = 0;
+
 errout:
 	if (err < 0 && modified && net_ratelimit())
 		printk(KERN_WARNING "A link change request failed with "


^ permalink raw reply related

* [net-next-2.6 PATCH 2/3] Add ndo_{set|get}_vf_port_profile op support for enic dynamic vnics
From: Scott Feldman @ 2010-05-04  4:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, chrisw, arnd
In-Reply-To: <20100504045327.9261.33330.stgit@savbu-pc100.cisco.com>

From: Scott Feldman <scofeldm@cisco.com>

Add enic ndo_{set|get}_vf_port_profile ops to support setting/getting
port-profile for enic dynamic devices.  Enic dynamic devices are just like
normal enic eth devices except dynamic enics require an extra configuration
step to assign a port-profile identifier to the interface before the
interface is useable.  Once a port-profile is assigned, link comes up on the
interface and is ready for I/O.  The port-profile is used to configure the
network port assigned to the interface.  The network port configuration
includes VLAN membership, QoS policies, and port security settings typical
of a data center network.

A dynamic enic is assigned a default random mac address.  If no mac address
parameter is specified in the ndo_set_vf_port_profile op, the default random
mac address is used when assigning the port-profile.  Otherwise the mac
address specified in the op is used.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>
---
 drivers/net/enic/Makefile    |    2 
 drivers/net/enic/enic.h      |    3 -
 drivers/net/enic/enic_main.c |  200 +++++++++++++++++++++++++++++++++++++-----
 drivers/net/enic/vnic_dev.c  |   50 +++++++++++
 drivers/net/enic/vnic_dev.h  |    3 +
 drivers/net/enic/vnic_vic.c  |   73 +++++++++++++++
 drivers/net/enic/vnic_vic.h  |   59 ++++++++++++
 7 files changed, 363 insertions(+), 27 deletions(-)

diff --git a/drivers/net/enic/Makefile b/drivers/net/enic/Makefile
index 391c3bc..e7b6c31 100644
--- a/drivers/net/enic/Makefile
+++ b/drivers/net/enic/Makefile
@@ -1,5 +1,5 @@
 obj-$(CONFIG_ENIC) := enic.o
 
 enic-y := enic_main.o vnic_cq.o vnic_intr.o vnic_wq.o \
-	enic_res.o vnic_dev.o vnic_rq.o
+	enic_res.o vnic_dev.o vnic_rq.o vnic_vic.o
 
diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 5fa56f1..718033f 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -34,7 +34,7 @@
 
 #define DRV_NAME		"enic"
 #define DRV_DESCRIPTION		"Cisco VIC Ethernet NIC Driver"
-#define DRV_VERSION		"1.3.1.1"
+#define DRV_VERSION		"1.3.1.1-pp"
 #define DRV_COPYRIGHT		"Copyright 2008-2009 Cisco Systems, Inc"
 #define PFX			DRV_NAME ": "
 
@@ -95,6 +95,7 @@ struct enic {
 	u32 port_mtu;
 	u32 rx_coalesce_usecs;
 	u32 tx_coalesce_usecs;
+	struct ifla_vf_port_profile pp;
 
 	/* work queue cache line section */
 	____cacheline_aligned struct vnic_wq wq[ENIC_WQ_MAX];
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 1232887..8e5e46b 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -29,6 +29,7 @@
 #include <linux/etherdevice.h>
 #include <linux/if_ether.h>
 #include <linux/if_vlan.h>
+#include <linux/if_link.h>
 #include <linux/ethtool.h>
 #include <linux/in.h>
 #include <linux/ip.h>
@@ -40,6 +41,7 @@
 #include "vnic_dev.h"
 #include "vnic_intr.h"
 #include "vnic_stats.h"
+#include "vnic_vic.h"
 #include "enic_res.h"
 #include "enic.h"
 
@@ -49,10 +51,12 @@
 #define ENIC_DESC_MAX_SPLITS		(MAX_TSO / WQ_ENET_MAX_DESC_LEN + 1)
 
 #define PCI_DEVICE_ID_CISCO_VIC_ENET         0x0043  /* ethernet vnic */
+#define PCI_DEVICE_ID_CISCO_VIC_ENET_DYN     0x0044  /* enet dynamic vnic */
 
 /* Supported devices */
 static DEFINE_PCI_DEVICE_TABLE(enic_id_table) = {
 	{ PCI_VDEVICE(CISCO, PCI_DEVICE_ID_CISCO_VIC_ENET) },
+	{ PCI_VDEVICE(CISCO, PCI_DEVICE_ID_CISCO_VIC_ENET_DYN) },
 	{ 0, }	/* end of table */
 };
 
@@ -113,6 +117,11 @@ static const struct enic_stat enic_rx_stats[] = {
 static const unsigned int enic_n_tx_stats = ARRAY_SIZE(enic_tx_stats);
 static const unsigned int enic_n_rx_stats = ARRAY_SIZE(enic_rx_stats);
 
+static int enic_is_dynamic(struct enic *enic)
+{
+	return enic->pdev->device == PCI_DEVICE_ID_CISCO_VIC_ENET_DYN;
+}
+
 static int enic_get_settings(struct net_device *netdev,
 	struct ethtool_cmd *ecmd)
 {
@@ -810,14 +819,24 @@ static void enic_reset_mcaddrs(struct enic *enic)
 
 static int enic_set_mac_addr(struct net_device *netdev, char *addr)
 {
-	if (!is_valid_ether_addr(addr))
-		return -EADDRNOTAVAIL;
+	struct enic *enic = netdev_priv(netdev);
 
-	memcpy(netdev->dev_addr, addr, netdev->addr_len);
+	if (enic_is_dynamic(enic)) {
+		random_ether_addr(netdev->dev_addr);
+	} else {
+		if (!is_valid_ether_addr(addr))
+			return -EADDRNOTAVAIL;
+		memcpy(netdev->dev_addr, addr, netdev->addr_len);
+	}
 
 	return 0;
 }
 
+static int enic_set_mac_address(struct net_device *netdev, void *p)
+{
+	return -EOPNOTSUPP;
+}
+
 /* netif_tx_lock held, BHs disabled */
 static void enic_set_multicast_list(struct net_device *netdev)
 {
@@ -922,6 +941,131 @@ static void enic_tx_timeout(struct net_device *netdev)
 	schedule_work(&enic->reset);
 }
 
+static int enic_vnic_dev_deinit(struct enic *enic)
+{
+	int err;
+
+	spin_lock(&enic->devcmd_lock);
+	err = vnic_dev_deinit(enic->vdev);
+	spin_unlock(&enic->devcmd_lock);
+
+	return err;
+}
+
+static int enic_dev_init_prov(struct enic *enic, struct vic_provinfo *vp)
+{
+	int err;
+
+	spin_lock(&enic->devcmd_lock);
+	err = vnic_dev_init_prov(enic->vdev,
+		(u8 *)vp, vic_provinfo_size(vp));
+	spin_unlock(&enic->devcmd_lock);
+
+	return err;
+}
+
+static int enic_dev_init_done(struct enic *enic, int *done, int *error)
+{
+	int err;
+
+	spin_lock(&enic->devcmd_lock);
+	err = vnic_dev_init_done(enic->vdev, done, error);
+	spin_unlock(&enic->devcmd_lock);
+
+	return err;
+}
+
+static int enic_provinfo_add_tlv_str(struct vic_provinfo *vp, u16 type,
+	u16 max_length, char *str)
+{
+	if (!str)
+		return 0;
+
+	if (strlen(str) + 1 > max_length)
+		return 0;
+
+	return vic_provinfo_add_tlv(vp, type, strlen(str) + 1, str);
+}
+
+static int enic_set_vf_port_profile(struct net_device *netdev, int vf,
+	struct ifla_vf_port_profile *ivp)
+{
+	struct enic *enic = netdev_priv(netdev);
+	struct vic_provinfo *vp;
+	u8 oui[3] = VIC_PROVINFO_CISCO_OUI;
+	u8 *mac = ivp->mac;
+	int err;
+
+	if (!enic_is_dynamic(enic))
+		return -EOPNOTSUPP;
+
+	memset(&enic->pp, 0, sizeof(enic->pp));
+
+	enic_vnic_dev_deinit(enic);
+
+	if (strlen(ivp->port_profile) == 0)
+		return 0;
+
+	if (is_zero_ether_addr(mac))
+		mac = netdev->dev_addr;
+
+	if (!is_valid_ether_addr(mac))
+		return -EADDRNOTAVAIL;
+
+	vp = vic_provinfo_alloc(GFP_KERNEL, oui, VIC_PROVINFO_LINUX_TYPE);
+	if (!vp)
+		return -ENOMEM;
+
+	enic_provinfo_add_tlv_str(vp, VIC_LINUX_PROV_TLV_PORT_PROFILE_NAME_STR,
+		IFLA_VF_PORT_PROFILE_MAX, ivp->port_profile);
+	vic_provinfo_add_tlv(vp, VIC_LINUX_PROV_TLV_CLIENT_MAC_ADDR,
+		ETH_ALEN, mac);
+	enic_provinfo_add_tlv_str(vp, VIC_LINUX_PROV_TLV_HOST_UUID_STR,
+		IFLA_VF_UUID_MAX, ivp->host_uuid);
+	enic_provinfo_add_tlv_str(vp, VIC_LINUX_PROV_TLV_CLIENT_UUID_STR,
+		IFLA_VF_UUID_MAX, ivp->client_uuid);
+	enic_provinfo_add_tlv_str(vp, VIC_LINUX_PROV_TLV_CLIENT_NAME_STR,
+		IFLA_VF_CLIENT_NAME_MAX, ivp->client_name);
+
+	err = enic_dev_init_prov(enic, vp);
+	if (err)
+		goto err_out;
+
+	memcpy(&enic->pp, ivp, sizeof(enic->pp));
+
+err_out:
+	vic_provinfo_free(vp);
+
+	return err;
+}
+
+static int enic_get_vf_port_profile(struct net_device *netdev, int vf,
+	struct ifla_vf_port_profile *ivp)
+{
+	struct enic *enic = netdev_priv(netdev);
+	int err, error, done;
+
+	if (!enic_is_dynamic(enic))
+		return -EOPNOTSUPP;
+
+	enic->pp.status = IFLA_VF_PORT_PROFILE_STATUS_UNKNOWN;
+
+	err = enic_dev_init_done(enic, &done, &error);
+
+	if (err || error)
+		enic->pp.status = IFLA_VF_PORT_PROFILE_STATUS_ERROR;
+
+	if (!done)
+		enic->pp.status = IFLA_VF_PORT_PROFILE_STATUS_INPROGRESS;
+
+	if (!error)
+		enic->pp.status = IFLA_VF_PORT_PROFILE_STATUS_SUCCESS;
+
+	memcpy(ivp, &enic->pp, sizeof(enic->pp));
+
+	return 0;
+}
+
 static void enic_free_rq_buf(struct vnic_rq *rq, struct vnic_rq_buf *buf)
 {
 	struct enic *enic = vnic_dev_priv(rq->vdev);
@@ -1440,10 +1584,12 @@ static int enic_open(struct net_device *netdev)
 	for (i = 0; i < enic->rq_count; i++)
 		vnic_rq_enable(&enic->rq[i]);
 
-	spin_lock(&enic->devcmd_lock);
-	enic_add_station_addr(enic);
-	spin_unlock(&enic->devcmd_lock);
-	enic_set_multicast_list(netdev);
+	if (!enic_is_dynamic(enic)) {
+		spin_lock(&enic->devcmd_lock);
+		enic_add_station_addr(enic);
+		spin_unlock(&enic->devcmd_lock);
+		enic_set_multicast_list(netdev);
+	}
 
 	netif_wake_queue(netdev);
 	napi_enable(&enic->napi);
@@ -1775,20 +1921,22 @@ static void enic_clear_intr_mode(struct enic *enic)
 }
 
 static const struct net_device_ops enic_netdev_ops = {
-	.ndo_open		= enic_open,
-	.ndo_stop		= enic_stop,
-	.ndo_start_xmit		= enic_hard_start_xmit,
-	.ndo_get_stats		= enic_get_stats,
-	.ndo_validate_addr	= eth_validate_addr,
-	.ndo_set_mac_address 	= eth_mac_addr,
-	.ndo_set_multicast_list	= enic_set_multicast_list,
-	.ndo_change_mtu		= enic_change_mtu,
-	.ndo_vlan_rx_register	= enic_vlan_rx_register,
-	.ndo_vlan_rx_add_vid	= enic_vlan_rx_add_vid,
-	.ndo_vlan_rx_kill_vid	= enic_vlan_rx_kill_vid,
-	.ndo_tx_timeout		= enic_tx_timeout,
+	.ndo_open			= enic_open,
+	.ndo_stop			= enic_stop,
+	.ndo_start_xmit			= enic_hard_start_xmit,
+	.ndo_get_stats			= enic_get_stats,
+	.ndo_validate_addr		= eth_validate_addr,
+	.ndo_set_multicast_list		= enic_set_multicast_list,
+	.ndo_set_mac_address		= enic_set_mac_address,
+	.ndo_change_mtu			= enic_change_mtu,
+	.ndo_vlan_rx_register		= enic_vlan_rx_register,
+	.ndo_vlan_rx_add_vid		= enic_vlan_rx_add_vid,
+	.ndo_vlan_rx_kill_vid		= enic_vlan_rx_kill_vid,
+	.ndo_tx_timeout			= enic_tx_timeout,
+	.ndo_set_vf_port_profile	= enic_set_vf_port_profile,
+	.ndo_get_vf_port_profile	= enic_get_vf_port_profile,
 #ifdef CONFIG_NET_POLL_CONTROLLER
-	.ndo_poll_controller	= enic_poll_controller,
+	.ndo_poll_controller		= enic_poll_controller,
 #endif
 };
 
@@ -2010,11 +2158,13 @@ static int __devinit enic_probe(struct pci_dev *pdev,
 
 	netif_carrier_off(netdev);
 
-	err = vnic_dev_init(enic->vdev, 0);
-	if (err) {
-		printk(KERN_ERR PFX
-			"vNIC dev init failed, aborting.\n");
-		goto err_out_dev_close;
+	if (!enic_is_dynamic(enic)) {
+		err = vnic_dev_init(enic->vdev, 0);
+		if (err) {
+			printk(KERN_ERR PFX
+				"vNIC dev init failed, aborting.\n");
+			goto err_out_dev_close;
+		}
 	}
 
 	err = enic_dev_init(enic);
diff --git a/drivers/net/enic/vnic_dev.c b/drivers/net/enic/vnic_dev.c
index d43a9d4..e351b0f 100644
--- a/drivers/net/enic/vnic_dev.c
+++ b/drivers/net/enic/vnic_dev.c
@@ -682,6 +682,56 @@ int vnic_dev_init(struct vnic_dev *vdev, int arg)
 	return r;
 }
 
+int vnic_dev_init_done(struct vnic_dev *vdev, int *done, int *err)
+{
+	u64 a0 = 0, a1 = 0;
+	int wait = 1000;
+	int ret;
+
+	*done = 0;
+
+	ret = vnic_dev_cmd(vdev, CMD_INIT_STATUS, &a0, &a1, wait);
+	if (ret)
+		return ret;
+
+	*done = (a0 == 0);
+
+	*err = (a0 == 0) ? a1 : 0;
+
+	return 0;
+}
+
+int vnic_dev_init_prov(struct vnic_dev *vdev, u8 *buf, u32 len)
+{
+	u64 a0, a1 = len;
+	int wait = 1000;
+	u64 prov_pa;
+	void *prov_buf;
+	int ret;
+
+	prov_buf = pci_alloc_consistent(vdev->pdev, len, &prov_pa);
+	if (!prov_buf)
+		return -ENOMEM;
+
+	memcpy(prov_buf, buf, len);
+
+	a0 = prov_pa;
+
+	ret = vnic_dev_cmd(vdev, CMD_INIT_PROV_INFO, &a0, &a1, wait);
+
+	pci_free_consistent(vdev->pdev, len, prov_buf, prov_pa);
+
+	return ret;
+}
+
+int vnic_dev_deinit(struct vnic_dev *vdev)
+{
+	u64 a0 = 0, a1 = 0;
+	int wait = 1000;
+
+	return vnic_dev_cmd(vdev, CMD_DEINIT, &a0, &a1, wait);
+}
+
 int vnic_dev_link_status(struct vnic_dev *vdev)
 {
 	if (vdev->linkstatus)
diff --git a/drivers/net/enic/vnic_dev.h b/drivers/net/enic/vnic_dev.h
index f5be640..27f5a5a 100644
--- a/drivers/net/enic/vnic_dev.h
+++ b/drivers/net/enic/vnic_dev.h
@@ -124,6 +124,9 @@ int vnic_dev_disable(struct vnic_dev *vdev);
 int vnic_dev_open(struct vnic_dev *vdev, int arg);
 int vnic_dev_open_done(struct vnic_dev *vdev, int *done);
 int vnic_dev_init(struct vnic_dev *vdev, int arg);
+int vnic_dev_init_done(struct vnic_dev *vdev, int *done, int *err);
+int vnic_dev_init_prov(struct vnic_dev *vdev, u8 *buf, u32 len);
+int vnic_dev_deinit(struct vnic_dev *vdev);
 int vnic_dev_soft_reset(struct vnic_dev *vdev, int arg);
 int vnic_dev_soft_reset_done(struct vnic_dev *vdev, int *done);
 void vnic_dev_set_intr_mode(struct vnic_dev *vdev,
diff --git a/drivers/net/enic/vnic_vic.c b/drivers/net/enic/vnic_vic.c
new file mode 100644
index 0000000..d769772
--- /dev/null
+++ b/drivers/net/enic/vnic_vic.c
@@ -0,0 +1,73 @@
+/*
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ *
+ * This program is free software; you may redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+
+#include "vnic_vic.h"
+
+struct vic_provinfo *vic_provinfo_alloc(gfp_t flags, u8 *oui, u8 type)
+{
+	struct vic_provinfo *vp = kzalloc(VIC_PROVINFO_MAX_DATA, flags);
+
+	if (!vp || !oui)
+		return NULL;
+
+	memcpy(vp->oui, oui, sizeof(vp->oui));
+	vp->type = type;
+	vp->length = htonl(sizeof(vp->num_tlvs));
+
+	return vp;
+}
+
+void vic_provinfo_free(struct vic_provinfo *vp)
+{
+	kfree(vp);
+}
+
+int vic_provinfo_add_tlv(struct vic_provinfo *vp, u16 type, u16 length,
+	void *value)
+{
+	struct vic_provinfo_tlv *tlv;
+
+	if (!vp || !value)
+		return -EINVAL;
+
+	if (ntohl(vp->length) + sizeof(*tlv) + length >
+		VIC_PROVINFO_MAX_TLV_DATA)
+		return -ENOMEM;
+
+	tlv = (struct vic_provinfo_tlv *)((u8 *)vp->tlv +
+		ntohl(vp->length) - sizeof(vp->num_tlvs));
+
+	tlv->type = htons(type);
+	tlv->length = htons(length);
+	memcpy(tlv->value, value, length);
+
+	vp->num_tlvs = htonl(ntohl(vp->num_tlvs) + 1);
+	vp->length = htonl(ntohl(vp->length) + sizeof(*tlv) + length);
+
+	return 0;
+}
+
+size_t vic_provinfo_size(struct vic_provinfo *vp)
+{
+	return vp ?  ntohl(vp->length) + sizeof(*vp) - sizeof(vp->num_tlvs) : 0;
+}
diff --git a/drivers/net/enic/vnic_vic.h b/drivers/net/enic/vnic_vic.h
new file mode 100644
index 0000000..085c2a2
--- /dev/null
+++ b/drivers/net/enic/vnic_vic.h
@@ -0,0 +1,59 @@
+/*
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ *
+ * This program is free software; you may redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#ifndef _VNIC_VIC_H_
+#define _VNIC_VIC_H_
+
+/* Note: All integer fields in NETWORK byte order */
+
+/* Note: String field lengths include null char */
+
+#define VIC_PROVINFO_CISCO_OUI		{ 0x00, 0x00, 0x0c }
+#define VIC_PROVINFO_LINUX_TYPE		0x2
+
+enum vic_linux_prov_tlv_type {
+	VIC_LINUX_PROV_TLV_PORT_PROFILE_NAME_STR = 0,
+	VIC_LINUX_PROV_TLV_CLIENT_MAC_ADDR = 1,			/* u8[6] */
+	VIC_LINUX_PROV_TLV_CLIENT_NAME_STR = 2,
+	VIC_LINUX_PROV_TLV_HOST_UUID_STR = 8,
+	VIC_LINUX_PROV_TLV_CLIENT_UUID_STR = 9,
+};
+
+struct vic_provinfo {
+	u8 oui[3];		/* OUI of data provider */
+	u8 type;		/* provider-specific type */
+	u32 length;		/* length of data below */
+	u32 num_tlvs;		/* number of tlvs */
+	struct vic_provinfo_tlv {
+		u16 type;
+		u16 length;
+		u8 value[0];
+	} tlv[0];
+} __attribute__ ((packed));
+
+#define VIC_PROVINFO_MAX_DATA		1385
+#define VIC_PROVINFO_MAX_TLV_DATA (VIC_PROVINFO_MAX_DATA - \
+	sizeof(struct vic_provinfo))
+
+struct vic_provinfo *vic_provinfo_alloc(gfp_t flags, u8 *oui, u8 type);
+void vic_provinfo_free(struct vic_provinfo *vp);
+int vic_provinfo_add_tlv(struct vic_provinfo *vp, u16 type, u16 length,
+	void *value);
+size_t vic_provinfo_size(struct vic_provinfo *vp);
+
+#endif	/* _VNIC_VIC_H_ */


^ permalink raw reply related

* [net-next-2.6 PATCH 3/3] Add SR-IOV support to enic (please don't apply this patch)
From: Scott Feldman @ 2010-05-04  4:53 UTC (permalink / raw)
  To: davem; +Cc: netdev, chrisw, arnd
In-Reply-To: <20100504045327.9261.33330.stgit@savbu-pc100.cisco.com>

From: Scott Feldman <scofeldm@cisco.com>

This patch is to illustrate how port-profiles will be assigned to VFs in
a compliant SR-IOV enic device.  Here the VF devices are dynamic enics and
the PF device is a "static" enic device.  Only the PF device resonds to
ndo_vf_{set|get}_port_profile to set/get the port-profile on a VF.  It's
not possible to set a port-profile on a PF since PFs have an immutable port
assignment on the external switch, established when the PF was provisioned.

The same driver (enic) is used for both PFs and VFs devices.  The PF
enables N number of VFs based on a PF configuration parameter assigned
when the PF is provisioned.

While this patch is functionally complete, we (Cisco) need to do more testing
before we can cliam full SR-IOV support in Linux, so we ask that this patch
not be applied at this time.  it is provide with this patch set for
illustrative purposes only to show how the port-profile netlink API would
be used for a SR-IOV compliant device that supports port-profiles.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>
---
 drivers/net/enic/enic.h      |    5 +-
 drivers/net/enic/enic_main.c |   96 +++++++++++++++++++++++++++++++-----------
 drivers/net/enic/enic_res.c  |    3 +
 drivers/net/enic/vnic_dev.c  |   12 +++--
 drivers/net/enic/vnic_dev.h  |    6 +--
 drivers/net/enic/vnic_enet.h |    1 
 6 files changed, 86 insertions(+), 37 deletions(-)

diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 718033f..4d00e5e 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -34,7 +34,7 @@
 
 #define DRV_NAME		"enic"
 #define DRV_DESCRIPTION		"Cisco VIC Ethernet NIC Driver"
-#define DRV_VERSION		"1.3.1.1-pp"
+#define DRV_VERSION		"1.3.1.1-sr-iov"
 #define DRV_COPYRIGHT		"Copyright 2008-2009 Cisco Systems, Inc"
 #define PFX			DRV_NAME ": "
 
@@ -95,7 +95,8 @@ struct enic {
 	u32 port_mtu;
 	u32 rx_coalesce_usecs;
 	u32 tx_coalesce_usecs;
-	struct ifla_vf_port_profile pp;
+	struct ifla_vf_port_profile *pp;
+	unsigned int vf_count;
 
 	/* work queue cache line section */
 	____cacheline_aligned struct vnic_wq wq[ENIC_WQ_MAX];
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 8e5e46b..1488431 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -941,35 +941,36 @@ static void enic_tx_timeout(struct net_device *netdev)
 	schedule_work(&enic->reset);
 }
 
-static int enic_vnic_dev_deinit(struct enic *enic)
+static int enic_vnic_dev_deinit(struct enic *enic, int vf)
 {
 	int err;
 
 	spin_lock(&enic->devcmd_lock);
-	err = vnic_dev_deinit(enic->vdev);
+	err = vnic_dev_deinit(enic->vdev, vf);
 	spin_unlock(&enic->devcmd_lock);
 
 	return err;
 }
 
-static int enic_dev_init_prov(struct enic *enic, struct vic_provinfo *vp)
+static int enic_dev_init_prov(struct enic *enic, int vf,
+	struct vic_provinfo *vp)
 {
 	int err;
 
 	spin_lock(&enic->devcmd_lock);
-	err = vnic_dev_init_prov(enic->vdev,
+	err = vnic_dev_init_prov(enic->vdev, vf,
 		(u8 *)vp, vic_provinfo_size(vp));
 	spin_unlock(&enic->devcmd_lock);
 
 	return err;
 }
 
-static int enic_dev_init_done(struct enic *enic, int *done, int *error)
+static int enic_dev_init_done(struct enic *enic, int vf, int *done, int *error)
 {
 	int err;
 
 	spin_lock(&enic->devcmd_lock);
-	err = vnic_dev_init_done(enic->vdev, done, error);
+	err = vnic_dev_init_done(enic->vdev, vf, done, error);
 	spin_unlock(&enic->devcmd_lock);
 
 	return err;
@@ -993,23 +994,22 @@ static int enic_set_vf_port_profile(struct net_device *netdev, int vf,
 	struct enic *enic = netdev_priv(netdev);
 	struct vic_provinfo *vp;
 	u8 oui[3] = VIC_PROVINFO_CISCO_OUI;
-	u8 *mac = ivp->mac;
 	int err;
 
-	if (!enic_is_dynamic(enic))
+	if (enic_is_dynamic(enic))
 		return -EOPNOTSUPP;
 
-	memset(&enic->pp, 0, sizeof(enic->pp));
+	if (vf < 0 || vf >= enic->vf_count)
+		return -EOPNOTSUPP;
+
+	memset(&enic->pp[vf], 0, sizeof(enic->pp[vf]));
 
-	enic_vnic_dev_deinit(enic);
+	enic_vnic_dev_deinit(enic, vf);
 
 	if (strlen(ivp->port_profile) == 0)
 		return 0;
 
-	if (is_zero_ether_addr(mac))
-		mac = netdev->dev_addr;
-
-	if (!is_valid_ether_addr(mac))
+	if (!is_valid_ether_addr(ipv->mac))
 		return -EADDRNOTAVAIL;
 
 	vp = vic_provinfo_alloc(GFP_KERNEL, oui, VIC_PROVINFO_LINUX_TYPE);
@@ -1019,7 +1019,7 @@ static int enic_set_vf_port_profile(struct net_device *netdev, int vf,
 	enic_provinfo_add_tlv_str(vp, VIC_LINUX_PROV_TLV_PORT_PROFILE_NAME_STR,
 		IFLA_VF_PORT_PROFILE_MAX, ivp->port_profile);
 	vic_provinfo_add_tlv(vp, VIC_LINUX_PROV_TLV_CLIENT_MAC_ADDR,
-		ETH_ALEN, mac);
+		ETH_ALEN, ivp->mac);
 	enic_provinfo_add_tlv_str(vp, VIC_LINUX_PROV_TLV_HOST_UUID_STR,
 		IFLA_VF_UUID_MAX, ivp->host_uuid);
 	enic_provinfo_add_tlv_str(vp, VIC_LINUX_PROV_TLV_CLIENT_UUID_STR,
@@ -1027,11 +1027,11 @@ static int enic_set_vf_port_profile(struct net_device *netdev, int vf,
 	enic_provinfo_add_tlv_str(vp, VIC_LINUX_PROV_TLV_CLIENT_NAME_STR,
 		IFLA_VF_CLIENT_NAME_MAX, ivp->client_name);
 
-	err = enic_dev_init_prov(enic, vp);
+	err = enic_dev_init_prov(enic, vf, vp);
 	if (err)
 		goto err_out;
 
-	memcpy(&enic->pp, ivp, sizeof(enic->pp));
+	memcpy(&enic->pp[vf], ivp, sizeof(enic->pp[vf]));
 
 err_out:
 	vic_provinfo_free(vp);
@@ -1045,23 +1045,26 @@ static int enic_get_vf_port_profile(struct net_device *netdev, int vf,
 	struct enic *enic = netdev_priv(netdev);
 	int err, error, done;
 
-	if (!enic_is_dynamic(enic))
+	if (enic_is_dynamic(enic))
+		return -EOPNOTSUPP;
+
+	if (vf < 0 || vf >= enic->vf_count)
 		return -EOPNOTSUPP;
 
-	enic->pp.status = IFLA_VF_PORT_PROFILE_STATUS_UNKNOWN;
+	enic->pp[vf].status = IFLA_VF_PORT_PROFILE_STATUS_UNKNOWN;
 
-	err = enic_dev_init_done(enic, &done, &error);
+	err = enic_dev_init_done(enic, vf, &done, &error);
 
 	if (err || error)
-		enic->pp.status = IFLA_VF_PORT_PROFILE_STATUS_ERROR;
+		enic->pp[vf].status = IFLA_VF_PORT_PROFILE_STATUS_ERROR;
 
 	if (!done)
-		enic->pp.status = IFLA_VF_PORT_PROFILE_STATUS_INPROGRESS;
+		enic->pp[vf].status = IFLA_VF_PORT_PROFILE_STATUS_INPROGRESS;
 
 	if (!error)
-		enic->pp.status = IFLA_VF_PORT_PROFILE_STATUS_SUCCESS;
+		enic->pp[vf].status = IFLA_VF_PORT_PROFILE_STATUS_SUCCESS;
 
-	memcpy(ivp, &enic->pp, sizeof(enic->pp));
+	memcpy(ivp, &enic->pp[vf], sizeof(enic->pp[vf]));
 
 	return 0;
 }
@@ -2023,6 +2026,37 @@ err_out_free_vnic_resources:
 	return err;
 }
 
+static int enic_enable_vfs(struct enic *enic)
+{
+	int err;
+
+	enic->vf_count = enic->config.vf_count;
+
+	enic->pp = kzalloc(enic->vf_count  *
+		sizeof(struct ifla_vf_port_profile), GFP_KERNEL);
+	if (!enic->pp)
+		return -ENOMEM;
+
+	if (enic->pdev->is_physfn && enic->vf_count > 0) {
+
+		err = pci_enable_sriov(enic->pdev, enic->vf_count);
+		if (err) {
+			kfree(enic->pp);
+			return err;
+		}
+	}
+
+	return 0;
+}
+
+static void enic_disable_vfs(struct enic *enic)
+{
+       if (enic->pdev->is_physfn && enic->vf_count > 0)
+               pci_disable_sriov(enic->pdev);
+       kfree(enic->pp);
+       enic->vf_count = 0;
+}
+
 static void enic_iounmap(struct enic *enic)
 {
 	unsigned int i;
@@ -2174,6 +2208,13 @@ static int __devinit enic_probe(struct pci_dev *pdev,
 		goto err_out_dev_close;
 	}
 
+	err = enic_enable_vfs(enic);
+	if (err) {
+		printk(KERN_ERR PFX
+			"SR-IOV VF enable failed, aborting.\n");
+		goto err_out_dev_deinit;
+	}
+
 	/* Setup notification timer, HW reset task, and locks
 	 */
 
@@ -2198,7 +2239,7 @@ static int __devinit enic_probe(struct pci_dev *pdev,
 	if (err) {
 		printk(KERN_ERR PFX
 			"Invalid MAC address, aborting.\n");
-		goto err_out_dev_deinit;
+		goto err_out_disable_vfs;
 	}
 
 	enic->tx_coalesce_usecs = enic->config.intr_timer_usec;
@@ -2234,11 +2275,13 @@ static int __devinit enic_probe(struct pci_dev *pdev,
 	if (err) {
 		printk(KERN_ERR PFX
 			"Cannot register net device, aborting.\n");
-		goto err_out_dev_deinit;
+		goto err_out_disable_vfs;
 	}
 
 	return 0;
 
+err_out_disable_vfs:
+	enic_disable_vfs(enic);
 err_out_dev_deinit:
 	enic_dev_deinit(enic);
 err_out_dev_close:
@@ -2267,6 +2310,7 @@ static void __devexit enic_remove(struct pci_dev *pdev)
 
 		flush_scheduled_work();
 		unregister_netdev(netdev);
+		enic_disable_vfs(enic);
 		enic_dev_deinit(enic);
 		vnic_dev_close(enic->vdev);
 		vnic_dev_unregister(enic->vdev);
diff --git a/drivers/net/enic/enic_res.c b/drivers/net/enic/enic_res.c
index 02839bf..dfb37f2 100644
--- a/drivers/net/enic/enic_res.c
+++ b/drivers/net/enic/enic_res.c
@@ -69,6 +69,7 @@ int enic_get_vnic_config(struct enic *enic)
 	GET_CONFIG(intr_timer_type);
 	GET_CONFIG(intr_mode);
 	GET_CONFIG(intr_timer_usec);
+	GET_CONFIG(vf_count);
 
 	c->wq_desc_count =
 		min_t(u32, ENIC_MAX_WQ_DESCS,
@@ -99,6 +100,8 @@ int enic_get_vnic_config(struct enic *enic)
 		c->mtu, ENIC_SETTING(enic, TXCSUM),
 		ENIC_SETTING(enic, RXCSUM), ENIC_SETTING(enic, TSO),
 		ENIC_SETTING(enic, LRO), c->intr_timer_usec);
+	if (c->vf_count)
+		printk(KERN_INFO PFX "vNIC SR-IOV VF count %d\n", c->vf_count);
 
 	return 0;
 }
diff --git a/drivers/net/enic/vnic_dev.c b/drivers/net/enic/vnic_dev.c
index e351b0f..261d5f0 100644
--- a/drivers/net/enic/vnic_dev.c
+++ b/drivers/net/enic/vnic_dev.c
@@ -682,9 +682,9 @@ int vnic_dev_init(struct vnic_dev *vdev, int arg)
 	return r;
 }
 
-int vnic_dev_init_done(struct vnic_dev *vdev, int *done, int *err)
+int vnic_dev_init_done(struct vnic_dev *vdev, u16 vf, int *done, int *err)
 {
-	u64 a0 = 0, a1 = 0;
+	u64 a0 = vf, a1 = 0;
 	int wait = 1000;
 	int ret;
 
@@ -701,9 +701,9 @@ int vnic_dev_init_done(struct vnic_dev *vdev, int *done, int *err)
 	return 0;
 }
 
-int vnic_dev_init_prov(struct vnic_dev *vdev, u8 *buf, u32 len)
+int vnic_dev_init_prov(struct vnic_dev *vdev, u16 vf, u8 *buf, u32 len)
 {
-	u64 a0, a1 = len;
+	u64 a0, a1 = (u64)len | ((u64)vf << 32);
 	int wait = 1000;
 	u64 prov_pa;
 	void *prov_buf;
@@ -724,9 +724,9 @@ int vnic_dev_init_prov(struct vnic_dev *vdev, u8 *buf, u32 len)
 	return ret;
 }
 
-int vnic_dev_deinit(struct vnic_dev *vdev)
+int vnic_dev_deinit(struct vnic_dev *vdev, u16 vf)
 {
-	u64 a0 = 0, a1 = 0;
+	u64 a0 = vf, a1 = 0;
 	int wait = 1000;
 
 	return vnic_dev_cmd(vdev, CMD_DEINIT, &a0, &a1, wait);
diff --git a/drivers/net/enic/vnic_dev.h b/drivers/net/enic/vnic_dev.h
index 27f5a5a..d508187 100644
--- a/drivers/net/enic/vnic_dev.h
+++ b/drivers/net/enic/vnic_dev.h
@@ -124,9 +124,9 @@ int vnic_dev_disable(struct vnic_dev *vdev);
 int vnic_dev_open(struct vnic_dev *vdev, int arg);
 int vnic_dev_open_done(struct vnic_dev *vdev, int *done);
 int vnic_dev_init(struct vnic_dev *vdev, int arg);
-int vnic_dev_init_done(struct vnic_dev *vdev, int *done, int *err);
-int vnic_dev_init_prov(struct vnic_dev *vdev, u8 *buf, u32 len);
-int vnic_dev_deinit(struct vnic_dev *vdev);
+int vnic_dev_init_done(struct vnic_dev *vdev, u16 vf, int *done, int *err);
+int vnic_dev_init_prov(struct vnic_dev *vdev, u16 vf, u8 *buf, u32 len);
+int vnic_dev_deinit(struct vnic_dev *vdev, u16 vf);
 int vnic_dev_soft_reset(struct vnic_dev *vdev, int arg);
 int vnic_dev_soft_reset_done(struct vnic_dev *vdev, int *done);
 void vnic_dev_set_intr_mode(struct vnic_dev *vdev,
diff --git a/drivers/net/enic/vnic_enet.h b/drivers/net/enic/vnic_enet.h
index 8eeb675..466a7b3 100644
--- a/drivers/net/enic/vnic_enet.h
+++ b/drivers/net/enic/vnic_enet.h
@@ -35,6 +35,7 @@ struct vnic_enet_config {
 	u8 intr_mode;
 	char devname[16];
 	u32 intr_timer_usec;
+	u16 vf_count;
 };
 
 #define VENETF_TSO		0x1	/* TSO enabled */


^ permalink raw reply related

* Re: [PATCH] net-next: remove useless union keyword
From: Eric Dumazet @ 2010-05-04  5:07 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netdev
In-Reply-To: <1272942254-14760-1-git-send-email-xiaosuo@gmail.com>

Le mardi 04 mai 2010 à 11:04 +0800, Changli Gao a écrit :
> remove useless union keyword in rtable, rt6_info and dn_route.
> 
> Since there is only one member in a union, the union keyword isn't useful.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

I'll respin my dst patch

^ permalink raw reply

* [PATCH] forcedeth: GRO support
From: Tom Herbert @ 2010-05-04  5:08 UTC (permalink / raw)
  To: netdev, aabdulla, davem

Add GRO support to forcedeth.

Signed-off-by: Tom Herbert <therbert@google.com>
---
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 5cf0e66..4a24cc7 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -2817,7 +2817,7 @@ static int nv_rx_process(struct net_device *dev, int limit)
 		dprintk(KERN_DEBUG "%s: nv_rx_process: %d bytes, proto %d accepted.\n",
 					dev->name, len, skb->protocol);
 #ifdef CONFIG_FORCEDETH_NAPI
-		netif_receive_skb(skb);
+		napi_gro_receive(&np->napi, skb);
 #else
 		netif_rx(skb);
 #endif
@@ -2910,7 +2910,7 @@ static int nv_rx_process_optimized(struct net_device *dev, int limit)
 
 			if (likely(!np->vlangrp)) {
 #ifdef CONFIG_FORCEDETH_NAPI
-				netif_receive_skb(skb);
+				napi_gro_receive(&np->napi, skb);
 #else
 				netif_rx(skb);
 #endif
@@ -2918,15 +2918,15 @@ static int nv_rx_process_optimized(struct net_device *dev, int limit)
 				vlanflags = le32_to_cpu(np->get_rx.ex->buflow);
 				if (vlanflags & NV_RX3_VLAN_TAG_PRESENT) {
 #ifdef CONFIG_FORCEDETH_NAPI
-					vlan_hwaccel_receive_skb(skb, np->vlangrp,
-								 vlanflags & NV_RX3_VLAN_TAG_MASK);
+					vlan_gro_receive(&np->napi, np->vlangrp,
+							 vlanflags & NV_RX3_VLAN_TAG_MASK, skb);
 #else
 					vlan_hwaccel_rx(skb, np->vlangrp,
 							vlanflags & NV_RX3_VLAN_TAG_MASK);
 #endif
 				} else {
 #ifdef CONFIG_FORCEDETH_NAPI
-					netif_receive_skb(skb);
+					napi_gro_receive(&np->napi, skb);
 #else
 					netif_rx(skb);
 #endif
@@ -5711,6 +5711,9 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 		np->txrxctl_bits |= NVREG_TXRXCTL_RXCHECK;
 		dev->features |= NETIF_F_IP_CSUM | NETIF_F_SG;
 		dev->features |= NETIF_F_TSO;
+#ifdef CONFIG_FORCEDETH_NAPI
+		dev->features |= NETIF_F_GRO;
+#endif
 	}
 
 	np->vlanctl_bits = 0;

^ permalink raw reply related

* Re: [PATCH] net-next: remove useless union keyword
From: Eric Dumazet @ 2010-05-04  5:12 UTC (permalink / raw)
  To: Changli Gao; +Cc: David S. Miller, netdev
In-Reply-To: <1272949625.2407.177.camel@edumazet-laptop>

Le mardi 04 mai 2010 à 07:07 +0200, Eric Dumazet a écrit :
> Le mardi 04 mai 2010 à 11:04 +0800, Changli Gao a écrit :
> > remove useless union keyword in rtable, rt6_info and dn_route.
> > 
> > Since there is only one member in a union, the union keyword isn't useful.
> > 
> > Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> > ----
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> I'll respin my dst patch

Scratch my Acked-by, this doesnt even compile...

Changli, you _must_ compile your patches.

You _must_ also state if you did not test them.


  CC      net/ipv6/route.o
net/ipv6/route.c:129: error: unknown field ‘u’ specified in initializer
net/ipv6/route.c:130: error: unknown field ‘dst’ specified in initializer
net/ipv6/route.c:131: error: unknown field ‘__refcnt’ specified in initializer
net/ipv6/route.c:131: warning: braces around scalar initializer
net/ipv6/route.c:131: warning: (near initialization for ‘ip6_null_entry_template.dst.rcu_head.next’)
net/ipv6/route.c:131: warning: initialization makes pointer from integer without a cast
net/ipv6/route.c:132: error: unknown field ‘__use’ specified in initializer
net/ipv6/route.c:132: warning: initialization makes pointer from integer without a cast
net/ipv6/route.c:133: error: unknown field ‘obsolete’ specified in initializer
net/ipv6/route.c:133: warning: excess elements in struct initializer
net/ipv6/route.c:133: warning: (near initialization for ‘ip6_null_entry_template.dst.rcu_head’)
net/ipv6/route.c:134: error: unknown field ‘error’ specified in initializer
net/ipv6/route.c:134: warning: excess elements in struct initializer
net/ipv6/route.c:134: warning: (near initialization for ‘ip6_null_entry_template.dst.rcu_head’)



^ permalink raw reply

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: David Miller @ 2010-05-04  6:05 UTC (permalink / raw)
  To: brian.haley; +Cc: enh, netdev
In-Reply-To: <4BDF8387.4000303@hp.com>

From: Brian Haley <brian.haley@hp.com>
Date: Mon, 03 May 2010 22:16:39 -0400

> It looks like a bug to me, feel free to send along a patch :)

Is it?  The quoted text is only about setting the value and what
effect setting -1 or whatever has.

For getting the value, the behavior described sounds just fine.

The default for a socket is whatever the kernel-wide default is.

^ permalink raw reply

* Re: [PATCH] ethernet: add sanity check before memory dereferencing
From: David Miller @ 2010-05-04  6:11 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, netdev
In-Reply-To: <1272944028-23410-1-git-send-email-xiaosuo@gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Tue,  4 May 2010 11:33:48 +0800

> add sanity check before memory dereferencing
> 
> Some callers of eth_type_trans() only can assure the length of the packets
> passed to it is not less than ETH_HLEN. We'd better check the packets length
> before dereferencing skb->data.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

We can deference skb->data for at least 16 bytes or so past the end
of the valid packet data area however we want.

It might give garbage values, but it will not cause a fault or any
kind.

We want to remove checks here, not add new ones Changli.

^ permalink raw reply

* Re: [PATCH v2] ethernet: call __skb_pull() in eth_type_trans()
From: David Miller @ 2010-05-04  6:16 UTC (permalink / raw)
  To: xiaosuo; +Cc: eric.dumazet, netdev
In-Reply-To: <h2h412e6f7f1005031934m971342dw965e046d40485ec7@mail.gmail.com>

From: Changli Gao <xiaosuo@gmail.com>
Date: Tue, 4 May 2010 10:34:07 +0800

> It seems no callers pass eth_type_trans() a packet, whose length is
> less than ETH_HLEN. It means that skb_pull() always returns non-NULL.
> And if skb_pull() returns NULL, the later memory dereferences must be
> invalid.

In your opinion.  As I explained in the reply to your latest
eth_type_trans() patch, we can defererence several bytes past
the end of skb->data's valid packet data area without faulting.

We'll just read in garbage, because it's things like skb_shared_info()
and friends might be there.

But it's completely safe, and frankly I'm fine with the kernel doing
this when runts make it into this code if that allows us to avoid
stupid checks.

> As Eric mentioned above, GRE only assures the length of the packets
> passed to eth_type_trans() isn't less than ETH_HLEN, we should check
> skb->len before we dereference skb->data.

The code needs only ETH_HLEN, but valid ethernet packets must be at
least ETH_ZLEN.

> For performance, how about inlining eth_type_trans(). Because its main
> users are NIC drivers, and there aren't likely many kinds of NICs at
> the same time, inlining it won't increases the size of the kernel
> image much.

No, that's unnecessary bloat, plus I want to make ->ndo_type_trans()
a netdev operation so it can possibly be deferred.

Changli, please go hack elsewhere and on some other piece of code,
all your ideas here in eth_type_trans() are not well founded.

I see from your struct dst union removal patch that you don't even
build test your changes, so why don't you spend your excess energy on
making sure your code at least compiles successfully?

Thanks.

^ permalink raw reply

* Re: OOP in ip_cmsg_recv (net-next)
From: David Miller @ 2010-05-04  6:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, netdev
In-Reply-To: <1272948225.2407.170.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 04 May 2010 06:43:45 +0200

> David, if I am not mistaken (not thea yet for me this early morning) the
> tracer you mention is included in kfree_skb(), not in __kfree_skb() :
 ...
> I only copied part of consume_skb() which doesnt call
> trace_kfree_skb() :
 ...
> So I believe my second patch is a bit better : We dont even lock the
> socket in the (rare) case we should not orphan the skb ;)

Right you are.

> [PATCH net-next-2.6] net: skb_free_datagram_locked() fix

I'll apply this, thanks!

^ permalink raw reply

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: enh @ 2010-05-04  6:19 UTC (permalink / raw)
  To: David Miller; +Cc: brian.haley, netdev
In-Reply-To: <20100503.230553.200626786.davem@davemloft.net>

On Mon, May 3, 2010 at 23:05, David Miller <davem@davemloft.net> wrote:
> From: Brian Haley <brian.haley@hp.com>
> Date: Mon, 03 May 2010 22:16:39 -0400
>
>> It looks like a bug to me, feel free to send along a patch :)
>
> Is it?  The quoted text is only about setting the value and what
> effect setting -1 or whatever has.
>
> For getting the value, the behavior described sounds just fine.
>
> The default for a socket is whatever the kernel-wide default is.

for the *unicast* hops, a part of the RFC i didn't quote says:

   If the [IPV6_UNICAST_HOPS] option is not set, the
   system selects a default value.

but for the *multicast* hops, which is what i'm talking about, this
part of the quoted text seems pretty definitive:

           If IPV6_MULTICAST_HOPS is not set, the default is 1
           (same as IPv4 today)

this is what my test shows isn't true of linux; linux reuses its
unicast default instead.

-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/

^ permalink raw reply

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: David Miller @ 2010-05-04  6:22 UTC (permalink / raw)
  To: enh; +Cc: brian.haley, netdev
In-Reply-To: <AANLkTingTdmhyJhENb9AYDjUT7m_-yQflw5-gJqIA9iw@mail.gmail.com>

From: enh <enh@google.com>
Date: Mon, 3 May 2010 23:19:22 -0700

> for the *unicast* hops, a part of the RFC i didn't quote says:
> 
>    If the [IPV6_UNICAST_HOPS] option is not set, the
>    system selects a default value.
> 
> but for the *multicast* hops, which is what i'm talking about, this
> part of the quoted text seems pretty definitive:
> 
>            If IPV6_MULTICAST_HOPS is not set, the default is 1
>            (same as IPv4 today)
> 
> this is what my test shows isn't true of linux; linux reuses its
> unicast default instead.

Ok, I see, so yeah this needs to be fixed to use "1" instead of
"-1" in the np->xxx ipv6 socket initialization.

^ permalink raw reply

* Re: [PATCH] ep93xx_eth stopps receiving packets
From: David Miller @ 2010-05-04  6:23 UTC (permalink / raw)
  To: buytenh; +Cc: stefan, netdev
In-Reply-To: <20100504014606.GI4586@mail.wantstofly.org>

From: Lennert Buytenhek <buytenh@wantstofly.org>
Date: Tue, 4 May 2010 03:46:07 +0200

> On Mon, May 03, 2010 at 01:42:44PM +0200, Stefan Agner wrote:
> 
>> Receiving small packet(s) in a fast pace leads to not receiving any
>> packets at all after some time.
>> 
>> After ethernet packet(s) arrived the receive descriptor is incremented
>> by the number of frames processed. If another packet arrives while
>> processing, this is processed in another call of ep93xx_rx. This
>> second call leads that too many receive descriptors getting released.
>> 
>> This fix increments, even in these case, the right number of processed
>> receive descriptors.
>> 
>> Signed-off-by: Stefan Agner <stefan@agner.ch>
> 
> I haven't opened my ep93xx docs for a while, but if this works for you:
> 
> Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>

I had to apply this by hand because Stefan's email client corrupted the
spacing et al. in the patch.  Stefan please post a usable patch next
time.

^ permalink raw reply

* Re: [PATCH] forcedeth: GRO support
From: David Miller @ 2010-05-04  6:27 UTC (permalink / raw)
  To: therbert; +Cc: netdev, aabdulla
In-Reply-To: <alpine.DEB.1.00.1005032203520.11991@pokey.mtv.corp.google.com>

From: Tom Herbert <therbert@google.com>
Date: Mon, 3 May 2010 22:08:45 -0700 (PDT)

> Add GRO support to forcedeth.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>

Applied, nice work Tom.

I think it's really time to kill the NAPI ifdef crap from this driver.
Something marked "experimental" that people have been actively using
and distributions have been turning on for years isn't experimental
any more.

Actually this goes back to 2006 even.

In fact, I'm just going to kill it right now in net-next-2.6

Thanks again Tom!

^ permalink raw reply

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: enh @ 2010-05-04  6:27 UTC (permalink / raw)
  To: David Miller; +Cc: brian.haley, netdev
In-Reply-To: <20100503.232249.200767273.davem@davemloft.net>

On Mon, May 3, 2010 at 23:22, David Miller <davem@davemloft.net> wrote:
> From: enh <enh@google.com>
> Date: Mon, 3 May 2010 23:19:22 -0700
>
>> for the *unicast* hops, a part of the RFC i didn't quote says:
>>
>>    If the [IPV6_UNICAST_HOPS] option is not set, the
>>    system selects a default value.
>>
>> but for the *multicast* hops, which is what i'm talking about, this
>> part of the quoted text seems pretty definitive:
>>
>>            If IPV6_MULTICAST_HOPS is not set, the default is 1
>>            (same as IPv4 today)
>>
>> this is what my test shows isn't true of linux; linux reuses its
>> unicast default instead.
>
> Ok, I see, so yeah this needs to be fixed to use "1" instead of
> "-1" in the np->xxx ipv6 socket initialization.

i think so. there's already the IPV6_DEFAULT_MCASTHOPS constant
defined to 1 but unused according to gitweb, so you might want to
either use it or remove it.

thanks!

-- 
Elliott Hughes - http://who/enh - http://jessies.org/~enh/

^ permalink raw reply

* Re: [PATCH] forcedeth: GRO support
From: David Miller @ 2010-05-04  6:33 UTC (permalink / raw)
  To: therbert; +Cc: netdev, aabdulla
In-Reply-To: <20100503.232714.43422983.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Mon, 03 May 2010 23:27:14 -0700 (PDT)

> In fact, I'm just going to kill it right now in net-next-2.6

--------------------
forcedeth: Kill NAPI config options.

All distributions enable it, therefore no significant body of users
are even testing the driver with it disabled.  And making NAPI
configurable is heavily discouraged anyways.

I left the MSI-X interrupt enabling thing in an "#if 0" block
so hopefully someone can debug that and it can get re-enabled.
Probably it was just one of the NVIDIA chipset MSI erratas that
we work handle these days in the PCI quirks (see drivers/pci/quirks.c
and stuff like nvenet_msi_disable()).

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/Kconfig     |   14 ----
 drivers/net/forcedeth.c |  194 +----------------------------------------------
 2 files changed, 1 insertions(+), 207 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index dbd26f9..b9e7618 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -1453,20 +1453,6 @@ config FORCEDETH
 	  To compile this driver as a module, choose M here. The module
 	  will be called forcedeth.
 
-config FORCEDETH_NAPI
-	bool "Use Rx Polling (NAPI) (EXPERIMENTAL)"
-	depends on FORCEDETH && EXPERIMENTAL
-	help
-	  NAPI is a new driver API designed to reduce CPU and interrupt load
-	  when the driver is receiving lots of packets from the card. It is
-	  still somewhat experimental and thus not yet enabled by default.
-
-	  If your estimated Rx load is 10kpps or more, or if the card will be
-	  deployed on potentially unfriendly networks (e.g. in a firewall),
-	  then say Y here.
-
-	  If in doubt, say N.
-
 config CS89x0
 	tristate "CS89x0 support"
 	depends on NET_ETHERNET && (ISA || EISA || MACH_IXDP2351 \
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 4a24cc7..f9e1dd4 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -1104,20 +1104,16 @@ static void nv_disable_hw_interrupts(struct net_device *dev, u32 mask)
 
 static void nv_napi_enable(struct net_device *dev)
 {
-#ifdef CONFIG_FORCEDETH_NAPI
 	struct fe_priv *np = get_nvpriv(dev);
 
 	napi_enable(&np->napi);
-#endif
 }
 
 static void nv_napi_disable(struct net_device *dev)
 {
-#ifdef CONFIG_FORCEDETH_NAPI
 	struct fe_priv *np = get_nvpriv(dev);
 
 	napi_disable(&np->napi);
-#endif
 }
 
 #define MII_READ	(-1)
@@ -1810,7 +1806,6 @@ static int nv_alloc_rx_optimized(struct net_device *dev)
 }
 
 /* If rx bufs are exhausted called after 50ms to attempt to refresh */
-#ifdef CONFIG_FORCEDETH_NAPI
 static void nv_do_rx_refill(unsigned long data)
 {
 	struct net_device *dev = (struct net_device *) data;
@@ -1819,41 +1814,6 @@ static void nv_do_rx_refill(unsigned long data)
 	/* Just reschedule NAPI rx processing */
 	napi_schedule(&np->napi);
 }
-#else
-static void nv_do_rx_refill(unsigned long data)
-{
-	struct net_device *dev = (struct net_device *) data;
-	struct fe_priv *np = netdev_priv(dev);
-	int retcode;
-
-	if (!using_multi_irqs(dev)) {
-		if (np->msi_flags & NV_MSI_X_ENABLED)
-			disable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
-		else
-			disable_irq(np->pci_dev->irq);
-	} else {
-		disable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector);
-	}
-	if (!nv_optimized(np))
-		retcode = nv_alloc_rx(dev);
-	else
-		retcode = nv_alloc_rx_optimized(dev);
-	if (retcode) {
-		spin_lock_irq(&np->lock);
-		if (!np->in_shutdown)
-			mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
-		spin_unlock_irq(&np->lock);
-	}
-	if (!using_multi_irqs(dev)) {
-		if (np->msi_flags & NV_MSI_X_ENABLED)
-			enable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_ALL].vector);
-		else
-			enable_irq(np->pci_dev->irq);
-	} else {
-		enable_irq(np->msi_x_entry[NV_MSI_X_VECTOR_RX].vector);
-	}
-}
-#endif
 
 static void nv_init_rx(struct net_device *dev)
 {
@@ -2816,11 +2776,7 @@ static int nv_rx_process(struct net_device *dev, int limit)
 		skb->protocol = eth_type_trans(skb, dev);
 		dprintk(KERN_DEBUG "%s: nv_rx_process: %d bytes, proto %d accepted.\n",
 					dev->name, len, skb->protocol);
-#ifdef CONFIG_FORCEDETH_NAPI
 		napi_gro_receive(&np->napi, skb);
-#else
-		netif_rx(skb);
-#endif
 		dev->stats.rx_packets++;
 		dev->stats.rx_bytes += len;
 next_pkt:
@@ -2909,27 +2865,14 @@ static int nv_rx_process_optimized(struct net_device *dev, int limit)
 				dev->name, len, skb->protocol);
 
 			if (likely(!np->vlangrp)) {
-#ifdef CONFIG_FORCEDETH_NAPI
 				napi_gro_receive(&np->napi, skb);
-#else
-				netif_rx(skb);
-#endif
 			} else {
 				vlanflags = le32_to_cpu(np->get_rx.ex->buflow);
 				if (vlanflags & NV_RX3_VLAN_TAG_PRESENT) {
-#ifdef CONFIG_FORCEDETH_NAPI
 					vlan_gro_receive(&np->napi, np->vlangrp,
 							 vlanflags & NV_RX3_VLAN_TAG_MASK, skb);
-#else
-					vlan_hwaccel_rx(skb, np->vlangrp,
-							vlanflags & NV_RX3_VLAN_TAG_MASK);
-#endif
 				} else {
-#ifdef CONFIG_FORCEDETH_NAPI
 					napi_gro_receive(&np->napi, skb);
-#else
-					netif_rx(skb);
-#endif
 				}
 			}
 
@@ -3496,10 +3439,6 @@ static irqreturn_t nv_nic_irq(int foo, void *data)
 	struct net_device *dev = (struct net_device *) data;
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
-#ifndef CONFIG_FORCEDETH_NAPI
-	int total_work = 0;
-	int loop_count = 0;
-#endif
 
 	dprintk(KERN_DEBUG "%s: nv_nic_irq\n", dev->name);
 
@@ -3516,7 +3455,6 @@ static irqreturn_t nv_nic_irq(int foo, void *data)
 
 	nv_msi_workaround(np);
 
-#ifdef CONFIG_FORCEDETH_NAPI
 	if (napi_schedule_prep(&np->napi)) {
 		/*
 		 * Disable further irq's (msix not enabled with napi)
@@ -3525,65 +3463,6 @@ static irqreturn_t nv_nic_irq(int foo, void *data)
 		__napi_schedule(&np->napi);
 	}
 
-#else
-	do
-	{
-		int work = 0;
-		if ((work = nv_rx_process(dev, RX_WORK_PER_LOOP))) {
-			if (unlikely(nv_alloc_rx(dev))) {
-				spin_lock(&np->lock);
-				if (!np->in_shutdown)
-					mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
-				spin_unlock(&np->lock);
-			}
-		}
-
-		spin_lock(&np->lock);
-		work += nv_tx_done(dev, TX_WORK_PER_LOOP);
-		spin_unlock(&np->lock);
-
-		if (!work)
-			break;
-
-		total_work += work;
-
-		loop_count++;
-	}
-	while (loop_count < max_interrupt_work);
-
-	if (nv_change_interrupt_mode(dev, total_work)) {
-		/* setup new irq mask */
-		writel(np->irqmask, base + NvRegIrqMask);
-	}
-
-	if (unlikely(np->events & NVREG_IRQ_LINK)) {
-		spin_lock(&np->lock);
-		nv_link_irq(dev);
-		spin_unlock(&np->lock);
-	}
-	if (unlikely(np->need_linktimer && time_after(jiffies, np->link_timeout))) {
-		spin_lock(&np->lock);
-		nv_linkchange(dev);
-		spin_unlock(&np->lock);
-		np->link_timeout = jiffies + LINK_TIMEOUT;
-	}
-	if (unlikely(np->events & NVREG_IRQ_RECOVER_ERROR)) {
-		spin_lock(&np->lock);
-		/* disable interrupts on the nic */
-		if (!(np->msi_flags & NV_MSI_X_ENABLED))
-			writel(0, base + NvRegIrqMask);
-		else
-			writel(np->irqmask, base + NvRegIrqMask);
-		pci_push(base);
-
-		if (!np->in_shutdown) {
-			np->nic_poll_irq = np->irqmask;
-			np->recover_error = 1;
-			mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
-		}
-		spin_unlock(&np->lock);
-	}
-#endif
 	dprintk(KERN_DEBUG "%s: nv_nic_irq completed\n", dev->name);
 
 	return IRQ_HANDLED;
@@ -3599,10 +3478,6 @@ static irqreturn_t nv_nic_irq_optimized(int foo, void *data)
 	struct net_device *dev = (struct net_device *) data;
 	struct fe_priv *np = netdev_priv(dev);
 	u8 __iomem *base = get_hwbase(dev);
-#ifndef CONFIG_FORCEDETH_NAPI
-	int total_work = 0;
-	int loop_count = 0;
-#endif
 
 	dprintk(KERN_DEBUG "%s: nv_nic_irq_optimized\n", dev->name);
 
@@ -3619,7 +3494,6 @@ static irqreturn_t nv_nic_irq_optimized(int foo, void *data)
 
 	nv_msi_workaround(np);
 
-#ifdef CONFIG_FORCEDETH_NAPI
 	if (napi_schedule_prep(&np->napi)) {
 		/*
 		 * Disable further irq's (msix not enabled with napi)
@@ -3627,66 +3501,6 @@ static irqreturn_t nv_nic_irq_optimized(int foo, void *data)
 		writel(0, base + NvRegIrqMask);
 		__napi_schedule(&np->napi);
 	}
-#else
-	do
-	{
-		int work = 0;
-		if ((work = nv_rx_process_optimized(dev, RX_WORK_PER_LOOP))) {
-			if (unlikely(nv_alloc_rx_optimized(dev))) {
-				spin_lock(&np->lock);
-				if (!np->in_shutdown)
-					mod_timer(&np->oom_kick, jiffies + OOM_REFILL);
-				spin_unlock(&np->lock);
-			}
-		}
-
-		spin_lock(&np->lock);
-		work += nv_tx_done_optimized(dev, TX_WORK_PER_LOOP);
-		spin_unlock(&np->lock);
-
-		if (!work)
-			break;
-
-		total_work += work;
-
-		loop_count++;
-	}
-	while (loop_count < max_interrupt_work);
-
-	if (nv_change_interrupt_mode(dev, total_work)) {
-		/* setup new irq mask */
-		writel(np->irqmask, base + NvRegIrqMask);
-	}
-
-	if (unlikely(np->events & NVREG_IRQ_LINK)) {
-		spin_lock(&np->lock);
-		nv_link_irq(dev);
-		spin_unlock(&np->lock);
-	}
-	if (unlikely(np->need_linktimer && time_after(jiffies, np->link_timeout))) {
-		spin_lock(&np->lock);
-		nv_linkchange(dev);
-		spin_unlock(&np->lock);
-		np->link_timeout = jiffies + LINK_TIMEOUT;
-	}
-	if (unlikely(np->events & NVREG_IRQ_RECOVER_ERROR)) {
-		spin_lock(&np->lock);
-		/* disable interrupts on the nic */
-		if (!(np->msi_flags & NV_MSI_X_ENABLED))
-			writel(0, base + NvRegIrqMask);
-		else
-			writel(np->irqmask, base + NvRegIrqMask);
-		pci_push(base);
-
-		if (!np->in_shutdown) {
-			np->nic_poll_irq = np->irqmask;
-			np->recover_error = 1;
-			mod_timer(&np->nic_poll, jiffies + POLL_WAIT);
-		}
-		spin_unlock(&np->lock);
-	}
-
-#endif
 	dprintk(KERN_DEBUG "%s: nv_nic_irq_optimized completed\n", dev->name);
 
 	return IRQ_HANDLED;
@@ -3735,7 +3549,6 @@ static irqreturn_t nv_nic_irq_tx(int foo, void *data)
 	return IRQ_RETVAL(i);
 }
 
-#ifdef CONFIG_FORCEDETH_NAPI
 static int nv_napi_poll(struct napi_struct *napi, int budget)
 {
 	struct fe_priv *np = container_of(napi, struct fe_priv, napi);
@@ -3805,7 +3618,6 @@ static int nv_napi_poll(struct napi_struct *napi, int budget)
 	}
 	return rx_work;
 }
-#endif
 
 static irqreturn_t nv_nic_irq_rx(int foo, void *data)
 {
@@ -5711,9 +5523,7 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 		np->txrxctl_bits |= NVREG_TXRXCTL_RXCHECK;
 		dev->features |= NETIF_F_IP_CSUM | NETIF_F_SG;
 		dev->features |= NETIF_F_TSO;
-#ifdef CONFIG_FORCEDETH_NAPI
 		dev->features |= NETIF_F_GRO;
-#endif
 	}
 
 	np->vlanctl_bits = 0;
@@ -5766,9 +5576,7 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 	else
 		dev->netdev_ops = &nv_netdev_ops_optimized;
 
-#ifdef CONFIG_FORCEDETH_NAPI
 	netif_napi_add(dev, &np->napi, nv_napi_poll, RX_WORK_PER_LOOP);
-#endif
 	SET_ETHTOOL_OPS(dev, &ops);
 	dev->watchdog_timeo = NV_WATCHDOG_TIMEO;
 
@@ -5871,7 +5679,7 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 		/* msix has had reported issues when modifying irqmask
 		   as in the case of napi, therefore, disable for now
 		*/
-#ifndef CONFIG_FORCEDETH_NAPI
+#if 0
 		np->msi_flags |= NV_MSI_X_CAPABLE;
 #endif
 	}
-- 
1.7.0.4


^ permalink raw reply related

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: David Miller @ 2010-05-04  6:42 UTC (permalink / raw)
  To: enh; +Cc: brian.haley, netdev
In-Reply-To: <AANLkTimf-FWCsiLqYDGg4nl-relRMuNvDowfTKZbe9kx@mail.gmail.com>

From: enh <enh@google.com>
Date: Mon, 3 May 2010 23:27:23 -0700

> On Mon, May 3, 2010 at 23:22, David Miller <davem@davemloft.net> wrote:
>> Ok, I see, so yeah this needs to be fixed to use "1" instead of
>> "-1" in the np->xxx ipv6 socket initialization.
> 
> i think so. there's already the IPV6_DEFAULT_MCASTHOPS constant
> defined to 1 but unused according to gitweb, so you might want to
> either use it or remove it.

I've applied the following, thanks Elliot.

--------------------
ipv6: Fix default multicast hops setting.

As per RFC 3493 the default multicast hops setting
for a socket should be "1" just like ipv4.

Ironically we have a IPV6_DEFAULT_MCASTHOPS macro
it just wasn't being used.

Reported-by: Elliot Hughes <enh@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/af_inet6.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 3192aa0..3f9e86b 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -200,7 +200,7 @@ lookup_protocol:
 
 	inet_sk(sk)->pinet6 = np = inet6_sk_generic(sk);
 	np->hop_limit	= -1;
-	np->mcast_hops	= -1;
+	np->mcast_hops	= IPV6_DEFAULT_MCASTHOPS;
 	np->mc_loop	= 1;
 	np->pmtudisc	= IPV6_PMTUDISC_WANT;
 	np->ipv6only	= net->ipv6.sysctl.bindv6only;
-- 
1.7.0.4


^ permalink raw reply related

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: David Stevens @ 2010-05-04  7:48 UTC (permalink / raw)
  To: enh; +Cc: brian.haley, David Miller, netdev, netdev-owner
In-Reply-To: <AANLkTimf-FWCsiLqYDGg4nl-relRMuNvDowfTKZbe9kx@mail.gmail.com>

It's set to -1 by default, but the common code for unicast and
multicast in getsockopt is falling through to use the dst_entry.

I believe (though I haven't actually tried it recently) it actually
uses "1" for the default value for multicast; it just doesn't
report it correctly. It should distinguish multicast from unicast
in the <0 check in getsockopt.
                                        +-DLS

^ permalink raw reply

* Re: linux kernel's IPV6_MULTICAST_HOPS default is 64; should be 1?
From: David Miller @ 2010-05-04  7:57 UTC (permalink / raw)
  To: dlstevens; +Cc: enh, brian.haley, netdev, netdev-owner
In-Reply-To: <OF8BB88147.4F3756DA-ON88257719.002A895A-88257719.002AEB08@us.ibm.com>

From: David Stevens <dlstevens@us.ibm.com>
Date: Tue, 4 May 2010 00:48:46 -0700

> It's set to -1 by default, but the common code for unicast and
> multicast in getsockopt is falling through to use the dst_entry.
> 
> I believe (though I haven't actually tried it recently) it actually
> uses "1" for the default value for multicast;

It doesn't, all of the uses in the ipv6 stack say something like:

	if (multicast)
		hlimit = np->mcast_hops;
	else
		hlimit = np->hop_limit;
	if (hlimit < 0)
		hlimit = ip6_dst_hoplimit(dst);

Therefore, the change suggested by Elliot and which I committed is the
way to get the correct behavior and fix this.

^ permalink raw reply

* Re: [PATCH 1/3] ptp: Added a brand new class driver for ptp clocks.
From: Wolfgang Grandegger @ 2010-05-04  7:57 UTC (permalink / raw)
  To: Richard Cochran; +Cc: netdev
In-Reply-To: <20100503100754.GA30417@riccoc20.at.omicron.at>

On 05/03/2010 12:07 PM, Richard Cochran wrote:
> On Sun, May 02, 2010 at 12:50:56PM +0200, Wolfgang Grandegger wrote:
>>
>> As long as the device is in use by an application, no other can access
>> it, because the mutex is locked. Other application may want to read the
>> PTP clock time while ptpd is running, though.
> 
> Yes, of course. I implemented it that way just to get started. I first
> want to concentrate on getting the basic drivers in place (still have
> IXP46x and Phyter to do), and then on the ancillary features, like
> timers, time stamping external inputs, and so on.
> 
> I understand that some fine grained access control to the PTP clock
> woul be nice to have, but I am not sure exactly what would work best,
> and I would like to save that decision for later...

OK, fair enough.

> However, if you have some ideas, please take a look at the list of
> features in the docu, and explain how you would like the access
> control to work.

The fine grain locking for set/get properties is tricky. I personally
would just require "CAP_SYS_TIME" for setting properties and a
spin_lock_irq* to protect against concurrent access, just like for
setting/getting system clock properties.

> Or better yet, post a patch ;)

Would do if we could agree on the solution mentioned above.

Wolfgang.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox