Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] e100: expose broadcast_disabled as a module option
From: Erwan Velu @ 2010-04-23 21:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Jeff Kirsher, netdev, David Miller, linux-kernel,
	jesse.brandeburg, bruce.w.allan, alexander.h.duyck,
	peter.p.waskiewicz.jr, john.ronciak
In-Reply-To: <20100423135816.23f5861f@nehalam>

I first tried "ifconfig -broadcast" without any success, so I forced
the driver to unset IFF_BROADCAST, the interface didn't showed anymore
the BROADCAST option with ifconfig. But I didn't noticed any reduction
in the amount of context/switches on my host.

I found the broadcast_disabled far more efficient when considering the
cpu impact.


2010/4/23 Stephen Hemminger <shemminger@vyatta.com>:
> On Fri, 23 Apr 2010 13:22:22 -0700
> Jeff Kirsher <jeffrey.t.kirsher@intel.com> wrote:
>
>> On Fri, Apr 23, 2010 at 13:14, Erwan Velu <erwanaliasr1@gmail.com> wrote:
>> > Hi folks,
>> >
>> > I've been facing a very noisy network where hundreds broadcast packets
>> > were generated every second.
>> > When this traffic can't be controlled at the source, there is a side
>> > effect on some systems.
>> > I was having some idle systems that will never be targeted by this
>> > broadcast traffic that got loaded just by receiving that "flood".
>> > I mean by loaded that this light hardware was generating 300
>> > context/switches per second.
>> >
>> > I was looking for many options to avoid this traffic to disturb this
>> > hosts and I discovered that the e100 driver was featuring a
>> > "broadcast_disabled" configure option.
>> > I realize that this option is not controllable, so I wrote this simple
>> > patch that expose this option as a module option.
>> > This allow me to tell this hosts not to listen anymore this traffic.
>> >
>> > The result is clearly good as my systems are now running at 21
>> > context/switches while being idle.
>> > Hope this patch isn't too bad and could help others that faces the same problem.
>> >
>> > Patch can be downloaded here :
>> > http://konilope.linuxeries.org/e100_broadcast_disabled.patch
>> >
>> > Even if gmail is eating the inlined, patch, at least that make it
>> > easier to read it for humans.
>> > If the patch is acked, the downloaded one will be more clean ;)
>> >
>> > This patch was generated on top of the latest 2.6 torvald's git.
>> > Cheers,
>> > Erwan
>> >
>> > Signed-off-by: Erwan Velu <erwanaliasr1@gmail.com>
>> >
>> > diff --git a/drivers/net/e100.c b/drivers/net/e100.c
>> > index b997e57..2ba582f 100644
>> > --- a/drivers/net/e100.c
>> > +++ b/drivers/net/e100.c
>> > @@ -194,12 +194,15 @@ MODULE_FIRMWARE(FIRMWARE_D102E);
>> >  static int debug = 3;
>> >  static int eeprom_bad_csum_allow = 0;
>> >  static int use_io = 0;
>> > +static int broadcast_disabled = 0;
>> >  module_param(debug, int, 0);
>> >  module_param(eeprom_bad_csum_allow, int, 0);
>> >  module_param(use_io, int, 0);
>> > +module_param(broadcast_disabled, int, 0);
>> >  MODULE_PARM_DESC(debug, "Debug level (0=none,...,16=all)");
>> >  MODULE_PARM_DESC(eeprom_bad_csum_allow, "Allow bad eeprom checksums");
>> >  MODULE_PARM_DESC(use_io, "Force use of i/o access mode");
>> > +MODULE_PARM_DESC(broadcast_disabled, "Filter broadcast packets
>> > (0=disabled (default), 1=enabled)");
>> >  #define DPRINTK(nlevel, klevel, fmt, args...) \
>> >        (void)((NETIF_MSG_##nlevel & nic->msg_enable) && \
>> >        printk(KERN_##klevel PFX "%s: %s: " fmt, nic->netdev->name, \
>> > @@ -1131,6 +1134,8 @@ static void e100_configure(struct nic *nic,
>> > struct cb *cb, struct sk_buff *skb)
>> >                config->promiscuous_mode = 0x1;         /* 1=on, 0=off */
>> >        }
>> >
>> > +       config->broadcast_disabled = broadcast_disabled; /* Broadcast filtering */
>> > +
>> >        if (nic->flags & multicast_all)
>> >                config->multicast_all = 0x1;            /* 1=accept, 0=no */
>> > --
>>
>> Adding Netdev...
>>
>
> What is wrong with using existing IFF_BROADCAST flag?
>
>
> --
>

^ permalink raw reply

* Re: eSwitch management
From: Anirban Chakraborty @ 2010-04-23 21:08 UTC (permalink / raw)
  To: Chris Wright
  Cc: Scott Feldman, David Miller, netdev@vger.kernel.org,
	Arnd Bergmann, Ameen Rahman, Amit Salecha, Rajesh Borundia,
	shemminger@vyatta.com
In-Reply-To: <20100423194455.GA3843@x200.localdomain>


On Apr 23, 2010, at 12:44 PM, Chris Wright wrote:

> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
>> On Apr 23, 2010, at 9:23 AM, Chris Wright wrote:
>>> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
>>>> It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?
>>> 
>>> No, you don't need to use netlink in your driver.  You just need to fill
>>> in the relevant net_device_ops in your driver init.  Specifically:
>>> 
>>> *      SR-IOV management functions.
>>> * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
>>> * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
>>> * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
>>> * int (*ndo_get_vf_config)(struct net_device *dev,
>>> *                          int vf, struct ifla_vf_info *ivf);
>>> 
>>> These are all operating on a VF indexed internally w/in the driver, so it's
>>> a little cumbersome to use from userspace.
>> 
>> These are all intended for VFs and are configureable from PF.
> 
> Yes, and while the set of callbacks can change, they are always tied to
> some net_device (typically the PF) that knows how to make hardware
> settings on behalf of a VF.
> 
>> However, in our case, there are multiple physical NIC function on a
>> port which are configureable by the eswitch.
> 
> Is there a PCI function that represents the switch?  Or a special PCI
> NIC function that has VEB mgmt plane access?  And do you have examples
> of configuration that you'll do here?
There is no PCI function that represents the switch. However, one of the NIC functions can act as a privileged function to configure the eswitch. Typically the first NIC function that is enumerated in the bus manages the eswitch. Typical configurations would be to set tx bandwidth, VLAN ID, MAC address, promiscuous mode setting for each of these ports at the start of the day. This is useful in virtualization scenario where we can do PCI passthru of the functions to the guest and these settings for the guest are configured via the driver in the host.

<snip>
> 
> One idea that has been discussed in the past is to create essentially
> a pluggable set of bridge_ops.  The first step would be purely internal
> shuffling, to make the existing sw bridge code go through the bridge_ops.
> The second step would be making your driver for whichever PCI function
> you have that supports managing the bridge create a net_device which is
> a bridge during driver init.  And now normal brctl can call into your
> VEB via the bridge_ops callbacks. </handwave>
> 
I liked the idea of iovnl as it works by utilizing port profile. That way the eswitch can be configured with the same port profile that a vswitch in a hypervisor has.

thanks,
Anirban





^ permalink raw reply

* [PATCH 0/3] IPv6: Add IPV6_RECVPATHMTU, IPV6_PATHMTU and IPV6_DONTFRAG support
From: Brian Haley @ 2010-04-23 21:26 UTC (permalink / raw)
  To: davem, yoshfuji; +Cc: netdev

This series adds support for IPV6_RECVPATHMTU, IPV6_PATHMTU, and
IPV6_DONTFRAG socket options as defined in RFC 3542.

 include/linux/in6.h      |    2 +-
 include/linux/ipv6.h     |   15 +++++-
 include/net/ipv6.h       |    5 ++-
 include/net/transp_v6.h  |    3 +-
 net/ipv6/af_inet6.c      |    3 +
 net/ipv6/datagram.c      |  108 +++++++++++++++++++++++++++++++++++++++++++++-
 net/ipv6/icmp.c          |    5 +-
 net/ipv6/ip6_flowlabel.c |    3 +-
 net/ipv6/ip6_output.c    |   26 +++++++----
 net/ipv6/ipv6_sockglue.c |   49 ++++++++++++++++++++-
 net/ipv6/raw.c           |   12 ++++-
 net/ipv6/udp.c           |   12 ++++-
 12 files changed, 219 insertions(+), 24 deletions(-)

^ permalink raw reply

* [PATCH 1/3] IPv6: data structure changes for new socket options
From: Brian Haley @ 2010-04-23 21:26 UTC (permalink / raw)
  To: davem, yoshfuji; +Cc: netdev
In-Reply-To: <1272057969-6526-1-git-send-email-brian.haley@hp.com>

Add underlying data structure changes and basic setsockopt()
and getsockopt() support for IPV6_RECVPATHMTU, IPV6_PATHMTU,
and IPV6_DONTFRAG.  IPV6_PATHMTU is actually fully functional
at this point.

Signed-off-by: Brian Haley <brian.haley@hp.com>
---
 include/linux/in6.h      |    2 +-
 include/linux/ipv6.h     |   13 ++++++++++---
 net/ipv6/ipv6_sockglue.c |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/include/linux/in6.h b/include/linux/in6.h
index 9b90cb2..c4bf46f 100644
--- a/include/linux/in6.h
+++ b/include/linux/in6.h
@@ -221,10 +221,10 @@ struct in6_flowlabel_req {
 #define IPV6_RTHDR		57
 #define IPV6_RECVDSTOPTS	58
 #define IPV6_DSTOPTS		59
-#if 0	/* not yet */
 #define IPV6_RECVPATHMTU	60
 #define IPV6_PATHMTU		61
 #define IPV6_DONTFRAG		62
+#if 0	/* not yet */
 #define IPV6_USE_MIN_MTU	63
 #endif
 
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 1bdbebf..1976942 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -21,6 +21,10 @@ struct in6_pktinfo {
 	int		ipi6_ifindex;
 };
 
+struct ip6_mtuinfo {
+	struct sockaddr_in6	ip6m_addr;
+	__u32			ip6m_mtu;
+};
 
 struct in6_ifreq {
 	struct in6_addr	ifr6_addr;
@@ -334,22 +338,25 @@ struct ipv6_pinfo {
 				dstopts:1,
 				odstopts:1,
                                 rxflow:1,
-				rxtclass:1;
+				rxtclass:1,
+				rxpmtu:1;
 		} bits;
 		__u16		all;
 	} rxopt;
 
 	/* sockopt flags */
-	__u8			recverr:1,
+	__u16			recverr:1,
 	                        sndflow:1,
 				pmtudisc:2,
 				ipv6only:1,
-				srcprefs:3;	/* 001: prefer temporary address
+				srcprefs:3,	/* 001: prefer temporary address
 						 * 010: prefer public address
 						 * 100: prefer care-of address
 						 */
+				dontfrag:1;
 	__u8			min_hopcount;
 	__u8			tclass;
+	__u8			padding;
 
 	__u32			dst_cookie;
 
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 92295ad..2bf9eda 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -337,6 +337,13 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 		retv = 0;
 		break;
 
+	case IPV6_RECVPATHMTU:
+		if (optlen < sizeof(int))
+			goto e_inval;
+		np->rxopt.bits.rxpmtu = valbool;
+		retv = 0;
+		break;
+
 	case IPV6_HOPOPTS:
 	case IPV6_RTHDRDSTOPTS:
 	case IPV6_RTHDR:
@@ -773,6 +780,9 @@ pref_skip_coa:
 		if (val < 0 || val > 255)
 			goto e_inval;
 		np->min_hopcount = val;
+		break;
+	case IPV6_DONTFRAG:
+		np->dontfrag = valbool;
 		retv = 0;
 		break;
 	}
@@ -1063,6 +1073,38 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		val = np->rxopt.bits.rxflow;
 		break;
 
+	case IPV6_RECVPATHMTU:
+		val = np->rxopt.bits.rxpmtu;
+		break;
+
+	case IPV6_PATHMTU:
+	{
+		struct dst_entry *dst;
+		struct ip6_mtuinfo mtuinfo;
+
+		if (len < sizeof(mtuinfo))
+			return -EINVAL;
+
+		len = sizeof(mtuinfo);
+		memset(&mtuinfo, 0, sizeof(mtuinfo));
+
+		rcu_read_lock();
+		dst = __sk_dst_get(sk);
+		if (dst)
+			mtuinfo.ip6m_mtu = dst_mtu(dst);
+		rcu_read_unlock();
+		if (!mtuinfo.ip6m_mtu)
+			return -ENOTCONN;
+
+		if (put_user(len, optlen))
+			return -EFAULT;
+		if (copy_to_user(optval, &mtuinfo, len))
+			return -EFAULT;
+
+		return 0;
+		break;
+	}
+
 	case IPV6_UNICAST_HOPS:
 	case IPV6_MULTICAST_HOPS:
 	{
@@ -1128,6 +1170,10 @@ static int do_ipv6_getsockopt(struct sock *sk, int level, int optname,
 		val = np->min_hopcount;
 		break;
 
+	case IPV6_DONTFRAG:
+		val = np->dontfrag;
+		break;
+
 	default:
 		return -ENOPROTOOPT;
 	}
-- 
1.5.4.3


^ permalink raw reply related

* [PATCH 2/3] IPv6: Add dontfrag argument to relevant functions
From: Brian Haley @ 2010-04-23 21:26 UTC (permalink / raw)
  To: davem, yoshfuji; +Cc: netdev
In-Reply-To: <1272057969-6526-2-git-send-email-brian.haley@hp.com>

Add dontfrag argument to relevant functions for
IPV6_DONTFRAG support, as well as allowing the value
to be passed-in via ancillary cmsg data.

Signed-off-by: Brian Haley <brian.haley@hp.com>
---
 include/net/ipv6.h       |    3 ++-
 include/net/transp_v6.h  |    3 ++-
 net/ipv6/datagram.c      |   21 ++++++++++++++++++++-
 net/ipv6/icmp.c          |    5 +++--
 net/ipv6/ip6_flowlabel.c |    3 ++-
 net/ipv6/ip6_output.c    |    2 +-
 net/ipv6/ipv6_sockglue.c |    3 ++-
 net/ipv6/raw.c           |    9 +++++++--
 net/ipv6/udp.c           |    9 +++++++--
 9 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index b1d8db9..7ab6323 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -503,7 +503,8 @@ extern int			ip6_append_data(struct sock *sk,
 						struct ipv6_txoptions *opt,
 						struct flowi *fl,
 						struct rt6_info *rt,
-						unsigned int flags);
+						unsigned int flags,
+						int dontfrag);
 
 extern int			ip6_push_pending_frames(struct sock *sk);
 
diff --git a/include/net/transp_v6.h b/include/net/transp_v6.h
index d65381c..42a0eb6 100644
--- a/include/net/transp_v6.h
+++ b/include/net/transp_v6.h
@@ -44,7 +44,8 @@ extern int			datagram_send_ctl(struct net *net,
 						  struct msghdr *msg,
 						  struct flowi *fl,
 						  struct ipv6_txoptions *opt,
-						  int *hlimit, int *tclass);
+						  int *hlimit, int *tclass,
+						  int *dontfrag);
 
 #define		LOOPBACK4_IPV6		cpu_to_be32(0x7f000006)
 
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 622dc79..f5076d3 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -497,7 +497,7 @@ int datagram_recv_ctl(struct sock *sk, struct msghdr *msg, struct sk_buff *skb)
 int datagram_send_ctl(struct net *net,
 		      struct msghdr *msg, struct flowi *fl,
 		      struct ipv6_txoptions *opt,
-		      int *hlimit, int *tclass)
+		      int *hlimit, int *tclass, int *dontfrag)
 {
 	struct in6_pktinfo *src_info;
 	struct cmsghdr *cmsg;
@@ -737,6 +737,25 @@ int datagram_send_ctl(struct net *net,
 
 			break;
 		    }
+
+		case IPV6_DONTFRAG:
+		    {
+			int df;
+
+			err = -EINVAL;
+			if (cmsg->cmsg_len != CMSG_LEN(sizeof(int))) {
+				goto exit_f;
+			}
+
+			df = *(int *)CMSG_DATA(cmsg);
+			if (df < 0 || df > 1)
+				goto exit_f;
+
+			err = 0;
+			*dontfrag = df;
+
+			break;
+		    }
 		default:
 			LIMIT_NETDEBUG(KERN_DEBUG "invalid cmsg type: %d\n",
 				       cmsg->cmsg_type);
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 12d2fa4..ce79929 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -481,7 +481,7 @@ route_done:
 			      len + sizeof(struct icmp6hdr),
 			      sizeof(struct icmp6hdr), hlimit,
 			      np->tclass, NULL, &fl, (struct rt6_info*)dst,
-			      MSG_DONTWAIT);
+			      MSG_DONTWAIT, np->dontfrag);
 	if (err) {
 		ICMP6_INC_STATS_BH(net, idev, ICMP6_MIB_OUTMSGS);
 		ip6_flush_pending_frames(sk);
@@ -561,7 +561,8 @@ static void icmpv6_echo_reply(struct sk_buff *skb)
 
 	err = ip6_append_data(sk, icmpv6_getfrag, &msg, skb->len + sizeof(struct icmp6hdr),
 				sizeof(struct icmp6hdr), hlimit, np->tclass, NULL, &fl,
-				(struct rt6_info*)dst, MSG_DONTWAIT);
+				(struct rt6_info*)dst, MSG_DONTWAIT,
+				np->dontfrag);
 
 	if (err) {
 		ICMP6_INC_STATS_BH(net, idev, ICMP6_MIB_OUTMSGS);
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index 14e2321..1365468 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -360,7 +360,8 @@ fl_create(struct net *net, struct in6_flowlabel_req *freq, char __user *optval,
 		msg.msg_control = (void*)(fl->opt+1);
 		flowi.oif = 0;
 
-		err = datagram_send_ctl(net, &msg, &flowi, fl->opt, &junk, &junk);
+		err = datagram_send_ctl(net, &msg, &flowi, fl->opt, &junk,
+					&junk, &junk);
 		if (err)
 			goto done;
 		err = -EINVAL;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 263d4cf..54d43dd 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1105,7 +1105,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 	int offset, int len, int odd, struct sk_buff *skb),
 	void *from, int length, int transhdrlen,
 	int hlimit, int tclass, struct ipv6_txoptions *opt, struct flowi *fl,
-	struct rt6_info *rt, unsigned int flags)
+	struct rt6_info *rt, unsigned int flags, int dontfrag)
 {
 	struct inet_sock *inet = inet_sk(sk);
 	struct ipv6_pinfo *np = inet6_sk(sk);
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 2bf9eda..bd43f01 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -458,7 +458,8 @@ sticky_done:
 		msg.msg_controllen = optlen;
 		msg.msg_control = (void*)(opt+1);
 
-		retv = datagram_send_ctl(net, &msg, &fl, opt, &junk, &junk);
+		retv = datagram_send_ctl(net, &msg, &fl, opt, &junk, &junk,
+					 &junk);
 		if (retv)
 			goto done;
 update:
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 8763b1a..44a84ea 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -733,6 +733,7 @@ static int rawv6_sendmsg(struct kiocb *iocb, struct sock *sk,
 	int addr_len = msg->msg_namelen;
 	int hlimit = -1;
 	int tclass = -1;
+	int dontfrag = -1;
 	u16 proto;
 	int err;
 
@@ -811,7 +812,8 @@ static int rawv6_sendmsg(struct kiocb *iocb, struct sock *sk,
 		memset(opt, 0, sizeof(struct ipv6_txoptions));
 		opt->tot_len = sizeof(struct ipv6_txoptions);
 
-		err = datagram_send_ctl(sock_net(sk), msg, &fl, opt, &hlimit, &tclass);
+		err = datagram_send_ctl(sock_net(sk), msg, &fl, opt, &hlimit,
+					&tclass, &dontfrag);
 		if (err < 0) {
 			fl6_sock_release(flowlabel);
 			return err;
@@ -880,6 +882,9 @@ static int rawv6_sendmsg(struct kiocb *iocb, struct sock *sk,
 	if (tclass < 0)
 		tclass = np->tclass;
 
+	if (dontfrag < 0)
+		dontfrag = np->dontfrag;
+
 	if (msg->msg_flags&MSG_CONFIRM)
 		goto do_confirm;
 
@@ -890,7 +895,7 @@ back_from_confirm:
 		lock_sock(sk);
 		err = ip6_append_data(sk, ip_generic_getfrag, msg->msg_iov,
 			len, 0, hlimit, tclass, opt, &fl, (struct rt6_info*)dst,
-			msg->msg_flags);
+			msg->msg_flags, dontfrag);
 
 		if (err)
 			ip6_flush_pending_frames(sk);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 92bf903..39e3665 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -919,6 +919,7 @@ int udpv6_sendmsg(struct kiocb *iocb, struct sock *sk,
 	int ulen = len;
 	int hlimit = -1;
 	int tclass = -1;
+	int dontfrag = -1;
 	int corkreq = up->corkflag || msg->msg_flags&MSG_MORE;
 	int err;
 	int connected = 0;
@@ -1049,7 +1050,8 @@ do_udp_sendmsg:
 		memset(opt, 0, sizeof(struct ipv6_txoptions));
 		opt->tot_len = sizeof(*opt);
 
-		err = datagram_send_ctl(sock_net(sk), msg, &fl, opt, &hlimit, &tclass);
+		err = datagram_send_ctl(sock_net(sk), msg, &fl, opt, &hlimit,
+					&tclass, &dontfrag);
 		if (err < 0) {
 			fl6_sock_release(flowlabel);
 			return err;
@@ -1120,6 +1122,9 @@ do_udp_sendmsg:
 	if (tclass < 0)
 		tclass = np->tclass;
 
+	if (dontfrag < 0)
+		dontfrag = np->dontfrag;
+
 	if (msg->msg_flags&MSG_CONFIRM)
 		goto do_confirm;
 back_from_confirm:
@@ -1143,7 +1148,7 @@ do_append_data:
 	err = ip6_append_data(sk, getfrag, msg->msg_iov, ulen,
 		sizeof(struct udphdr), hlimit, tclass, opt, &fl,
 		(struct rt6_info*)dst,
-		corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags);
+		corkreq ? msg->msg_flags|MSG_MORE : msg->msg_flags, dontfrag);
 	if (err)
 		udp_v6_flush_pending_frames(sk);
 	else if (!corkreq)
-- 
1.5.4.3


^ permalink raw reply related

* [PATCH 3/3] IPv6: Complete IPV6_DONTFRAG support
From: Brian Haley @ 2010-04-23 21:26 UTC (permalink / raw)
  To: davem, yoshfuji; +Cc: netdev
In-Reply-To: <1272057969-6526-3-git-send-email-brian.haley@hp.com>

Finally add support to detect a local IPV6_DONTFRAG event
and return the relevant data to the user if they've enabled
IPV6_RECVPATHMTU on the socket.  The next recvmsg() will
return no data, but have an IPV6_PATHMTU as ancillary data.

Signed-off-by: Brian Haley <brian.haley@hp.com>
---
 include/linux/ipv6.h  |    2 +
 include/net/ipv6.h    |    2 +
 net/ipv6/af_inet6.c   |    3 ++
 net/ipv6/datagram.c   |   87 +++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv6/ip6_output.c |   24 +++++++++----
 net/ipv6/raw.c        |    3 ++
 net/ipv6/udp.c        |    3 ++
 7 files changed, 116 insertions(+), 8 deletions(-)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 1976942..2ab5509 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -257,6 +257,7 @@ struct inet6_skb_parm {
 };
 
 #define IP6CB(skb)	((struct inet6_skb_parm*)((skb)->cb))
+#define IP6CBMTU(skb)	((struct ip6_mtuinfo *)((skb)->cb))
 
 static inline int inet6_iif(const struct sk_buff *skb)
 {
@@ -366,6 +367,7 @@ struct ipv6_pinfo {
 
 	struct ipv6_txoptions	*opt;
 	struct sk_buff		*pktoptions;
+	struct sk_buff		*rxpmtu;
 	struct {
 		struct ipv6_txoptions *opt;
 		u8 hop_limit;
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 7ab6323..eba5cc0 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -578,9 +578,11 @@ extern int			ip6_datagram_connect(struct sock *sk,
 						     struct sockaddr *addr, int addr_len);
 
 extern int 			ipv6_recv_error(struct sock *sk, struct msghdr *msg, int len);
+extern int 			ipv6_recv_rxpmtu(struct sock *sk, struct msghdr *msg, int len);
 extern void			ipv6_icmp_error(struct sock *sk, struct sk_buff *skb, int err, __be16 port,
 						u32 info, u8 *payload);
 extern void			ipv6_local_error(struct sock *sk, int err, struct flowi *fl, u32 info);
+extern void			ipv6_local_rxpmtu(struct sock *sk, struct flowi *fl, u32 mtu);
 
 extern int inet6_release(struct socket *sock);
 extern int inet6_bind(struct socket *sock, struct sockaddr *uaddr, 
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 3192aa0..d2df314 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -417,6 +417,9 @@ void inet6_destroy_sock(struct sock *sk)
 	if ((skb = xchg(&np->pktoptions, NULL)) != NULL)
 		kfree_skb(skb);
 
+	if ((skb = xchg(&np->rxpmtu, NULL)) != NULL)
+		kfree_skb(skb);
+
 	/* Free flowlabels */
 	fl6_free_socklist(sk);
 
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index f5076d3..5959230 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -278,6 +278,45 @@ void ipv6_local_error(struct sock *sk, int err, struct flowi *fl, u32 info)
 		kfree_skb(skb);
 }
 
+void ipv6_local_rxpmtu(struct sock *sk, struct flowi *fl, u32 mtu)
+{
+	struct ipv6_pinfo *np = inet6_sk(sk);
+	struct ipv6hdr *iph;
+	struct sk_buff *skb;
+	struct ip6_mtuinfo *mtu_info;
+
+	if (!np->rxopt.bits.rxpmtu)
+		return;
+
+	skb = alloc_skb(sizeof(struct ipv6hdr), GFP_ATOMIC);
+	if (!skb)
+		return;
+
+	skb_put(skb, sizeof(struct ipv6hdr));
+	skb_reset_network_header(skb);
+	iph = ipv6_hdr(skb);
+	ipv6_addr_copy(&iph->daddr, &fl->fl6_dst);
+
+	mtu_info = IP6CBMTU(skb);
+	if (!mtu_info) {
+		kfree_skb(skb);
+		return;
+	}
+
+	mtu_info->ip6m_mtu = mtu;
+	mtu_info->ip6m_addr.sin6_family = AF_INET6;
+	mtu_info->ip6m_addr.sin6_port = 0;
+	mtu_info->ip6m_addr.sin6_flowinfo = 0;
+	mtu_info->ip6m_addr.sin6_scope_id = fl->oif;
+	ipv6_addr_copy(&mtu_info->ip6m_addr.sin6_addr, &ipv6_hdr(skb)->daddr);
+
+	__skb_pull(skb, skb_tail_pointer(skb) - skb->data);
+	skb_reset_transport_header(skb);
+
+	skb = xchg(&np->rxpmtu, skb);
+	kfree_skb(skb);
+}
+
 /*
  *	Handle MSG_ERRQUEUE
  */
@@ -381,6 +420,54 @@ out:
 	return err;
 }
 
+/*
+ *	Handle IPV6_RECVPATHMTU
+ */
+int ipv6_recv_rxpmtu(struct sock *sk, struct msghdr *msg, int len)
+{
+	struct ipv6_pinfo *np = inet6_sk(sk);
+	struct sk_buff *skb;
+	struct sockaddr_in6 *sin;
+	struct ip6_mtuinfo mtu_info;
+	int err;
+	int copied;
+
+	err = -EAGAIN;
+	skb = xchg(&np->rxpmtu, NULL);
+	if (skb == NULL)
+		goto out;
+
+	copied = skb->len;
+	if (copied > len) {
+		msg->msg_flags |= MSG_TRUNC;
+		copied = len;
+	}
+	err = skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);
+	if (err)
+		goto out_free_skb;
+
+	sock_recv_timestamp(msg, sk, skb);
+
+	memcpy(&mtu_info, IP6CBMTU(skb), sizeof(mtu_info));
+
+	sin = (struct sockaddr_in6 *)msg->msg_name;
+	if (sin) {
+		sin->sin6_family = AF_INET6;
+		sin->sin6_flowinfo = 0;
+		sin->sin6_port = 0;
+		sin->sin6_scope_id = mtu_info.ip6m_addr.sin6_scope_id;
+		ipv6_addr_copy(&sin->sin6_addr, &mtu_info.ip6m_addr.sin6_addr);
+	}
+
+	put_cmsg(msg, SOL_IPV6, IPV6_PATHMTU, sizeof(mtu_info), &mtu_info);
+
+	err = copied;
+
+out_free_skb:
+	kfree_skb(skb);
+out:
+	return err;
+}
 
 
 int datagram_recv_ctl(struct sock *sk, struct msghdr *msg, struct sk_buff *skb)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 54d43dd..61e2bef 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1219,15 +1219,23 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 	 */
 
 	inet->cork.length += length;
-	if (((length > mtu) && (sk->sk_protocol == IPPROTO_UDP)) &&
-	    (rt->u.dst.dev->features & NETIF_F_UFO)) {
+	if (length > mtu) {
+		int proto = sk->sk_protocol;
+		if (dontfrag && (proto == IPPROTO_UDP || proto == IPPROTO_RAW)){
+			ipv6_local_rxpmtu(sk, fl, mtu-exthdrlen);
+			return -EMSGSIZE;
+		}
 
-		err = ip6_ufo_append_data(sk, getfrag, from, length, hh_len,
-					  fragheaderlen, transhdrlen, mtu,
-					  flags);
-		if (err)
-			goto error;
-		return 0;
+		if (proto == IPPROTO_UDP &&
+		    (rt->u.dst.dev->features & NETIF_F_UFO)) {
+
+			err = ip6_ufo_append_data(sk, getfrag, from, length,
+						  hh_len, fragheaderlen,
+						  transhdrlen, mtu, flags);
+			if (err)
+				goto error;
+			return 0;
+		}
 	}
 
 	if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL)
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 44a84ea..8562738 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -461,6 +461,9 @@ static int rawv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	if (flags & MSG_ERRQUEUE)
 		return ipv6_recv_error(sk, msg, len);
 
+	if (np->rxpmtu && np->rxopt.bits.rxpmtu)
+		return ipv6_recv_rxpmtu(sk, msg, len);
+
 	skb = skb_recv_datagram(sk, flags, noblock, &err);
 	if (!skb)
 		goto out;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 39e3665..2850e35 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -335,6 +335,9 @@ int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	if (flags & MSG_ERRQUEUE)
 		return ipv6_recv_error(sk, msg, len);
 
+	if (np->rxpmtu && np->rxopt.bits.rxpmtu)
+		return ipv6_recv_rxpmtu(sk, msg, len);
+
 try_again:
 	skb = __skb_recv_datagram(sk, flags | (noblock ? MSG_DONTWAIT : 0),
 				  &peeked, &err);
-- 
1.5.4.3


^ permalink raw reply related

* Re: [PATCH] e100: expose broadcast_disabled as a module option
From: Stephen Hemminger @ 2010-04-23 22:02 UTC (permalink / raw)
  To: Erwan Velu
  Cc: Jeff Kirsher, netdev, David Miller, linux-kernel,
	jesse.brandeburg, bruce.w.allan, alexander.h.duyck,
	peter.p.waskiewicz.jr, john.ronciak
In-Reply-To: <r2ob43bf5491004231403o64f8b88bsa9543d9910648d97@mail.gmail.com>

On Fri, 23 Apr 2010 23:03:59 +0200
Erwan Velu <erwanaliasr1@gmail.com> wrote:

> I first tried "ifconfig -broadcast" without any success, so I forced
> the driver to unset IFF_BROADCAST, the interface didn't showed anymore
> the BROADCAST option with ifconfig. But I didn't noticed any reduction
> in the amount of context/switches on my host.
> 
> I found the broadcast_disabled far more efficient when considering the
> cpu impact.

The point is that the driver can look at IFF_BROADCAST rather than having
module parameter. Module parameters are device driver specific and should
be avoid as much as possible in favor of general mechanism. This is a repeated
problem where users and vendors make special hooks that only work with their
driver, which makes life hard for other users and distribution providers.

^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: jamal @ 2010-04-23 22:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Changli Gao, David S. Miller, Tom Herbert, Stephen Hemminger,
	netdev
In-Reply-To: <1272014825.7895.7851.camel@edumazet-laptop>

On Fri, 2010-04-23 at 11:27 +0200, Eric Dumazet wrote:

> 
> Lets see how it improves thing for Jamal benchs ;)


Ive done a setup with the last patch from Changli + net-next - I will
post test results tomorrow AM.

cheers,
jamal


^ permalink raw reply

* [PATCH net-2.6 1/9] sfc: Wait at most 10ms for the MC to finish reading out MAC statistics
From: Ben Hutchings @ 2010-04-23 22:23 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers

From: Steve Hodgson <shodgson@solarflare.com>

The original code would wait indefinitely if MAC stats DMA failed.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/siena.c |   13 +++++++++++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/sfc/siena.c b/drivers/net/sfc/siena.c
index 38dcc42..e0c46f5 100644
--- a/drivers/net/sfc/siena.c
+++ b/drivers/net/sfc/siena.c
@@ -456,8 +456,17 @@ static int siena_try_update_nic_stats(struct efx_nic *efx)
 
 static void siena_update_nic_stats(struct efx_nic *efx)
 {
-	while (siena_try_update_nic_stats(efx) == -EAGAIN)
-		cpu_relax();
+	int retry;
+
+	/* If we're unlucky enough to read statistics wduring the DMA, wait
+	 * up to 10ms for it to finish (typically takes <500us) */
+	for (retry = 0; retry < 100; ++retry) {
+		if (siena_try_update_nic_stats(efx) == 0)
+			return;
+		udelay(100);
+	}
+
+	/* Use the old values instead */
 }
 
 static void siena_start_nic_stats(struct efx_nic *efx)
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH net-2.6 2/9] sfc: Ignore parity errors in the other port's SRAM
From: Ben Hutchings @ 2010-04-23 22:24 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <0998e2e60655106abde76dcb4b7bac1136d1a11f.1272061382.git.bhutchings@solarflare.com>

From: Steve Hodgson <shodgson@solarflare.com>

Siena has a separate SRAM bank for each port.  On single-port boards
these can be merged together, so each port has an interrupt flag for
parity errors in the other port's SRAM.  Currently we do not enable
such merging and should mask this interrupt source.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/nic.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/sfc/nic.c b/drivers/net/sfc/nic.c
index b06f8e3..664fd6c 100644
--- a/drivers/net/sfc/nic.c
+++ b/drivers/net/sfc/nic.c
@@ -1563,6 +1563,8 @@ void efx_nic_init_common(struct efx_nic *efx)
 			     FRF_AZ_ILL_ADR_INT_KER_EN, 1,
 			     FRF_AZ_RBUF_OWN_INT_KER_EN, 1,
 			     FRF_AZ_TBUF_OWN_INT_KER_EN, 1);
+	if (efx_nic_rev(efx) >= EFX_REV_SIENA_A0)
+		EFX_SET_OWORD_FIELD(temp, FRF_CZ_SRAM_PERR_INT_P_KER_EN, 1);
 	EFX_INVERT_OWORD(temp);
 	efx_writeo(efx, &temp, FR_AZ_FATAL_INTR_KER);
 
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH net-2.6 3/9] sfc: Always close net device at the end of a disabling reset
From: Ben Hutchings @ 2010-04-23 22:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272061459.5520.15.camel@achroite.uk.solarflarecom.com>

This fixes a regression introduced by commit
eb9f6744cbfa97674c13263802259b5aa0034594 "sfc: Implement ethtool
reset operation".

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/efx.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/sfc/efx.c b/drivers/net/sfc/efx.c
index 6486657..649a264 100644
--- a/drivers/net/sfc/efx.c
+++ b/drivers/net/sfc/efx.c
@@ -1861,6 +1861,7 @@ out:
 	}
 
 	if (disabled) {
+		dev_close(efx->net_dev);
 		EFX_ERR(efx, "has been disabled\n");
 		efx->state = STATE_DISABLED;
 	} else {
@@ -1884,8 +1885,7 @@ static void efx_reset_work(struct work_struct *data)
 	}
 
 	rtnl_lock();
-	if (efx_reset(efx, efx->reset_pending))
-		dev_close(efx->net_dev);
+	(void)efx_reset(efx, efx->reset_pending);
 	rtnl_unlock();
 }
 
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH net-2.6 4/9] sfc: Consistently report short MCDI responses as EIO
From: Ben Hutchings @ 2010-04-23 22:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272061459.5520.15.camel@achroite.uk.solarflarecom.com>

EMSGSIZE means 'Message too large' whereas these are too short.
EINVAL means 'Invalid argument' whereas this is a response.
In some cases failing functions were returning 0 which is obviously wrong.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/mcdi.c     |   22 ++++++++++++++--------
 drivers/net/sfc/mcdi_phy.c |    6 +++---
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/drivers/net/sfc/mcdi.c b/drivers/net/sfc/mcdi.c
index c48669c..1344afa 100644
--- a/drivers/net/sfc/mcdi.c
+++ b/drivers/net/sfc/mcdi.c
@@ -613,7 +613,7 @@ int efx_mcdi_fwver(struct efx_nic *efx, u64 *version, u32 *build)
 	}
 
 	if (outlength < MC_CMD_GET_VERSION_V1_OUT_LEN) {
-		rc = -EMSGSIZE;
+		rc = -EIO;
 		goto fail;
 	}
 
@@ -647,8 +647,10 @@ int efx_mcdi_drv_attach(struct efx_nic *efx, bool driver_operating,
 			  outbuf, sizeof(outbuf), &outlen);
 	if (rc)
 		goto fail;
-	if (outlen < MC_CMD_DRV_ATTACH_OUT_LEN)
+	if (outlen < MC_CMD_DRV_ATTACH_OUT_LEN) {
+		rc = -EIO;
 		goto fail;
+	}
 
 	if (was_attached != NULL)
 		*was_attached = MCDI_DWORD(outbuf, DRV_ATTACH_OUT_OLD_STATE);
@@ -676,7 +678,7 @@ int efx_mcdi_get_board_cfg(struct efx_nic *efx, u8 *mac_address,
 		goto fail;
 
 	if (outlen < MC_CMD_GET_BOARD_CFG_OUT_LEN) {
-		rc = -EMSGSIZE;
+		rc = -EIO;
 		goto fail;
 	}
 
@@ -738,8 +740,10 @@ int efx_mcdi_nvram_types(struct efx_nic *efx, u32 *nvram_types_out)
 			  outbuf, sizeof(outbuf), &outlen);
 	if (rc)
 		goto fail;
-	if (outlen < MC_CMD_NVRAM_TYPES_OUT_LEN)
+	if (outlen < MC_CMD_NVRAM_TYPES_OUT_LEN) {
+		rc = -EIO;
 		goto fail;
+	}
 
 	*nvram_types_out = MCDI_DWORD(outbuf, NVRAM_TYPES_OUT_TYPES);
 	return 0;
@@ -765,8 +769,10 @@ int efx_mcdi_nvram_info(struct efx_nic *efx, unsigned int type,
 			  outbuf, sizeof(outbuf), &outlen);
 	if (rc)
 		goto fail;
-	if (outlen < MC_CMD_NVRAM_INFO_OUT_LEN)
+	if (outlen < MC_CMD_NVRAM_INFO_OUT_LEN) {
+		rc = -EIO;
 		goto fail;
+	}
 
 	*size_out = MCDI_DWORD(outbuf, NVRAM_INFO_OUT_SIZE);
 	*erase_size_out = MCDI_DWORD(outbuf, NVRAM_INFO_OUT_ERASESIZE);
@@ -968,7 +974,7 @@ static int efx_mcdi_read_assertion(struct efx_nic *efx)
 	if (rc)
 		return rc;
 	if (outlen < MC_CMD_GET_ASSERTS_OUT_LEN)
-		return -EINVAL;
+		return -EIO;
 
 	/* Print out any recorded assertion state */
 	flags = MCDI_DWORD(outbuf, GET_ASSERTS_OUT_GLOBAL_FLAGS);
@@ -1086,7 +1092,7 @@ int efx_mcdi_wol_filter_set(struct efx_nic *efx, u32 type,
 		goto fail;
 
 	if (outlen < MC_CMD_WOL_FILTER_SET_OUT_LEN) {
-		rc = -EMSGSIZE;
+		rc = -EIO;
 		goto fail;
 	}
 
@@ -1121,7 +1127,7 @@ int efx_mcdi_wol_filter_get_magic(struct efx_nic *efx, int *id_out)
 		goto fail;
 
 	if (outlen < MC_CMD_WOL_FILTER_GET_OUT_LEN) {
-		rc = -EMSGSIZE;
+		rc = -EIO;
 		goto fail;
 	}
 
diff --git a/drivers/net/sfc/mcdi_phy.c b/drivers/net/sfc/mcdi_phy.c
index 2f23546..5d34487 100644
--- a/drivers/net/sfc/mcdi_phy.c
+++ b/drivers/net/sfc/mcdi_phy.c
@@ -48,7 +48,7 @@ efx_mcdi_get_phy_cfg(struct efx_nic *efx, struct efx_mcdi_phy_cfg *cfg)
 		goto fail;
 
 	if (outlen < MC_CMD_GET_PHY_CFG_OUT_LEN) {
-		rc = -EMSGSIZE;
+		rc = -EIO;
 		goto fail;
 	}
 
@@ -111,7 +111,7 @@ static int efx_mcdi_loopback_modes(struct efx_nic *efx, u64 *loopback_modes)
 		goto fail;
 
 	if (outlen < MC_CMD_GET_LOOPBACK_MODES_OUT_LEN) {
-		rc = -EMSGSIZE;
+		rc = -EIO;
 		goto fail;
 	}
 
@@ -587,7 +587,7 @@ static int efx_mcdi_phy_test_alive(struct efx_nic *efx)
 		return rc;
 
 	if (outlen < MC_CMD_GET_PHY_STATE_OUT_LEN)
-		return -EMSGSIZE;
+		return -EIO;
 	if (MCDI_DWORD(outbuf, GET_PHY_STATE_STATE) != MC_CMD_PHY_STATE_OK)
 		return -EINVAL;
 
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH net-2.6 5/9] sfc: Change falcon_probe_board() to fail for unsupported boards
From: Ben Hutchings @ 2010-04-23 22:26 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272061459.5520.15.camel@achroite.uk.solarflarecom.com>

The driver needs specific PHY and board support code for each SFC4000
board; there is no point trying to continue if it is missing.
Currently unsupported boards can trigger an 'oops'.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/falcon.c        |    4 +++-
 drivers/net/sfc/falcon_boards.c |   13 +++----------
 drivers/net/sfc/nic.h           |    2 +-
 3 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/net/sfc/falcon.c b/drivers/net/sfc/falcon.c
index d294d66..08278e7 100644
--- a/drivers/net/sfc/falcon.c
+++ b/drivers/net/sfc/falcon.c
@@ -1320,7 +1320,9 @@ static int falcon_probe_nvconfig(struct efx_nic *efx)
 
 	EFX_LOG(efx, "PHY is %d phy_id %d\n", efx->phy_type, efx->mdio.prtad);
 
-	falcon_probe_board(efx, board_rev);
+	rc = falcon_probe_board(efx, board_rev);
+	if (rc)
+		goto fail2;
 
 	kfree(nvconfig);
 	return 0;
diff --git a/drivers/net/sfc/falcon_boards.c b/drivers/net/sfc/falcon_boards.c
index 5712fdd..c7a933a 100644
--- a/drivers/net/sfc/falcon_boards.c
+++ b/drivers/net/sfc/falcon_boards.c
@@ -728,15 +728,7 @@ static const struct falcon_board_type board_types[] = {
 	},
 };
 
-static const struct falcon_board_type falcon_dummy_board = {
-	.init		= efx_port_dummy_op_int,
-	.init_phy	= efx_port_dummy_op_void,
-	.fini		= efx_port_dummy_op_void,
-	.set_id_led	= efx_port_dummy_op_set_id_led,
-	.monitor	= efx_port_dummy_op_int,
-};
-
-void falcon_probe_board(struct efx_nic *efx, u16 revision_info)
+int falcon_probe_board(struct efx_nic *efx, u16 revision_info)
 {
 	struct falcon_board *board = falcon_board(efx);
 	u8 type_id = FALCON_BOARD_TYPE(revision_info);
@@ -754,8 +746,9 @@ void falcon_probe_board(struct efx_nic *efx, u16 revision_info)
 			 (efx->pci_dev->subsystem_vendor == EFX_VENDID_SFC)
 			 ? board->type->ref_model : board->type->gen_type,
 			 'A' + board->major, board->minor);
+		return 0;
 	} else {
 		EFX_ERR(efx, "unknown board type %d\n", type_id);
-		board->type = &falcon_dummy_board;
+		return -ENODEV;
 	}
 }
diff --git a/drivers/net/sfc/nic.h b/drivers/net/sfc/nic.h
index 9351c03..3166baf 100644
--- a/drivers/net/sfc/nic.h
+++ b/drivers/net/sfc/nic.h
@@ -156,7 +156,7 @@ extern struct efx_nic_type siena_a0_nic_type;
  **************************************************************************
  */
 
-extern void falcon_probe_board(struct efx_nic *efx, u16 revision_info);
+extern int falcon_probe_board(struct efx_nic *efx, u16 revision_info);
 
 /* TX data path */
 extern int efx_nic_probe_tx(struct efx_tx_queue *tx_queue);
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH net-2.6 6/9] sfc: Handle serious errors in exactly one interrupt handler
From: Ben Hutchings @ 2010-04-23 22:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272061459.5520.15.camel@achroite.uk.solarflarecom.com>

From: Steve Hodgson <shodgson@solarflare.com>

'Fatal' errors set an interrupt flag associated with a specific event
queue; only read the syndrome vector if we see that queue's flag set
(legacy interrupts) or in the interrupt handler for that queue (MSI).

Do not ignore an interrupt if the fatal error flag is set but specific
error flags are all zero.  Even if we don't schedule a reset, we must
respect the queue mask and rearm the appropriate event queues.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/falcon.c     |   13 ++++++++-----
 drivers/net/sfc/net_driver.h |    2 ++
 drivers/net/sfc/nic.c        |   35 +++++++++++++++++++----------------
 3 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/drivers/net/sfc/falcon.c b/drivers/net/sfc/falcon.c
index 08278e7..e783b0a 100644
--- a/drivers/net/sfc/falcon.c
+++ b/drivers/net/sfc/falcon.c
@@ -175,16 +175,19 @@ irqreturn_t falcon_legacy_interrupt_a1(int irq, void *dev_id)
 	EFX_TRACE(efx, "IRQ %d on CPU %d status " EFX_OWORD_FMT "\n",
 		  irq, raw_smp_processor_id(), EFX_OWORD_VAL(*int_ker));
 
-	/* Check to see if we have a serious error condition */
-	syserr = EFX_OWORD_FIELD(*int_ker, FSF_AZ_NET_IVEC_FATAL_INT);
-	if (unlikely(syserr))
-		return efx_nic_fatal_interrupt(efx);
-
 	/* Determine interrupting queues, clear interrupt status
 	 * register and acknowledge the device interrupt.
 	 */
 	BUILD_BUG_ON(FSF_AZ_NET_IVEC_INT_Q_WIDTH > EFX_MAX_CHANNELS);
 	queues = EFX_OWORD_FIELD(*int_ker, FSF_AZ_NET_IVEC_INT_Q);
+
+	/* Check to see if we have a serious error condition */
+	if (queues & (1U << efx->fatal_irq_level)) {
+		syserr = EFX_OWORD_FIELD(*int_ker, FSF_AZ_NET_IVEC_FATAL_INT);
+		if (unlikely(syserr))
+			return efx_nic_fatal_interrupt(efx);
+	}
+
 	EFX_ZERO_OWORD(*int_ker);
 	wmb(); /* Ensure the vector is cleared before interrupt ack */
 	falcon_irq_ack_a1(efx);
diff --git a/drivers/net/sfc/net_driver.h b/drivers/net/sfc/net_driver.h
index cb018e2..70aea3a 100644
--- a/drivers/net/sfc/net_driver.h
+++ b/drivers/net/sfc/net_driver.h
@@ -672,6 +672,7 @@ union efx_multicast_hash {
  *	This register is written with the SMP processor ID whenever an
  *	interrupt is handled.  It is used by efx_nic_test_interrupt()
  *	to verify that an interrupt has occurred.
+ * @fatal_irq_level: IRQ level (bit number) used for serious errors
  * @spi_flash: SPI flash device
  *	This field will be %NULL if no flash device is present (or for Siena).
  * @spi_eeprom: SPI EEPROM device
@@ -756,6 +757,7 @@ struct efx_nic {
 	struct efx_buffer irq_status;
 	volatile signed int last_irq_cpu;
 	unsigned long irq_zero_count;
+	unsigned fatal_irq_level;
 
 	struct efx_spi_device *spi_flash;
 	struct efx_spi_device *spi_eeprom;
diff --git a/drivers/net/sfc/nic.c b/drivers/net/sfc/nic.c
index 664fd6c..23738f8 100644
--- a/drivers/net/sfc/nic.c
+++ b/drivers/net/sfc/nic.c
@@ -1229,15 +1229,9 @@ static inline void efx_nic_interrupts(struct efx_nic *efx,
 				      bool enabled, bool force)
 {
 	efx_oword_t int_en_reg_ker;
-	unsigned int level = 0;
-
-	if (EFX_WORKAROUND_17213(efx) && !EFX_INT_MODE_USE_MSI(efx))
-		/* Set the level always even if we're generating a test
-		 * interrupt, because our legacy interrupt handler is safe */
-		level = 0x1f;
 
 	EFX_POPULATE_OWORD_3(int_en_reg_ker,
-			     FRF_AZ_KER_INT_LEVE_SEL, level,
+			     FRF_AZ_KER_INT_LEVE_SEL, efx->fatal_irq_level,
 			     FRF_AZ_KER_INT_KER, force,
 			     FRF_AZ_DRV_INT_EN_KER, enabled);
 	efx_writeo(efx, &int_en_reg_ker, FR_AZ_INT_EN_KER);
@@ -1291,8 +1285,6 @@ irqreturn_t efx_nic_fatal_interrupt(struct efx_nic *efx)
 		EFX_OWORD_FMT ": %s\n", EFX_OWORD_VAL(*int_ker),
 		EFX_OWORD_VAL(fatal_intr),
 		error ? "disabling bus mastering" : "no recognised error");
-	if (error == 0)
-		goto out;
 
 	/* If this is a memory parity error dump which blocks are offending */
 	mem_perr = EFX_OWORD_FIELD(fatal_intr, FRF_AZ_MEM_PERR_INT_KER);
@@ -1324,7 +1316,7 @@ irqreturn_t efx_nic_fatal_interrupt(struct efx_nic *efx)
 			"NIC will be disabled\n");
 		efx_schedule_reset(efx, RESET_TYPE_DISABLE);
 	}
-out:
+
 	return IRQ_HANDLED;
 }
 
@@ -1346,9 +1338,11 @@ static irqreturn_t efx_legacy_interrupt(int irq, void *dev_id)
 	queues = EFX_EXTRACT_DWORD(reg, 0, 31);
 
 	/* Check to see if we have a serious error condition */
-	syserr = EFX_OWORD_FIELD(*int_ker, FSF_AZ_NET_IVEC_FATAL_INT);
-	if (unlikely(syserr))
-		return efx_nic_fatal_interrupt(efx);
+	if (queues & (1U << efx->fatal_irq_level)) {
+		syserr = EFX_OWORD_FIELD(*int_ker, FSF_AZ_NET_IVEC_FATAL_INT);
+		if (unlikely(syserr))
+			return efx_nic_fatal_interrupt(efx);
+	}
 
 	if (queues != 0) {
 		if (EFX_WORKAROUND_15783(efx))
@@ -1413,9 +1407,11 @@ static irqreturn_t efx_msi_interrupt(int irq, void *dev_id)
 		  irq, raw_smp_processor_id(), EFX_OWORD_VAL(*int_ker));
 
 	/* Check to see if we have a serious error condition */
-	syserr = EFX_OWORD_FIELD(*int_ker, FSF_AZ_NET_IVEC_FATAL_INT);
-	if (unlikely(syserr))
-		return efx_nic_fatal_interrupt(efx);
+	if (channel->channel == efx->fatal_irq_level) {
+		syserr = EFX_OWORD_FIELD(*int_ker, FSF_AZ_NET_IVEC_FATAL_INT);
+		if (unlikely(syserr))
+			return efx_nic_fatal_interrupt(efx);
+	}
 
 	/* Schedule processing of the channel */
 	efx_schedule_channel(channel);
@@ -1553,6 +1549,13 @@ void efx_nic_init_common(struct efx_nic *efx)
 			     FRF_AZ_INT_ADR_KER, efx->irq_status.dma_addr);
 	efx_writeo(efx, &temp, FR_AZ_INT_ADR_KER);
 
+	if (EFX_WORKAROUND_17213(efx) && !EFX_INT_MODE_USE_MSI(efx))
+		/* Use an interrupt level unused by event queues */
+		efx->fatal_irq_level = 0x1f;
+	else
+		/* Use a valid MSI-X vector */
+		efx->fatal_irq_level = 0;
+
 	/* Enable all the genuinely fatal interrupts.  (They are still
 	 * masked by the overall interrupt mask, controlled by
 	 * falcon_interrupts()).
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH net-2.6 7/9] sfc: Stop masking out XGMII faults over reconfigures
From: Ben Hutchings @ 2010-04-23 22:27 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272061459.5520.15.camel@achroite.uk.solarflarecom.com>

From: Steve Hodgson <shodgson@solarflare.com>

The aim of this code was to avoid a spurious XGMII fault over a MAC
reconfigure. It's less relevant now that the PHY reconfigure isn't
called from the MAC reconfigure.

After applying this patch, our link stress test passed 48 hours of
testing without ever resetting the PHY.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/falcon_xmac.c |   20 +++++---------------
 1 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/drivers/net/sfc/falcon_xmac.c b/drivers/net/sfc/falcon_xmac.c
index 8ccab2c..3d65abf 100644
--- a/drivers/net/sfc/falcon_xmac.c
+++ b/drivers/net/sfc/falcon_xmac.c
@@ -85,14 +85,14 @@ int falcon_reset_xaui(struct efx_nic *efx)
 	return -ETIMEDOUT;
 }
 
-static void falcon_mask_status_intr(struct efx_nic *efx, bool enable)
+static void falcon_ack_status_intr(struct efx_nic *efx)
 {
 	efx_oword_t reg;
 
 	if ((efx_nic_rev(efx) != EFX_REV_FALCON_B0) || LOOPBACK_INTERNAL(efx))
 		return;
 
-	/* We expect xgmii faults if the wireside link is up */
+	/* We expect xgmii faults if the wireside link is down */
 	if (!EFX_WORKAROUND_5147(efx) || !efx->link_state.up)
 		return;
 
@@ -101,14 +101,7 @@ static void falcon_mask_status_intr(struct efx_nic *efx, bool enable)
 	if (efx->xmac_poll_required)
 		return;
 
-	/* Flush the ISR */
-	if (enable)
-		efx_reado(efx, &reg, FR_AB_XM_MGT_INT_MSK);
-
-	EFX_POPULATE_OWORD_2(reg,
-			     FRF_AB_XM_MSK_RMTFLT, !enable,
-			     FRF_AB_XM_MSK_LCLFLT, !enable);
-	efx_writeo(efx, &reg, FR_AB_XM_MGT_INT_MASK);
+	efx_reado(efx, &reg, FR_AB_XM_MGT_INT_MSK);
 }
 
 static bool falcon_xgxs_link_ok(struct efx_nic *efx)
@@ -283,15 +276,13 @@ static bool falcon_xmac_check_fault(struct efx_nic *efx)
 
 static int falcon_reconfigure_xmac(struct efx_nic *efx)
 {
-	falcon_mask_status_intr(efx, false);
-
 	falcon_reconfigure_xgxs_core(efx);
 	falcon_reconfigure_xmac_core(efx);
 
 	falcon_reconfigure_mac_wrapper(efx);
 
 	efx->xmac_poll_required = !falcon_xmac_link_ok_retry(efx, 5);
-	falcon_mask_status_intr(efx, true);
+	falcon_ack_status_intr(efx);
 
 	return 0;
 }
@@ -362,9 +353,8 @@ void falcon_poll_xmac(struct efx_nic *efx)
 	    !efx->xmac_poll_required)
 		return;
 
-	falcon_mask_status_intr(efx, false);
 	efx->xmac_poll_required = !falcon_xmac_link_ok_retry(efx, 1);
-	falcon_mask_status_intr(efx, true);
+	falcon_ack_status_intr(efx);
 }
 
 struct efx_mac_operations falcon_xmac_operations = {
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH net-2.6 8/9] sfc: Reconfigure the XAUI serdes after an EM reset
From: Ben Hutchings @ 2010-04-23 22:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272061459.5520.15.camel@achroite.uk.solarflarecom.com>

From: Steve Hodgson <shodgson@solarflare.com>

Fix a regression introduced in d3245b28ef2a45ec4e115062a38100bd06229289
"sfc: Refactor link configuration".

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/falcon.c      |    3 +++
 drivers/net/sfc/falcon_xmac.c |    2 +-
 drivers/net/sfc/nic.h         |    1 +
 3 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sfc/falcon.c b/drivers/net/sfc/falcon.c
index e783b0a..655b697 100644
--- a/drivers/net/sfc/falcon.c
+++ b/drivers/net/sfc/falcon.c
@@ -507,6 +507,9 @@ static void falcon_reset_macs(struct efx_nic *efx)
 	/* Ensure the correct MAC is selected before statistics
 	 * are re-enabled by the caller */
 	efx_writeo(efx, &mac_ctrl, FR_AB_MAC_CTRL);
+
+	/* This can run even when the GMAC is selected */
+	falcon_setup_xaui(efx);
 }
 
 void falcon_drain_tx_fifo(struct efx_nic *efx)
diff --git a/drivers/net/sfc/falcon_xmac.c b/drivers/net/sfc/falcon_xmac.c
index 3d65abf..c84a2ce 100644
--- a/drivers/net/sfc/falcon_xmac.c
+++ b/drivers/net/sfc/falcon_xmac.c
@@ -26,7 +26,7 @@
  *************************************************************************/
 
 /* Configure the XAUI driver that is an output from Falcon */
-static void falcon_setup_xaui(struct efx_nic *efx)
+void falcon_setup_xaui(struct efx_nic *efx)
 {
 	efx_oword_t sdctl, txdrv;
 
diff --git a/drivers/net/sfc/nic.h b/drivers/net/sfc/nic.h
index 3166baf..bcf1ac4 100644
--- a/drivers/net/sfc/nic.h
+++ b/drivers/net/sfc/nic.h
@@ -203,6 +203,7 @@ extern void falcon_irq_ack_a1(struct efx_nic *efx);
 extern int efx_nic_flush_queues(struct efx_nic *efx);
 extern void falcon_start_nic_stats(struct efx_nic *efx);
 extern void falcon_stop_nic_stats(struct efx_nic *efx);
+extern void falcon_setup_xaui(struct efx_nic *efx);
 extern int falcon_reset_xaui(struct efx_nic *efx);
 extern void efx_nic_init_common(struct efx_nic *efx);
 
-- 
1.6.2.5


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH net-2.6 9/9] sfc: Extend the legacy interrupt workarounds
From: Ben Hutchings @ 2010-04-23 22:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272061459.5520.15.camel@achroite.uk.solarflarecom.com>

From: Steve Hodgson <shodgson@solarflare.com>

Siena has two problems with legacy interrupts:
  1. There is no synchronisation between the ISR read completion,
     and the interrupt deassert message.
  2. A downstream read at the "wrong" moment can return 0, and
     suppress generating the next interrupt.

Falcon should suffer from both of these, and it appears it does.
Enable EFX_WORKAROUND_15783 on Falcon as well.

Also, when we see queues == 0, ensure we always schedule or rearm
every event queue.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/sfc/nic.c         |   23 +++++++++--------------
 drivers/net/sfc/workarounds.h |    2 +-
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/net/sfc/nic.c b/drivers/net/sfc/nic.c
index 23738f8..b61674c 100644
--- a/drivers/net/sfc/nic.c
+++ b/drivers/net/sfc/nic.c
@@ -1356,33 +1356,28 @@ static irqreturn_t efx_legacy_interrupt(int irq, void *dev_id)
 		}
 		result = IRQ_HANDLED;
 
-	} else if (EFX_WORKAROUND_15783(efx) &&
-		   efx->irq_zero_count++ == 0) {
+	} else if (EFX_WORKAROUND_15783(efx)) {
 		efx_qword_t *event;
 
-		/* Ensure we rearm all event queues */
+		/* We can't return IRQ_HANDLED more than once on seeing ISR=0
+		 * because this might be a shared interrupt. */
+		if (efx->irq_zero_count++ == 0)
+			result = IRQ_HANDLED;
+
+		/* Ensure we schedule or rearm all event queues */
 		efx_for_each_channel(channel, efx) {
 			event = efx_event(channel, channel->eventq_read_ptr);
 			if (efx_event_present(event))
 				efx_schedule_channel(channel);
+			else
+				efx_nic_eventq_read_ack(channel);
 		}
-
-		result = IRQ_HANDLED;
 	}
 
 	if (result == IRQ_HANDLED) {
 		efx->last_irq_cpu = raw_smp_processor_id();
 		EFX_TRACE(efx, "IRQ %d on CPU %d status " EFX_DWORD_FMT "\n",
 			  irq, raw_smp_processor_id(), EFX_DWORD_VAL(reg));
-	} else if (EFX_WORKAROUND_15783(efx)) {
-		/* We can't return IRQ_HANDLED more than once on seeing ISR0=0
-		 * because this might be a shared interrupt, but we do need to
-		 * check the channel every time and preemptively rearm it if
-		 * it's idle. */
-		efx_for_each_channel(channel, efx) {
-			if (!channel->work_pending)
-				efx_nic_eventq_read_ack(channel);
-		}
 	}
 
 	return result;
diff --git a/drivers/net/sfc/workarounds.h b/drivers/net/sfc/workarounds.h
index acd9c73..518f7fc 100644
--- a/drivers/net/sfc/workarounds.h
+++ b/drivers/net/sfc/workarounds.h
@@ -37,7 +37,7 @@
 /* Truncated IPv4 packets can confuse the TX packet parser */
 #define EFX_WORKAROUND_15592 EFX_WORKAROUND_FALCON_AB
 /* Legacy ISR read can return zero once */
-#define EFX_WORKAROUND_15783 EFX_WORKAROUND_SIENA
+#define EFX_WORKAROUND_15783 EFX_WORKAROUND_ALWAYS
 /* Legacy interrupt storm when interrupt fifo fills */
 #define EFX_WORKAROUND_17213 EFX_WORKAROUND_SIENA
 
-- 
1.6.2.5

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* Re: [PATCH net-2.6 1/9] sfc: Wait at most 10ms for the MC to finish reading out MAC statistics
From: Ben Hutchings @ 2010-04-23 22:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, sf-linux-drivers
In-Reply-To: <1272061439.5520.14.camel@achroite.uk.solarflarecom.com>

All of these 9 patches should also be applicable to 2.6.33.y, except
that one hunk of "sfc: Consistently report short MCDI responses as EIO"
is not applicable and should be dropped.

Some of the bug fixes are applicable to 2.6.32.y and maybe to 2.6.27.y,
but the patches will need some adjustment.  I intend to send backported
patches to stable@kernel.org separately.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-2.6 1/9] sfc: Wait at most 10ms for the MC to finish reading out MAC statistics
From: David Miller @ 2010-04-23 22:36 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272062009.5520.27.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Fri, 23 Apr 2010 23:33:28 +0100

> All of these 9 patches should also be applicable to 2.6.33.y, except
> that one hunk of "sfc: Consistently report short MCDI responses as EIO"
> is not applicable and should be dropped.
> 
> Some of the bug fixes are applicable to 2.6.32.y and maybe to 2.6.27.y,
> but the patches will need some adjustment.  I intend to send backported
> patches to stable@kernel.org separately.

There is zero way I'm applying 9 patches this late in the RC
series.

If you want this stuff to go into net-2.6 and get backported
to -stable, pick a very small (2 or 3) set of the most important
fixes.

Consistent -EIO error code returns and junk like that are
not appropriate this late in the RC, and definitely not -stable
material.

^ permalink raw reply

* Re: [PATCH net-2.6 1/9] sfc: Wait at most 10ms for the MC to finish reading out MAC statistics
From: Ben Hutchings @ 2010-04-23 22:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <20100423.153633.45897185.davem@davemloft.net>

On Fri, 2010-04-23 at 15:36 -0700, David Miller wrote:
> From: Ben Hutchings <bhutchings@solarflare.com>
> Date: Fri, 23 Apr 2010 23:33:28 +0100
> 
> > All of these 9 patches should also be applicable to 2.6.33.y, except
> > that one hunk of "sfc: Consistently report short MCDI responses as EIO"
> > is not applicable and should be dropped.
> > 
> > Some of the bug fixes are applicable to 2.6.32.y and maybe to 2.6.27.y,
> > but the patches will need some adjustment.  I intend to send backported
> > patches to stable@kernel.org separately.
> 
> There is zero way I'm applying 9 patches this late in the RC
> series.
> 
> If you want this stuff to go into net-2.6 and get backported
> to -stable, pick a very small (2 or 3) set of the most important
> fixes.

This makes no sense.  You want to put a quota on bug fixes?  I could
arbitrarily pick some but I'm still going to want to get the other fixes
into distributions.

> Consistent -EIO error code returns and junk like that are
> not appropriate this late in the RC, and definitely not -stable
> material.

The important part of that change is that functions were returning 0 in
a failure case.  I should have made that the first sentence in the
commit message.  I didn't see the point in making a separate commit to
fix the wrong error codes, but I can split this up if you prefer.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH net-2.6 1/9] sfc: Wait at most 10ms for the MC to finish reading out MAC statistics
From: David Miller @ 2010-04-23 22:58 UTC (permalink / raw)
  To: bhutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1272063270.5520.39.camel@achroite.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Fri, 23 Apr 2010 23:54:30 +0100

> This makes no sense.  You want to put a quota on bug fixes?  I could
> arbitrarily pick some but I'm still going to want to get the other fixes
> into distributions.

It's not a quota.  It's a request that only the most catastropic
bugs get fixed this late in the RC.

You don't have 9 catastropic bugs to fix in your driver.

^ permalink raw reply

* Re: [PATCH] RCU: don't turn off lockdep when find suspicious rcu_dereference_check() usage
From: Miles Lane @ 2010-04-23 22:59 UTC (permalink / raw)
  To: paulmck
  Cc: Vivek Goyal, Eric Paris, Lai Jiangshan, Ingo Molnar,
	Peter Zijlstra, LKML, nauman, eric.dumazet, netdev, Jens Axboe,
	Gui Jianfeng, Li Zefan
In-Reply-To: <20100423194255.GE2589@linux.vnet.ibm.com>

On Fri, Apr 23, 2010 at 3:42 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Fri, Apr 23, 2010 at 08:50:59AM -0400, Miles Lane wrote:
>> Hi Paul,
>> There has been a bit of back and forth, and I am not sure what patches
>> I should test now.
>> Could you send me a bundle of whatever needs testing now?
>
> Hello, Miles,
>
> I am posting my set as replies to this message.  There are a couple
> of KVM fixes that are going up via Avi's tree, and a number of networking
> fixes that are going up via Dave Miller's tree -- a number of these
> are against quickly changing code, so it didn't make sense for me to
> keep them separately.
>
> I believe that the two splats below are addressed by this patch set
> carried in the networking tree:
>
>        https://patchwork.kernel.org/patch/90754/

With your twelve patches and the one linked to above applied to
2.6.34-rc5-git3, here are the warnings I see:

[    0.173969] [ INFO: suspicious rcu_dereference_check() usage. ]
[    0.174097] ---------------------------------------------------
[    0.174226] include/linux/cgroup.h:534 invoked
rcu_dereference_check() without protection!
[    0.174429]
[    0.174430] other info that might help us debug this:
[    0.174431]
[    0.174792]
[    0.174793] rcu_scheduler_active = 1, debug_locks = 1
[    0.175037] no locks held by watchdog/0/5.
[    0.175162]
[    0.175163] stack backtrace:
[    0.175405] Pid: 5, comm: watchdog/0 Not tainted 2.6.34-rc5-git3 #22
[    0.175534] Call Trace:
[    0.175666]  [<ffffffff81067fbe>] lockdep_rcu_dereference+0x9d/0xa5
[    0.175799]  [<ffffffff8102d678>] task_subsys_state+0x59/0x70
[    0.175931]  [<ffffffff810328fa>] __sched_setscheduler+0x19d/0x300
[    0.176064]  [<ffffffff8102b477>] ? need_resched+0x1e/0x28
[    0.176196]  [<ffffffff813cd401>] ? schedule+0x5c3/0x66e
[    0.176327]  [<ffffffff81091943>] ? watchdog+0x0/0x8c
[    0.176457]  [<ffffffff81032a78>] sched_setscheduler+0xe/0x10
[    0.176587]  [<ffffffff8109196d>] watchdog+0x2a/0x8c
[    0.176677]  [<ffffffff81091943>] ? watchdog+0x0/0x8c
[    0.176808]  [<ffffffff81057152>] kthread+0x89/0x91
[    0.176939]  [<ffffffff8106891e>] ? trace_hardirqs_on_caller+0x114/0x13f
[    0.177073]  [<ffffffff81003994>] kernel_thread_helper+0x4/0x10
[    0.177204]  [<ffffffff813cfc40>] ? restore_args+0x0/0x30
[    0.177334]  [<ffffffff810570c9>] ? kthread+0x0/0x91
[    0.177463]  [<ffffffff81003990>] ? kernel_thread_helper+0x0/0x10

[    3.173419] [ INFO: suspicious rcu_dereference_check() usage. ]
[    3.173419] ---------------------------------------------------
[    3.173419] kernel/cgroup.c:4438 invoked rcu_dereference_check()
without protection!
[    3.173419]
[    3.173419] other info that might help us debug this:
[    3.173419]
[    3.173419]
[    3.173419] rcu_scheduler_active = 1, debug_locks = 1
[    3.173419] 2 locks held by async/0/668:
[    3.173419]  #0:  (&shost->scan_mutex){+.+.+.}, at:
[<ffffffff812df020>] __scsi_add_device+0x83/0xe4
[    3.173419]  #1:  (&(&blkcg->lock)->rlock){......}, at:
[<ffffffff811f2df9>] blkiocg_add_blkio_group+0x29/0x7f
[    3.173419]
[    3.173419] stack backtrace:
[    3.173419] Pid: 668, comm: async/0 Not tainted 2.6.34-rc5-git3 #22
[    3.173419] Call Trace:
[    3.173419]  [<ffffffff81067fbe>] lockdep_rcu_dereference+0x9d/0xa5
[    3.173419]  [<ffffffff8107f9ad>] css_id+0x3f/0x51
[    3.173419]  [<ffffffff811f2e08>] blkiocg_add_blkio_group+0x38/0x7f
[    3.173419]  [<ffffffff811f4dd0>] cfq_init_queue+0xdf/0x2dc
[    3.173419]  [<ffffffff811e33b1>] elevator_init+0xba/0xf5
[    3.173419]  [<ffffffff812dbfaa>] ? scsi_request_fn+0x0/0x451
[    3.173419]  [<ffffffff811e68d7>] blk_init_queue_node+0x12f/0x135
[    3.173419]  [<ffffffff811e68e9>] blk_init_queue+0xc/0xe
[    3.173419]  [<ffffffff812dc41c>] __scsi_alloc_queue+0x21/0x111
[    3.173419]  [<ffffffff812dc524>] scsi_alloc_queue+0x18/0x64
[    3.173419]  [<ffffffff812de520>] scsi_alloc_sdev+0x19e/0x256
[    3.173419]  [<ffffffff812de6be>] scsi_probe_and_add_lun+0xe6/0x9c5
[    3.173419]  [<ffffffff8106891e>] ? trace_hardirqs_on_caller+0x114/0x13f
[    3.173419]  [<ffffffff813ce056>] ? __mutex_lock_common+0x3e4/0x43a
[    3.173419]  [<ffffffff812df020>] ? __scsi_add_device+0x83/0xe4
[    3.173419]  [<ffffffff812d09dc>] ? transport_setup_classdev+0x0/0x17
[    3.173419]  [<ffffffff812df020>] ? __scsi_add_device+0x83/0xe4
[    3.173419]  [<ffffffff812df055>] __scsi_add_device+0xb8/0xe4
[    3.173419]  [<ffffffff812ea945>] ata_scsi_scan_host+0x74/0x16e
[    3.173419]  [<ffffffff81057699>] ? autoremove_wake_function+0x0/0x34
[    3.173419]  [<ffffffff812e8de4>] async_port_probe+0xab/0xb7
[    3.173419]  [<ffffffff8105e1b1>] ? async_thread+0x0/0x1f4
[    3.173419]  [<ffffffff8105e2b6>] async_thread+0x105/0x1f4
[    3.173419]  [<ffffffff81033d8e>] ? default_wake_function+0x0/0xf
[    3.173419]  [<ffffffff8105e1b1>] ? async_thread+0x0/0x1f4
[    3.173419]  [<ffffffff81057152>] kthread+0x89/0x91
[    3.173419]  [<ffffffff8106891e>] ? trace_hardirqs_on_caller+0x114/0x13f
[    3.173419]  [<ffffffff81003994>] kernel_thread_helper+0x4/0x10
[    3.173419]  [<ffffffff813cfc40>] ? restore_args+0x0/0x30
[    3.173419]  [<ffffffff810570c9>] ? kthread+0x0/0x91
[    3.173419]  [<ffffffff81003990>] ? kernel_thread_helper+0x0/0x10

[   32.905446] [ INFO: suspicious rcu_dereference_check() usage. ]
[   32.905449] ---------------------------------------------------
[   32.905453] net/core/dev.c:1993 invoked rcu_dereference_check()
without protection!
[   32.905456]
[   32.905457] other info that might help us debug this:
[   32.905458]
[   32.905461]
[   32.905462] rcu_scheduler_active = 1, debug_locks = 1
[   32.905466] 2 locks held by canberra-gtk-pl/4182:
[   32.905469]  #0:  (sk_lock-AF_INET){+.+.+.}, at:
[<ffffffff81394f7d>] inet_stream_connect+0x3a/0x24d
[   32.905483]  #1:  (rcu_read_lock_bh){.+....}, at:
[<ffffffff8134a789>] dev_queue_xmit+0x14e/0x4b8
[   32.905495]
[   32.905496] stack backtrace:
[   32.905500] Pid: 4182, comm: canberra-gtk-pl Not tainted 2.6.34-rc5-git3 #22
[   32.905504] Call Trace:
[   32.905512]  [<ffffffff81067fbe>] lockdep_rcu_dereference+0x9d/0xa5
[   32.905518]  [<ffffffff8134a894>] dev_queue_xmit+0x259/0x4b8
[   32.905524]  [<ffffffff8134a789>] ? dev_queue_xmit+0x14e/0x4b8
[   32.905531]  [<ffffffff81041c66>] ? _local_bh_enable_ip+0xcd/0xda
[   32.905538]  [<ffffffff813536da>] neigh_resolve_output+0x234/0x285
[   32.905544]  [<ffffffff8136f69f>] ip_finish_output2+0x257/0x28c
[   32.905549]  [<ffffffff8136f73c>] ip_finish_output+0x68/0x6a
[   32.905554]  [<ffffffff81370433>] T.866+0x52/0x59
[   32.905559]  [<ffffffff8137067e>] ip_output+0xaa/0xb4
[   32.905565]  [<ffffffff8136eb38>] ip_local_out+0x20/0x24
[   32.905571]  [<ffffffff8136f184>] ip_queue_xmit+0x309/0x368
[   32.905578]  [<ffffffff810e4226>] ? __kmalloc_track_caller+0x111/0x155
[   32.905585]  [<ffffffff8138316f>] ? tcp_connect+0x223/0x3d3
[   32.905591]  [<ffffffff813818f1>] tcp_transmit_skb+0x707/0x745
[   32.905597]  [<ffffffff813832c2>] tcp_connect+0x376/0x3d3
[   32.905604]  [<ffffffff81268a43>] ? secure_tcp_sequence_number+0x55/0x6f
[   32.905610]  [<ffffffff81387270>] tcp_v4_connect+0x3df/0x455
[   32.905617]  [<ffffffff8133cb59>] ? lock_sock_nested+0xf3/0x102
[   32.905623]  [<ffffffff81394fe7>] inet_stream_connect+0xa4/0x24d
[   32.905629]  [<ffffffff8133b398>] sys_connect+0x90/0xd0
[   32.905636]  [<ffffffff81002b9c>] ? sysret_check+0x27/0x62
[   32.905642]  [<ffffffff8106891e>] ? trace_hardirqs_on_caller+0x114/0x13f
[   32.905649]  [<ffffffff813cec80>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   32.905655]  [<ffffffff81002b6b>] system_call_fastpath+0x16/0x1b

[   51.912282] [ INFO: suspicious rcu_dereference_check() usage. ]
[   51.912285] ---------------------------------------------------
[   51.912289] net/mac80211/sta_info.c:886 invoked
rcu_dereference_check() without protection!
[   51.912293]
[   51.912293] other info that might help us debug this:
[   51.912295]
[   51.912298]
[   51.912298] rcu_scheduler_active = 1, debug_locks = 1
[   51.912302] no locks held by wpa_supplicant/3951.
[   51.912305]
[   51.912306] stack backtrace:
[   51.912310] Pid: 3951, comm: wpa_supplicant Not tainted 2.6.34-rc5-git3 #22
[   51.912314] Call Trace:
[   51.912317]  <IRQ>  [<ffffffff81067fbe>] lockdep_rcu_dereference+0x9d/0xa5
[   51.912345]  [<ffffffffa014f9ae>]
ieee80211_find_sta_by_hw+0x46/0x10f [mac80211]
[   51.912358]  [<ffffffffa014fa8e>] ieee80211_find_sta+0x17/0x19 [mac80211]
[   51.912373]  [<ffffffffa01e50f2>] iwl_tx_queue_reclaim+0xdb/0x1b1 [iwlcore]
[   51.912380]  [<ffffffff8106842b>] ? mark_lock+0x2d/0x235
[   51.912391]  [<ffffffffa0252f1c>] iwl5000_rx_reply_tx+0x4a9/0x556 [iwlagn]
[   51.912399]  [<ffffffff8120a353>] ? is_swiotlb_buffer+0x2e/0x3b
[   51.912407]  [<ffffffffa024bbf4>] iwl_rx_handle+0x163/0x2b5 [iwlagn]
[   51.912414]  [<ffffffff81068904>] ? trace_hardirqs_on_caller+0xfa/0x13f
[   51.912422]  [<ffffffffa024c3ac>] iwl_irq_tasklet+0x2bb/0x3c0 [iwlagn]
[   51.912429]  [<ffffffff810411f3>] tasklet_action+0xa7/0x10f
[   51.912435]  [<ffffffff81042205>] __do_softirq+0x144/0x252
[   51.912442]  [<ffffffff81003a8c>] call_softirq+0x1c/0x34
[   51.912447]  [<ffffffff810050e4>] do_softirq+0x38/0x80
[   51.912452]  [<ffffffff81041cd2>] irq_exit+0x45/0x94
[   51.912457]  [<ffffffff81004829>] do_IRQ+0xad/0xc4
[   51.912463]  [<ffffffff810cbbd3>] ? might_fault+0x63/0xb3
[   51.912470]  [<ffffffff813cfb93>] ret_from_intr+0x0/0xf
[   51.912474]  <EOI>  [<ffffffff810cbbd3>] ? might_fault+0x63/0xb3
[   51.912484]  [<ffffffff8106a75d>] ? lock_release+0x208/0x215
[   51.912490]  [<ffffffff810cbc1c>] might_fault+0xac/0xb3
[   51.912495]  [<ffffffff810cbbd3>] ? might_fault+0x63/0xb3
[   51.912501]  [<ffffffff812025e3>] __clear_user+0x15/0x59
[   51.912508]  [<ffffffff8100b2bc>] save_i387_xstate+0x9c/0x1bc
[   51.912515]  [<ffffffff81002276>] do_signal+0x240/0x686
[   51.912521]  [<ffffffff81002b9c>] ? sysret_check+0x27/0x62
[   51.912527]  [<ffffffff8106891e>] ? trace_hardirqs_on_caller+0x114/0x13f
[   51.912533]  [<ffffffff813cec80>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   51.912539]  [<ffffffff810026e3>] do_notify_resume+0x27/0x5f
[   51.912545]  [<ffffffff813cec80>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[   51.912551]  [<ffffffff81002e86>] int_signal+0x12/0x17

[   51.929529] [ INFO: suspicious rcu_dereference_check() usage. ]
[   51.929532] ---------------------------------------------------
[   51.929536] net/mac80211/sta_info.c:886 invoked
rcu_dereference_check() without protection!
[   51.929540]
[   51.929541] other info that might help us debug this:
[   51.929542]
[   51.929545]
[   51.929546] rcu_scheduler_active = 1, debug_locks = 1
[   51.929550] 1 lock held by Xorg/4013:
[   51.929553]  #0:  (clock-AF_UNIX){++.+..}, at: [<ffffffff8133cebd>]
sock_def_readable+0x19/0x62
[   51.929567]
[   51.929568] stack backtrace:
[   51.929573] Pid: 4013, comm: Xorg Not tainted 2.6.34-rc5-git3 #22
[   51.929576] Call Trace:
[   51.929579]  <IRQ>  [<ffffffff81067fbe>] lockdep_rcu_dereference+0x9d/0xa5
[   51.929603]  [<ffffffffa014f9fe>]
ieee80211_find_sta_by_hw+0x96/0x10f [mac80211]
[   51.929615]  [<ffffffffa014fa8e>] ieee80211_find_sta+0x17/0x19 [mac80211]
[   51.929631]  [<ffffffffa01e50f2>] iwl_tx_queue_reclaim+0xdb/0x1b1 [iwlcore]
[   51.929642]  [<ffffffffa0252f1c>] iwl5000_rx_reply_tx+0x4a9/0x556 [iwlagn]
[   51.929649]  [<ffffffff81068685>] ? mark_held_locks+0x52/0x70
[   51.929656]  [<ffffffff813cf46c>] ? _raw_spin_unlock_irqrestore+0x3a/0x69
[   51.929662]  [<ffffffff8120a353>] ? is_swiotlb_buffer+0x2e/0x3b
[   51.929671]  [<ffffffffa024bbf4>] iwl_rx_handle+0x163/0x2b5 [iwlagn]
[   51.929680]  [<ffffffffa024c3ac>] iwl_irq_tasklet+0x2bb/0x3c0 [iwlagn]
[   51.929687]  [<ffffffff810411f3>] tasklet_action+0xa7/0x10f
[   51.929693]  [<ffffffff81042205>] __do_softirq+0x144/0x252
[   51.929700]  [<ffffffff81003a8c>] call_softirq+0x1c/0x34
[   51.929705]  [<ffffffff810050e4>] do_softirq+0x38/0x80
[   51.929711]  [<ffffffff81041cd2>] irq_exit+0x45/0x94
[   51.929717]  [<ffffffff81019b10>] smp_apic_timer_interrupt+0x87/0x95
[   51.929724]  [<ffffffff81003553>] apic_timer_interrupt+0x13/0x20
[   51.929727]  <EOI>  [<ffffffff813cf46e>] ?
_raw_spin_unlock_irqrestore+0x3c/0x69
[   51.929739]  [<ffffffff8102d3fb>] __wake_up_sync_key+0x49/0x52
[   51.929745]  [<ffffffff8133cee7>] sock_def_readable+0x43/0x62
[   51.929751]  [<ffffffff813b1c61>] unix_stream_sendmsg+0x243/0x2e2
[   51.929758]  [<ffffffff8133b912>] ? sock_aio_write+0x0/0xcf
[   51.929764]  [<ffffffff81339342>] __sock_sendmsg+0x59/0x64
[   51.929770]  [<ffffffff8133b9cd>] sock_aio_write+0xbb/0xcf
[   51.929777]  [<ffffffff810e9909>] do_sync_readv_writev+0xbc/0xfb
[   51.929785]  [<ffffffff811c1792>] ? selinux_file_permission+0xa2/0xaf
[   51.929790]  [<ffffffff810e9690>] ? copy_from_user+0x2a/0x2c
[   51.929797]  [<ffffffff811baff1>] ? security_file_permission+0x11/0x13
[   51.929804]  [<ffffffff810ea6a6>] do_readv_writev+0xa2/0x122
[   51.929810]  [<ffffffff810ead93>] ? fcheck_files+0x8f/0xc9
[   51.929816]  [<ffffffff810ea764>] vfs_writev+0x3e/0x49
[   51.929821]  [<ffffffff810ea84a>] sys_writev+0x45/0x8e
[   51.929828]  [<ffffffff81002b6b>] system_call_fastpath+0x16/0x1b

^ permalink raw reply

* Re: eSwitch management
From: Chris Wright @ 2010-04-23 23:04 UTC (permalink / raw)
  To: Anirban Chakraborty
  Cc: Chris Wright, Scott Feldman, David Miller, netdev@vger.kernel.org,
	Arnd Bergmann, Ameen Rahman, Amit Salecha, Rajesh Borundia,
	shemminger@vyatta.com
In-Reply-To: <193C9C72-488F-4543-9BC1-F9938F189E91@qlogic.com>

* Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> 
> On Apr 23, 2010, at 12:44 PM, Chris Wright wrote:
> 
> > * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> >> On Apr 23, 2010, at 9:23 AM, Chris Wright wrote:
> >>> * Anirban Chakraborty (anirban.chakraborty@qlogic.com) wrote:
> >>>> It looks like ifla_vf_info does contain most of the data set. But if I use it, what NETLINK protocol family should I use in my driver to receive netlink messages? Do I need to create a private protocol family?
> >>> 
> >>> No, you don't need to use netlink in your driver.  You just need to fill
> >>> in the relevant net_device_ops in your driver init.  Specifically:
> >>> 
> >>> *      SR-IOV management functions.
> >>> * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
> >>> * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
> >>> * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
> >>> * int (*ndo_get_vf_config)(struct net_device *dev,
> >>> *                          int vf, struct ifla_vf_info *ivf);
> >>> 
> >>> These are all operating on a VF indexed internally w/in the driver, so it's
> >>> a little cumbersome to use from userspace.
> >> 
> >> These are all intended for VFs and are configureable from PF.
> > 
> > Yes, and while the set of callbacks can change, they are always tied to
> > some net_device (typically the PF) that knows how to make hardware
> > settings on behalf of a VF.
> > 
> >> However, in our case, there are multiple physical NIC function on a
> >> port which are configureable by the eswitch.
> > 
> > Is there a PCI function that represents the switch?  Or a special PCI
> > NIC function that has VEB mgmt plane access?  And do you have examples
> > of configuration that you'll do here?
> 
> There is no PCI function that represents the switch. However, one
> of the NIC functions can act as a privileged function to configure the
> eswitch. Typically the first NIC function that is enumerated in the bus
> manages the eswitch. Typical configurations would be to set tx bandwidth,
> VLAN ID, MAC address, promiscuous mode setting for each of these ports
> at the start of the day. This is useful in virtualization scenario where
> we can do PCI passthru of the functions to the guest and these settings
> for the guest are configured via the driver in the host.

(btw, this is not uncommon, there other adapters that have multiple
functions for a single physical port that is not SR-IOV based)

How does the privileged function identify the other functions?  IOW, the
existing SR-IOV ndo callbacks have most of the above (tx bw control, mac,
vlan id), and have an 'int vf' which is basically just a driver specific
identifier to a non-privileged function or set of hw resources.  It looks
like you can use the existing bits (just need to expand a little).

So far we have only:

- tx bw control
- set mac addr
- set vlan id

You've additionally identified:

- set promiscuous mode

I'm also aware of:

- setting port aggregation
- issuing a function reset
- setting port mirroring or bcast/mcast replication
- setting anti-spoofing (mac/vlan..)
- setting security/filtering
- getting port statistics
- ...whatever else I'm forgetting

> <snip>
> > 
> > One idea that has been discussed in the past is to create essentially
> > a pluggable set of bridge_ops.  The first step would be purely internal
> > shuffling, to make the existing sw bridge code go through the bridge_ops.
> > The second step would be making your driver for whichever PCI function
> > you have that supports managing the bridge create a net_device which is
> > a bridge during driver init.  And now normal brctl can call into your
> > VEB via the bridge_ops callbacks. </handwave>
> > 
> I liked the idea of iovnl as it works by utilizing port profile. That way the eswitch can be configured with the same port profile that a vswitch in a hypervisor has.

I don't quite follow you here.

thanks,
-chris

^ permalink raw reply

* Re: [PATCH 1/2] fsl_pq_mdio: Fix kernel oops during OF address translation
From: David Miller @ 2010-04-23 23:20 UTC (permalink / raw)
  To: avorontsov; +Cc: Sandeep.Kumar, netdev, linuxppc-dev
In-Reply-To: <20100423171235.GA2140@oksana.dev.rtsoft.ru>

From: Anton Vorontsov <avorontsov@mvista.com>
Date: Fri, 23 Apr 2010 21:12:35 +0400

> Old P1020RDB device trees were not specifing tbipa address for
> MDIO nodes, which is now causing this kernel oops:
> 
>  ...
>  eth2: TX BD ring size for Q[6]: 256
>  eth2: TX BD ring size for Q[7]: 256
>  Unable to handle kernel paging request for data at address 0x00000000
>  Faulting instruction address: 0xc0015504
>  Oops: Kernel access of bad area, sig: 11 [#1]
>  ...
>  NIP [c0015504] memcpy+0x3c/0x9c
>  LR [c000a9f8] __of_translate_address+0xfc/0x21c
>  Call Trace:
>  [df839e00] [c000a94c] __of_translate_address+0x50/0x21c (unreliable)
>  [df839e50] [c01a33e8] get_gfar_tbipa+0xb0/0xe0
>  ...
> 
> The old device trees are buggy, though having a dead ethernet is
> better than a dead kernel, so fix the issue by using of_iomap().
> 
> Also, a somewhat similar issue exist in the probe() routine, though
> there the oops is only a possibility. Nonetheless, fix it too.
> 
> Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>

Seems reasonable, applied to net-2.6 thanks!

^ permalink raw reply

* Re: [PATCH 2/2] gianfar: Fix potential oops during OF address translation
From: David Miller @ 2010-04-23 23:20 UTC (permalink / raw)
  To: avorontsov; +Cc: Sandeep.Kumar, netdev, linuxppc-dev
In-Reply-To: <20100423171244.GB2140@oksana.dev.rtsoft.ru>

From: Anton Vorontsov <avorontsov@mvista.com>
Date: Fri, 23 Apr 2010 21:12:44 +0400

> gianfar driver may pass NULL pointer to the of_translate_address(),
> which may lead to a kernel oops. Fix this by using of_iomap(), which
> is also much simpler and shorter.
> 
> Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>

Also applied to net-2.6, thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox