Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 2/2] smsc95xx: set MII_BUSY bit to read/write PHY regs
From: David Miller @ 2012-11-13 19:26 UTC (permalink / raw)
  To: steve; +Cc: netdev
In-Reply-To: <CAKh2mn4Hm0R5vgOtx+kjjVXh9ucF649cdrXb_VVbRBN47WGr2A@mail.gmail.com>

From: Steve Glendinning <steve@shawell.net>
Date: Tue, 13 Nov 2012 17:44:39 +0000

>> Steve please let me know why you only submitted patch #2
>> of an apparent 2 part series.
> 
> My bad being sloppy when formatting the patch for submission, sorry.
> The first patch in my rebased tree was one you've already accepted
> into net but hadn't pulled into net-next yet.  I just sent the second
> one, and I forgot to remove the 2/2 from it.
> 
> They don't depend on each other, they're independent bugfixes.
> 
> Sorry again for the sloppyness.

Thanks for the clarification, applied, thanks.

^ permalink raw reply

* Re: [net-next 04/11] ixgbevf: Add flag to indicate when rx is in net poll
From: Greg Rose @ 2012-11-13 19:25 UTC (permalink / raw)
  To: David Miller; +Cc: jeffrey.t.kirsher, netdev, gospo, sassmann
In-Reply-To: <20121113.142005.106273728495001348.davem@davemloft.net>

On Tue, 13 Nov 2012 14:20:05 -0500
David Miller <davem@davemloft.net> wrote:

> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Tue, 13 Nov 2012 06:03:18 -0800
> 
> > From: Greg Rose <gregory.v.rose@intel.com>
> > 
> > napi_gro_receive shouldn't be called from netpoll context.  Doing
> > so was causing kernel panics when jumbo frames larger than 2K were
> > set. Add a flag to check if the Rx ring processing is occurring
> > from interrupt context or from netpoll context and call netif_rx()
> > if in the polling context.
> > 
> > Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> > Tested-by: Sibai Li <sibai.li@intel.com>
> > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> 
> This is not a scalable solution.
> 
> It is not prudent to have every single driver do a check like
> this.  If using GRO receive from netpoll causes problems,
> then it's a generic issue rather than a driver specific one.

OK, let me look into this a bit more then.

Thanks,

- Greg

^ permalink raw reply

* Re: [PATCH 01/11] batman-adv: don't rely on positions in struct for hashing
From: David Miller @ 2012-11-13 19:24 UTC (permalink / raw)
  To: ordex; +Cc: netdev, b.a.t.m.a.n, simon.wunderlich, siwu
In-Reply-To: <1352798139-19458-2-git-send-email-ordex@autistici.org>

From: Antonio Quartulli <ordex@autistici.org>
Date: Tue, 13 Nov 2012 10:15:29 +0100

> @@ -37,18 +37,26 @@ static void batadv_bla_periodic_work(struct work_struct *work);
>  static void batadv_bla_send_announce(struct batadv_priv *bat_priv,
>  				     struct batadv_backbone_gw *backbone_gw);
>  
> +static inline void hash_bytes(uint32_t *hash, void *data, uint32_t size)
> +{
> +	const unsigned char *key = data;
> +	int i;
> +
> +	for (i = 0; i < size; i++) {
> +		*hash += key[i];
> +		*hash += (*hash << 10);
> +		*hash ^= (*hash >> 6);
> +	}
> +}
> +

Remove the inline tag.

Return the uint32_t resulting hash value rather than passing it by
reference.

^ permalink raw reply

* Re: [PATCH 01/22] bnx2x: Support probing and removing of VF device
From: David Miller @ 2012-11-13 19:22 UTC (permalink / raw)
  To: ariele; +Cc: netdev, eilong
In-Reply-To: <1352823451-27042-2-git-send-email-ariele@broadcom.com>

From: "Ariel Elior" <ariele@broadcom.com>
Date: Tue, 13 Nov 2012 18:17:10 +0200

> @@ -1579,6 +1598,7 @@ struct bnx2x {
>  	char			fw_ver[32];
>  	const struct firmware	*firmware;
>  
> +
>  	/* DCB support on/off */
>  	u16 dcb_state;
>  #define BNX2X_DCB_STATE_OFF			0

Please re-review this entire patch series and get rid of
gratuitous things like the unnecessary empty line being
added here.

Thanks.

^ permalink raw reply

* Re: [net-next 04/11] ixgbevf: Add flag to indicate when rx is in net poll
From: David Miller @ 2012-11-13 19:20 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: gregory.v.rose, netdev, gospo, sassmann
In-Reply-To: <1352815405-751-5-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 13 Nov 2012 06:03:18 -0800

> From: Greg Rose <gregory.v.rose@intel.com>
> 
> napi_gro_receive shouldn't be called from netpoll context.  Doing
> so was causing kernel panics when jumbo frames larger than 2K were set.
> Add a flag to check if the Rx ring processing is occurring from interrupt
> context or from netpoll context and call netif_rx() if in the polling
> context.
> 
> Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
> Tested-by: Sibai Li <sibai.li@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

This is not a scalable solution.

It is not prudent to have every single driver do a check like
this.  If using GRO receive from netpoll causes problems,
then it's a generic issue rather than a driver specific one.

^ permalink raw reply

* Re: [net-next 00/11][pull request] Intel Wired LAN Driver Updates
From: David Miller @ 2012-11-13 19:19 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, sassmann
In-Reply-To: <1352815405-751-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Tue, 13 Nov 2012 06:03:14 -0800

> This series contains updates to ixgbe, ixgbevf and igb.
> 
> The following are changes since commit 9fafd65ad407d4e0c96919a325f568dd95d032af:
>   ipv6 ndisc: Use pre-defined in6addr_linklocal_allnodes.
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Pulled, but I have some long term feedback to give on one of the
patches, thanks.

^ permalink raw reply

* Re: [PATCH 0/3] netfilter updates for net-next
From: David Miller @ 2012-11-13 19:12 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <20121113124002.GA5350@1984>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Tue, 13 Nov 2012 13:40:02 +0100

> On Tue, Nov 13, 2012 at 01:06:40AM +0100, pablo@netfilter.org wrote:
>> From: Pablo Neira Ayuso <pablo@netfilter.org>
>> 
>> Hi David,
>> 
>> The following three patches contain updates for your net-next tree,
>> they include:
>> 
>> * Little cleanup for IPVS the use of a strange notation to assign the
>>   conntrack object, from Alan Cox.
>> 
>> * getsockopt support to obtain the original IPv6 address after NAT,
>>   similar to the one that IPv4 provides, from Florian Westphal.
>> 
>> * Another little cleanup for nf_nat to save a couple of lines by using
>>   PTR_RET, from Wu Fengguang.
> 
> Please, hold on with this pull request. We need a follow-up patch to
> resolve an issue regarding the getsockopt to obtain original address
> with IPv6 NAT.
> 
> I'll send you a new pull request.

Ok.

^ permalink raw reply

* Re: SR-IOV problem with Intel 82599EB (not enough MMIO resources for SR-IOV)
From: Yinghai Lu @ 2012-11-13 18:29 UTC (permalink / raw)
  To: Li, Sibai
  Cc: Jason Gao, bhelgaas@google.com, Rose, Gregory V,
	ddutile@redhat.com, Kirsher, Jeffrey T, linux-kernel, netdev, kvm,
	e1000-devel@lists.sourceforge.net, linux-pci@vger.kernel.org
In-Reply-To: <A3A921390EAE4044940E99EBF13D97321AE21FBB@FMSMSX104.amr.corp.intel.com>

On Tue, Nov 13, 2012 at 10:25 AM, Li, Sibai <sibai.li@intel.com> wrote:
>
> Never append "pci=realloc" for both kernel 2.6.32.279 and kernel 3.5.0 above.

well,  can you both post boot log with "debug ignore_loglevel" ?

^ permalink raw reply

* Re: SR-IOV problem with Intel 82599EB (not enough MMIO resources for SR-IOV)
From: Li, Sibai @ 2012-11-13 18:25 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: kvm, e1000-devel@lists.sourceforge.net, netdev, linux-kernel,
	ddutile@redhat.com, linux-pci@vger.kernel.org,
	bhelgaas@google.com
In-Reply-To: <CAE9FiQXnP2gpy9z6VNGSpimLkhDYKp6Qr=9nGZ9d-pKdohA4bw@mail.gmail.com>


> -----Original Message-----
> From: yhlu.kernel@gmail.com [mailto:yhlu.kernel@gmail.com] On Behalf Of
> Yinghai Lu
> Sent: Tuesday, November 13, 2012 10:17 AM
> To: Li, Sibai
> Cc: Jason Gao; bhelgaas@google.com; Rose, Gregory V; ddutile@redhat.com;
> Kirsher, Jeffrey T; linux-kernel; netdev; kvm; e1000-devel@lists.sourceforge.net;
> linux-pci@vger.kernel.org
> Subject: Re: SR-IOV problem with Intel 82599EB (not enough MMIO resources
> for SR-IOV)
> 
> On Tue, Nov 13, 2012 at 8:04 AM, Li, Sibai <sibai.li@intel.com> wrote:
> >
> >>
> >> Thank you very much,I try "pci=realloc" in Centos 6.3,and now it works for
> me.
> >>
> >> thank you Sibai,Our server "Dell R710",its BIOS version is just
> >> v.6.3.0 and release date is 07/24/2012,and I also configured
> >> intel_iommu=on in the grub.conf file,but I can't find these IOMMU
> >> options in "Device Drivers" in my
> >> kernel(2.6.32-279) .config file , btw my os is Centos
> >> 6.3(RHEL6.3),although the problem solved,I'd like to know what's your os
> version ,kernel version?
> >
> > I am using RHEL6.3 with unstable kernel 3.7.0-rc
> 
> that means that config has
> CONFIG_PCI_REALLOC_ENABLE_AUTO=y
> 
> So you don't need to append "pci=realloc"
> 
> Yinghai

Never append "pci=realloc" for both kernel 2.6.32.279 and kernel 3.5.0 above.

------------------------------------------------------------------------------
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply

* Re: [REGRESSION, v3.7-rc5, bisected] 100% CPU usage in softirqd, unable to shutdown(Internet mail)
From: Lekensteyn @ 2012-11-13 18:25 UTC (permalink / raw)
  To: dannyfeng(冯小天)
  Cc: David S. Miller, netdev@vger.kernel.org
In-Reply-To: <0ED22CD02BE51541A6BCC0D6BDA097AE0F5A6D@EXMBX-BJ003.tencent.com>

On Tuesday 13 November 2012 10:26:03 dannyfeng wrote:
> I'm sorry, I didn't noticed that jme_open use tasklet_enable, which may
> reuse a killed tasklet... Could you please try following patch?
> ---
> diff --git a/drivers/net/ethernet/jme.c b/drivers/net/ethernet/jme.c
> index 92317e9..c0314c1 100644
> --- a/drivers/net/ethernet/jme.c
> +++ b/drivers/net/ethernet/jme.c
> @@ -1860,10 +1860,14 @@ jme_open(struct net_device *netdev)
>         jme_clear_pm(jme);
>         JME_NAPI_ENABLE(jme);
>  
> -       tasklet_enable(&jme->linkch_task);
> -       tasklet_enable(&jme->txclean_task);
> -       tasklet_hi_enable(&jme->rxclean_task);
> -       tasklet_hi_enable(&jme->rxempty_task);
> +       tasklet_init(&jme->linkch_task, jme_link_change_tasklet,
> +                     (unsigned long) jme);
> +       tasklet_init(&jme->txclean_task, jme_tx_clean_tasklet,
> +                     (unsigned long) jme);
> +       tasklet_init(&jme->rxclean_task, jme_rx_clean_tasklet,
> +                     (unsigned long) jme);
> +       tasklet_init(&jme->rxempty_task, jme_rx_empty_tasklet,
> +                     (unsigned long) jme);
>  
>         rc = jme_request_irq(jme);
>         if (rc)
> @@ -3079,22 +3083,6 @@ jme_init_one(struct pci_dev *pdev,
>         tasklet_init(&jme->pcc_task,
>                      jme_pcc_tasklet,
>                      (unsigned long) jme);
> -       tasklet_init(&jme->linkch_task,
> -                    jme_link_change_tasklet,
> -                    (unsigned long) jme);
> -       tasklet_init(&jme->txclean_task,
> -                    jme_tx_clean_tasklet,
> -                    (unsigned long) jme);
> -       tasklet_init(&jme->rxclean_task,
> -                    jme_rx_clean_tasklet,
> -                    (unsigned long) jme);
> -       tasklet_init(&jme->rxempty_task,
> -                    jme_rx_empty_tasklet,
> -                    (unsigned long) jme);
> -       tasklet_disable_nosync(&jme->linkch_task);
> -       tasklet_disable_nosync(&jme->txclean_task);
> -       tasklet_disable_nosync(&jme->rxclean_task);
> -       tasklet_disable_nosync(&jme->rxempty_task);
>         jme->dpi.cur = PCC_P1;
>  
>         jme->reg_ghc = 0;
Tested-by: Peter Wu <lekensteyn@gmail.com>

I have no idea what this does, but it seems to help as I can suspend/resume 
without rapidly rising softirqs. (applied on top of 3.7-rc5.)

I tested it like this:
- rmmod jme
- insmod <path to new patched build>/jme.ko

I hand-edited the code as your patch does not apply due to tab to space 
conversion. You might want to use a real email client instead of the web 
interface.

Regards,
Peter

^ permalink raw reply

* Re: SR-IOV problem with Intel 82599EB (not enough MMIO resources for SR-IOV)
From: Yinghai Lu @ 2012-11-13 18:16 UTC (permalink / raw)
  To: Li, Sibai
  Cc: Jason Gao, bhelgaas@google.com, Rose, Gregory V,
	ddutile@redhat.com, Kirsher, Jeffrey T, linux-kernel, netdev, kvm,
	e1000-devel@lists.sourceforge.net, linux-pci@vger.kernel.org
In-Reply-To: <A3A921390EAE4044940E99EBF13D97321AE21E15@FMSMSX104.amr.corp.intel.com>

On Tue, Nov 13, 2012 at 8:04 AM, Li, Sibai <sibai.li@intel.com> wrote:
>
>>
>> Thank you very much,I try "pci=realloc" in Centos 6.3,and now it works for me.
>>
>> thank you Sibai,Our server "Dell R710",its BIOS version is just
>> v.6.3.0 and release date is 07/24/2012,and I also configured intel_iommu=on in
>> the grub.conf file,but I can't find these IOMMU options in "Device Drivers" in my
>> kernel(2.6.32-279) .config file , btw my os is Centos 6.3(RHEL6.3),although the
>> problem solved,I'd like to know what's your os version ,kernel version?
>
> I am using RHEL6.3 with unstable kernel 3.7.0-rc

that means that config has
CONFIG_PCI_REALLOC_ENABLE_AUTO=y

So you don't need to append "pci=realloc"

Yinghai

^ permalink raw reply

* [PATCH net-next 3/4] bridge: implement BPDU blocking
From: Stephen Hemminger @ 2012-11-13 17:53 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20121113175304.628996029@vyatta.com>

[-- Attachment #1: bridge-bpdu-guard.patch --]
[-- Type: text/plain, Size: 4367 bytes --]

This is Linux bridge implementation of STP protection
(Cisco BPDU guard/Juniper BPDU block). BPDU block disables
the bridge port if a STP BPDU packet is received.

Why would you want to do this?
If running Spanning Tree on bridge, hostile devices on the network
may send BPDU and cause network failure. Enabling bpdu block
will detect and stop this.

How to recover the port?
The port will be restarted if link is brought down, or
removed and reattached.  For example:
 # ip li set dev eth0 down; ip li set dev eth0 up

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
v2 - refresh for earlier changes related to generic bridge setlink

 include/uapi/linux/if_link.h |    1 +
 net/bridge/br_netlink.c      |    7 ++++++-
 net/bridge/br_private.h      |    1 +
 net/bridge/br_stp_bpdu.c     |    7 +++++++
 net/bridge/br_sysfs_if.c     |    2 ++
 5 files changed, 17 insertions(+), 1 deletion(-)

--- a/net/bridge/br_private.h	2012-11-12 07:58:45.000000000 -0800
+++ b/net/bridge/br_private.h	2012-11-12 13:52:51.858985336 -0800
@@ -135,6 +135,7 @@ struct net_bridge_port
 
 	unsigned long 			flags;
 #define BR_HAIRPIN_MODE		0x00000001
+#define BR_BPDU_GUARD           0x00000002
 
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	u32				multicast_startup_queries_sent;
--- a/net/bridge/br_stp_bpdu.c	2012-11-12 07:58:35.000000000 -0800
+++ b/net/bridge/br_stp_bpdu.c	2012-11-12 13:52:51.858985336 -0800
@@ -170,6 +170,13 @@ void br_stp_rcv(const struct stp_proto *
 	if (!ether_addr_equal(dest, br->group_addr))
 		goto out;
 
+	if (p->flags & BR_BPDU_GUARD) {
+		br_notice(br, "BPDU received on blocked port %u(%s)\n",
+			  (unsigned int) p->port_no, p->dev->name);
+		br_stp_disable_port(p);
+		goto out;
+	}
+
 	buf = skb_pull(skb, 3);
 
 	if (buf[0] == BPDU_TYPE_CONFIG) {
--- a/net/bridge/br_sysfs_if.c	2012-11-12 13:52:42.000000000 -0800
+++ b/net/bridge/br_sysfs_if.c	2012-11-12 13:52:51.858985336 -0800
@@ -156,6 +156,7 @@ static int store_flush(struct net_bridge
 static BRPORT_ATTR(flush, S_IWUSR, NULL, store_flush);
 
 BRPORT_ATTR_FLAG(hairpin_mode, BR_HAIRPIN_MODE);
+BRPORT_ATTR_FLAG(bpdu_guard, BR_BPDU_GUARD);
 
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 static ssize_t show_multicast_router(struct net_bridge_port *p, char *buf)
@@ -189,6 +190,7 @@ static const struct brport_attribute *br
 	&brport_attr_hold_timer,
 	&brport_attr_flush,
 	&brport_attr_hairpin_mode,
+	&brport_attr_bpdu_guard,
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	&brport_attr_multicast_router,
 #endif
--- a/include/uapi/linux/if_link.h	2012-11-12 13:33:53.000000000 -0800
+++ b/include/uapi/linux/if_link.h	2012-11-12 13:52:51.858985336 -0800
@@ -216,6 +216,7 @@ enum {
 	IFLA_BRPORT_PRIORITY,	/* "             priority  */
 	IFLA_BRPORT_COST,	/* "             cost      */
 	IFLA_BRPORT_MODE,	/* mode (hairpin)          */
+	IFLA_BRPORT_GUARD,	/* bpdu guard              */
 	__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
--- a/net/bridge/br_netlink.c	2012-11-12 13:51:37.000000000 -0800
+++ b/net/bridge/br_netlink.c	2012-11-12 13:53:19.694706427 -0800
@@ -26,6 +26,7 @@ static inline size_t br_port_info_size(v
 		+ nla_total_size(2)	/* IFLA_BRPORT_PRIORITY */
 		+ nla_total_size(4)	/* IFLA_BRPORT_COST */
 		+ nla_total_size(1)	/* IFLA_BRPORT_MODE */
+		+ nla_total_size(1)	/* IFLA_BRPORT_GUARD */
 		+ 0;
 }
 
@@ -49,7 +50,8 @@ static int br_port_fill_attrs(struct sk_
 	if (nla_put_u8(skb, IFLA_BRPORT_STATE, p->state) ||
 	    nla_put_u16(skb, IFLA_BRPORT_PRIORITY, p->priority) ||
 	    nla_put_u32(skb, IFLA_BRPORT_COST, p->path_cost) ||
-	    nla_put_u8(skb, IFLA_BRPORT_MODE, mode))
+	    nla_put_u8(skb, IFLA_BRPORT_MODE, mode) ||
+	    nla_put_u8(skb, IFLA_BRPORT_GUARD, !!(p->flags & BR_BPDU_GUARD)))
 		return -EMSGSIZE;
 
 	return 0;
@@ -162,6 +164,7 @@ static const struct nla_policy ifla_brpo
 	[IFLA_BRPORT_COST]	= { .type = NLA_U32 },
 	[IFLA_BRPORT_PRIORITY]	= { .type = NLA_U16 },
 	[IFLA_BRPORT_MODE]	= { .type = NLA_U8 },
+	[IFLA_BRPORT_GUARD]	= { .type = NLA_U8 },
 };
 
 /* Change the state of the port and notify spanning tree */
@@ -203,6 +206,7 @@ static int br_setport(struct net_bridge_
 	int err;
 
 	br_set_port_flag(p, tb, IFLA_BRPORT_MODE, BR_HAIRPIN_MODE);
+	br_set_port_flag(p, tb, IFLA_BRPORT_GUARD, BR_BPDU_GUARD);
 
 	if (tb[IFLA_BRPORT_COST]) {
 		err = br_stp_set_path_cost(p, nla_get_u32(tb[IFLA_BRPORT_COST]));

^ permalink raw reply

* [PATCH net-next 4/4] bridge: add root port blocking
From: Stephen Hemminger @ 2012-11-13 17:53 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20121113175304.628996029@vyatta.com>

[-- Attachment #1: bridge-root-block.patch --]
[-- Type: text/plain, Size: 4599 bytes --]

This is Linux bridge implementation of root port guard.
If BPDU is received from a leaf (edge) port, it should not
be elected as root port.

Why would you want to do this?
If using STP on a bridge and the downstream bridges are not fully
trusted; this prevents a hostile guest for rerouting traffic.

Why not just use netfilter?
Netfilter does not track of follow spanning tree decisions.
It would be difficult and error prone to try and mirror STP
resolution in netfilter module.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


---
 include/uapi/linux/if_link.h |    1 +
 net/bridge/br_netlink.c      |    6 +++++-
 net/bridge/br_private.h      |    1 +
 net/bridge/br_stp.c          |   22 +++++++++++++++++++++-
 net/bridge/br_sysfs_if.c     |    2 ++
 5 files changed, 30 insertions(+), 2 deletions(-)

--- a/net/bridge/br_private.h	2012-11-12 13:52:51.858985336 -0800
+++ b/net/bridge/br_private.h	2012-11-12 13:54:52.309778448 -0800
@@ -136,6 +136,7 @@ struct net_bridge_port
 	unsigned long 			flags;
 #define BR_HAIRPIN_MODE		0x00000001
 #define BR_BPDU_GUARD           0x00000002
+#define BR_ROOT_BLOCK		0x00000004
 
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	u32				multicast_startup_queries_sent;
--- a/net/bridge/br_stp.c	2012-11-12 07:58:31.067183169 -0800
+++ b/net/bridge/br_stp.c	2012-11-12 13:53:27.254630683 -0800
@@ -100,6 +100,21 @@ static int br_should_become_root_port(co
 	return 0;
 }
 
+static void br_root_port_block(const struct net_bridge *br,
+			       struct net_bridge_port *p)
+{
+
+	br_notice(br, "port %u(%s) tried to become root port (blocked)",
+		  (unsigned int) p->port_no, p->dev->name);
+
+	p->state = BR_STATE_LISTENING;
+	br_log_state(p);
+	br_ifinfo_notify(RTM_NEWLINK, p);
+
+	if (br->forward_delay > 0)
+		mod_timer(&p->forward_delay_timer, jiffies + br->forward_delay);
+}
+
 /* called under bridge lock */
 static void br_root_selection(struct net_bridge *br)
 {
@@ -107,7 +122,12 @@ static void br_root_selection(struct net
 	u16 root_port = 0;
 
 	list_for_each_entry(p, &br->port_list, list) {
-		if (br_should_become_root_port(p, root_port))
+		if (!br_should_become_root_port(p, root_port))
+			continue;
+
+		if (p->flags & BR_ROOT_BLOCK)
+			br_root_port_block(br, p);
+		else
 			root_port = p->port_no;
 	}
 
--- a/net/bridge/br_sysfs_if.c	2012-11-12 13:52:51.858985336 -0800
+++ b/net/bridge/br_sysfs_if.c	2012-11-12 13:54:52.309778448 -0800
@@ -157,6 +157,7 @@ static BRPORT_ATTR(flush, S_IWUSR, NULL,
 
 BRPORT_ATTR_FLAG(hairpin_mode, BR_HAIRPIN_MODE);
 BRPORT_ATTR_FLAG(bpdu_guard, BR_BPDU_GUARD);
+BRPORT_ATTR_FLAG(root_block, BR_ROOT_BLOCK);
 
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 static ssize_t show_multicast_router(struct net_bridge_port *p, char *buf)
@@ -191,6 +192,7 @@ static const struct brport_attribute *br
 	&brport_attr_flush,
 	&brport_attr_hairpin_mode,
 	&brport_attr_bpdu_guard,
+	&brport_attr_root_block,
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	&brport_attr_multicast_router,
 #endif
--- a/include/uapi/linux/if_link.h	2012-11-12 13:52:51.858985336 -0800
+++ b/include/uapi/linux/if_link.h	2012-11-12 13:54:52.309778448 -0800
@@ -217,6 +217,7 @@ enum {
 	IFLA_BRPORT_COST,	/* "             cost      */
 	IFLA_BRPORT_MODE,	/* mode (hairpin)          */
 	IFLA_BRPORT_GUARD,	/* bpdu guard              */
+	IFLA_BRPORT_PROTECT,	/* root port protection    */
 	__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
--- a/net/bridge/br_netlink.c	2012-11-12 13:53:19.694706427 -0800
+++ b/net/bridge/br_netlink.c	2012-11-12 13:54:52.309778448 -0800
@@ -27,6 +27,7 @@ static inline size_t br_port_info_size(v
 		+ nla_total_size(4)	/* IFLA_BRPORT_COST */
 		+ nla_total_size(1)	/* IFLA_BRPORT_MODE */
 		+ nla_total_size(1)	/* IFLA_BRPORT_GUARD */
+		+ nla_total_size(1)	/* IFLA_BRPORT_PROTECT */
 		+ 0;
 }
 
@@ -51,7 +52,8 @@ static int br_port_fill_attrs(struct sk_
 	    nla_put_u16(skb, IFLA_BRPORT_PRIORITY, p->priority) ||
 	    nla_put_u32(skb, IFLA_BRPORT_COST, p->path_cost) ||
 	    nla_put_u8(skb, IFLA_BRPORT_MODE, mode) ||
-	    nla_put_u8(skb, IFLA_BRPORT_GUARD, !!(p->flags & BR_BPDU_GUARD)))
+	    nla_put_u8(skb, IFLA_BRPORT_GUARD, !!(p->flags & BR_BPDU_GUARD)) ||
+	    nla_put_u8(skb, IFLA_BRPORT_PROTECT, !!(p->flags & BR_ROOT_BLOCK)))
 		return -EMSGSIZE;
 
 	return 0;
@@ -165,6 +167,7 @@ static const struct nla_policy ifla_brpo
 	[IFLA_BRPORT_PRIORITY]	= { .type = NLA_U16 },
 	[IFLA_BRPORT_MODE]	= { .type = NLA_U8 },
 	[IFLA_BRPORT_GUARD]	= { .type = NLA_U8 },
+	[IFLA_BRPORT_PROTECT]	= { .type = NLA_U8 },
 };
 
 /* Change the state of the port and notify spanning tree */

^ permalink raw reply

* [PATCH net-next 0/4] New Bridge security features
From: Stephen Hemminger @ 2012-11-13 17:53 UTC (permalink / raw)
  To: davem; +Cc: netdev

New bridge API's and security features for protecting
Spanning Tree Protocol.  For more info, see KVM forum talk
  http://www.slideshare.net/shemminger/new-bridge
and BPDU guard explanation here:
  http://blog.ipexpert.com/2010/12/06/bpdu-filter-and-bpdu-guard/

^ permalink raw reply

* [PATCH net-next 2/4] bridge: add template for bridge port flags
From: Stephen Hemminger @ 2012-11-13 17:53 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20121113175304.628996029@vyatta.com>

[-- Attachment #1: brport-flags-sysfs.patch --]
[-- Type: text/plain, Size: 1902 bytes --]

Provide macro to build sysfs data structures and functions
for accessing flag bits.  If flag bits change do netlink
notification.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/net/bridge/br_sysfs_if.c	2012-11-12 07:58:35.411139543 -0800
+++ b/net/bridge/br_sysfs_if.c	2012-11-12 13:52:42.319080923 -0800
@@ -34,6 +34,28 @@ const struct brport_attribute brport_att
 	.store	= _store,					\
 };
 
+#define BRPORT_ATTR_FLAG(_name, _mask)				\
+static ssize_t show_##_name(struct net_bridge_port *p, char *buf) \
+{								\
+	return sprintf(buf, "%d\n", !!(p->flags & _mask));	\
+}								\
+static int store_##_name(struct net_bridge_port *p, unsigned long v) \
+{								\
+	unsigned long flags = p->flags;				\
+	if (v)							\
+		flags |= _mask;					\
+	else							\
+		flags &= ~_mask;				\
+	if (flags != p->flags) {				\
+		p->flags = flags;				\
+		br_ifinfo_notify(RTM_NEWLINK, p);		\
+	}							\
+	return 0;						\
+}								\
+static BRPORT_ATTR(_name, S_IRUGO | S_IWUSR,			\
+		   show_##_name, store_##_name)
+
+
 static ssize_t show_path_cost(struct net_bridge_port *p, char *buf)
 {
 	return sprintf(buf, "%d\n", p->path_cost);
@@ -133,21 +155,7 @@ static int store_flush(struct net_bridge
 }
 static BRPORT_ATTR(flush, S_IWUSR, NULL, store_flush);
 
-static ssize_t show_hairpin_mode(struct net_bridge_port *p, char *buf)
-{
-	int hairpin_mode = (p->flags & BR_HAIRPIN_MODE) ? 1 : 0;
-	return sprintf(buf, "%d\n", hairpin_mode);
-}
-static int store_hairpin_mode(struct net_bridge_port *p, unsigned long v)
-{
-	if (v)
-		p->flags |= BR_HAIRPIN_MODE;
-	else
-		p->flags &= ~BR_HAIRPIN_MODE;
-	return 0;
-}
-static BRPORT_ATTR(hairpin_mode, S_IRUGO | S_IWUSR,
-		   show_hairpin_mode, store_hairpin_mode);
+BRPORT_ATTR_FLAG(hairpin_mode, BR_HAIRPIN_MODE);
 
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 static ssize_t show_multicast_router(struct net_bridge_port *p, char *buf)

^ permalink raw reply

* [PATCH net-next 1/4] bridge: bridge port parameters over netlink
From: Stephen Hemminger @ 2012-11-13 17:53 UTC (permalink / raw)
  To: davem; +Cc: netdev
In-Reply-To: <20121113175304.628996029@vyatta.com>

[-- Attachment #1: bridge-port-flags-netlink.patch --]
[-- Type: text/plain, Size: 7734 bytes --]

Expose bridge port parameter over netlink. By switching to a nested
message, this can be used for other bridge parameters.

This changes IFLA_PROTINFO attribute from one byte to a full nested
set of attributes. This is safe for application interface because the
old message used IFLA_PROTINFO and new one uses
 IFLA_PROTINFO | NLA_F_NESTED.

The code adapts to old format requests, and therefore stays
compatible with user mode RSTP daemon. Since the type field
for nested and unnested attributes are different, and the old
code in libnetlink doesn't do the mask, it is also safe to use
with old versions of bridge monitor command.

Note: although mode is only a boolean, treating it as a
full byte since in the future someone will probably want to add more
values (like macvlan has).

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
v2 - adapt to new bridge generic setlink infrastructure
     fix cost/priority width in policy

 include/uapi/linux/if_link.h |   10 ++
 net/bridge/br_netlink.c      |  145 ++++++++++++++++++++++++++++++++-----------
 2 files changed, 119 insertions(+), 36 deletions(-)

--- a/include/uapi/linux/if_link.h	2012-11-12 07:58:35.000000000 -0800
+++ b/include/uapi/linux/if_link.h	2012-11-12 13:33:53.558390820 -0800
@@ -205,6 +205,21 @@ enum {
 
 #define IFLA_INET6_MAX	(__IFLA_INET6_MAX - 1)
 
+enum {
+	BRIDGE_MODE_UNSPEC,
+	BRIDGE_MODE_HAIRPIN,
+};
+
+enum {
+	IFLA_BRPORT_UNSPEC,
+	IFLA_BRPORT_STATE,	/* Spanning tree state     */
+	IFLA_BRPORT_PRIORITY,	/* "             priority  */
+	IFLA_BRPORT_COST,	/* "             cost      */
+	IFLA_BRPORT_MODE,	/* mode (hairpin)          */
+	__IFLA_BRPORT_MAX
+};
+#define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
+
 struct ifla_cacheinfo {
 	__u32	max_reasm_len;
 	__u32	tstamp;		/* ipv6InterfaceTable updated timestamp */
--- a/net/bridge/br_netlink.c	2012-11-12 07:58:45.000000000 -0800
+++ b/net/bridge/br_netlink.c	2012-11-12 13:51:24.367861973 -0800
@@ -20,16 +20,39 @@
 #include "br_private.h"
 #include "br_private_stp.h"
 
+static inline size_t br_port_info_size(void)
+{
+	return nla_total_size(1)	/* IFLA_BRPORT_STATE  */
+		+ nla_total_size(2)	/* IFLA_BRPORT_PRIORITY */
+		+ nla_total_size(4)	/* IFLA_BRPORT_COST */
+		+ nla_total_size(1)	/* IFLA_BRPORT_MODE */
+		+ 0;
+}
+
 static inline size_t br_nlmsg_size(void)
 {
 	return NLMSG_ALIGN(sizeof(struct ifinfomsg))
-	       + nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
-	       + nla_total_size(MAX_ADDR_LEN) /* IFLA_ADDRESS */
-	       + nla_total_size(4) /* IFLA_MASTER */
-	       + nla_total_size(4) /* IFLA_MTU */
-	       + nla_total_size(4) /* IFLA_LINK */
-	       + nla_total_size(1) /* IFLA_OPERSTATE */
-	       + nla_total_size(1); /* IFLA_PROTINFO */
+		+ nla_total_size(IFNAMSIZ) /* IFLA_IFNAME */
+		+ nla_total_size(MAX_ADDR_LEN) /* IFLA_ADDRESS */
+		+ nla_total_size(4) /* IFLA_MASTER */
+		+ nla_total_size(4) /* IFLA_MTU */
+		+ nla_total_size(4) /* IFLA_LINK */
+		+ nla_total_size(1) /* IFLA_OPERSTATE */
+		+ nla_total_size(br_port_info_size()); /* IFLA_PROTINFO */
+}
+
+static int br_port_fill_attrs(struct sk_buff *skb,
+			      const struct net_bridge_port *p)
+{
+	u8 mode = !!(p->flags & BR_HAIRPIN_MODE);
+
+	if (nla_put_u8(skb, IFLA_BRPORT_STATE, p->state) ||
+	    nla_put_u16(skb, IFLA_BRPORT_PRIORITY, p->priority) ||
+	    nla_put_u32(skb, IFLA_BRPORT_COST, p->path_cost) ||
+	    nla_put_u8(skb, IFLA_BRPORT_MODE, mode))
+		return -EMSGSIZE;
+
+	return 0;
 }
 
 /*
@@ -67,10 +90,18 @@ static int br_fill_ifinfo(struct sk_buff
 	    (dev->addr_len &&
 	     nla_put(skb, IFLA_ADDRESS, dev->addr_len, dev->dev_addr)) ||
 	    (dev->ifindex != dev->iflink &&
-	     nla_put_u32(skb, IFLA_LINK, dev->iflink)) ||
-	    (event == RTM_NEWLINK &&
-	     nla_put_u8(skb, IFLA_PROTINFO, port->state)))
+	     nla_put_u32(skb, IFLA_LINK, dev->iflink)))
 		goto nla_put_failure;
+
+	if (event == RTM_NEWLINK) {
+		struct nlattr *nest
+			= nla_nest_start(skb, IFLA_PROTINFO | NLA_F_NESTED);
+
+		if (nest == NULL || br_port_fill_attrs(skb, port) < 0)
+			goto nla_put_failure;
+		nla_nest_end(skb, nest);
+	}
+
 	return nlmsg_end(skb, nlh);
 
 nla_put_failure:
@@ -126,49 +157,117 @@ out:
 	return err;
 }
 
-/*
- * Change state of port (ie from forwarding to blocking etc)
- * Used by spanning tree in user space.
- */
-int br_setlink(struct net_device *dev, struct nlmsghdr *nlh)
-{
-	struct ifinfomsg *ifm;
-	struct nlattr *protinfo;
-	struct net_bridge_port *p;
-	u8 new_state;
-
-	ifm = nlmsg_data(nlh);
+static const struct nla_policy ifla_brport_policy[IFLA_BRPORT_MAX + 1] = {
+	[IFLA_BRPORT_STATE]	= { .type = NLA_U8 },
+	[IFLA_BRPORT_COST]	= { .type = NLA_U32 },
+	[IFLA_BRPORT_PRIORITY]	= { .type = NLA_U16 },
+	[IFLA_BRPORT_MODE]	= { .type = NLA_U8 },
+};
 
-	protinfo = nlmsg_find_attr(nlh, sizeof(*ifm), IFLA_PROTINFO);
-	if (!protinfo || nla_len(protinfo) < sizeof(u8))
-		return -EINVAL;
-
-	new_state = nla_get_u8(protinfo);
-	if (new_state > BR_STATE_BLOCKING)
-		return -EINVAL;
-
-	p = br_port_get_rtnl(dev);
-	if (!p)
+/* Change the state of the port and notify spanning tree */
+static int br_set_port_state(struct net_bridge_port *p, u8 state)
+{
+	if (state > BR_STATE_BLOCKING)
 		return -EINVAL;
 
 	/* if kernel STP is running, don't allow changes */
 	if (p->br->stp_enabled == BR_KERNEL_STP)
 		return -EBUSY;
 
-	if (!netif_running(dev) ||
-	    (!netif_carrier_ok(dev) && new_state != BR_STATE_DISABLED))
+	if (!netif_running(p->dev) ||
+	    (!netif_carrier_ok(p->dev) && state != BR_STATE_DISABLED))
 		return -ENETDOWN;
 
-	p->state = new_state;
+	p->state = state;
 	br_log_state(p);
-
-	spin_lock_bh(&p->br->lock);
 	br_port_state_selection(p->br);
-	spin_unlock_bh(&p->br->lock);
+	return 0;
+}
+
+/* Set/clear or port flags based on attribute */
+static void br_set_port_flag(struct net_bridge_port *p, struct nlattr *tb[],
+			   int attrtype, unsigned long mask)
+{
+	if (tb[attrtype]) {
+		u8 flag = nla_get_u8(tb[attrtype]);
+		if (flag)
+			p->flags |= mask;
+		else
+			p->flags &= ~mask;
+	}
+}
+
+/* Process bridge protocol info on port */
+static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
+{
+	int err;
 
+	br_set_port_flag(p, tb, IFLA_BRPORT_MODE, BR_HAIRPIN_MODE);
+
+	if (tb[IFLA_BRPORT_COST]) {
+		err = br_stp_set_path_cost(p, nla_get_u32(tb[IFLA_BRPORT_COST]));
+		if (err)
+			return err;
+	}
+
+	if (tb[IFLA_BRPORT_PRIORITY]) {
+		err = br_stp_set_port_priority(p, nla_get_u16(tb[IFLA_BRPORT_PRIORITY]));
+		if (err)
+			return err;
+	}
+
+	if (tb[IFLA_BRPORT_STATE]) {
+		err = br_set_port_state(p, nla_get_u8(tb[IFLA_BRPORT_STATE]));
+		if (err)
+			return err;
+	}
 	return 0;
 }
 
+/* Change state and parameters on port. */
+int br_setlink(struct net_device *dev, struct nlmsghdr *nlh)
+{
+	struct ifinfomsg *ifm;
+	struct nlattr *protinfo;
+	struct net_bridge_port *p;
+	struct nlattr *tb[IFLA_BRPORT_MAX];
+	int err;
+
+	ifm = nlmsg_data(nlh);
+
+	protinfo = nlmsg_find_attr(nlh, sizeof(*ifm), IFLA_PROTINFO);
+	if (!protinfo)
+		return 0;
+
+ 	p = br_port_get_rtnl(dev);
+	if (!p)
+		return -EINVAL;
+
+	if (protinfo->nla_type & NLA_F_NESTED) {
+		err = nla_parse_nested(tb, IFLA_BRPORT_MAX,
+				       protinfo, ifla_brport_policy);
+		if (err)
+			return err;
+
+		spin_lock_bh(&p->br->lock);
+		err = br_setport(p, tb);
+		spin_unlock_bh(&p->br->lock);
+	} else {
+		/* Binary compatability with old RSTP */
+		if (nla_len(protinfo) < sizeof(u8))
+			return -EINVAL;
+
+		spin_lock_bh(&p->br->lock);
+		err = br_set_port_state(p, nla_get_u8(protinfo));
+		spin_unlock_bh(&p->br->lock);
+	}
+
+	if (err == 0)
+		br_ifinfo_notify(RTM_NEWLINK, p);
+
+	return err;
+}
+
 static int br_validate(struct nlattr *tb[], struct nlattr *data[])
 {
 	if (tb[IFLA_ADDRESS]) {

^ permalink raw reply

* Re: [PATCH 2/2] smsc95xx: set MII_BUSY bit to read/write PHY regs
From: Steve Glendinning @ 2012-11-13 17:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20121109.160850.901809648224300452.davem@davemloft.net>

> Steve please let me know why you only submitted patch #2
> of an apparent 2 part series.

My bad being sloppy when formatting the patch for submission, sorry.
The first patch in my rebased tree was one you've already accepted
into net but hadn't pulled into net-next yet.  I just sent the second
one, and I forgot to remove the 2/2 from it.

They don't depend on each other, they're independent bugfixes.

Sorry again for the sloppyness.

^ permalink raw reply

* Re: [net-next 01/11] ixgbe: Do not use DCA to prefetch the entire packet into the cache
From: Alexander Duyck @ 2012-11-13 17:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jeff Kirsher, davem, netdev, gospo, sassmann
In-Reply-To: <1352816011.6185.33.camel@edumazet-glaptop>

On 11/13/2012 06:13 AM, Eric Dumazet wrote:
> On Tue, 2012-11-13 at 06:03 -0800, Jeff Kirsher wrote:
>> From: Alexander Duyck <alexander.h.duyck@intel.com>
>>
>> The way the code was previously written it was causing DCA to prefetch the
>> entire packet into the cache when it was enabled.  That is excessive as we
>> only really need the headers.
>>
>> We are now prefetching the headers via software so doing this from DCA would
>> be redundant anyway.  So clear the bit that was causing us to prefetch the
>> packet data and instead only use DCA for the descriptor rings.
>>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
>> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> ---
> Excellent !
>
> My own ixgbe cards are moving so I cant test this, do you guys have some
> numbers to share ?
>
> Thanks

In my tests I saw no real change because of the DCA changes.  I kind of
suspected that would be the case as mentioned in the patch the prefetch
and DCA were both doing the same thing so by dropping the extra DCA
prefetch I am just polluting the case less.

I have a similar set of changes for the ixgbe transmit path that are
similar to the changes I made to igb in this patch set.  Those are a bit
more interesting as they actually decreased the ixgbe_xmit_frame_ring
function overhead for some of my tests by something like 15%.

Thanks,

Alex

^ permalink raw reply

* Re: [net-next PATCH v2 0/3] extend set/get netlink for embedded
From: John Fastabend @ 2012-11-13 17:16 UTC (permalink / raw)
  To: Ariel Elior
  Cc: shemminger@vyatta.com, buytenh@wantstofly.org,
	davem@davemloft.net, vyasevic@redhat.com, jhs@mojatatu.com,
	chrisw@redhat.com, krkumar2@in.ibm.com, samudrala@us.ibm.com,
	peter.p.waskiewicz.jr@intel.com, jeffrey.t.kirsher@intel.com,
	netdev@vger.kernel.org, bhutchings@solarflare.com,
	gregory.v.rose@intel.com, Eilon Greenstein
In-Reply-To: <6AE768456CEC4B4A9B2248CB6B87EB3E1BC95245@SJEXCHMB05.corp.ad.broadcom.com>

>
> This series looks fine from bnx2x point of view.
> The dynamic change from VEB to VEPA will require a firmware change,
> so we might arrive a little late for the party.
> Ariel
>

In the meantime you could enable just the get routines
so the upper layer could learn how your SR-IOV switching
is being done or not done.

My basic expectation is hardware defaults to VEB but
not sure if that is true in all cases.

.John

^ permalink raw reply

* [REGRESSION] r8169: jumbo fixes caused jumbo regressions!
From: Kirill Smelkov @ 2012-11-13 17:06 UTC (permalink / raw)
  To: Francois Romieu
  Cc: Realtek linux nic maintainers, Hayes Wang, David S. Miller,
	Greg Kroah-Hartman, netdev


Short description:

    I run net-next on my netbook with yukon2 ethernet controller and stable-3.0
    at work with pcie realtek network chips on several hosts. Upgrading from
    3.0.45 to 3.0.46 there revealed jumbo-related regression, because of

        r8169: jumbo fixes.

    which is

        cc669c37ba4a9c5c54c7842d0c9428aab64d62d7 at stable-3.0, and
        d58d46b5d85139d18eb939aa7279c160bab70484 upstream

    The problem is it is no longer possible to use 7200 mtu and tx checksum
    offload. Both features used to work without problems.


Details
-------

I have two machines with realtek chips in them. They are

    eth0: RTL8168cp/8111cp at 0xdffb8000, 00:18:7d:11:83:2b, XID 1cb00080 IRQ 16
    Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

and

    eth0: RTL8168c/8111c at 0xf8062000, 00:22:15:90:7e:c6, XID 1c4000c0 IRQ 17
    Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)


Visually looking at chips, I can confirm that they are labelled as RTL8111CP
and RTL8111C accordingly.

I used to set mtu=7200 and turn tx checksum offload on on them and
transmit/receive almost gigabit traffic from/to either of them without a
problem. This worked fine until upgrade from 3.0.45 to 3.0.46 where
things broke - now for both devices r8169 driver says:

    eth0: jumbo features [frames: 6128 bytes, tx checksumming: ko]

i.e. only 6128 max mtu and no support for tx checksum offload.

Indeed, for one thing the patch says tx checksumming cannot work together with
jumbo frames:

commit cc669c37ba4a9c5c54c7842d0c9428aab64d62d7
Author:     Francois Romieu <romieu@fr.zoreil.com>
AuthorDate: Fri Oct 5 23:29:11 2012 +0200
Commit:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CommitDate: Sat Oct 13 05:28:12 2012 +0900

    r8169: jumbo fixes.
    
    commit d58d46b5d85139d18eb939aa7279c160bab70484 upstream.
    
    - fix features : jumbo frames and checksumming can not be used at the
      same time.
    
    - introduce hw_jumbo_{enable / disable} helpers. Their content has been
      creatively extracted from Realtek's own drivers. As an illustration,
      it would be nice to know how/if the MaxTxPacketSize register operates
      when the device can work with a 9k jumbo frame as its documentation
      (8168c) can not be applied beyond ~7k.
    
    - rtl_tx_performance_tweak is moved forward. No change.
    
    Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
    Acked-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


but again, I say that up till now I've used ~7K jumbos with tx checksum offload
just fine on those chips:

My test is to stream raw video from 8 PAL cameras to net - 4 for 720x576@25 and
4 for 360x288@25 which for YUYV format occupies ~ 860 Mbps of bandwidth. The
program to transmit/receive video is here: http://repo.or.cz/w/rawv.git

For video sources vivi.ko video driver is used with fps set to 25.  The
streams are generated with

    $ rawv -d /dev/video$X,720x576 -t 239.255.17.$X:1200$X  # X=1..4, 5834 eth framelen
    $ rawv -d /dev/video$X,360x288 -t 239.255.17.$X:1200$X  # X=5..8, 6554 eth framelen

(which is more than 6K jumbos for the second case), and also to come
close to 7K limit with

    $ rawv -d /dev/video$X,708x576 -t 239.255.17.$X:1200$X  # X=1..4, 7154 eth framelen
    $ rawv -d /dev/video$X,352x288 -t 239.255.17.$X:1200$X  # X=5..8, 7114 eth framelen

This used to work fine with mtu set to 7200 or 7152 (=7152+14+2 =7168 =1024*7
max eth framelen) and tx csum offload turned on via `ethtool -K eth0 tx on`.


Patching the driver to know "true xid"

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index f7a56f4..247a238 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -1773,6 +1778,7 @@ static void rtl8169_get_mac_version(struct rtl8169_private *tp,
        reg = RTL_R32(TxConfig);
        while ((reg & p->mask) != p->val)
                p++;
+       dprintk("mac_version for 0x%08x (0x%08x): %i\n", reg, reg & 0x9cf0f8ff,p->mac_version);
        tp->mac_version = p->mac_version;
 
        if (tp->mac_version == RTL_GIGA_MAC_NONE) {


I've found that RTL_R32(TxConfig) is 0x3fb00080 and 0x3f4006c0 for my chips.
This gives RTL_GIGA_MAC_VER_24 and RTL_GIGA_MAC_VER_22 judging by table in
rtl8169_get_mac_version().

Then I'm now running 3.0.46 kernel with the following patch applied

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index f7a56f4..247a238 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -210,11 +212,11 @@ static const struct {
        [RTL_GIGA_MAC_VER_21] =
                _R("RTL8168c/8111c",    RTL_TD_1, NULL, JUMBO_6K, false),
        [RTL_GIGA_MAC_VER_22] =
-               _R("RTL8168c/8111c",    RTL_TD_1, NULL, JUMBO_6K, false),
+               _R("RTL8168c/8111c",    RTL_TD_1, NULL, JUMBO_7K, true),
        [RTL_GIGA_MAC_VER_23] =
                _R("RTL8168cp/8111cp",  RTL_TD_1, NULL, JUMBO_6K, false),
        [RTL_GIGA_MAC_VER_24] =
-               _R("RTL8168cp/8111cp",  RTL_TD_1, NULL, JUMBO_6K, false),
+               _R("RTL8168cp/8111cp",  RTL_TD_1, NULL, JUMBO_7K, true),
        [RTL_GIGA_MAC_VER_25] =
                _R("RTL8168d/8111d",    RTL_TD_1, FIRMWARE_8168D_1,
                                                        JUMBO_9K, false),


and ~7K jumbos and tx csum offload work again.
(by the way, on atom system, without tx csum offload, half of cpu time
is spent only to calculate checksums...)


Now I wonder, where that 6K limit came from and why they say it is now
not possible to use jumbos together with tx csum offload? Is my testing
enough to justify raising the limits and allowing tx offload? If yes,
then how do we handle this regression?


Thanks,
Kirill


P.S. Just for info: I've also tried, but on both my chips 9K jumbos do
not work.

^ permalink raw reply related

* Re: SR-IOV problem with Intel 82599EB (not enough MMIO resources for SR-IOV)
From: Don Dutile @ 2012-11-13 16:40 UTC (permalink / raw)
  To: Li, Sibai
  Cc: Jason Gao, bhelgaas@google.com, Rose, Gregory V,
	Kirsher, Jeffrey T, linux-kernel, netdev, kvm,
	e1000-devel@lists.sourceforge.net, linux-pci@vger.kernel.org,
	Yinghai Lu
In-Reply-To: <A3A921390EAE4044940E99EBF13D97321AE21E15@FMSMSX104.amr.corp.intel.com>

On 11/13/2012 11:04 AM, Li, Sibai wrote:
>
>
>> -----Original Message-----
>> From: Jason Gao [mailto:pkill.2012@gmail.com]
>> Sent: Tuesday, November 13, 2012 5:38 AM
>> To: bhelgaas@google.com; Rose, Gregory V; Li, Sibai
>> Cc: ddutile@redhat.com; Kirsher, Jeffrey T; linux-kernel; netdev; kvm; e1000-
>> devel@lists.sourceforge.net; linux-pci@vger.kernel.org; Yinghai Lu
>> Subject: Re: SR-IOV problem with Intel 82599EB (not enough MMIO resources
>> for SR-IOV)
>>
>> I'm very sorry for delayed reply.now SR-IOV works for me in Centos 6.3,thank all
>> of you.
>>
>>
>> On Fri, Nov 9, 2012 at 11:26 PM, Bjorn Helgaas<bhelgaas@google.com>  wrote:
>>> Linux normally uses the resource assignments done by the BIOS, but it
>>> is possible for the kernel to reassign those.  We don't have good
>>> automatic support for that yet, but on a recent upstream kernel, you
>>> can try "pci=realloc".  I doubt this option is in CentOS 6.3, though
>>
>> Thank you very much,I try "pci=realloc" in Centos 6.3,and now it works for me.
>>
>>
>>
>> On Sat, Nov 10, 2012 at 2:08 AM, Li, Sibai<sibai.li@intel.com>  wrote:
>>> DellR710 with the latest BIOS should work fine for SR-IOV. My BIOS is
>>> v.6.3.0 and release date is 07/24/2012 Please check if you configured
>> intel_iommu=on in the grub.conf file.
>>> If you did, check your kernel .config file under Device Drivers->  IOMMU
>> Hardware support->enable Support for Intel IOMMU using DMA remapping
>> Devices, enable Intel DMA Remapping Devices by Default, enable Support for
>> Interrupt Remapping.
>>
>> thank you Sibai,Our server "Dell R710",its BIOS version is just
>> v.6.3.0 and release date is 07/24/2012,and I also configured intel_iommu=on in
>> the grub.conf file,but I can't find these IOMMU options in "Device Drivers" in my
Sibai is referring to kernel config options.  RHEL6.3 has the IOMMU options built into
the kernel, but not enabled by default -- have to add 'intel_iommu=on' to the kernel
cmdline to enable IOMMU. SRIOV support (CONFIG_IOV) is built into the RHEL6.3 kernel as well.

>> kernel(2.6.32-279) .config file , btw my os is Centos 6.3(RHEL6.3),although the
>> problem solved,I'd like to know what's your os version ,kernel version?
>
> I am using RHEL6.3 with unstable kernel 3.7.0-rc
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH 00/20] bnx2x: support SR-IOV
From: Ariel Elior @ 2012-11-13 16:20 UTC (permalink / raw)
  To: Ariel Elior, David Miller; +Cc: netdev
In-Reply-To: <1352823451-27042-1-git-send-email-ariele@broadcom.com>

> -----Original Message-----
> From: Ariel Elior [mailto:ariele@broadcom.com]
> Sent: Tuesday, November 13, 2012 6:17 PM
> To: David Miller
> Cc: netdev; Ariel Elior
> Subject: [PATCH 00/20] bnx2x: support SR-IOV
> 
> Please consider applying these patches.
To net-next, of course

^ permalink raw reply

* net, bluetooth: object debug warning in bt_host_release()
From: Sasha Levin @ 2012-11-13 16:18 UTC (permalink / raw)
  To: marcel, gustavo, Johan Hedberg, David S. Miller
  Cc: linux-bluetooth, netdev, linux-kernel@vger.kernel.org, Dave Jones

Hi all,

While fuzzing with trinity on a KVM tools (lkvm) guest running latest -next kernel I've
stumbled on the following:

[ 1434.201149] ------------[ cut here ]------------
[ 1434.204998] WARNING: at lib/debugobjects.c:261 debug_print_object+0x8e/0xb0()
[ 1434.208324] ODEBUG: free active (active state 0) object type: work_struct hint: hci_power_on+0x0/0x90
[ 1434.210386] Pid: 8564, comm: trinity-child25 Tainted: G        W    3.7.0-rc5-next-20121112-sasha-00018-g2f4ce0e #127
[ 1434.210760] Call Trace:
[ 1434.210760]  [<ffffffff819f3d6e>] ? debug_print_object+0x8e/0xb0
[ 1434.210760]  [<ffffffff8110b887>] warn_slowpath_common+0x87/0xb0
[ 1434.210760]  [<ffffffff8110b911>] warn_slowpath_fmt+0x41/0x50
[ 1434.210760]  [<ffffffff819f3d6e>] debug_print_object+0x8e/0xb0
[ 1434.210760]  [<ffffffff8376b750>] ? hci_dev_open+0x310/0x310
[ 1434.210760]  [<ffffffff83bf94e5>] ? _raw_spin_unlock_irqrestore+0x55/0xa0
[ 1434.210760]  [<ffffffff819f3ee5>] __debug_check_no_obj_freed+0xa5/0x230
[ 1434.210760]  [<ffffffff83785db0>] ? bt_host_release+0x10/0x20
[ 1434.210760]  [<ffffffff819f4d15>] debug_check_no_obj_freed+0x15/0x20
[ 1434.210760]  [<ffffffff8125eee7>] kfree+0x227/0x330
[ 1434.210760]  [<ffffffff83785db0>] bt_host_release+0x10/0x20
[ 1434.210760]  [<ffffffff81e539e5>] device_release+0x65/0xc0
[ 1434.210760]  [<ffffffff819d3975>] kobject_cleanup+0x145/0x190
[ 1434.210760]  [<ffffffff819d39cd>] kobject_release+0xd/0x10
[ 1434.210760]  [<ffffffff819d33cc>] kobject_put+0x4c/0x60
[ 1434.210760]  [<ffffffff81e548b2>] put_device+0x12/0x20
[ 1434.210760]  [<ffffffff8376a334>] hci_free_dev+0x24/0x30
[ 1434.210760]  [<ffffffff82fd8fe1>] vhci_release+0x31/0x60
[ 1434.210760]  [<ffffffff8127be12>] __fput+0x122/0x250
[ 1434.210760]  [<ffffffff811cab0d>] ? rcu_user_exit+0x9d/0xd0
[ 1434.210760]  [<ffffffff8127bf49>] ____fput+0x9/0x10
[ 1434.210760]  [<ffffffff81133402>] task_work_run+0xb2/0xf0
[ 1434.210760]  [<ffffffff8106cfa7>] do_notify_resume+0x77/0xa0
[ 1434.210760]  [<ffffffff83bfb0ea>] int_signal+0x12/0x17
[ 1434.210760] ---[ end trace a6d57fefbc8a8cc7 ]---

Not that the guest doesn't emulate anything that looks like a bluetooth device or
has bluetooth capabilities.


Thanks,
Sasha

^ permalink raw reply

* [PATCH 20/22] bnx2x: Support VF FLR
From: Ariel Elior @ 2012-11-13 16:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Ariel Elior, Eilon Greenstein
In-Reply-To: <1352823451-27042-1-git-send-email-ariele@broadcom.com>

The FLR indication arrives as an attention from the management processor.
Upon VF flr all FLRed function in the indication have already been
released by Firmware and now we basically need to free the resources
allocated to those VFs, and clean any remainders from the device
(FLR final cleanup).

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x.h       |    6 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c  |   19 +-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h   |    1 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c |  295 ++++++++++++++++++++-
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h |    7 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.h  |    1 +
 6 files changed, 321 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
index 88dbf02..dfae9b0 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x.h
@@ -1880,7 +1880,13 @@ void bnx2x_prep_dmae_with_comp(struct bnx2x *bp, struct dmae_command *dmae,
 int bnx2x_issue_dmae_with_comp(struct bnx2x *bp, struct dmae_command *dmae);
 void bnx2x_dp_dmae(struct bnx2x *bp, struct dmae_command *dmae, int msglvl);
 
+/* FLR related routines */
+u32 bnx2x_flr_clnup_poll_count(struct bnx2x *bp);
+void bnx2x_tx_hw_flushed(struct bnx2x *bp, u32 poll_count);
+int bnx2x_send_final_clnup(struct bnx2x *bp, u8 clnup_func, u32 poll_cnt);
 u8 bnx2x_is_pcie_pending(struct pci_dev *dev);
+int bnx2x_flr_clnup_poll_hw_counter(struct bnx2x *bp, u32 reg,
+				    char *msg, u32 poll_cnt);
 
 void bnx2x_calc_fc_adv(struct bnx2x *bp);
 int bnx2x_sp_post(struct bnx2x *bp, int command, int cid,
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 4418bc4..05aef19 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -1096,8 +1096,8 @@ static u32 bnx2x_flr_clnup_reg_poll(struct bnx2x *bp, u32 reg,
 	return val;
 }
 
-static int bnx2x_flr_clnup_poll_hw_counter(struct bnx2x *bp, u32 reg,
-					   char *msg, u32 poll_cnt)
+int bnx2x_flr_clnup_poll_hw_counter(struct bnx2x *bp, u32 reg,
+				    char *msg, u32 poll_cnt)
 {
 	u32 val = bnx2x_flr_clnup_reg_poll(bp, reg, 0, poll_cnt);
 	if (val != 0) {
@@ -1107,7 +1107,8 @@ static int bnx2x_flr_clnup_poll_hw_counter(struct bnx2x *bp, u32 reg,
 	return 0;
 }
 
-static u32 bnx2x_flr_clnup_poll_count(struct bnx2x *bp)
+/* Common routines with VF FLR cleanup */
+u32 bnx2x_flr_clnup_poll_count(struct bnx2x *bp)
 {
 	/* adjust polling timeout */
 	if (CHIP_REV_IS_EMUL(bp))
@@ -1119,7 +1120,7 @@ static u32 bnx2x_flr_clnup_poll_count(struct bnx2x *bp)
 	return FLR_POLL_CNT;
 }
 
-static void bnx2x_tx_hw_flushed(struct bnx2x *bp, u32 poll_count)
+void bnx2x_tx_hw_flushed(struct bnx2x *bp, u32 poll_count)
 {
 	struct pbf_pN_cmd_regs cmd_regs[] = {
 		{0, (CHIP_IS_E3B0(bp)) ?
@@ -1194,8 +1195,7 @@ static void bnx2x_tx_hw_flushed(struct bnx2x *bp, u32 poll_count)
 	(((index) << SDM_OP_GEN_AGG_VECT_IDX_SHIFT) & SDM_OP_GEN_AGG_VECT_IDX)
 
 
-static int bnx2x_send_final_clnup(struct bnx2x *bp, u8 clnup_func,
-					 u32 poll_cnt)
+int bnx2x_send_final_clnup(struct bnx2x *bp, u8 clnup_func, u32 poll_cnt)
 {
 	struct sdm_op_gen op_gen = {0};
 
@@ -1220,7 +1220,8 @@ static int bnx2x_send_final_clnup(struct bnx2x *bp, u8 clnup_func,
 		BNX2X_ERR("FW final cleanup did not succeed\n");
 		DP(BNX2X_MSG_SP, "At timeout completion address contained %x\n",
 		   (REG_RD(bp, comp_addr)));
-		ret = 1;
+		bnx2x_panic();
+		return 1;
 	}
 	/* Zero completion for nxt FLR */
 	REG_WR(bp, comp_addr, 0);
@@ -3884,6 +3885,10 @@ static void bnx2x_attn_int_deasserted3(struct bnx2x *bp, u32 attn)
 
 			if (val & DRV_STATUS_DRV_INFO_REQ)
 				bnx2x_handle_drv_info_req(bp);
+
+			if (val & DRV_STATUS_VF_DISABLED)
+				bnx2x_vf_handle_flr_event(bp);
+
 			if ((bp->port.pmf == 0) && (val & DRV_STATUS_PMF))
 				bnx2x_pmf_update(bp);
 
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
index 3997f63..823d1b6 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_reg.h
@@ -876,6 +876,7 @@
 #define HC_CONFIG_0_REG_MSI_MSIX_INT_EN_0			 (0x1<<2)
 #define HC_CONFIG_0_REG_SINGLE_ISR_EN_0				 (0x1<<1)
 #define HC_CONFIG_1_REG_BLOCK_DISABLE_1				 (0x1<<0)
+#define DORQ_REG_VF_USAGE_CNT					 0x170320
 #define HC_REG_AGG_INT_0					 0x108050
 #define HC_REG_AGG_INT_1					 0x108054
 #define HC_REG_ATTN_BIT 					 0x108120
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
index f083b5d..ee02374 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
@@ -137,6 +137,17 @@ enum bnx2x_vfop_mcast_state {
 	   BNX2X_VFOP_MCAST_ADD,
 	   BNX2X_VFOP_MCAST_CHK_DONE
 };
+enum bnx2x_vfop_qflr_state {
+	   BNX2X_VFOP_QFLR_CLR_VLAN,
+	   BNX2X_VFOP_QFLR_CLR_MAC,
+	   BNX2X_VFOP_QFLR_TERMINATE,
+	   BNX2X_VFOP_QFLR_DONE
+};
+
+enum bnx2x_vfop_flr_state {
+	   BNX2X_VFOP_FLR_QUEUES,
+	   BNX2X_VFOP_FLR_HW
+};
 
 enum bnx2x_vfop_close_state {
 	   BNX2X_VFOP_CLOSE_QUEUES,
@@ -974,6 +985,93 @@ int bnx2x_vfop_qsetup_cmd(struct bnx2x *bp,
 	return -ENOMEM;
 }
 
+/* VFOP queue FLR handling (clear vlans, clear macs, queue destructor) */
+static void bnx2x_vfop_qflr(struct bnx2x *bp, struct bnx2x_virtf *vf)
+{
+	struct bnx2x_vfop *vfop = bnx2x_vfop_cur(bp, vf);
+	int qid = vfop->args.qx.qid;
+	enum bnx2x_vfop_qflr_state state = vfop->state;
+	struct bnx2x_queue_state_params *qstate;
+	struct bnx2x_vfop_cmd cmd;
+
+	bnx2x_vfop_reset_wq(vf);
+
+	if (vfop->rc < 0)
+		goto op_err;
+
+	DP(BNX2X_MSG_IOV, "VF[%d] STATE: %d\n", vf->abs_vfid, state);
+
+	cmd.done = bnx2x_vfop_qflr;
+	cmd.block = false;
+
+	switch (state) {
+	case BNX2X_VFOP_QFLR_CLR_VLAN:
+		/* vlan-clear-all: driver-only, don't consume credit */
+		vfop->state = BNX2X_VFOP_QFLR_CLR_MAC;
+		vfop->rc = bnx2x_vfop_vlan_delall_cmd(bp, vf, &cmd, qid, true);
+		if (vfop->rc)
+			goto op_err;
+		return;
+
+	case BNX2X_VFOP_QFLR_CLR_MAC:
+		/* mac-clear-all: driver only consume credit */
+		vfop->state = BNX2X_VFOP_QFLR_TERMINATE;
+		vfop->rc = bnx2x_vfop_mac_delall_cmd(bp, vf, &cmd, qid, true);
+		DP(BNX2X_MSG_IOV,
+		   "VF[%d] vfop->rc after bnx2x_vfop_mac_delall_cmd was %d",
+		   vf->abs_vfid, vfop->rc);
+		if (vfop->rc)
+			goto op_err;
+		return;
+
+	case BNX2X_VFOP_QFLR_TERMINATE:
+		qstate = &vfop->op_p->qctor.qstate;
+		memset(qstate , 0, sizeof(*qstate));
+		qstate->q_obj = &bnx2x_vfq(vf, qid, sp_obj);
+		vfop->state = BNX2X_VFOP_QFLR_DONE;
+
+		DP(BNX2X_MSG_IOV, "VF[%d] qstate during flr was %d",
+		   vf->abs_vfid, qstate->q_obj->state);
+
+		if (qstate->q_obj->state != BNX2X_Q_STATE_RESET) {
+			qstate->q_obj->state = BNX2X_Q_STATE_STOPPED;
+			qstate->cmd = BNX2X_Q_CMD_TERMINATE;
+			vfop->rc = bnx2x_queue_state_change(bp, qstate);
+			bnx2x_vfop_finalize(vf, vfop->rc, VFOP_VERIFY_PEND);
+		} else {
+			goto op_done;
+		}
+
+op_err:
+	BNX2X_ERR("QFLR[%d:%d] error: rc %d\n",
+		  vf->abs_vfid, qid, vfop->rc);
+op_done:
+	case BNX2X_VFOP_QFLR_DONE:
+		bnx2x_vfop_end(bp, vf, vfop);
+		return;
+	default:
+		bnx2x_vfop_default(state);
+	}
+op_pending:
+	return;
+}
+
+static int bnx2x_vfop_qflr_cmd(struct bnx2x *bp,
+			       struct bnx2x_virtf *vf,
+			       struct bnx2x_vfop_cmd *cmd,
+			       int qid)
+{
+	struct bnx2x_vfop *vfop = bnx2x_vfop_add(bp, vf);
+
+	if (vfop) {
+		vfop->args.qx.qid = qid;
+		bnx2x_vfop_opset(BNX2X_VFOP_QFLR_CLR_VLAN,
+				 bnx2x_vfop_qflr, cmd->done);
+		return bnx2x_vfop_transition(bp, vf, bnx2x_vfop_qflr,
+					     cmd->block);
+	}
+	return -ENOMEM;
+}
 
 /* VFOP multi-casts */
 static void bnx2x_vfop_mcast(struct bnx2x *bp, struct bnx2x_virtf *vf)
@@ -1433,8 +1531,203 @@ static void bnx2x_vf_free_resc(struct bnx2x *bp, struct bnx2x_virtf *vf)
 	vf->state = VF_FREE;
 }
 
-/* called by bnx2x_init_hw_func, returns the next ilt line */
+static void bnx2x_vf_flr_clnup_hw(struct bnx2x *bp, struct bnx2x_virtf *vf)
+{
+	u32 poll_cnt = bnx2x_flr_clnup_poll_count(bp);
+
+	/* DQ usage counter */
+	bnx2x_pretend_func(bp, HW_VF_HANDLE(bp, vf->abs_vfid));
+	bnx2x_flr_clnup_poll_hw_counter(bp, DORQ_REG_VF_USAGE_CNT,
+					"DQ VF usage counter timed out",
+					poll_cnt);
+	bnx2x_pretend_func(bp, BP_ABS_FUNC(bp));
+
+	/* FW cleanup command - poll for the results */
+	if (bnx2x_send_final_clnup(bp, (u8)FW_VF_HANDLE(vf->abs_vfid),
+				   poll_cnt))
+		BNX2X_ERR("VF[%d] Final cleanup timed-out\n", vf->abs_vfid);
+
+	/* ATC cleanup */
+
+	/* verify TX hw is flushed */
+	bnx2x_tx_hw_flushed(bp, poll_cnt);
+
+}
+
+static void bnx2x_vfop_flr(struct bnx2x *bp, struct bnx2x_virtf *vf)
+{
+	struct bnx2x_vfop *vfop = bnx2x_vfop_cur(bp, vf);
+	struct bnx2x_vfop_args_qx *qx = &vfop->args.qx;
+	enum bnx2x_vfop_flr_state state = vfop->state;
+	struct bnx2x_vfop_cmd cmd = {
+		.done = bnx2x_vfop_flr,
+		.block = false,
+	};
+
+	if (vfop->rc < 0)
+		goto op_err;
+
+	DP(BNX2X_MSG_IOV, "vf[%d] STATE: %d\n", vf->abs_vfid, state);
 
+	switch (state) {
+	case BNX2X_VFOP_FLR_QUEUES:
+		/* the cleanup operations are valid if and only if the VF
+		 * was first acquired.
+		 */
+		if (++(qx->qid) < vf_rxq_count(vf)) {
+			vfop->rc = bnx2x_vfop_qflr_cmd(bp, vf, &cmd,
+						       qx->qid);
+			if (vfop->rc)
+				goto op_err;
+			return;
+		}
+		/* remove multicasts */
+		vfop->state = BNX2X_VFOP_FLR_HW;
+		vfop->rc = bnx2x_vfop_mcast_cmd(bp, vf, &cmd, NULL,
+						0, true);
+		if (vfop->rc)
+			goto op_err;
+		return;
+	case BNX2X_VFOP_FLR_HW:
+
+		/* dispatch final cleanup and wait for HW queues to flush */
+		bnx2x_vf_flr_clnup_hw(bp, vf);
+
+		/* release VF resources */
+		bnx2x_vf_free_resc(bp, vf);
+
+		/* re-open the mailbox */
+		bnx2x_vf_enable_mbx(bp, vf->abs_vfid);
+
+		goto op_done;
+	default:
+		bnx2x_vfop_default(state);
+	}
+op_err:
+	BNX2X_ERR("VF[%d] FLR error: rc %d\n", vf->abs_vfid, vfop->rc);
+op_done:
+	vf->flr_clnup_stage = VF_FLR_ACK;
+	bnx2x_vfop_end(bp, vf, vfop);
+	bnx2x_unlock_vf_pf_channel(bp, vf, CHANNEL_TLV_FLR);
+}
+
+static int bnx2x_vfop_flr_cmd(struct bnx2x *bp,
+			      struct bnx2x_virtf *vf,
+			      vfop_handler_t done)
+{
+	struct bnx2x_vfop *vfop = bnx2x_vfop_add(bp, vf);
+	if (vfop) {
+		vfop->args.qx.qid = -1; /* loop */
+		bnx2x_vfop_opset(BNX2X_VFOP_FLR_QUEUES,
+				 bnx2x_vfop_flr, done);
+		return bnx2x_vfop_transition(bp, vf, bnx2x_vfop_flr, false);
+	}
+	return -ENOMEM;
+}
+
+void bnx2x_vf_flr_clnup(struct bnx2x *bp, struct bnx2x_virtf *prev_vf)
+{
+	int i = prev_vf ? prev_vf->index + 1 : 0;
+	struct bnx2x_virtf *vf;
+
+	/* find next VF to cleanup */
+next_vf_to_clean:
+	for (;
+	     i < BNX2X_NR_VIRTFN(bp) &&
+	     (bnx2x_vf(bp, i, state) != VF_RESET ||
+	      bnx2x_vf(bp, i, flr_clnup_stage) != VF_FLR_CLN);
+	     i++)
+		;
+
+	DP(BNX2X_MSG_IOV, "next vf to cleanup: %d. num of vfs: %d\n", i,
+	   BNX2X_NR_VIRTFN(bp));
+
+	if (i < BNX2X_NR_VIRTFN(bp)) {
+		vf = BP_VF(bp, i);
+
+		/* lock the vf pf channel */
+		bnx2x_lock_vf_pf_channel(bp, vf, CHANNEL_TLV_FLR);
+
+		/* invoke the VF FLR SM */
+		if (bnx2x_vfop_flr_cmd(bp, vf, bnx2x_vf_flr_clnup)) {
+			BNX2X_ERR("VF[%d]: FLR cleanup failed -ENOMEM\n",
+				  vf->abs_vfid);
+
+			/* mark the VF to be ACKED and continue */
+			vf->flr_clnup_stage = VF_FLR_ACK;
+			goto next_vf_to_clean;
+		}
+		return;
+	}
+
+	/* we are done, update vf records */
+	for_each_vf(bp, i) {
+		vf = BP_VF(bp, i);
+
+		if (vf->flr_clnup_stage != VF_FLR_ACK)
+			continue;
+
+		vf->flr_clnup_stage = VF_FLR_EPILOG;
+	}
+
+	/* Acknowledge the handled VFs.
+	 * we are acknowledge all the vfs which an flr was requested for, even
+	 * if amongst them there are such that we never opened, since the mcp
+	 * will interrupt us immediately again if we only ack some of the bits,
+	 * resulting in an endless loop. This can happen for example in KVM
+	 * where an 'all ones' flr request is sometimes given by hyper visor
+	 */
+	DP(BNX2X_MSG_MCP, "DRV_STATUS_VF_DISABLED ACK for vfs 0x%x 0x%x\n",
+	   bp->vfdb->flrd_vfs[0], bp->vfdb->flrd_vfs[1]);
+	for (i = 0; i < FLRD_VFS_DWORDS; i++)
+		SHMEM2_WR(bp, drv_ack_vf_disabled[BP_FW_MB_IDX(bp)][i],
+			  bp->vfdb->flrd_vfs[i]);
+
+	bnx2x_fw_command(bp, DRV_MSG_CODE_VF_DISABLED_DONE, 0);
+
+	/* clear the acked bits - better yet if the MCP implemented
+	 * write to clear semantics
+	 */
+	for (i = 0; i < FLRD_VFS_DWORDS; i++)
+		SHMEM2_WR(bp, drv_ack_vf_disabled[BP_FW_MB_IDX(bp)][i], 0);
+}
+
+void bnx2x_vf_handle_flr_event(struct bnx2x *bp)
+{
+	int i;
+
+	/* Read FLR'd VFs */
+	for (i = 0; i < FLRD_VFS_DWORDS; i++)
+		bp->vfdb->flrd_vfs[i] = SHMEM2_RD(bp, mcp_vf_disabled[i]);
+
+	DP(BNX2X_MSG_MCP,
+	   "DRV_STATUS_VF_DISABLED received for vfs 0x%x 0x%x\n",
+	   bp->vfdb->flrd_vfs[0], bp->vfdb->flrd_vfs[1]);
+
+	for_each_vf(bp, i) {
+		struct bnx2x_virtf *vf = BP_VF(bp, i);
+		u32 reset = 0;
+
+		if (vf->abs_vfid < 32)
+			reset = bp->vfdb->flrd_vfs[0] & (1 << vf->abs_vfid);
+		else
+			reset = bp->vfdb->flrd_vfs[1] &
+				(1 << (vf->abs_vfid - 32));
+
+		if (reset) {
+			/* set as reset and ready for cleanup */
+			vf->state = VF_RESET;
+			vf->flr_clnup_stage = VF_FLR_CLN;
+
+			DP(BNX2X_MSG_IOV,
+			   "Initiating Final cleanup for VF %d\n",
+			   vf->abs_vfid);
+		}
+	}
+
+	/* do the FLR cleanup for all marked VFs*/
+	bnx2x_vf_flr_clnup(bp, NULL);
+}
 /* IOV global initialization routines  */
 void bnx2x_iov_init_dq(struct bnx2x *bp)
 {
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h
index 3815b20..b4efa79 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h
@@ -697,9 +697,16 @@ int bnx2x_vfop_release_cmd(struct bnx2x *bp,
 void bnx2x_vf_release(struct bnx2x *bp, struct bnx2x_virtf *vf, bool block);
 int bnx2x_vf_idx_by_abs_fid(struct bnx2x *bp, u16 abs_vfid);
 u8 bnx2x_vf_max_queue_cnt(struct bnx2x *bp, struct bnx2x_virtf *vf);
+
+/* FLR routines */
+
 /* VF FLR helpers */
 int bnx2x_vf_flr_clnup_epilog(struct bnx2x *bp, u8 abs_vfid);
 void bnx2x_vf_enable_access(struct bnx2x *bp, u8 abs_vfid);
+
+/* Handles an FLR (or VF_DISABLE) notification form the MCP */
+void bnx2x_vf_handle_flr_event(struct bnx2x *bp);
+
 void bnx2x_add_tlv(struct bnx2x *bp, void *tlvs_list, u16 offset, u16 type,
 		   u16 length);
 void bnx2x_vfpf_prep(struct bnx2x *bp, struct vfpf_first_tlv *first_tlv,
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.h
index 13c76df..0cf6e7f 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.h
@@ -307,6 +307,7 @@ enum channel_tlvs {
 	   CHANNEL_TLV_RELEASE,
 	   CHANNEL_TLV_PF_RELEASE_VF,
 	   CHANNEL_TLV_LIST_END,
+	   CHANNEL_TLV_FLR,
 	   CHANNEL_TLV_MAX
 };
 
-- 
1.7.9.GIT

^ permalink raw reply related

* [PATCH 18/22] bnx2x: Support of PF driver of a VF close request
From: Ariel Elior @ 2012-11-13 16:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Ariel Elior, Eilon Greenstein
In-Reply-To: <1352823451-27042-1-git-send-email-ariele@broadcom.com>

The 'close' command is the opposite of an init request. Here the
queues of the VF are closed (if any are opened) and released.
This flow applies the 'q_teardown' flow on all the queues.
The VF state is changed by this request.
Interrupts are disabled for the VF when closed.

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c |   96 +++++++++++++++++++++
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h |    4 +
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c  |   27 ++++++
 3 files changed, 127 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
index 2c1945e..71640f3 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.c
@@ -138,6 +138,11 @@ enum bnx2x_vfop_mcast_state {
 	   BNX2X_VFOP_MCAST_CHK_DONE
 };
 
+enum bnx2x_vfop_close_state {
+	   BNX2X_VFOP_CLOSE_QUEUES,
+	   BNX2X_VFOP_CLOSE_HW
+};
+
 enum bnx2x_vfop_rxmode_state {
 	   BNX2X_VFOP_RXMODE_CONFIG,
 	   BNX2X_VFOP_RXMODE_DONE
@@ -2302,6 +2307,28 @@ static void bnx2x_vf_qtbl_set_q(struct bnx2x *bp, u8 abs_vfid, u8 qid,
 	REG_WR(bp, reg, val);
 }
 
+static void bnx2x_vf_clr_qtbl(struct bnx2x *bp, struct bnx2x_virtf *vf)
+{
+	int i;
+
+	for_each_vfq(vf, i)
+		bnx2x_vf_qtbl_set_q(bp, vf->abs_vfid,
+				    vfq_qzone_id(vf, vfq_get(vf, i)), false);
+}
+
+static void bnx2x_vf_igu_disable(struct bnx2x *bp, struct bnx2x_virtf *vf)
+{
+	u32 val;
+
+	/* clear the VF configuration - pretend */
+	bnx2x_pretend_func(bp, HW_VF_HANDLE(bp, vf->abs_vfid));
+	val = REG_RD(bp, IGU_REG_VF_CONFIGURATION);
+	val &= ~(IGU_VF_CONF_MSI_MSIX_EN | IGU_VF_CONF_SINGLE_ISR_EN |
+		 IGU_VF_CONF_FUNC_EN | IGU_VF_CONF_PARENT_MASK);
+	REG_WR(bp, IGU_REG_VF_CONFIGURATION, val);
+	bnx2x_pretend_func(bp, BP_ABS_FUNC(bp));
+}
+
 
 inline
 u8 bnx2x_vf_max_queue_cnt(struct bnx2x *bp, struct bnx2x_virtf *vf)
@@ -2475,6 +2502,75 @@ int bnx2x_vf_init(struct bnx2x *bp, struct bnx2x_virtf *vf, dma_addr_t *sb_map)
 	return 0;
 }
 
+/* VFOP close (teardown the queues, delete mcasts and close HW) */
+static void bnx2x_vfop_close(struct bnx2x *bp, struct bnx2x_virtf *vf)
+{
+	struct bnx2x_vfop *vfop = bnx2x_vfop_cur(bp, vf);
+	struct bnx2x_vfop_args_qx *qx = &vfop->args.qx;
+	enum bnx2x_vfop_close_state state = vfop->state;
+	struct bnx2x_vfop_cmd cmd = {
+		.done = bnx2x_vfop_close,
+		.block = false,
+	};
+
+	if (vfop->rc < 0)
+		goto op_err;
+
+	DP(BNX2X_MSG_IOV, "vf[%d] STATE: %d\n", vf->abs_vfid, state);
+
+	switch (state) {
+	case BNX2X_VFOP_CLOSE_QUEUES:
+
+		if (++(qx->qid) < vf_rxq_count(vf)) {
+			vfop->rc = bnx2x_vfop_qdown_cmd(bp, vf, &cmd, qx->qid);
+			if (vfop->rc)
+				goto op_err;
+			return;
+		}
+
+		/* remove multicasts */
+		vfop->state = BNX2X_VFOP_CLOSE_HW;
+		vfop->rc = bnx2x_vfop_mcast_cmd(bp, vf, &cmd, NULL, 0, false);
+		if (vfop->rc)
+			goto op_err;
+		return;
+
+	case BNX2X_VFOP_CLOSE_HW:
+
+		/* disable the interrupts */
+		DP(BNX2X_MSG_IOV, "disabling igu\n");
+		bnx2x_vf_igu_disable(bp, vf);
+
+		/* disable the VF */
+		DP(BNX2X_MSG_IOV, "clearing qtbl\n");
+		bnx2x_vf_clr_qtbl(bp, vf);
+
+		goto op_done;
+	default:
+		bnx2x_vfop_default(state);
+	}
+op_err:
+	BNX2X_ERR("VF[%d] CLOSE error: rc %d\n", vf->abs_vfid, vfop->rc);
+op_done:
+	vf->state = VF_ACQUIRED;
+	DP(BNX2X_MSG_IOV, "set state to acquired\n");
+	bnx2x_vfop_end(bp, vf, vfop);
+}
+
+int bnx2x_vfop_close_cmd(struct bnx2x *bp,
+			 struct bnx2x_virtf *vf,
+			 struct bnx2x_vfop_cmd *cmd)
+{
+	struct bnx2x_vfop *vfop = bnx2x_vfop_add(bp, vf);
+	if (vfop) {
+		vfop->args.qx.qid = -1; /* loop */
+		bnx2x_vfop_opset(BNX2X_VFOP_CLOSE_QUEUES,
+				 bnx2x_vfop_close, cmd->done);
+		return bnx2x_vfop_transition(bp, vf, bnx2x_vfop_close,
+					     cmd->block);
+	}
+	return -ENOMEM;
+}
 void bnx2x_lock_vf_pf_channel(struct bnx2x *bp, struct bnx2x_virtf *vf,
 			      enum channel_tlvs tlv)
 {
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h
index 5c3e921..3c81029 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sriov.h
@@ -667,6 +667,10 @@ int bnx2x_vfop_rxmode_cmd(struct bnx2x *bp,
 			  struct bnx2x_vfop_cmd *cmd,
 			  int qid, unsigned long accept_flags);
 
+int bnx2x_vfop_close_cmd(struct bnx2x *bp,
+			 struct bnx2x_virtf *vf,
+			 struct bnx2x_vfop_cmd *cmd);
+
 int bnx2x_vf_idx_by_abs_fid(struct bnx2x *bp, u16 abs_vfid);
 u8 bnx2x_vf_max_queue_cnt(struct bnx2x *bp, struct bnx2x_virtf *vf);
 /* VF FLR helpers */
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
index 20c929a..acf5f54 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_vfpf.c
@@ -814,6 +814,30 @@ static void bnx2x_vf_mbx_teardown_q(struct bnx2x *bp, struct bnx2x_virtf *vf,
 		bnx2x_vf_mbx_resp(bp, vf);
 }
 
+/* Done handler for 'close' operation: send response and reopen the
+ * channel.
+ */
+static void bnx2x_vf_mbx_close_done(struct bnx2x *bp, struct bnx2x_virtf *vf)
+{
+	bnx2x_vf_mbx_resp(bp, vf);
+	bnx2x_vf_enable_mbx(bp, vf->abs_vfid);
+}
+
+static void bnx2x_vf_mbx_close_vf(struct bnx2x *bp, struct bnx2x_virtf *vf,
+				  struct bnx2x_vf_mbx *mbx)
+{
+	struct bnx2x_vfop_cmd cmd = {
+		.done = bnx2x_vf_mbx_close_done,
+		.block = false,
+	};
+
+	DP(BNX2X_MSG_IOV, "VF[%d] VF_CLOSE\n", vf->abs_vfid);
+
+	vf->op_rc = bnx2x_vfop_close_cmd(bp, vf, &cmd);
+	if (vf->op_rc)
+		bnx2x_vf_mbx_close_done(bp, vf);
+}
+
 /* dispatch request */
 static void bnx2x_vf_mbx_request(struct bnx2x *bp, struct bnx2x_virtf *vf,
 				  struct bnx2x_vf_mbx *mbx)
@@ -845,6 +869,9 @@ static void bnx2x_vf_mbx_request(struct bnx2x *bp, struct bnx2x_virtf *vf,
 		case CHANNEL_TLV_TEARDOWN_Q:
 			bnx2x_vf_mbx_teardown_q(bp, vf, mbx);
 			break;
+		case CHANNEL_TLV_CLOSE:
+			bnx2x_vf_mbx_close_vf(bp, vf, mbx);
+			break;
 		}
 
 	/* unknown TLV - this may belong to a VF driver from the future - a
-- 
1.7.9.GIT

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox