Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next] vmxnet3: fix skb truesize underestimation
From: David Miller @ 2011-10-14  2:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, sbhatewara
In-Reply-To: <1318541897.2533.33.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 13 Oct 2011 23:38:17 +0200

> vmxnet3 allocates a page per skb fragment. We must account
> PAGE_SIZE increments on skb->truesize, not the actual frag length.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] niu: fix skb truesize underestimation
From: David Miller @ 2011-10-14  2:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1318545567.2533.46.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 14 Oct 2011 00:39:27 +0200

> Add a 'truesize' argument to niu_rx_skb_append(), filled with rcr_size
> by the caller to properly account frag sizes in skb->truesize
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> Please David double check this one as I am not very familiar with NIU
> code. Thanks !

It looks perfect!  And if it's not I'll soon find out :-)

Applied.

^ permalink raw reply

* Re: [PATCH net-next] ftmac100: fix skb truesize underestimation
From: David Miller @ 2011-10-14  2:28 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, ratbert
In-Reply-To: <1318540808.2533.27.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 13 Oct 2011 23:20:08 +0200

> ftmac100 allocates a page per skb fragment. We must account
> PAGE_SIZE increments on skb->truesize, not the actual frag length.
> 
> If frame is under 64 bytes, page is freed, so increase truesize only for
> bigger frames.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH v7 0/8] Request for inclusion: tcp memory buffers
From: Valdis.Kletnieks @ 2011-10-14  2:55 UTC (permalink / raw)
  To: Glauber Costa
  Cc: David Miller, linux-kernel, akpm, lizf, kamezawa.hiroyu, ebiederm,
	paul, gthelen, netdev, linux-mm, kirill, avagin, devel
In-Reply-To: <4E9744A6.5010101@parallels.com>

[-- Attachment #1: Type: text/plain, Size: 921 bytes --]

On Fri, 14 Oct 2011 00:05:58 +0400, Glauber Costa said:
> On 10/14/2011 12:00 AM, David Miller wrote:

> > Make this evaluate into exactly the same exact code stream we have
> > now when the memory cgroup feature is not in use, which will be the
> > majority of users.
> 
> What exactly do you mean by "not in use" ? Not compiled in or not 
> actively being exercised ? If you mean the later, I appreciate tips on 
> how to achieve it.
> 
> Also, I kind of dispute the affirmation that !cgroup will encompass
> the majority of users, since cgroups is being enabled by default by
> most vendors. All systemd based systems use it extensively, for instance.

Yes, systemd requires a kernel that includes cgroups.  However, systemd does
*not* require the memory cgroup feature.  As a practical matter, if your patch
doesn't generate equivalent code for the "have cgroups, but no memory cgroup"
situation, it's a non-starter.

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply

* Re: [PATCH net-next] niu: fix skb truesize underestimation
From: Eric Dumazet @ 2011-10-14  3:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111013.222659.12182837968152363.davem@davemloft.net>

Le jeudi 13 octobre 2011 à 22:26 -0400, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 14 Oct 2011 00:39:27 +0200
> 
> > Add a 'truesize' argument to niu_rx_skb_append(), filled with rcr_size
> > by the caller to properly account frag sizes in skb->truesize
> > 
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> > ---
> > Please David double check this one as I am not very familiar with NIU
> > code. Thanks !
> 
> It looks perfect!  And if it's not I'll soon find out :-)
> 

Thanks !

By the way, I noticed NIU uses a get_page() every time a chunk is
attached to a skb (only the last chunk of a page is given without the
get_page())

So I thought it might incur false sharing if a previous SKB using a
chunk from same page is processed by another CPU.

But then I see you also do in niu_rbr_add_page(), rigth after the
alloc_page(), the thing I was thinking to add : (perform all needed
get_page() in a single shot)

atomic_add(rp->rbr_blocks_per_page - 1,
	&compound_head(page)->_count);

So I am a bit lost here. Arent you doing too many page->_count
increases ?

Thanks !

^ permalink raw reply

* Re: [PATCH net-next] ftgmac100: fix skb truesize underestimation
From: Po-Yu Chuang @ 2011-10-14  3:50 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, ratbert
In-Reply-To: <20111013.222550.2221506906969662746.davem@davemloft.net>

Dear Eric and David,

On Fri, Oct 14, 2011 at 10:25 AM, David Miller <davem@davemloft.net> wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 13 Oct 2011 23:30:52 +0200
>
>> ftgmac100 allocates a page per skb fragment. We must account
>> PAGE_SIZE increments on skb->truesize, not the actual frag length.
>>
>> If frame is under 64 bytes, page is freed, and truesize adjusted.
>>
>> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>
> Applied.

Thank you.

best regards,
Po-Yu Chuang

^ permalink raw reply

* [PATCH net-next] tcp: use TCP_INIT_CWND in tcp_fixup_sndbuf()
From: Eric Dumazet @ 2011-10-14  4:24 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Initial cwnd being 10 (TCP_INIT_CWND) instead of 3, change
tcp_fixup_sndbuf() to get more than 16384 bytes (sysctl_tcp_wmem[1]) in
initial sk_sndbuf

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
I believe a similar change in tcp_fixup_rcvbuf() is needed too.

 net/ipv4/tcp_input.c |    8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index c1653fe..1e848b2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -267,11 +267,9 @@ static void tcp_fixup_sndbuf(struct sock *sk)
 {
 	int sndmem = SKB_TRUESIZE(tcp_sk(sk)->rx_opt.mss_clamp + MAX_TCP_HEADER);
 
-	if (sk->sk_sndbuf < 3 * sndmem) {
-		sk->sk_sndbuf = 3 * sndmem;
-		if (sk->sk_sndbuf > sysctl_tcp_wmem[2])
-			sk->sk_sndbuf = sysctl_tcp_wmem[2];
-	}
+	sndmem *= TCP_INIT_CWND;
+	if (sk->sk_sndbuf < sndmem)
+		sk->sk_sndbuf = min(sndmem, sysctl_tcp_wmem[2]);
 }
 
 /* 2. Tuning advertised window (window_clamp, rcv_ssthresh)

^ permalink raw reply related

* [PATCH net-next] bnx2x: Disable LRO on FCoE or iSCSI boot device
From: Michael Chan @ 2011-10-14  3:38 UTC (permalink / raw)
  To: davem; +Cc: netdev, dmitry, eilong

From: Dmitry Kravkov <dmitry@broadcom.com>

For an FCoE or iSCSI boot device, the networking side must stay "up" all
the time.  Otherwise, the FCoE/iSCSI interface driven by bnx2i/bnx2fc
will be reset and we'll lose the root file system.

If LRO is enabled, scripts that enable IP forwarding or bridging will
disable LRO and cause the device to be reset.  Disabling LRO on these
boot devices will prevent the reset.

Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 6486ab8..4960048 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -9794,6 +9794,7 @@ static int __devinit bnx2x_init_bp(struct bnx2x *bp)
 	int func;
 	int timer_interval;
 	int rc;
+	u32 cnic_boot_device;
 
 	mutex_init(&bp->port.phy_mutex);
 	mutex_init(&bp->fw_mb_mutex);
@@ -9840,8 +9841,11 @@ static int __devinit bnx2x_init_bp(struct bnx2x *bp)
 
 	bp->multi_mode = multi_mode;
 
+	cnic_boot_device =
+		!!SHMEM_RD(bp, func_mb[BP_FW_MB_IDX(bp)].iscsi_boot_signature);
+
 	/* Set TPA flags */
-	if (disable_tpa) {
+	if (disable_tpa || cnic_boot_device) {
 		bp->flags &= ~TPA_ENABLE_FLAG;
 		bp->dev->features &= ~NETIF_F_LRO;
 	} else {
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH net-next] niu: fix skb truesize underestimation
From: David Miller @ 2011-10-14  4:34 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1318563231.2533.55.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 14 Oct 2011 05:33:51 +0200

> But then I see you also do in niu_rbr_add_page(), rigth after the
> alloc_page(), the thing I was thinking to add : (perform all needed
> get_page() in a single shot)
> 
> atomic_add(rp->rbr_blocks_per_page - 1,
> 	&compound_head(page)->_count);
> 
> So I am a bit lost here. Arent you doing too many page->_count
> increases ?

It would be pretty amazing for a leak of this magnitude to exist for
so long. :-)

A page can be split into multiple blocks, each block is some power
of two in size.

The chip splits up "blocks" into smaller (also power of two)
fragments, and these fragments are what we en-tail to the SKBs.

So at the top level we give the chip blocks.  We try to make this
equal to PAGE_SIZE.  But if PAGE_SIZE is really large we limit the
block size to 1 << 15.  Note that it is only when we enforce this
block size limit that the compount_page(page)->_count atomic increment
will occur.  As long as PAGE_SIZE <= 1 << 15, rbr_blocks_per_page
will be 1.

When the chip takes a block and starts using it, it decides which
fragment size to use for that block.  Once a fragment size has been
choosen for a block, it will not change.

The fragment sizes the chip can use is stored in rp->rbr_sizes[].  We
always configure the chip to use 256 byte and 1024 byte blocks, then
depending upon the MTU and the PAGE_SIZE we'll optionally enable other
sizes such as 2048, 4096, and 8192.

When we get an RX packet the descriptor tells us the DMA address
and the fragment size in use for the block that the memory at
DMA address belongs to.

So the two seperate page reference count grabs you see are handling
references for memory being chopped up at two different levels.

I can't see how we could optimize the intra-block refcounts any
further.  Part of the problem is that we don't know apriori what
fragment size the chip will use for a given block.

^ permalink raw reply

* [Bug] skb truesize does not update properly in many place
From: roy.qing.li @ 2011-10-14  5:33 UTC (permalink / raw)
  To: eric.dumazet, netdev; +Cc: roy.qing.li

Hi Eric Dumazet:

I see you are correcting the wrong skb truesize, I found the
same problem exists in many forms and many places, like:

After calling skb_fill_page_desc(), what should be updated 
to truesize.

pskb_expand_head(), __pskb_pull_tail()... do not update the
truesize.

...

-RongQing.Li

^ permalink raw reply

* Re: [Bug] skb truesize does not update properly in many place
From: Eric Dumazet @ 2011-10-14  5:48 UTC (permalink / raw)
  To: roy.qing.li; +Cc: netdev
In-Reply-To: <1318570381-4731-1-git-send-email-roy.qing.li@gmail.com>

Le vendredi 14 octobre 2011 à 13:33 +0800, roy.qing.li@gmail.com a
écrit :
> Hi Eric Dumazet:
> 
> I see you are correcting the wrong skb truesize, I found the
> same problem exists in many forms and many places, like:
> 
> After calling skb_fill_page_desc(), what should be updated 
> to truesize.
> 
> pskb_expand_head(), __pskb_pull_tail()... do not update the
> truesize.

All this is scheduled, but any help is appreciated.

I gave the general idea and patched some drivers, I hope other dev will
follow me in this work.

I am now focusing in the TCP pruning effect, that we can see with WIFI
drivers (and also with drivers using a full PAGE to store a 1500 byte
tcp frame), when a single packet loss is happening on a session with
large RTT.

All this truesize saga started because I was shocked by following
"netstat -s" extract on my laptop after few minutes of Internet stuff.

    848 packets collapsed in receive queue due to low socket buffer

^ permalink raw reply

* Re: [Bug] skb truesize does not update properly in many place
From: Eric Dumazet @ 2011-10-14  5:51 UTC (permalink / raw)
  To: roy.qing.li; +Cc: netdev
In-Reply-To: <1318571313.2533.77.camel@edumazet-laptop>

Le vendredi 14 octobre 2011 à 07:48 +0200, Eric Dumazet a écrit :

> I am now focusing in the TCP pruning effect, that we can see with WIFI
> drivers (and also with drivers using a full PAGE to store a 1500 byte
> tcp frame), when a single packet loss is happening on a session with
> large RTT.
> 

Here is what I am currently testing :

When one SKB has to be queued in out_of_order_queue,
possibly for a long time, try to reduce its truesize (by using
skb_copy_expand()) if skb->truesize is larger than
2*SKB_TRUESIZE(skb->len)

It seems to work very well, and should not happen in fast path.

Stay tuned

^ permalink raw reply

* Re: [PATCH 1/4] ipv4: Fix pmtu propagating
From: Steffen Klassert @ 2011-10-14  5:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111013.135808.636341622629050648.davem@davemloft.net>

On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Thu, 13 Oct 2011 12:09:50 +0200
> 
> > At least it seems that raw_sendmsg() and ping_sendmsg() don't use
> > a cached route, they do the route lookup in any case. I don't see
> > where we check if we learned a new pmtu in this cases. 
> 
> A freshly looked up route should not have ->obsolete set.
> 
> That's why we don't do dst_check() in that part of the ip_output.c
> helper code you're modifying.
> 
> Please find out exactly why dst->obsolete is non-zero on a freshly
> looked up route.  It's unexpected.

Hm, on a slow path route lookup e.g. __mkroute_output() calls
rt_dst_alloc() which initializes dst->obsolete to -1. It seems
that ___dst_free() is the only function that ever changes the
initial obsolete value. After calling ___dst_free() dst->obsolete
is 2.

Btw. on a slow path route lookup, __mkroute_output() and friends
initialize the pmtu informations via rt_set_nexthop(). How do we
check if these informations are still valid if we get the route
via the routing hash cache? Do we need to check in this case?

The raw protocol uses ip4_datagram_connect() as it's connect function.
ip4_datagram_connect() uses sk_dst_set() to cache the dst_entry on
the socket, why we don't use this cached dst_entry on raw_sendmsg()
in the connected case?

^ permalink raw reply

* [PATCH] r8169: fix wrong eee setting for rlt8111evl
From: Hayes Wang @ 2011-10-14  6:14 UTC (permalink / raw)
  To: romieu; +Cc: netdev, linux-kernel, Hayes Wang

Correct the wrong parameter for setting EEE for RTL8111E-VL.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/r8169.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index c236670..27f7ebc 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2859,7 +2859,7 @@ static void rtl8168e_2_hw_phy_config(struct rtl8169_private *tp)
 	rtl_writephy(tp, 0x1f, 0x0004);
 	rtl_writephy(tp, 0x1f, 0x0007);
 	rtl_writephy(tp, 0x1e, 0x0020);
-	rtl_w1w0_phy(tp, 0x06, 0x0000, 0x0100);
+	rtl_w1w0_phy(tp, 0x15, 0x0000, 0x0100);
 	rtl_writephy(tp, 0x1f, 0x0002);
 	rtl_writephy(tp, 0x1f, 0x0000);
 	rtl_writephy(tp, 0x0d, 0x0007);
-- 
1.7.6.2

^ permalink raw reply related

* [net-next 0/6][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2011-10-14  6:21 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

The following series contains updates to e1000e, if_link, ixgbe, igbvf
and igb.  This version of the series contains the following changes:

- e1000e not sure what happened in the pull on Tuesday which has this fix
  so re-posting this fix
- igb fix for timecompare_update and enable L4 timestamping
- igbvf final conversion to ndo_fix_features
- if_link/ixgbe add spoof checking feature

The following are changes since commit 7ae60b3f3b297b7f04025c93f1cb2275c3a1dfcd:
  sky2: fix skb truesize underestimation
and are available in the git repository at
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next.git

Bruce Allan (1):
  e1000e: locking bug introduced by commit 67fd4fcb

Greg Rose (2):
  if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
  ixgbe: Add new netdev op to turn spoof checking on or off per VF

Jacob Keller (2):
  igb: enable l4 timestamping for v2 event packets
  igb: fix timecompare_upate race condition

Michał Mirosław (1):
  igbvf: convert to ndo_fix_features

 drivers/net/ethernet/intel/e1000e/e1000.h      |    1 +
 drivers/net/ethernet/intel/e1000e/ich8lan.c    |   21 +++++---
 drivers/net/ethernet/intel/igb/igb_main.c      |   11 ++++-
 drivers/net/ethernet/intel/igbvf/ethtool.c     |   57 ------------------------
 drivers/net/ethernet/intel/igbvf/netdev.c      |   25 ++++++++--
 drivers/net/ethernet/intel/ixgbe/ixgbe.h       |    3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |   10 +++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |   48 +++++++++++++++++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h |    1 +
 include/linux/if_link.h                        |   10 ++++
 include/linux/netdevice.h                      |    3 +
 net/core/rtnetlink.c                           |   33 ++++++++++++-
 12 files changed, 139 insertions(+), 84 deletions(-)

-- 
1.7.6.4

^ permalink raw reply

* [net-next 2/6] if_link: Add additional parameter to IFLA_VF_INFO for spoof checking
From: Jeff Kirsher @ 2011-10-14  6:21 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1318573288-18286-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Greg Rose <gregory.v.rose@intel.com>

Add configuration setting for drivers to turn spoof checking on or off
for discrete VFs.

v2 - Fix indentation problem, wrap the ifla_vf_info structure in
     #ifdef __KERNEL__ to prevent user space from accessing and
     change function paramater for the spoof check setting netdev
     op from u8 to bool.
v3 - Preset spoof check setting to -1 so that user space tools such
     as ip can detect that the driver didn't report a spoofcheck
     setting.  Prevents incorrect display of spoof check settings
     for drivers that don't report it.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 include/linux/if_link.h   |   10 ++++++++++
 include/linux/netdevice.h |    3 +++
 net/core/rtnetlink.c      |   33 ++++++++++++++++++++++++++++++---
 3 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 0ee969a..c52d4b5 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -279,6 +279,7 @@ enum {
 	IFLA_VF_MAC,		/* Hardware queue specific attributes */
 	IFLA_VF_VLAN,
 	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
+	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
 	__IFLA_VF_MAX,
 };
 
@@ -300,13 +301,22 @@ struct ifla_vf_tx_rate {
 	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling */
 };
 
+struct ifla_vf_spoofchk {
+	__u32 vf;
+	__u32 setting;
+};
+#ifdef __KERNEL__
+
+/* We don't want this structure exposed to user space */
 struct ifla_vf_info {
 	__u32 vf;
 	__u8 mac[32];
 	__u32 vlan;
 	__u32 qos;
 	__u32 tx_rate;
+	__u32 spoofchk;
 };
+#endif
 
 /* VF ports management section
  *
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 43b3298..0db1f5f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -781,6 +781,7 @@ struct netdev_tc_txq {
  * int (*ndo_set_vf_mac)(struct net_device *dev, int vf, u8* mac);
  * int (*ndo_set_vf_vlan)(struct net_device *dev, int vf, u16 vlan, u8 qos);
  * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
+ * int (*ndo_set_vf_spoofchk)(struct net_device *dev, int vf, bool setting);
  * int (*ndo_get_vf_config)(struct net_device *dev,
  *			    int vf, struct ifla_vf_info *ivf);
  * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
@@ -900,6 +901,8 @@ struct net_device_ops {
 						   int queue, u16 vlan, u8 qos);
 	int			(*ndo_set_vf_tx_rate)(struct net_device *dev,
 						      int vf, int rate);
+	int			(*ndo_set_vf_spoofchk)(struct net_device *dev,
+						       int vf, bool setting);
 	int			(*ndo_get_vf_config)(struct net_device *dev,
 						     int vf,
 						     struct ifla_vf_info *ivf);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 39f8dd6..9083e82 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -731,7 +731,8 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev)
 		size += num_vfs *
 			(nla_total_size(sizeof(struct ifla_vf_mac)) +
 			 nla_total_size(sizeof(struct ifla_vf_vlan)) +
-			 nla_total_size(sizeof(struct ifla_vf_tx_rate)));
+			 nla_total_size(sizeof(struct ifla_vf_tx_rate)) +
+			 nla_total_size(sizeof(struct ifla_vf_spoofchk)));
 		return size;
 	} else
 		return 0;
@@ -954,13 +955,27 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			struct ifla_vf_mac vf_mac;
 			struct ifla_vf_vlan vf_vlan;
 			struct ifla_vf_tx_rate vf_tx_rate;
+			struct ifla_vf_spoofchk vf_spoofchk;
+
+			/*
+			 * Not all SR-IOV capable drivers support the
+			 * spoofcheck query.  Preset to -1 so the user
+			 * space tool can detect that the driver didn't
+			 * report anything.
+			 */
+			ivi.spoofchk = -1;
 			if (dev->netdev_ops->ndo_get_vf_config(dev, i, &ivi))
 				break;
-			vf_mac.vf = vf_vlan.vf = vf_tx_rate.vf = ivi.vf;
+			vf_mac.vf =
+				vf_vlan.vf =
+				vf_tx_rate.vf =
+				vf_spoofchk.vf = ivi.vf;
+
 			memcpy(vf_mac.mac, ivi.mac, sizeof(ivi.mac));
 			vf_vlan.vlan = ivi.vlan;
 			vf_vlan.qos = ivi.qos;
 			vf_tx_rate.rate = ivi.tx_rate;
+			vf_spoofchk.setting = ivi.spoofchk;
 			vf = nla_nest_start(skb, IFLA_VF_INFO);
 			if (!vf) {
 				nla_nest_cancel(skb, vfinfo);
@@ -968,7 +983,10 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			}
 			NLA_PUT(skb, IFLA_VF_MAC, sizeof(vf_mac), &vf_mac);
 			NLA_PUT(skb, IFLA_VF_VLAN, sizeof(vf_vlan), &vf_vlan);
-			NLA_PUT(skb, IFLA_VF_TX_RATE, sizeof(vf_tx_rate), &vf_tx_rate);
+			NLA_PUT(skb, IFLA_VF_TX_RATE, sizeof(vf_tx_rate),
+				&vf_tx_rate);
+			NLA_PUT(skb, IFLA_VF_SPOOFCHK, sizeof(vf_spoofchk),
+				&vf_spoofchk);
 			nla_nest_end(skb, vf);
 		}
 		nla_nest_end(skb, vfinfo);
@@ -1202,6 +1220,15 @@ static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
 							      ivt->rate);
 			break;
 		}
+		case IFLA_VF_SPOOFCHK: {
+			struct ifla_vf_spoofchk *ivs;
+			ivs = nla_data(vf);
+			err = -EOPNOTSUPP;
+			if (ops->ndo_set_vf_spoofchk)
+				err = ops->ndo_set_vf_spoofchk(dev, ivs->vf,
+							       ivs->setting);
+			break;
+		}
 		default:
 			err = -EINVAL;
 			break;
-- 
1.7.6.4

^ permalink raw reply related

* [net-next 1/6] e1000e: locking bug introduced by commit 67fd4fcb
From: Jeff Kirsher @ 2011-10-14  6:21 UTC (permalink / raw)
  To: davem; +Cc: Bruce Allan, netdev, gospo, sassmann
In-Reply-To: <1318573288-18286-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Bruce Allan <bruce.w.allan@intel.com>

Commit 67fd4fcb (e1000e: convert to stats64) added the ability to update
statistics more accurately and on-demand through the net_device_ops
.ndo_get_stats64 hook, but introduced a locking bug on 82577/8/9 when
linked at half-duplex (seen on kernels with CONFIG_DEBUG_ATOMIC_SLEEP=y and
CONFIG_PROVE_LOCKING=y).  The commit introduced code paths that caused a
mutex to be locked in atomic contexts, e.g. an rcu_read_lock is held when
irqbalance reads the stats from /sys/class/net/ethX/statistics causing the
mutex to be locked to read the Phy half-duplex statistics registers.

The mutex was originally introduced to prevent concurrent accesses of
resources (the NVM and Phy) shared by the driver, firmware and hardware
a few years back when there was an issue with the NVM getting corrupted.
It was later split into two mutexes - one for the NVM and one for the Phy
when it was determined the NVM, unlike the Phy, should not be protected by
the software/firmware/hardware semaphore (arbitration of which is done in
part with the SWFLAG bit in the EXTCNF_CTRL register).  This latter
semaphore should be sufficient to prevent resource contention of the Phy in
the driver (i.e. the mutex for Phy accesses is not needed), but to be sure
the mutex is replaced with an atomic bit flag which will warn if any
contention is possible.

Also add additional debug output to help determine when the sw/fw/hw
semaphore is owned by the firmware or hardware.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Reported-by: Francois Romieu <romieu@fr.zoreil.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
---
 drivers/net/ethernet/intel/e1000e/e1000.h   |    1 +
 drivers/net/ethernet/intel/e1000e/ich8lan.c |   21 +++++++++++++--------
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h b/drivers/net/ethernet/intel/e1000e/e1000.h
index 7877b9c..9fe18d1 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -469,6 +469,7 @@ struct e1000_info {
 enum e1000_state_t {
 	__E1000_TESTING,
 	__E1000_RESETTING,
+	__E1000_ACCESS_SHARED_RESOURCE,
 	__E1000_DOWN
 };
 
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index 4f70974..6a17c62 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -852,8 +852,6 @@ static void e1000_release_nvm_ich8lan(struct e1000_hw *hw)
 	mutex_unlock(&nvm_mutex);
 }
 
-static DEFINE_MUTEX(swflag_mutex);
-
 /**
  *  e1000_acquire_swflag_ich8lan - Acquire software control flag
  *  @hw: pointer to the HW structure
@@ -866,7 +864,12 @@ static s32 e1000_acquire_swflag_ich8lan(struct e1000_hw *hw)
 	u32 extcnf_ctrl, timeout = PHY_CFG_TIMEOUT;
 	s32 ret_val = 0;
 
-	mutex_lock(&swflag_mutex);
+	if (test_and_set_bit(__E1000_ACCESS_SHARED_RESOURCE,
+			     &hw->adapter->state)) {
+		WARN(1, "e1000e: %s: contention for Phy access\n",
+		     hw->adapter->netdev->name);
+		return -E1000_ERR_PHY;
+	}
 
 	while (timeout) {
 		extcnf_ctrl = er32(EXTCNF_CTRL);
@@ -878,7 +881,7 @@ static s32 e1000_acquire_swflag_ich8lan(struct e1000_hw *hw)
 	}
 
 	if (!timeout) {
-		e_dbg("SW/FW/HW has locked the resource for too long.\n");
+		e_dbg("SW has already locked the resource.\n");
 		ret_val = -E1000_ERR_CONFIG;
 		goto out;
 	}
@@ -898,7 +901,9 @@ static s32 e1000_acquire_swflag_ich8lan(struct e1000_hw *hw)
 	}
 
 	if (!timeout) {
-		e_dbg("Failed to acquire the semaphore.\n");
+		e_dbg("Failed to acquire the semaphore, FW or HW has it: "
+		      "FWSM=0x%8.8x EXTCNF_CTRL=0x%8.8x)\n",
+		      er32(FWSM), extcnf_ctrl);
 		extcnf_ctrl &= ~E1000_EXTCNF_CTRL_SWFLAG;
 		ew32(EXTCNF_CTRL, extcnf_ctrl);
 		ret_val = -E1000_ERR_CONFIG;
@@ -907,7 +912,7 @@ static s32 e1000_acquire_swflag_ich8lan(struct e1000_hw *hw)
 
 out:
 	if (ret_val)
-		mutex_unlock(&swflag_mutex);
+		clear_bit(__E1000_ACCESS_SHARED_RESOURCE, &hw->adapter->state);
 
 	return ret_val;
 }
@@ -932,7 +937,7 @@ static void e1000_release_swflag_ich8lan(struct e1000_hw *hw)
 		e_dbg("Semaphore unexpectedly released by sw/fw/hw\n");
 	}
 
-	mutex_unlock(&swflag_mutex);
+	clear_bit(__E1000_ACCESS_SHARED_RESOURCE, &hw->adapter->state);
 }
 
 /**
@@ -3139,7 +3144,7 @@ static s32 e1000_reset_hw_ich8lan(struct e1000_hw *hw)
 	msleep(20);
 
 	if (!ret_val)
-		mutex_unlock(&swflag_mutex);
+		clear_bit(__E1000_ACCESS_SHARED_RESOURCE, &hw->adapter->state);
 
 	if (ctrl & E1000_CTRL_PHY_RST) {
 		ret_val = hw->phy.ops.get_cfg_done(hw);
-- 
1.7.6.4

^ permalink raw reply related

* [net-next 3/6] ixgbe: Add new netdev op to turn spoof checking on or off per VF
From: Jeff Kirsher @ 2011-10-14  6:21 UTC (permalink / raw)
  To: davem; +Cc: Greg Rose, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1318573288-18286-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Greg Rose <gregory.v.rose@intel.com>

Implements the new netdev op to allow user configuration of spoof
checking on a per VF basis.

V2 - Change netdev spoof check op setting to bool

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h       |    3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c  |   10 ++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |   48 ++++++++++++++++++++---
 drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h |    1 +
 4 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index c1f76aa..6c4d693 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -130,6 +130,8 @@ struct vf_data_storage {
 	u16 pf_vlan; /* When set, guest VLAN config not allowed. */
 	u16 pf_qos;
 	u16 tx_rate;
+	u16 vlan_count;
+	u8 spoofchk_enabled;
 	struct pci_dev *vfdev;
 };
 
@@ -509,7 +511,6 @@ struct ixgbe_adapter {
 	int vf_rate_link_speed;
 	struct vf_macvlans vf_mvs;
 	struct vf_macvlans *mv_list;
-	bool antispoofing_enabled;
 
 	struct hlist_head fdir_filter_list;
 	union ixgbe_atr_input fdir_mask;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index f740a8e..fb7d884 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2816,6 +2816,7 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 	u32 vt_reg_bits;
 	u32 reg_offset, vf_shift;
 	u32 vmdctl;
+	int i;
 
 	if (!(adapter->flags & IXGBE_FLAG_SRIOV_ENABLED))
 		return;
@@ -2851,9 +2852,13 @@ static void ixgbe_configure_virtualization(struct ixgbe_adapter *adapter)
 	IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
 	/* Enable MAC Anti-Spoofing */
 	hw->mac.ops.set_mac_anti_spoofing(hw,
-					  (adapter->antispoofing_enabled =
-					   (adapter->num_vfs != 0)),
+					   (adapter->num_vfs != 0),
 					  adapter->num_vfs);
+	/* For VFs that have spoof checking turned off */
+	for (i = 0; i < adapter->num_vfs; i++) {
+		if (!adapter->vfinfo[i].spoofchk_enabled)
+			ixgbe_ndo_set_vf_spoofchk(adapter->netdev, i, false);
+	}
 }
 
 static void ixgbe_set_rx_buffer_len(struct ixgbe_adapter *adapter)
@@ -7277,6 +7282,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_set_vf_mac		= ixgbe_ndo_set_vf_mac,
 	.ndo_set_vf_vlan	= ixgbe_ndo_set_vf_vlan,
 	.ndo_set_vf_tx_rate	= ixgbe_ndo_set_vf_bw,
+	.ndo_set_vf_spoofchk    = ixgbe_ndo_set_vf_spoofchk,
 	.ndo_get_vf_config	= ixgbe_ndo_get_vf_config,
 	.ndo_get_stats64	= ixgbe_get_stats64,
 	.ndo_setup_tc		= ixgbe_setup_tc,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
index 468ddd0..db95731 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c
@@ -151,6 +151,8 @@ void ixgbe_enable_sriov(struct ixgbe_adapter *adapter,
 		/* Disable RSC when in SR-IOV mode */
 		adapter->flags2 &= ~(IXGBE_FLAG2_RSC_CAPABLE |
 				     IXGBE_FLAG2_RSC_ENABLED);
+		for (i = 0; i < adapter->num_vfs; i++)
+			adapter->vfinfo[i].spoofchk_enabled = true;
 		return;
 	}
 
@@ -620,7 +622,13 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 			       vf);
 			retval = -1;
 		} else {
+			if (add)
+				adapter->vfinfo[vf].vlan_count++;
+			else if (adapter->vfinfo[vf].vlan_count)
+				adapter->vfinfo[vf].vlan_count--;
 			retval = ixgbe_set_vf_vlan(adapter, add, vid, vf);
+			if (!retval && adapter->vfinfo[vf].spoofchk_enabled)
+				hw->mac.ops.set_vlan_anti_spoofing(hw, true, vf);
 		}
 		break;
 	case IXGBE_VF_SET_MACVLAN:
@@ -632,12 +640,8 @@ static int ixgbe_rcv_msg_from_vf(struct ixgbe_adapter *adapter, u32 vf)
 		 * greater than 0 will indicate the VF is setting a
 		 * macvlan MAC filter.
 		 */
-		if (index > 0 && adapter->antispoofing_enabled) {
-			hw->mac.ops.set_mac_anti_spoofing(hw, false,
-							  adapter->num_vfs);
-			hw->mac.ops.set_vlan_anti_spoofing(hw, false, vf);
-			adapter->antispoofing_enabled = false;
-		}
+		if (index > 0 && adapter->vfinfo[vf].spoofchk_enabled)
+			ixgbe_ndo_set_vf_spoofchk(adapter->netdev, vf, false);
 		retval = ixgbe_set_vf_macvlan(adapter, vf, index,
 					      (unsigned char *)(&msgbuf[1]));
 		break;
@@ -748,8 +752,9 @@ int ixgbe_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan, u8 qos)
 			goto out;
 		ixgbe_set_vmvir(adapter, vlan | (qos << VLAN_PRIO_SHIFT), vf);
 		ixgbe_set_vmolr(hw, vf, false);
-		if (adapter->antispoofing_enabled)
+		if (adapter->vfinfo[vf].spoofchk_enabled)
 			hw->mac.ops.set_vlan_anti_spoofing(hw, true, vf);
+		adapter->vfinfo[vf].vlan_count++;
 		adapter->vfinfo[vf].pf_vlan = vlan;
 		adapter->vfinfo[vf].pf_qos = qos;
 		dev_info(&adapter->pdev->dev,
@@ -768,6 +773,8 @@ int ixgbe_ndo_set_vf_vlan(struct net_device *netdev, int vf, u16 vlan, u8 qos)
 		ixgbe_set_vmvir(adapter, vlan, vf);
 		ixgbe_set_vmolr(hw, vf, true);
 		hw->mac.ops.set_vlan_anti_spoofing(hw, false, vf);
+		if (adapter->vfinfo[vf].vlan_count)
+			adapter->vfinfo[vf].vlan_count--;
 		adapter->vfinfo[vf].pf_vlan = 0;
 		adapter->vfinfo[vf].pf_qos = 0;
        }
@@ -877,6 +884,32 @@ int ixgbe_ndo_set_vf_bw(struct net_device *netdev, int vf, int tx_rate)
 	return 0;
 }
 
+int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, bool setting)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(netdev);
+	int vf_target_reg = vf >> 3;
+	int vf_target_shift = vf % 8;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u32 regval;
+
+	adapter->vfinfo[vf].spoofchk_enabled = setting;
+
+	regval = IXGBE_READ_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg));
+	regval &= ~(1 << vf_target_shift);
+	regval |= (setting << vf_target_shift);
+	IXGBE_WRITE_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg), regval);
+
+	if (adapter->vfinfo[vf].vlan_count) {
+		vf_target_shift += IXGBE_SPOOF_VLANAS_SHIFT;
+		regval = IXGBE_READ_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg));
+		regval &= ~(1 << vf_target_shift);
+		regval |= (setting << vf_target_shift);
+		IXGBE_WRITE_REG(hw, IXGBE_PFVFSPOOF(vf_target_reg), regval);
+	}
+
+	return 0;
+}
+
 int ixgbe_ndo_get_vf_config(struct net_device *netdev,
 			    int vf, struct ifla_vf_info *ivi)
 {
@@ -888,5 +921,6 @@ int ixgbe_ndo_get_vf_config(struct net_device *netdev,
 	ivi->tx_rate = adapter->vfinfo[vf].tx_rate;
 	ivi->vlan = adapter->vfinfo[vf].pf_vlan;
 	ivi->qos = adapter->vfinfo[vf].pf_qos;
+	ivi->spoofchk = adapter->vfinfo[vf].spoofchk_enabled;
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
index 2781847..5a7e1eb 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.h
@@ -38,6 +38,7 @@ int ixgbe_ndo_set_vf_mac(struct net_device *netdev, int queue, u8 *mac);
 int ixgbe_ndo_set_vf_vlan(struct net_device *netdev, int queue, u16 vlan,
 			   u8 qos);
 int ixgbe_ndo_set_vf_bw(struct net_device *netdev, int vf, int tx_rate);
+int ixgbe_ndo_set_vf_spoofchk(struct net_device *netdev, int vf, bool setting);
 int ixgbe_ndo_get_vf_config(struct net_device *netdev,
 			    int vf, struct ifla_vf_info *ivi);
 void ixgbe_check_vf_rate_limit(struct ixgbe_adapter *adapter);
-- 
1.7.6.4

^ permalink raw reply related

* [net-next 5/6] igb: fix timecompare_upate race condition
From: Jeff Kirsher @ 2011-10-14  6:21 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1318573288-18286-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

This patch closes a possible race condition when timestamping using
the timecompare_update function as a method to detect clock skew of
the internal cycle counter. Because timecompare_update usually allows
skew detection no more than once a second, if ptpd or other software
performs a clock offset (for example, using the "date" command), there
is a small window of time where the clock skew will not match the
current kernel wall time. This patch forces the timecompare_update to
calculate skew every time we timestamp a packet, which removes the
possibility of this race condition.

Signed-off-by: Jacob E Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index c10cc71..8f3296d 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5581,7 +5581,15 @@ static void igb_systim_to_hwtstamp(struct igb_adapter *adapter,
 		regval <<= IGB_82580_TSYNC_SHIFT;
 
 	ns = timecounter_cyc2time(&adapter->clock, regval);
-	timecompare_update(&adapter->compare, ns);
+
+	/*
+	 * force timecompare_update to calculate the skew (even if
+	 * less than one second has passed since the last update) in
+	 * order to prevent the possibility that an offset has been
+	 * applied to the wall time. this ensures valid timestamps are
+	 * passed to the network stack.
+	 */
+	timecompare_update(&adapter->compare, 0);
 	memset(shhwtstamps, 0, sizeof(struct skb_shared_hwtstamps));
 	shhwtstamps->hwtstamp = ns_to_ktime(ns);
 	shhwtstamps->syststamp = timecompare_transform(&adapter->compare, ns);
-- 
1.7.6.4

^ permalink raw reply related

* [net-next 4/6] igb: enable l4 timestamping for v2 event packets
From: Jeff Kirsher @ 2011-10-14  6:21 UTC (permalink / raw)
  To: davem; +Cc: Jacob Keller, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1318573288-18286-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

When enabling hardware timestamping for ptp v2 event packets, the
software does not setup the queue for l4 packets, although layer 4
packets are valid for v2. This patch adds the flag which enables
setting up a queue and enabling udp packet timestamping.

Signed-off-by: Jacob E Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 06109af..c10cc71 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -6268,6 +6268,7 @@ static int igb_hwtstamp_ioctl(struct net_device *netdev,
 		tsync_rx_ctl |= E1000_TSYNCRXCTL_TYPE_EVENT_V2;
 		config.rx_filter = HWTSTAMP_FILTER_PTP_V2_EVENT;
 		is_l2 = true;
+		is_l4 = true;
 		break;
 	default:
 		return -ERANGE;
-- 
1.7.6.4

^ permalink raw reply related

* [net-next 6/6] igbvf: convert to ndo_fix_features
From: Jeff Kirsher @ 2011-10-14  6:21 UTC (permalink / raw)
  To: davem; +Cc: Michał Mirosław, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1318573288-18286-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>

Private rx_csum flags are now duplicate of netdev->features & NETIF_F_RXCSUM.
Removing this needs deeper surgery.

Things noticed:
 - HW VLAN acceleration probably can be toggled, but it's left as is
 - the resets on RX csum offload change can probably be avoided
 - there is A LOT of copy-and-pasted code here

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igbvf/ethtool.c |   57 ----------------------------
 drivers/net/ethernet/intel/igbvf/netdev.c  |   25 ++++++++++--
 2 files changed, 20 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/intel/igbvf/ethtool.c b/drivers/net/ethernet/intel/igbvf/ethtool.c
index 0ee8b68..2c25858 100644
--- a/drivers/net/ethernet/intel/igbvf/ethtool.c
+++ b/drivers/net/ethernet/intel/igbvf/ethtool.c
@@ -128,55 +128,6 @@ static int igbvf_set_pauseparam(struct net_device *netdev,
 	return -EOPNOTSUPP;
 }
 
-static u32 igbvf_get_rx_csum(struct net_device *netdev)
-{
-	struct igbvf_adapter *adapter = netdev_priv(netdev);
-	return !(adapter->flags & IGBVF_FLAG_RX_CSUM_DISABLED);
-}
-
-static int igbvf_set_rx_csum(struct net_device *netdev, u32 data)
-{
-	struct igbvf_adapter *adapter = netdev_priv(netdev);
-
-	if (data)
-		adapter->flags &= ~IGBVF_FLAG_RX_CSUM_DISABLED;
-	else
-		adapter->flags |= IGBVF_FLAG_RX_CSUM_DISABLED;
-
-	return 0;
-}
-
-static u32 igbvf_get_tx_csum(struct net_device *netdev)
-{
-	return (netdev->features & NETIF_F_IP_CSUM) != 0;
-}
-
-static int igbvf_set_tx_csum(struct net_device *netdev, u32 data)
-{
-	if (data)
-		netdev->features |= (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM);
-	else
-		netdev->features &= ~(NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM);
-	return 0;
-}
-
-static int igbvf_set_tso(struct net_device *netdev, u32 data)
-{
-	struct igbvf_adapter *adapter = netdev_priv(netdev);
-
-	if (data) {
-		netdev->features |= NETIF_F_TSO;
-		netdev->features |= NETIF_F_TSO6;
-	} else {
-		netdev->features &= ~NETIF_F_TSO;
-		netdev->features &= ~NETIF_F_TSO6;
-	}
-
-	dev_info(&adapter->pdev->dev, "TSO is %s\n",
-	         data ? "Enabled" : "Disabled");
-	return 0;
-}
-
 static u32 igbvf_get_msglevel(struct net_device *netdev)
 {
 	struct igbvf_adapter *adapter = netdev_priv(netdev);
@@ -507,14 +458,6 @@ static const struct ethtool_ops igbvf_ethtool_ops = {
 	.set_ringparam		= igbvf_set_ringparam,
 	.get_pauseparam		= igbvf_get_pauseparam,
 	.set_pauseparam		= igbvf_set_pauseparam,
-	.get_rx_csum            = igbvf_get_rx_csum,
-	.set_rx_csum            = igbvf_set_rx_csum,
-	.get_tx_csum		= igbvf_get_tx_csum,
-	.set_tx_csum		= igbvf_set_tx_csum,
-	.get_sg			= ethtool_op_get_sg,
-	.set_sg			= ethtool_op_set_sg,
-	.get_tso		= ethtool_op_get_tso,
-	.set_tso		= igbvf_set_tso,
 	.self_test		= igbvf_diag_test,
 	.get_sset_count		= igbvf_get_sset_count,
 	.get_strings		= igbvf_get_strings,
diff --git a/drivers/net/ethernet/intel/igbvf/netdev.c b/drivers/net/ethernet/intel/igbvf/netdev.c
index b3d760b..32b3044 100644
--- a/drivers/net/ethernet/intel/igbvf/netdev.c
+++ b/drivers/net/ethernet/intel/igbvf/netdev.c
@@ -2530,6 +2530,18 @@ static void igbvf_print_device_info(struct igbvf_adapter *adapter)
 	dev_info(&pdev->dev, "MAC: %d\n", hw->mac.type);
 }
 
+static int igbvf_set_features(struct net_device *netdev, u32 features)
+{
+	struct igbvf_adapter *adapter = netdev_priv(netdev);
+
+	if (features & NETIF_F_RXCSUM)
+		adapter->flags &= ~IGBVF_FLAG_RX_CSUM_DISABLED;
+	else
+		adapter->flags |= IGBVF_FLAG_RX_CSUM_DISABLED;
+
+	return 0;
+}
+
 static const struct net_device_ops igbvf_netdev_ops = {
 	.ndo_open                       = igbvf_open,
 	.ndo_stop                       = igbvf_close,
@@ -2545,6 +2557,7 @@ static const struct net_device_ops igbvf_netdev_ops = {
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller            = igbvf_netpoll,
 #endif
+	.ndo_set_features               = igbvf_set_features,
 };
 
 /**
@@ -2652,16 +2665,18 @@ static int __devinit igbvf_probe(struct pci_dev *pdev,
 
 	adapter->bd_number = cards_found++;
 
-	netdev->features = NETIF_F_SG |
+	netdev->hw_features = NETIF_F_SG |
 	                   NETIF_F_IP_CSUM |
+			   NETIF_F_IPV6_CSUM |
+			   NETIF_F_TSO |
+			   NETIF_F_TSO6 |
+			   NETIF_F_RXCSUM;
+
+	netdev->features = netdev->hw_features |
 	                   NETIF_F_HW_VLAN_TX |
 	                   NETIF_F_HW_VLAN_RX |
 	                   NETIF_F_HW_VLAN_FILTER;
 
-	netdev->features |= NETIF_F_IPV6_CSUM;
-	netdev->features |= NETIF_F_TSO;
-	netdev->features |= NETIF_F_TSO6;
-
 	if (pci_using_dac)
 		netdev->features |= NETIF_F_HIGHDMA;
 
-- 
1.7.6.4

^ permalink raw reply related

* [iproute2] iproute2: Add new command to ip link to enable/disable VF spoof check
From: Jeff Kirsher @ 2011-10-14  6:31 UTC (permalink / raw)
  To: davem, shemminger; +Cc: Greg Rose, netdev, gospo, sassmann, Jeff Kirsher

From: Greg Rose <gregory.v.rose@intel.com>

Add ip link command parsing for VF spoof checking enable/disable

V2 - Fixed problem with parsing of dump info on kernels that don't
     support the spoof checking option and also wrapped the ifla_vf_info
     structure in #ifdef __KERNEL__ to prevent user space from directly
     accessing the structure
V3 - Improved parsing of vfinfo
V4 - Put Makefile back to proper list of subdirs
V5 - Remove struct ifla_vf_info, it is only used by the kernel
V6 - Make sure spoof check is reported by the driver - rtnl will set
     it to -1 to indicate driver didn't report a value.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 include/linux/if_link.h |    8 +++-----
 ip/ipaddress.c          |   19 +++++++++++++++++++
 ip/iplink.c             |   15 +++++++++++++++
 man/man8/ip.8           |    4 +++-
 4 files changed, 40 insertions(+), 6 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 304c44f..d3bc04c 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -277,6 +277,7 @@ enum {
 	IFLA_VF_MAC,		/* Hardware queue specific attributes */
 	IFLA_VF_VLAN,
 	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
+	IFLA_VF_SPOOFCHK,	/* Spoof Checking on/off switch */
 	__IFLA_VF_MAX,
 };
 
@@ -298,12 +299,9 @@ struct ifla_vf_tx_rate {
 	__u32 rate; /* Max TX bandwidth in Mbps, 0 disables throttling */
 };
 
-struct ifla_vf_info {
+struct ifla_vf_spoofchk {
 	__u32 vf;
-	__u8 mac[32];
-	__u32 vlan;
-	__u32 qos;
-	__u32 tx_rate;
+	__u32 setting;
 };
 
 /* VF ports management section
diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 85f05a2..2f2cabd 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -197,7 +197,9 @@ static void print_vfinfo(FILE *fp, struct rtattr *vfinfo)
 	struct ifla_vf_mac *vf_mac;
 	struct ifla_vf_vlan *vf_vlan;
 	struct ifla_vf_tx_rate *vf_tx_rate;
+	struct ifla_vf_spoofchk *vf_spoofchk;
 	struct rtattr *vf[IFLA_VF_MAX+1];
+	struct rtattr *tmp;
 	SPRINT_BUF(b1);
 
 	if (vfinfo->rta_type != IFLA_VF_INFO) {
@@ -211,6 +213,17 @@ static void print_vfinfo(FILE *fp, struct rtattr *vfinfo)
 	vf_vlan = RTA_DATA(vf[IFLA_VF_VLAN]);
 	vf_tx_rate = RTA_DATA(vf[IFLA_VF_TX_RATE]);
 
+	/* Check if the spoof checking vf info type is supported by
+	 * this kernel.
+	 */
+	tmp = (struct rtattr *)((char *)vf[IFLA_VF_TX_RATE] +
+			vf[IFLA_VF_TX_RATE]->rta_len);
+
+	if (tmp->rta_type != IFLA_VF_SPOOFCHK)
+		vf_spoofchk = NULL;
+	else
+		vf_spoofchk = RTA_DATA(vf[IFLA_VF_SPOOFCHK]);
+
 	fprintf(fp, "\n    vf %d MAC %s", vf_mac->vf,
 		ll_addr_n2a((unsigned char *)&vf_mac->mac,
 		ETH_ALEN, 0, b1, sizeof(b1)));
@@ -220,6 +233,12 @@ static void print_vfinfo(FILE *fp, struct rtattr *vfinfo)
 		fprintf(fp, ", qos %d", vf_vlan->qos);
 	if (vf_tx_rate->rate)
 		fprintf(fp, ", tx rate %d (Mbps)", vf_tx_rate->rate);
+	if (vf_spoofchk && vf_spoofchk->setting != -1) {
+		if (vf_spoofchk->setting)
+			fprintf(fp, ", spoof checking on");
+		else
+			fprintf(fp, ", spoof checking off");
+	}
 }
 
 int print_linkinfo(const struct sockaddr_nl *who,
diff --git a/ip/iplink.c b/ip/iplink.c
index 35e6dc6..ca1aaeb 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -71,7 +71,10 @@ void iplink_usage(void)
 	fprintf(stderr, "			  [ alias NAME ]\n");
 	fprintf(stderr, "	                  [ vf NUM [ mac LLADDR ]\n");
 	fprintf(stderr, "				   [ vlan VLANID [ qos VLAN-QOS ] ]\n");
+
 	fprintf(stderr, "				   [ rate TXRATE ] ] \n");
+
+	fprintf(stderr, "				   [ spoofchk { on | off} ] ] \n");
 	fprintf(stderr, "			  [ master DEVICE ]\n");
 	fprintf(stderr, "			  [ nomaster ]\n");
 	fprintf(stderr, "       ip link show [ DEVICE | group GROUP ]\n");
@@ -228,6 +231,18 @@ int iplink_parse_vf(int vf, int *argcp, char ***argvp,
 			ivt.vf = vf;
 			addattr_l(&req->n, sizeof(*req), IFLA_VF_TX_RATE, &ivt, sizeof(ivt));
 		
+		} else if (matches(*argv, "spoofchk") == 0) {
+			struct ifla_vf_spoofchk ivs;
+			NEXT_ARG();
+			if (matches(*argv, "on") == 0)
+				ivs.setting = 1;
+			else if (matches(*argv, "off") == 0)
+				ivs.setting = 0;
+			else
+				invarg("Invalid \"spoofchk\" value\n", *argv);
+			ivs.vf = vf;
+			addattr_l(&req->n, sizeof(*req), IFLA_VF_SPOOFCHK, &ivs, sizeof(ivs));
+
 		} else {
 			/* rewind arg */
 			PREV_ARG();
diff --git a/man/man8/ip.8 b/man/man8/ip.8
index 36431b6..a20eca7 100644
--- a/man/man8/ip.8
+++ b/man/man8/ip.8
@@ -100,7 +100,9 @@ ip \- show / manipulate routing, devices, policy routing and tunnels
 .B qos
 .IR VLAN-QOS " ] ] ["
 .B rate
-.IR TXRATE " ] |"
+.IR TXRATE " ] ["
+.B spoofchk { on | off }
+] |
 .br
 .B master
 .IR DEVICE
-- 
1.7.6.4

^ permalink raw reply related

* Re: [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes
From: Steffen Klassert @ 2011-10-14  6:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20111012.170805.2172804476308993385.davem@davemloft.net>

On Wed, Oct 12, 2011 at 05:08:05PM -0400, David Miller wrote:
> 
> You really can't do this, it's going to kill all of the memory savings from
> storing metrics in the inetpeer cache.
> 
> Every input route is going to have it's metrics COW'd with this change.
> 

Ok, I missed this completely. So if input and output routes share the
inetpeer information and we don't want to copy, we might not use the
(learned) pmtu informations on the inetpeer for input routes. So
for input routes, dst_mtu() could return dst->ops->default_mtu()
instead of the mtu informations stored on the metric. Are there other
(better) solutions?

^ permalink raw reply

* Re: [net-next 1/5] stmmac: add CHAINED descriptor mode support (V2)
From: Giuseppe CAVALLARO @ 2011-10-14  7:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, rayagond
In-Reply-To: <20111013.163946.859494070937160730.davem@davemloft.net>

On 10/13/2011 10:39 PM, David Miller wrote:
> From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
> Date: Wed, 12 Oct 2011 15:38:04 +0200
> 
>> +#if defined(CONFIG_STMMAC_RING)
>> +
>> +static unsigned int stmmac_jumbo_frm(struct stmmac_priv *priv,
>> +				     struct sk_buff *skb, int csum_insertion)
>> +{
> 
> This is not exactly what I meant.
> 
> In your original patch, two or three line snippets of code were conditionalized.
> 
> That's what I wanted you to do here.  Keep as much common code around as possible
> in the driver *.c file, but the small 2 or 3 line conditional parts are implemented
> in very small well contained inline functions implemented in a header file.
> 
> These small, 2 or 3 line, inline functions are where the ifdefs go.
> 
> I didn't mean to replicate all of the functions, in their entirety, into some
> header file.

This is what I wanted to do indeed. :-(

I had added new small functions like where possible (used in the main):

static void stmmac_refill_desc3(int bfsize, struct dma_desc *p)
static void stmmac_init_desc3(int des3_as_data_buf, struct dma_desc *p)
static void stmmac_clean_desc3(struct dma_desc *p)

I guess this is what you actually wanted.

In other cases, I had put two implementation of the same function
specialized for ring and chained mode. This was the case of the enhanced
and normal descriptors. Instead of implementing new inline funtcs I
direcly moved the functions themselves into the header because small enough.

For example

inline void enh_desc_release_tx_desc(struct dma_desc *p)
{
	memset(p, 0, offsetof(struct dma_desc, des2));
	p->des01.etx.second_address_chained = 1;
}

and
inline void enh_desc_release_tx_desc(struct dma_desc *p)
{
	int ter = p->des01.etx.end_ring;

	memset(p, 0, offsetof(struct dma_desc, des2));
	p->des01.etx.end_ring = ter;
}


Unfortunately, jumbo frame function is big :-( and I agree with you that
it's not good to have this in the Header.

At any rate, I'll try to reduce the code in the header as much possible
although this makes more complex the driver's API.

Thanks for your feedback.

Let me know for other advice and comments

Regards
Peppe

> 
> You might was well put the entire driver into a header file, then you can add
> all the ifdefs you want :-)
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* Re: [net-next 2/5] stmmac: allow mtu bigger than 1500 in case of normal desc (V2).
From: Giuseppe CAVALLARO @ 2011-10-14  7:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, Deepak SIKRI
In-Reply-To: <1318449481.2644.11.camel@edumazet-laptop>

Hello Eric

On 10/12/2011 9:58 PM, Eric Dumazet wrote:
> Le mercredi 12 octobre 2011 à 15:38 +0200, Giuseppe CAVALLARO a écrit :
>> This patch allows to set the mtu bigger than 1500
>> in case of normal descriptors.
>> This is helping some SPEAr customers.
>>
>> Signed-off-by: Deepak SIKRI <deepak.sikri@st.com>
>> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
>> ---
>>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |    6 +++---
>>  1 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> index ba7af2c..de3e536 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> @@ -1357,17 +1357,17 @@ static void stmmac_set_rx_mode(struct net_device *dev)
>>  static int stmmac_change_mtu(struct net_device *dev, int new_mtu)
>>  {
>>  	struct stmmac_priv *priv = netdev_priv(dev);
>> -	int max_mtu;
>> +	int max_mtu = ETH_DATA_LEN;
> 
> Why are you setting max_mtu to ETH_DATA_LEN here ?
> 
>>  
>>  	if (netif_running(dev)) {
>>  		pr_err("%s: must be stopped to change its MTU\n", dev->name);
>>  		return -EBUSY;
>>  	}
>>  
>> -	if (priv->plat->has_gmac)
>> +	if (priv->plat->enh_desc)
>>  		max_mtu = JUMBO_LEN;
>>  	else
>> -		max_mtu = ETH_DATA_LEN;
>> +		max_mtu = BUF_SIZE_4KiB;
> 
> Since later you init to completely different values...


Hmm, yes you are right. it's not needed to initialized the max_mtu.

Thanks! I'll rework the patch and send it again in the V3.

Thx

Regards
Peppe

> 
> 
> 
> 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox