Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] fib_trie: only calc for the un-first node
From: David Miller @ 2013-10-10  4:17 UTC (permalink / raw)
  To: baker.kernel; +Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel
In-Reply-To: <1381203411-12047-1-git-send-email-baker.kernel@gmail.com>

From: baker.kernel@gmail.com
Date: Tue,  8 Oct 2013 11:36:51 +0800

> From: "baker.zhang" <baker.kernel@gmail.com>
> 
> This is a enhancement.
> 
> for the first node in fib_trie, newpos is 0, bit is 1.
> Only for the leaf or node with unmatched key need calc pos.
> 
> Signed-off-by: baker.zhang <baker.kernel@gmail.com>

The compiler is probably doing this transformation all by itself,
but applied anyways, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: gro: allow to build full sized skb
From: David Miller @ 2013-10-10  4:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1381248143.12191.53.camel@edumazet-glaptop.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 08 Oct 2013 09:02:23 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> skb_gro_receive() is currently limited to 16 or 17 MSS per GRO skb,
> typically 24616 bytes, because it fills up to MAX_SKB_FRAGS frags.
> 
> It's relatively easy to extend the skb using frag_list to allow
> more frags to be appended into the last sk_buff.
> 
> This still builds very efficient skbs, and allows reaching 45 MSS per
> skb.
> 
> (45 MSS GRO packet uses one skb plus a frag_list containing 2 additional
> sk_buff)
> 
> High speed TCP flows benefit from this extension by lowering TCP stack
> cpu usage (less packets stored in receive queue, less ACK packets
> processed)
> 
> Forwarding setups could be hurt, as such skbs will need to be
> linearized, although its not a new problem, as GRO could already
> provide skbs with a frag_list.
> 
> We could make the 65536 bytes threshold a tunable to mitigate this.
> 
> (First time we need to linearize skb in skb_needs_linearize(), we could
> lower the tunable to ~16*1460 so that following skb_gro_receive() calls
> build smaller skbs)
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH net-next] inet: includes a sock_common in request_sock
From: David Miller @ 2013-10-10  4:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1381357289.4971.35.camel@edumazet-glaptop.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 09 Oct 2013 15:21:29 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> TCP listener refactoring, part 5 :
> 
> We want to be able to insert request sockets (SYN_RECV) into main
> ehash table instead of the per listener hash table to allow RCU
> lookups and remove listener lock contention.
> 
> This patch includes the needed struct sock_common in front
> of struct request_sock
> 
> This means there is no more inet6_request_sock IPv6 specific
> structure.
> 
> Following inet_request_sock fields were renamed as they became
> macros to reference fields from struct sock_common.
> Prefix ir_ was chosen to avoid name collisions.
> 
> loc_port   -> ir_loc_port
> loc_addr   -> ir_loc_addr
> rmt_addr   -> ir_rmt_addr
> rmt_port   -> ir_rmt_port
> iif        -> ir_iif
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Looks good, applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH net-next] tcp: use ACCESS_ONCE() in tcp_update_pacing_rate()
From: David Miller @ 2013-10-10  4:18 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1381364092.4971.57.camel@edumazet-glaptop.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 09 Oct 2013 17:14:52 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> sk_pacing_rate is read by sch_fq packet scheduler at any time,
> with no synchronization, so make sure we update it in a
> sensible way. ACCESS_ONCE() is how we instruct compiler
> to not do stupid things, like using the memory location
> as a temporary variable.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Also applied, thanks a lot.

^ permalink raw reply

* [PATCH v2] Stmmac: fix a bug when the clk_csr is euqal to 0x0
From: Wan ZongShun @ 2013-10-10  4:40 UTC (permalink / raw)
  To: Giuseppe Cavallaro, David Miller, Wan ZongShun; +Cc: netdev

Hi Giuseppe,

According to your advice, I add this new field comments in stmmac.txt.
And I only make one patch for fix this issue and field comments.

Signed-off-by: Wan Zongshun <mcuos.com@gmail.com>

---
 Documentation/networking/stmmac.txt               | 3 +++
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
 include/linux/stmmac.h                            | 1 +
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/stmmac.txt
b/Documentation/networking/stmmac.txt
index 457b8bb..3ed0ea9 100644
--- a/Documentation/networking/stmmac.txt
+++ b/Documentation/networking/stmmac.txt
@@ -116,6 +116,7 @@ struct plat_stmmacenet_data {
     struct stmmac_mdio_bus_data *mdio_bus_data;
     struct stmmac_dma_cfg *dma_cfg;
     int clk_csr;
+    unsigned int dynamic_mdc_clk_en;
     int has_gmac;
     int enh_desc;
     int tx_coe;
@@ -148,6 +149,8 @@ Where:
        GMAC also enables the 4xPBL by default.
    o fixed_burst/mixed_burst/burst_len
  o clk_csr: fixed CSR Clock range selection.
+ o dynamic_mdc_clk_en: If it is set to >=1 MDC clk will be selected
dynamically,
+               or else you must set a fixed CSR Clock range to clk_src.
  o has_gmac: uses the GMAC core.
  o enh_desc: if sets the MAC will use the enhanced descriptor structure.
  o tx_coe: core is able to perform the tx csum in HW.
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 8d4ccd3..93fc6bc 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2741,7 +2741,7 @@ struct stmmac_priv *stmmac_dvr_probe(struct
device *device,
      * set the MDC clock dynamically according to the csr actual
      * clock input.
      */
-    if (!priv->plat->clk_csr)
+    if (!!priv->plat->dynamic_mdc_clk_en)
         stmmac_clk_csr_set(priv);
     else
         priv->clk_csr = priv->plat->clk_csr;
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index bb5deb0..1b9f0b5 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -101,6 +101,7 @@ struct plat_stmmacenet_data {
     struct stmmac_mdio_bus_data *mdio_bus_data;
     struct stmmac_dma_cfg *dma_cfg;
     int clk_csr;
+    unsigned int dynamic_mdc_clk_en;
     int has_gmac;
     int enh_desc;
     int tx_coe;
-- 
1.8.1.2


-- 
Wan ZongShun.
www.mcuos.com

^ permalink raw reply related

* Re: [PATCH] net: sh_eth: Fix RX packets errors on R8A7740
From: Simon Horman @ 2013-10-10  5:29 UTC (permalink / raw)
  To: Nguyen Hong Ky; +Cc: David S. Miller, netdev, Ryusuke Sakato, Sergei Shtylyov
In-Reply-To: <CAK9_=nCdBjCgkwX1QHjNvLdXZ+5jxPDV_+XHqhG8dAo1fK=tVw@mail.gmail.com>

On Wed, Oct 09, 2013 at 02:38:17PM +0900, Nguyen Hong Ky wrote:
> On Wed, Oct 9, 2013 at 2:26 PM, Simon Horman <horms@verge.net.au> wrote:
> > On Wed, Oct 09, 2013 at 01:28:26PM +0900, Simon Horman wrote:
> >> On Mon, Oct 07, 2013 at 03:29:25PM +0900, Nguyen Hong Ky wrote:
> >> > This patch will fix RX packets errors when receiving big size
> >> > of data by set bit RNC = 1.
> >> >
> >> > RNC - Receive Enable Control
> >> >
> >> > 0: Upon completion of reception of one frame, the E-DMAC writes
> >> > the receive status to the descriptor and clears the RR bit in
> >> > EDRRR to 0.
> >> >
> >> > 1: Upon completion of reception of one frame, the E-DMAC writes
> >> > (writes back) the receive status to the descriptor. In addition,
> >> > the E-DMAC reads the next descriptor and prepares for reception
> >> > of the next frame.
> >> >
> >> > In addition, for get more stable when receiving packets, I set
> >> > maximum size for the transmit/receive FIFO and inserts padding
> >> > in receive data.
> >> >
> >> > Signed-off-by: Nguyen Hong Ky <nh-ky@jinso.co.jp>
> >>
> >> I realise that this patch has been applied but regardless
> >> I would like to acknowledge that it resolve a problem that
> >> I observed this afternoon.
> >>
> >> Without this patch I see the following on the console, many times:
> >>
> >> net eth0: Receive FIFO Overflow
> >>
> >> With this patch I do not see the warning message at all.
> >
> > Scratch that, I am still seeing the messages with this patch applied.
> >
> 
> Would you please confirm that it was patched in below function:
> 
> static struct sh_eth_cpu_data r8a7740_data = {...}

Thanks Ky-san.

I am seeing the problem on next-next
ba53742 ("tcp: use ACCESS_ONCE() in tcp_update_pacing_rate()")
which includes this patch.

However, it seems that the patch was applied to
sh7734_data instead of r8a7740_data.

I moved the changes from sh7734_data to r8a7740_data
and the FIFO Overflow messages seem to have disappeared.

I will post a patch to move things around accordingly.

^ permalink raw reply

* LOAN OFFER
From: info @ 2013-10-10  5:19 UTC (permalink / raw)



We are European Loan service,Do you need a Loan,contact Via  
(europanloan@foxmail.com)

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

^ permalink raw reply

* [PATCH] net: sh_eth: Correct fix for RX packet errors on R8A7740
From: Simon Horman @ 2013-10-10  5:51 UTC (permalink / raw)
  To: David S. Miller, netdev, linux-sh
  Cc: Magnus Damm, Sergei Shtylyov, Nguyen Hong Ky, Simon Horman
In-Reply-To: <1381384276-8077-1-git-send-email-horms+renesas@verge.net.au>

Nguyen Hong Ky posted a patch to correct RX packet errors on R8A7740 which
was applied as 2c6221e4a5aab417 ("net: sh_eth: Fix RX packets errors on
R8A7740"). Unfortunately sh_eth.c contains many similar instances
of struct sh_eth_cpu_data and the patch was miss-applied, updating
sh7734_data instead of r8a7740_data.

This patch corrects this problem by.
1. Reverting the change to sh7734_data and;
2. Applying the change to r8a7740_data.

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
---
 drivers/net/ethernet/renesas/sh_eth.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
index c7e1b8f..b57c278 100644
--- a/drivers/net/ethernet/renesas/sh_eth.c
+++ b/drivers/net/ethernet/renesas/sh_eth.c
@@ -620,16 +620,12 @@ static struct sh_eth_cpu_data sh7734_data = {
 	.eesr_err_check	= EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT |
 			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE |
 			  EESR_TDE | EESR_ECI,
-	.fdr_value	= 0x0000070f,
-	.rmcr_value	= 0x00000001,

 	.apr		= 1,
 	.mpr		= 1,
 	.tpauser	= 1,
 	.bculr		= 1,
 	.hw_swap	= 1,
-	.rpadir		= 1,
-	.rpadir_value   = 2 << 16,
 	.no_trimd	= 1,
 	.no_ade		= 1,
 	.tsu		= 1,
@@ -692,12 +688,16 @@ static struct sh_eth_cpu_data r8a7740_data = {
 	.eesr_err_check	= EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT |
 			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE |
 			  EESR_TDE | EESR_ECI,
+	.fdr_value	= 0x0000070f,
+	.rmcr_value	= 0x00000001,

 	.apr		= 1,
 	.mpr		= 1,
 	.tpauser	= 1,
 	.bculr		= 1,
 	.hw_swap	= 1,
+	.rpadir		= 1,
+	.rpadir_value   = 2 << 16,
 	.no_trimd	= 1,
 	.no_ade		= 1,
 	.tsu		= 1,
-- 
1.8.4

^ permalink raw reply related

* [GIT PULL] Renesas SH ethernet fix for v3.12
From: Simon Horman @ 2013-10-10  5:51 UTC (permalink / raw)
  To: David S. Miller, netdev, linux-sh
  Cc: Magnus Damm, Sergei Shtylyov, Nguyen Hong Ky, Simon Horman

Hi Dave,

please consider this fix for SH ethernet fix for v3.12.
I am sending it as a pull-request rather than a stand-alone
patch as it resolves the miss-application of a patch and I wanted
to eliminate the chance of further miss-application.

That said, I am happy for you to just apply the patch rather
than do a git pull. But you may want to check that the result
is the same as what is present in my tag.

The following changes since commit 5abbeea553c8260ed4e2ac4aae962aff800b6c6d:

  can: at91-can: fix device to driver data mapping for platform devices (2013-10-09 23:04:31 +0200)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas.git tags/renesas-sh_eth-fixes-for-v3.12

for you to fetch changes up to 31c827b594a0eb7c996510a93338b6096e8565e6:

  net: sh_eth: Correct fix for RX packet errors on R8A7740 (2013-10-10 14:42:15 +0900)

----------------------------------------------------------------
Renesas SH ethernet fix for v3.12

* Revised fix from Nguyen Hong Ky for
  RX packet errors on the R8A7740 SoC

----------------------------------------------------------------
Simon Horman (1):
      net: sh_eth: Correct fix for RX packet errors on R8A7740

 drivers/net/ethernet/renesas/sh_eth.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

^ permalink raw reply

* [PATCH net] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Fan Du @ 2013-10-10  5:51 UTC (permalink / raw)
  To: vyasevich, nhorman, steffen.klassert; +Cc: davem, netdev

igb/ixgbe have hardware sctp checksum support, when this feature is enabled
and also IPsec is armed to protect sctp traffic, ugly things happened as
xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum every thing
up and pack the 16bits result in the checksum field). The result is fail
establishment of sctp communication.

And I saw another point in this part of code, when IPsec is not armed,
sctp communication is good, however setting setting CHECKSUM_PARTIAL will
make xfrm_output compute dummy checksum values which will be overwritten by
hardware lately.

So this patch try to solve above two issues together.

Signed-off-by: Fan Du <fan.du@windriver.com>
---
note:
  igb/ixgbe hardware is not handy on my side, so just build test only.

---
 net/sctp/output.c |   27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index 0ac3a65..f0b9cc5 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -372,6 +372,16 @@ static void sctp_packet_set_owner_w(struct sk_buff *skb, struct sock *sk)
 	atomic_inc(&sk->sk_wmem_alloc);
 }
 
+static int is_xfrm_armed(struct dst_entry *dst)
+{
+#ifdef CONFIG_XFRM
+	/* If dst->xfrm is valid, this skb needs to be transformed */
+	return dst->xfrm != NULL;
+#else
+	return 0;
+#endif
+}
+
 /* All packets are sent to the network through this function from
  * sctp_outq_tail().
  *
@@ -536,20 +546,21 @@ int sctp_packet_transmit(struct sctp_packet *packet)
 	 * by CRC32-C as described in <draft-ietf-tsvwg-sctpcsum-02.txt>.
 	 */
 	if (!sctp_checksum_disable) {
-		if (!(dst->dev->features & NETIF_F_SCTP_CSUM)) {
+		if ((!(dst->dev->features & NETIF_F_SCTP_CSUM)) ||
+			is_xfrm_armed(dst)) {
+
 			__u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
 
 			/* 3) Put the resultant value into the checksum field in the
 			 *    common header, and leave the rest of the bits unchanged.
 			 */
 			sh->checksum = sctp_end_cksum(crc32);
-		} else {
-			/* no need to seed pseudo checksum for SCTP */
-			nskb->ip_summed = CHECKSUM_PARTIAL;
-			nskb->csum_start = (skb_transport_header(nskb) -
-			                    nskb->head);
-			nskb->csum_offset = offsetof(struct sctphdr, checksum);
-		}
+		} else
+			/* Mark skb as CHECKSUM_UNNECESSARY to let hardware compute
+			 * the checksum, and also avoid xfrm_output to do unceccessary
+			 * checksum.
+			 */
+			nskb->ip_summed = CHECKSUM_UNNECESSARY;
 	}
 
 	/* IP layer ECN support
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH net-next] openvswitch: fix vport-netdev unregister
From: Pravin Shelar @ 2013-10-10  6:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Jesse Gross, Jiri Pirko, dev@openvswitch.org,
	netdev
In-Reply-To: <CAMEtUuzVP0LQ_L2CrpoJgehP_bXwYhicW_HM2m-YOY6SN7RDxQ@mail.gmail.com>

On Wed, Oct 9, 2013 at 9:11 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Wed, Oct 9, 2013 at 8:02 PM, Pravin Shelar <pshelar@nicira.com> wrote:
>> On Tue, Oct 8, 2013 at 8:07 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>> The combination of two commits
>>>
>>> commit 8e4e1713e4
>>> ("openvswitch: Simplify datapath locking.")
>>>
>>> and
>>>
>>> commit 2537b4dd0a
>>> ("openvswitch:: link upper device for port devices")
>>>
>>> introduced a bug where upper_dev wasn't unlinked upon
>>> netdev_unregister notification
>>>
>>> The following steps:
>>>
>>>   modprobe openvswitch
>>>   ovs-dpctl add-dp test
>>>   ip tuntap add dev tap1 mode tap
>>>   ovs-dpctl add-if test tap1
>>>   ip tuntap del dev tap1 mode tap
>>>
>>> are causing multiple warnings:
>>>
>>> [   62.747557] gre: GRE over IPv4 demultiplexor driver
>>> [   62.749579] openvswitch: Open vSwitch switching datapath
>>> [   62.755087] device test entered promiscuous mode
>>> [   62.765911] device tap1 entered promiscuous mode
>>> [   62.766033] IPv6: ADDRCONF(NETDEV_UP): tap1: link is not ready
>>> [   62.769017] ------------[ cut here ]------------
>>> [   62.769022] WARNING: CPU: 1 PID: 3267 at net/core/dev.c:5501 rollback_registered_many+0x20f/0x240()
>>> [   62.769023] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
>>> [   62.769051] CPU: 1 PID: 3267 Comm: ip Not tainted 3.12.0-rc3+ #60
>>> [   62.769052] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
>>> [   62.769053]  0000000000000009 ffff8807f25cbd28 ffffffff8175e575 0000000000000006
>>> [   62.769055]  0000000000000000 ffff8807f25cbd68 ffffffff8105314c ffff8807f25cbd58
>>> [   62.769057]  ffff8807f2634000 ffff8807f25cbdc8 ffff8807f25cbd88 ffff8807f25cbdc8
>>> [   62.769059] Call Trace:
>>> [   62.769062]  [<ffffffff8175e575>] dump_stack+0x55/0x76
>>> [   62.769065]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
>>> [   62.769067]  [<ffffffff8105319a>] warn_slowpath_null+0x1a/0x20
>>> [   62.769069]  [<ffffffff8162a04f>] rollback_registered_many+0x20f/0x240
>>> [   62.769071]  [<ffffffff8162a101>] rollback_registered+0x31/0x40
>>> [   62.769073]  [<ffffffff8162a488>] unregister_netdevice_queue+0x58/0x90
>>> [   62.769075]  [<ffffffff8154f900>] __tun_detach+0x140/0x340
>>> [   62.769077]  [<ffffffff8154fb36>] tun_chr_close+0x36/0x60
>>> [   62.769080]  [<ffffffff811bddaf>] __fput+0xff/0x260
>>> [   62.769082]  [<ffffffff811bdf5e>] ____fput+0xe/0x10
>>> [   62.769084]  [<ffffffff8107b515>] task_work_run+0xb5/0xe0
>>> [   62.769087]  [<ffffffff810029b9>] do_notify_resume+0x59/0x80
>>> [   62.769089]  [<ffffffff813a41fe>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>>> [   62.769091]  [<ffffffff81770f5a>] int_signal+0x12/0x17
>>> [   62.769093] ---[ end trace 838756c62e156ffb ]---
>>> [   62.769481] ------------[ cut here ]------------
>>> [   62.769485] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0()
>>> [   62.769486] sysfs: can not remove 'master', no directory
>>> [   62.769486] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
>>> [   62.769514] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G        W    3.12.0-rc3+ #60
>>> [   62.769515] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
>>> [   62.769518] Workqueue: events ovs_dp_notify_wq [openvswitch]
>>> [   62.769519]  0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006
>>> [   62.769521]  ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b28
>>> [   62.769523]  0000000000000000 ffffffff81a87a1f ffff8807f2634000 ffff880037038500
>>> [   62.769525] Call Trace:
>>> [   62.769528]  [<ffffffff8175e575>] dump_stack+0x55/0x76
>>> [   62.769529]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
>>> [   62.769531]  [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50
>>> [   62.769533]  [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0
>>> [   62.769535]  [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30
>>> [   62.769538]  [<ffffffff81631ef7>] __netdev_adjacent_dev_remove+0xf7/0x150
>>> [   62.769540]  [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50
>>> [   62.769542]  [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50
>>> [   62.769544]  [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140
>>> [   62.769548]  [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch]
>>> [   62.769550]  [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch]
>>> [   62.769552]  [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch]
>>> [   62.769555]  [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch]
>>> [   62.769557]  [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0
>>> [   62.769559]  [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0
>>> [   62.769562]  [<ffffffff8107659b>] worker_thread+0x11b/0x370
>>> [   62.769564]  [<ffffffff81076480>] ? rescuer_thread+0x350/0x350
>>> [   62.769566]  [<ffffffff8107f44a>] kthread+0xea/0xf0
>>> [   62.769568]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
>>> [   62.769570]  [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0
>>> [   62.769572]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
>>> [   62.769573] ---[ end trace 838756c62e156ffc ]---
>>> [   62.769574] ------------[ cut here ]------------
>>> [   62.769576] WARNING: CPU: 1 PID: 92 at fs/sysfs/inode.c:325 sysfs_hash_and_remove+0xa9/0xb0()
>>> [   62.769577] sysfs: can not remove 'upper_test', no directory
>>> [   62.769577] Modules linked in: openvswitch gre vxlan ip_tunnel libcrc32c ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc vhost_net macvtap macvlan vhost kvm_intel kvm dm_crypt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi hid_generic mxm_wmi eeepc_wmi asus_wmi sparse_keymap dm_multipath psmouse serio_raw usbhid hid parport_pc ppdev firewire_ohci lpc_ich firewire_core e1000e crc_itu_t binfmt_misc igb dca ptp pps_core mac_hid wmi lp parport i2o_config i2o_block video
>>> [   62.769603] CPU: 1 PID: 92 Comm: kworker/1:2 Tainted: G        W    3.12.0-rc3+ #60
>>> [   62.769604] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
>>> [   62.769606] Workqueue: events ovs_dp_notify_wq [openvswitch]
>>> [   62.769607]  0000000000000009 ffff880807ad3ac8 ffffffff8175e575 0000000000000006
>>> [   62.769609]  ffff880807ad3b18 ffff880807ad3b08 ffffffff8105314c ffff880807ad3b58
>>> [   62.769611]  0000000000000000 ffff880807ad3bd9 ffff8807f2634000 ffff880037038500
>>> [   62.769613] Call Trace:
>>> [   62.769615]  [<ffffffff8175e575>] dump_stack+0x55/0x76
>>> [   62.769617]  [<ffffffff8105314c>] warn_slowpath_common+0x8c/0xc0
>>> [   62.769619]  [<ffffffff81053236>] warn_slowpath_fmt+0x46/0x50
>>> [   62.769621]  [<ffffffff8123e7e9>] sysfs_hash_and_remove+0xa9/0xb0
>>> [   62.769622]  [<ffffffff81240e96>] sysfs_remove_link+0x26/0x30
>>> [   62.769624]  [<ffffffff81631f22>] __netdev_adjacent_dev_remove+0x122/0x150
>>> [   62.769627]  [<ffffffff81632037>] __netdev_adjacent_dev_unlink_lists+0x27/0x50
>>> [   62.769629]  [<ffffffff8163213a>] __netdev_adjacent_dev_unlink_neighbour+0x3a/0x50
>>> [   62.769631]  [<ffffffff8163218d>] netdev_upper_dev_unlink+0x3d/0x140
>>> [   62.769633]  [<ffffffffa033c2db>] netdev_destroy+0x4b/0x80 [openvswitch]
>>> [   62.769636]  [<ffffffffa033b696>] ovs_vport_del+0x46/0x60 [openvswitch]
>>> [   62.769638]  [<ffffffffa0335314>] ovs_dp_detach_port+0x44/0x60 [openvswitch]
>>> [   62.769640]  [<ffffffffa0336574>] ovs_dp_notify_wq+0xb4/0x150 [openvswitch]
>>> [   62.769642]  [<ffffffff81075c28>] process_one_work+0x1d8/0x6a0
>>> [   62.769644]  [<ffffffff81075bc8>] ? process_one_work+0x178/0x6a0
>>> [   62.769646]  [<ffffffff8107659b>] worker_thread+0x11b/0x370
>>> [   62.769648]  [<ffffffff81076480>] ? rescuer_thread+0x350/0x350
>>> [   62.769650]  [<ffffffff8107f44a>] kthread+0xea/0xf0
>>> [   62.769652]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
>>> [   62.769654]  [<ffffffff81770bac>] ret_from_fork+0x7c/0xb0
>>> [   62.769656]  [<ffffffff8107f360>] ? flush_kthread_worker+0x150/0x150
>>> [   62.769657] ---[ end trace 838756c62e156ffd ]---
>>> [   62.769724] device tap1 left promiscuous mode
>>>
>>> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
>>> ---
>>>  net/openvswitch/dp_notify.c    |    5 +++++
>>>  net/openvswitch/vport-netdev.c |   16 +++++++++++++---
>>>  net/openvswitch/vport-netdev.h |    1 +
>>>  3 files changed, 19 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
>>> index c323567..e9380bd 100644
>>> --- a/net/openvswitch/dp_notify.c
>>> +++ b/net/openvswitch/dp_notify.c
>>> @@ -88,6 +88,11 @@ static int dp_device_event(struct notifier_block *unused, unsigned long event,
>>>                 return NOTIFY_DONE;
>>>
>>>         if (event == NETDEV_UNREGISTER) {
>>> +               /* rx_handler_unregister and upper_dev_unlink immediately */
>>> +               if (dev->reg_state == NETREG_UNREGISTERING)
>>> +                       ovs_netdev_unlink_dev(vport);
>>> +
>>
>> Rather than doing vport destroy here, we can just unlink upper device
>> and let workq do rest of work.
>
> isn't it what it's doing?

I meant just call netdev_upper_dev_unlink() here in event handler and
rest of vport destroy can be done in workq.

> It does unregister and unlink immediately and schedules workq to do destroy.
> Obviously destroy cannot happen here, since we cannot grab ovs_lock here,
> since it would be in the wrong order vs the rest of the code.
>
>>
>> Thanks.
>>

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: fix vport-netdev unregister
From: Alexei Starovoitov @ 2013-10-10  6:26 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: David S. Miller, Jesse Gross, Jiri Pirko, dev@openvswitch.org,
	netdev
In-Reply-To: <CALnjE+pLon2DGc=MfrG7Tsu=p_uF9chK=TREJwK_TM2Um89p6Q@mail.gmail.com>

On Wed, Oct 9, 2013 at 11:07 PM, Pravin Shelar <pshelar@nicira.com> wrote:
> On Wed, Oct 9, 2013 at 9:11 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>> On Wed, Oct 9, 2013 at 8:02 PM, Pravin Shelar <pshelar@nicira.com> wrote:
>>> On Tue, Oct 8, 2013 at 8:07 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
>>>> The combination of two commits
>>>>
>>>> commit 8e4e1713e4
>>>> ("openvswitch: Simplify datapath locking.")
>>>>
>>>> and
>>>>
>>>> commit 2537b4dd0a
>>>> ("openvswitch:: link upper device for port devices")
>>>>
>>>> introduced a bug where upper_dev wasn't unlinked upon
>>>> netdev_unregister notification
>>>>
>>>> The following steps:
>>>>
>>>>   modprobe openvswitch
>>>>   ovs-dpctl add-dp test
>>>>   ip tuntap add dev tap1 mode tap
>>>>   ovs-dpctl add-if test tap1
>>>>   ip tuntap del dev tap1 mode tap
>>>>
>>>> are causing multiple warnings:
>>>> diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
>>>> index c323567..e9380bd 100644
>>>> --- a/net/openvswitch/dp_notify.c
>>>> +++ b/net/openvswitch/dp_notify.c
>>>> @@ -88,6 +88,11 @@ static int dp_device_event(struct notifier_block *unused, unsigned long event,
>>>>                 return NOTIFY_DONE;
>>>>
>>>>         if (event == NETDEV_UNREGISTER) {
>>>> +               /* rx_handler_unregister and upper_dev_unlink immediately */
>>>> +               if (dev->reg_state == NETREG_UNREGISTERING)
>>>> +                       ovs_netdev_unlink_dev(vport);
>>>> +
>>>
>>> Rather than doing vport destroy here, we can just unlink upper device
>>> and let workq do rest of work.
>>
>> isn't it what it's doing?
>
> I meant just call netdev_upper_dev_unlink() here in event handler and
> rest of vport destroy can be done in workq.

netdev_upper_dev_unlink() without netdev_rx_handler_unregister() ?!
that's dangerous.
If that is acceptable, then there was no reason to link them in the first place.

notifier asks to unregister. imo the only acceptable deferred task
here is to delay dev_put,
since ovs structures are still referring to it.

^ permalink raw reply

* [PATCH RFC 0/2] xfrm: Remove ancient sleeping code
From: Steffen Klassert @ 2013-10-10  6:33 UTC (permalink / raw)
  To: netdev

Does anyone still rely on the ancient sleeping when the SA is in
acquire state? It is disabled by default since more that five years,
but can cause indefinite task hangs if enabled and the needed state
does not get resolved.

We now queue packets to the policy if the states are not yet resolved
if we are in a code path that can not sleep. We could do this even in
the case we can sleep. As a bonus, we can remove the FLOWI_FLAG_CAN_SLEEP
flag because the only thing this flag does, is to notify xfrm that we are
in a codepath that can sleep.

The two RFC patches to remove the sleeping code are in reply to this
mail. I'd add this to the ipsec-next tree if there are no objections.

^ permalink raw reply

* [PATCH RFC 1/2] xfrm: Remove ancient sleeping when the SA is in acquire state
From: Steffen Klassert @ 2013-10-10  6:33 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20131010063301.GO7660@secunet.com>

We now queue packets to the policy if the states are not yet resolved,
this replaces the ancient sleeping code. Also the sleeping can cause
indefinite task hangs if the needed state does not get resolved.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/netns/xfrm.h |    2 --
 net/key/af_key.c         |    5 ++---
 net/xfrm/xfrm_policy.c   |   21 ++-------------------
 net/xfrm/xfrm_state.c    |   21 +--------------------
 4 files changed, 5 insertions(+), 44 deletions(-)

diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h
index 5299e69..bda8039 100644
--- a/include/net/netns/xfrm.h
+++ b/include/net/netns/xfrm.h
@@ -33,8 +33,6 @@ struct netns_xfrm {
 	struct hlist_head	state_gc_list;
 	struct work_struct	state_gc_work;
 
-	wait_queue_head_t	km_waitq;
-
 	struct list_head	policy_all;
 	struct hlist_head	*policy_byidx;
 	unsigned int		policy_idx_hmask;
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 9d58537..239289d 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1379,10 +1379,9 @@ static int pfkey_acquire(struct sock *sk, struct sk_buff *skb, const struct sadb
 		return 0;
 
 	spin_lock_bh(&x->lock);
-	if (x->km.state == XFRM_STATE_ACQ) {
+	if (x->km.state == XFRM_STATE_ACQ)
 		x->km.state = XFRM_STATE_ERROR;
-		wake_up(&net->xfrm.km_waitq);
-	}
+
 	spin_unlock_bh(&x->lock);
 	xfrm_state_put(x);
 	return 0;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index ed38d5d..b885c76 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1875,8 +1875,7 @@ static struct xfrm_dst *xfrm_create_dummy_bundle(struct net *net,
 	if (IS_ERR(xdst))
 		return xdst;
 
-	if (net->xfrm.sysctl_larval_drop || num_xfrms <= 0 ||
-	    (fl->flowi_flags & FLOWI_FLAG_CAN_SLEEP))
+	if (net->xfrm.sysctl_larval_drop || num_xfrms <= 0)
 		return xdst;
 
 	dst1 = &xdst->u.dst;
@@ -2051,7 +2050,6 @@ struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry *dst_orig,
 	u8 dir = policy_to_flow_dir(XFRM_POLICY_OUT);
 	int i, err, num_pols, num_xfrms = 0, drop_pols = 0;
 
-restart:
 	dst = NULL;
 	xdst = NULL;
 	route = NULL;
@@ -2131,23 +2129,8 @@ restart:
 
 			return make_blackhole(net, family, dst_orig);
 		}
-		if (fl->flowi_flags & FLOWI_FLAG_CAN_SLEEP) {
-			DECLARE_WAITQUEUE(wait, current);
 
-			add_wait_queue(&net->xfrm.km_waitq, &wait);
-			set_current_state(TASK_INTERRUPTIBLE);
-			schedule();
-			set_current_state(TASK_RUNNING);
-			remove_wait_queue(&net->xfrm.km_waitq, &wait);
-
-			if (!signal_pending(current)) {
-				dst_release(dst);
-				goto restart;
-			}
-
-			err = -ERESTART;
-		} else
-			err = -EAGAIN;
+		err = -EAGAIN;
 
 		XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTNOSTATES);
 		goto error;
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index b9c3f9e..14518eb 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -374,8 +374,6 @@ static void xfrm_state_gc_task(struct work_struct *work)
 
 	hlist_for_each_entry_safe(x, tmp, &gc_list, gclist)
 		xfrm_state_gc_destroy(x);
-
-	wake_up(&net->xfrm.km_waitq);
 }
 
 static inline unsigned long make_jiffies(long secs)
@@ -390,7 +388,6 @@ static enum hrtimer_restart xfrm_timer_handler(struct hrtimer * me)
 {
 	struct tasklet_hrtimer *thr = container_of(me, struct tasklet_hrtimer, timer);
 	struct xfrm_state *x = container_of(thr, struct xfrm_state, mtimer);
-	struct net *net = xs_net(x);
 	unsigned long now = get_seconds();
 	long next = LONG_MAX;
 	int warn = 0;
@@ -460,12 +457,8 @@ resched:
 	goto out;
 
 expired:
-	if (x->km.state == XFRM_STATE_ACQ && x->id.spi == 0) {
+	if (x->km.state == XFRM_STATE_ACQ && x->id.spi == 0)
 		x->km.state = XFRM_STATE_EXPIRED;
-		wake_up(&net->xfrm.km_waitq);
-		next = 2;
-		goto resched;
-	}
 
 	err = __xfrm_state_delete(x);
 	if (!err && x->id.spi)
@@ -637,7 +630,6 @@ restart:
 
 out:
 	spin_unlock_bh(&xfrm_state_lock);
-	wake_up(&net->xfrm.km_waitq);
 	return err;
 }
 EXPORT_SYMBOL(xfrm_state_flush);
@@ -950,8 +942,6 @@ static void __xfrm_state_insert(struct xfrm_state *x)
 	if (x->replay_maxage)
 		mod_timer(&x->rtimer, jiffies + x->replay_maxage);
 
-	wake_up(&net->xfrm.km_waitq);
-
 	net->xfrm.state_num++;
 
 	xfrm_hash_grow_check(net, x->bydst.next != NULL);
@@ -1655,16 +1645,12 @@ EXPORT_SYMBOL(km_state_notify);
 
 void km_state_expired(struct xfrm_state *x, int hard, u32 portid)
 {
-	struct net *net = xs_net(x);
 	struct km_event c;
 
 	c.data.hard = hard;
 	c.portid = portid;
 	c.event = XFRM_MSG_EXPIRE;
 	km_state_notify(x, &c);
-
-	if (hard)
-		wake_up(&net->xfrm.km_waitq);
 }
 
 EXPORT_SYMBOL(km_state_expired);
@@ -1707,16 +1693,12 @@ EXPORT_SYMBOL(km_new_mapping);
 
 void km_policy_expired(struct xfrm_policy *pol, int dir, int hard, u32 portid)
 {
-	struct net *net = xp_net(pol);
 	struct km_event c;
 
 	c.data.hard = hard;
 	c.portid = portid;
 	c.event = XFRM_MSG_POLEXPIRE;
 	km_policy_notify(pol, dir, &c);
-
-	if (hard)
-		wake_up(&net->xfrm.km_waitq);
 }
 EXPORT_SYMBOL(km_policy_expired);
 
@@ -2025,7 +2007,6 @@ int __net_init xfrm_state_init(struct net *net)
 	INIT_WORK(&net->xfrm.state_hash_work, xfrm_hash_resize);
 	INIT_HLIST_HEAD(&net->xfrm.state_gc_list);
 	INIT_WORK(&net->xfrm.state_gc_work, xfrm_state_gc_task);
-	init_waitqueue_head(&net->xfrm.km_waitq);
 	return 0;
 
 out_byspi:
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH RFC 2/2] net: Remove FLOWI_FLAG_CAN_SLEEP
From: Steffen Klassert @ 2013-10-10  6:34 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20131010063301.GO7660@secunet.com>

FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility
to sleep until the needed states are resolved. This code is gone,
so FLOWI_FLAG_CAN_SLEEP is not needed anymore.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 include/net/flow.h               |    3 +--
 include/net/ipv6.h               |    6 ++----
 include/net/route.h              |    8 +++-----
 net/dccp/ipv4.c                  |    2 +-
 net/dccp/ipv6.c                  |    8 ++++----
 net/decnet/dn_route.c            |    2 --
 net/ipv4/af_inet.c               |    2 +-
 net/ipv4/datagram.c              |    2 +-
 net/ipv4/raw.c                   |    2 +-
 net/ipv4/tcp_ipv4.c              |    2 +-
 net/ipv4/udp.c                   |    2 +-
 net/ipv6/af_inet6.c              |    2 +-
 net/ipv6/datagram.c              |    2 +-
 net/ipv6/inet6_connection_sock.c |    4 ++--
 net/ipv6/ip6_output.c            |   12 ++----------
 net/ipv6/ping.c                  |    2 +-
 net/ipv6/raw.c                   |    2 +-
 net/ipv6/syncookies.c            |    2 +-
 net/ipv6/tcp_ipv6.c              |    4 ++--
 net/ipv6/udp.c                   |    2 +-
 net/l2tp/l2tp_ip6.c              |    2 +-
 net/sctp/ipv6.c                  |    4 ++--
 22 files changed, 31 insertions(+), 46 deletions(-)

diff --git a/include/net/flow.h b/include/net/flow.h
index 65ce471..d23e7fa 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -20,8 +20,7 @@ struct flowi_common {
 	__u8	flowic_proto;
 	__u8	flowic_flags;
 #define FLOWI_FLAG_ANYSRC		0x01
-#define FLOWI_FLAG_CAN_SLEEP		0x02
-#define FLOWI_FLAG_KNOWN_NH		0x04
+#define FLOWI_FLAG_KNOWN_NH		0x02
 	__u32	flowic_secid;
 };
 
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index fe1c7f6..4f91706 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -710,11 +710,9 @@ void ip6_flush_pending_frames(struct sock *sk);
 
 int ip6_dst_lookup(struct sock *sk, struct dst_entry **dst, struct flowi6 *fl6);
 struct dst_entry *ip6_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
-				      const struct in6_addr *final_dst,
-				      bool can_sleep);
+				      const struct in6_addr *final_dst);
 struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
-					 const struct in6_addr *final_dst,
-					 bool can_sleep);
+					 const struct in6_addr *final_dst);
 struct dst_entry *ip6_blackhole_route(struct net *net,
 				      struct dst_entry *orig_dst);
 
diff --git a/include/net/route.h b/include/net/route.h
index 0ad8e01..a25fcca 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -247,14 +247,12 @@ static inline char rt_tos2priority(u8 tos)
 static inline void ip_route_connect_init(struct flowi4 *fl4, __be32 dst, __be32 src,
 					 u32 tos, int oif, u8 protocol,
 					 __be16 sport, __be16 dport,
-					 struct sock *sk, bool can_sleep)
+					 struct sock *sk)
 {
 	__u8 flow_flags = 0;
 
 	if (inet_sk(sk)->transparent)
 		flow_flags |= FLOWI_FLAG_ANYSRC;
-	if (can_sleep)
-		flow_flags |= FLOWI_FLAG_CAN_SLEEP;
 
 	flowi4_init_output(fl4, oif, sk->sk_mark, tos, RT_SCOPE_UNIVERSE,
 			   protocol, flow_flags, dst, src, dport, sport);
@@ -264,13 +262,13 @@ static inline struct rtable *ip_route_connect(struct flowi4 *fl4,
 					      __be32 dst, __be32 src, u32 tos,
 					      int oif, u8 protocol,
 					      __be16 sport, __be16 dport,
-					      struct sock *sk, bool can_sleep)
+					      struct sock *sk)
 {
 	struct net *net = sock_net(sk);
 	struct rtable *rt;
 
 	ip_route_connect_init(fl4, dst, src, tos, oif, protocol,
-			      sport, dport, sk, can_sleep);
+			      sport, dport, sk);
 
 	if (!dst || !src) {
 		rt = __ip_route_output_key(net, fl4);
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index ebc54fe..751637a 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -75,7 +75,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	rt = ip_route_connect(fl4, nexthop, inet->inet_saddr,
 			      RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,
 			      IPPROTO_DCCP,
-			      orig_sport, orig_dport, sk, true);
+			      orig_sport, orig_dport, sk);
 	if (IS_ERR(rt))
 		return PTR_ERR(rt);
 
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 6cf9f77..3d2574e 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -237,7 +237,7 @@ static int dccp_v6_send_response(struct sock *sk, struct request_sock *req)
 
 	final_p = fl6_update_dst(&fl6, np->opt, &final);
 
-	dst = ip6_dst_lookup_flow(sk, &fl6, final_p, false);
+	dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
 	if (IS_ERR(dst)) {
 		err = PTR_ERR(dst);
 		dst = NULL;
@@ -302,7 +302,7 @@ static void dccp_v6_ctl_send_reset(struct sock *sk, struct sk_buff *rxskb)
 	security_skb_classify_flow(rxskb, flowi6_to_flowi(&fl6));
 
 	/* sk = NULL, but it is safe for now. RST socket required. */
-	dst = ip6_dst_lookup_flow(ctl_sk, &fl6, NULL, false);
+	dst = ip6_dst_lookup_flow(ctl_sk, &fl6, NULL);
 	if (!IS_ERR(dst)) {
 		skb_dst_set(skb, dst);
 		ip6_xmit(ctl_sk, skb, &fl6, NULL, 0);
@@ -513,7 +513,7 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk,
 		fl6.fl6_sport = inet_rsk(req)->loc_port;
 		security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
 
-		dst = ip6_dst_lookup_flow(sk, &fl6, final_p, false);
+		dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
 		if (IS_ERR(dst))
 			goto out;
 	}
@@ -933,7 +933,7 @@ static int dccp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 
 	final_p = fl6_update_dst(&fl6, np->opt, &final);
 
-	dst = ip6_dst_lookup_flow(sk, &fl6, final_p, true);
+	dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
 	if (IS_ERR(dst)) {
 		err = PTR_ERR(dst);
 		goto failure;
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index fe32388..ad2efa5 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1288,8 +1288,6 @@ int dn_route_output_sock(struct dst_entry __rcu **pprt, struct flowidn *fl, stru
 
 	err = __dn_route_output_key(pprt, fl, flags & MSG_TRYHARD);
 	if (err == 0 && fl->flowidn_proto) {
-		if (!(flags & MSG_DONTWAIT))
-			fl->flowidn_flags |= FLOWI_FLAG_CAN_SLEEP;
 		*pprt = xfrm_lookup(&init_net, *pprt,
 				    flowidn_to_flowi(fl), sk, 0);
 		if (IS_ERR(*pprt)) {
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 7a1874b..c3bf51e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1162,7 +1162,7 @@ static int inet_sk_reselect_saddr(struct sock *sk)
 	fl4 = &inet->cork.fl.u.ip4;
 	rt = ip_route_connect(fl4, daddr, 0, RT_CONN_FLAGS(sk),
 			      sk->sk_bound_dev_if, sk->sk_protocol,
-			      inet->inet_sport, inet->inet_dport, sk, false);
+			      inet->inet_sport, inet->inet_dport, sk);
 	if (IS_ERR(rt))
 		return PTR_ERR(rt);
 
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index b28e863..dcd4148 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -53,7 +53,7 @@ int ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	rt = ip_route_connect(fl4, usin->sin_addr.s_addr, saddr,
 			      RT_CONN_FLAGS(sk), oif,
 			      sk->sk_protocol,
-			      inet->inet_sport, usin->sin_port, sk, true);
+			      inet->inet_sport, usin->sin_port, sk);
 	if (IS_ERR(rt)) {
 		err = PTR_ERR(rt);
 		if (err == -ENETUNREACH)
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index a3fe534..6fd4d0c 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -573,7 +573,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	flowi4_init_output(&fl4, ipc.oif, sk->sk_mark, tos,
 			   RT_SCOPE_UNIVERSE,
 			   inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol,
-			   inet_sk_flowi_flags(sk) | FLOWI_FLAG_CAN_SLEEP |
+			   inet_sk_flowi_flags(sk) |
 			    (inet->hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
 			   daddr, saddr, 0, 0);
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index b14266b..6959be9 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -173,7 +173,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
 	rt = ip_route_connect(fl4, nexthop, inet->inet_saddr,
 			      RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,
 			      IPPROTO_TCP,
-			      orig_sport, orig_dport, sk, true);
+			      orig_sport, orig_dport, sk);
 	if (IS_ERR(rt)) {
 		err = PTR_ERR(rt);
 		if (err == -ENETUNREACH)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 22462d94..70a9e28 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -966,7 +966,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 		fl4 = &fl4_stack;
 		flowi4_init_output(fl4, ipc.oif, sk->sk_mark, tos,
 				   RT_SCOPE_UNIVERSE, sk->sk_protocol,
-				   inet_sk_flowi_flags(sk)|FLOWI_FLAG_CAN_SLEEP,
+				   inet_sk_flowi_flags(sk),
 				   faddr, saddr, dport, inet->inet_sport);
 
 		security_sk_classify_flow(sk, flowi4_to_flowi(fl4));
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 4966b12..cd3c181 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -666,7 +666,7 @@ int inet6_sk_rebuild_header(struct sock *sk)
 
 		final_p = fl6_update_dst(&fl6, np->opt, &final);
 
-		dst = ip6_dst_lookup_flow(sk, &fl6, final_p, false);
+		dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
 		if (IS_ERR(dst)) {
 			sk->sk_route_caps = 0;
 			sk->sk_err_soft = -PTR_ERR(dst);
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 48b6bd2..0bc3297 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -171,7 +171,7 @@ ipv4_connected:
 	opt = flowlabel ? flowlabel->opt : np->opt;
 	final_p = fl6_update_dst(&fl6, opt, &final);
 
-	dst = ip6_dst_lookup_flow(sk, &fl6, final_p, true);
+	dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
 	err = 0;
 	if (IS_ERR(dst)) {
 		err = PTR_ERR(dst);
diff --git a/net/ipv6/inet6_connection_sock.c b/net/ipv6/inet6_connection_sock.c
index e4311cb..b65ca8c 100644
--- a/net/ipv6/inet6_connection_sock.c
+++ b/net/ipv6/inet6_connection_sock.c
@@ -86,7 +86,7 @@ struct dst_entry *inet6_csk_route_req(struct sock *sk,
 	fl6->fl6_sport = inet_rsk(req)->loc_port;
 	security_req_classify_flow(req, flowi6_to_flowi(fl6));
 
-	dst = ip6_dst_lookup_flow(sk, fl6, final_p, false);
+	dst = ip6_dst_lookup_flow(sk, fl6, final_p);
 	if (IS_ERR(dst))
 		return NULL;
 
@@ -217,7 +217,7 @@ static struct dst_entry *inet6_csk_route_socket(struct sock *sk,
 
 	dst = __inet6_csk_dst_check(sk, np->dst_cookie);
 	if (!dst) {
-		dst = ip6_dst_lookup_flow(sk, fl6, final_p, false);
+		dst = ip6_dst_lookup_flow(sk, fl6, final_p);
 
 		if (!IS_ERR(dst))
 			__inet6_csk_dst_store(sk, dst, NULL, NULL);
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 3a692d5..7a0d3c4 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -937,7 +937,6 @@ EXPORT_SYMBOL_GPL(ip6_dst_lookup);
  *	@sk: socket which provides route info
  *	@fl6: flow to lookup
  *	@final_dst: final destination address for ipsec lookup
- *	@can_sleep: we are in a sleepable context
  *
  *	This function performs a route lookup on the given flow.
  *
@@ -945,8 +944,7 @@ EXPORT_SYMBOL_GPL(ip6_dst_lookup);
  *	error code.
  */
 struct dst_entry *ip6_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
-				      const struct in6_addr *final_dst,
-				      bool can_sleep)
+				      const struct in6_addr *final_dst)
 {
 	struct dst_entry *dst = NULL;
 	int err;
@@ -956,8 +954,6 @@ struct dst_entry *ip6_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
 		return ERR_PTR(err);
 	if (final_dst)
 		fl6->daddr = *final_dst;
-	if (can_sleep)
-		fl6->flowi6_flags |= FLOWI_FLAG_CAN_SLEEP;
 
 	return xfrm_lookup(sock_net(sk), dst, flowi6_to_flowi(fl6), sk, 0);
 }
@@ -968,7 +964,6 @@ EXPORT_SYMBOL_GPL(ip6_dst_lookup_flow);
  *	@sk: socket which provides the dst cache and route info
  *	@fl6: flow to lookup
  *	@final_dst: final destination address for ipsec lookup
- *	@can_sleep: we are in a sleepable context
  *
  *	This function performs a route lookup on the given flow with the
  *	possibility of using the cached route in the socket if it is valid.
@@ -979,8 +974,7 @@ EXPORT_SYMBOL_GPL(ip6_dst_lookup_flow);
  *	error code.
  */
 struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
-					 const struct in6_addr *final_dst,
-					 bool can_sleep)
+					 const struct in6_addr *final_dst)
 {
 	struct dst_entry *dst = sk_dst_check(sk, inet6_sk(sk)->dst_cookie);
 	int err;
@@ -992,8 +986,6 @@ struct dst_entry *ip6_sk_dst_lookup_flow(struct sock *sk, struct flowi6 *fl6,
 		return ERR_PTR(err);
 	if (final_dst)
 		fl6->daddr = *final_dst;
-	if (can_sleep)
-		fl6->flowi6_flags |= FLOWI_FLAG_CAN_SLEEP;
 
 	return xfrm_lookup(sock_net(sk), dst, flowi6_to_flowi(fl6), sk, 0);
 }
diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
index 18f19df..4ec91bb 100644
--- a/net/ipv6/ping.c
+++ b/net/ipv6/ping.c
@@ -144,7 +144,7 @@ int ping_v6_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	else if (!fl6.flowi6_oif)
 		fl6.flowi6_oif = np->ucast_oif;
 
-	dst = ip6_sk_dst_lookup_flow(sk, &fl6,  daddr, 1);
+	dst = ip6_sk_dst_lookup_flow(sk, &fl6,  daddr);
 	if (IS_ERR(dst))
 		return PTR_ERR(dst);
 	rt = (struct rt6_info *) dst;
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 58916bb..c74425d 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -866,7 +866,7 @@ static int rawv6_sendmsg(struct kiocb *iocb, struct sock *sk,
 		fl6.flowi6_oif = np->ucast_oif;
 	security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
 
-	dst = ip6_dst_lookup_flow(sk, &fl6, final_p, true);
+	dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
 	if (IS_ERR(dst)) {
 		err = PTR_ERR(dst);
 		goto out;
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index d703218..b0fbbfc 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -243,7 +243,7 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 		fl6.fl6_sport = inet_sk(sk)->inet_sport;
 		security_req_classify_flow(req, flowi6_to_flowi(&fl6));
 
-		dst = ip6_dst_lookup_flow(sk, &fl6, final_p, false);
+		dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
 		if (IS_ERR(dst))
 			goto out_free;
 	}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 5c71501..b762603 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -258,7 +258,7 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr,
 
 	security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
 
-	dst = ip6_dst_lookup_flow(sk, &fl6, final_p, true);
+	dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
 	if (IS_ERR(dst)) {
 		err = PTR_ERR(dst);
 		goto failure;
@@ -800,7 +800,7 @@ static void tcp_v6_send_response(struct sk_buff *skb, u32 seq, u32 ack, u32 win,
 	 * Underlying function will use this to retrieve the network
 	 * namespace
 	 */
-	dst = ip6_dst_lookup_flow(ctl_sk, &fl6, NULL, false);
+	dst = ip6_dst_lookup_flow(ctl_sk, &fl6, NULL);
 	if (!IS_ERR(dst)) {
 		skb_dst_set(buff, dst);
 		ip6_xmit(ctl_sk, buff, &fl6, NULL, tclass);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index f405815..a20b562 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1204,7 +1204,7 @@ do_udp_sendmsg:
 
 	security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
 
-	dst = ip6_sk_dst_lookup_flow(sk, &fl6, final_p, true);
+	dst = ip6_sk_dst_lookup_flow(sk, &fl6, final_p);
 	if (IS_ERR(dst)) {
 		err = PTR_ERR(dst);
 		dst = NULL;
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index b8a6039..d1149aa 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -598,7 +598,7 @@ static int l2tp_ip6_sendmsg(struct kiocb *iocb, struct sock *sk,
 
 	security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
 
-	dst = ip6_dst_lookup_flow(sk, &fl6, final_p, true);
+	dst = ip6_dst_lookup_flow(sk, &fl6, final_p);
 	if (IS_ERR(dst)) {
 		err = PTR_ERR(dst);
 		goto out;
diff --git a/net/sctp/ipv6.c b/net/sctp/ipv6.c
index e7b2d4f..2b3d523 100644
--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -263,7 +263,7 @@ static void sctp_v6_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
 	}
 
 	final_p = fl6_update_dst(fl6, np->opt, &final);
-	dst = ip6_dst_lookup_flow(sk, fl6, final_p, false);
+	dst = ip6_dst_lookup_flow(sk, fl6, final_p);
 	if (!asoc || saddr)
 		goto out;
 
@@ -320,7 +320,7 @@ static void sctp_v6_get_dst(struct sctp_transport *t, union sctp_addr *saddr,
 		fl6->saddr = baddr->v6.sin6_addr;
 		fl6->fl6_sport = baddr->v6.sin6_port;
 		final_p = fl6_update_dst(fl6, np->opt, &final);
-		dst = ip6_dst_lookup_flow(sk, fl6, final_p, false);
+		dst = ip6_dst_lookup_flow(sk, fl6, final_p);
 	}
 
 out:
-- 
1.7.9.5

^ permalink raw reply related

* [net-next 00/13][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2013-10-10  6:40 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series contains updates to i40e only.

Alex provides the majority of the patches against i40e, where he does
cleanup of the Tx and RX queues and to align the code with the known
good Tx/Rx queue code in the ixgbe driver.

Anjali provides an i40e patch to update link events to not print to
the log until the device is administratively up.

Catherine provides a patch to update the driver version.

The following are changes since commit b68656b22fd7a3e03087c11b2b45c15c0b328609:
  be2net: change the driver version number to 4.9.224.0
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Alexander Duyck (11):
  i40e: Drop unused completed stat
  i40e: Cleanup Tx buffer info layout
  i40e: Do not directly increment Tx next_to_use
  i40e: clean up Tx fast path
  i40e: Drop dead code and flags from Tx hotpath
  i40e: Add support for Tx byte queue limits
  i40e: Split bytes and packets from Rx/Tx stats
  i40e: Move q_vectors from pointer to array to array of pointers
  i40e: Replace ring container array with linked list
  i40e: Move rings from pointer to array to array of pointers
  i40e: Add support for 64 bit netstats

Anjali Singhai (1):
  i40e: Link code updates

Catherine Sullivan (1):
  i40e: Bump version

 drivers/net/ethernet/intel/i40e/i40e.h         |  11 +-
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 207 ++++++------
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  69 ++--
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 442 ++++++++++++++++---------
 drivers/net/ethernet/intel/i40e/i40e_txrx.c    | 358 ++++++++++----------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h    |  35 +-
 6 files changed, 650 insertions(+), 472 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [net-next 02/13] i40e: Drop unused completed stat
From: Jeff Kirsher @ 2013-10-10  6:41 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, gospo, sassmann, Jesse Brandeburg,
	Jeff Kirsher
In-Reply-To: <1381387271-29605-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

The Tx "completed" stat was part of the original rewrite for detecting
Tx hangs.  However some time ago in ixgbe I determined that we could
just use the packets stat instead.  Since then this stat was
removed from ixgbe and it serves no purpose in i40e so it can be
dropped.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 3 +--
 drivers/net/ethernet/intel/i40e/i40e_txrx.c    | 3 +--
 drivers/net/ethernet/intel/i40e/i40e_txrx.h    | 1 -
 3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 8dbd91f..0b01c61 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -560,10 +560,9 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
 				 vsi->tx_rings[i].tx_stats.bytes,
 				 vsi->tx_rings[i].tx_stats.restart_queue);
 			dev_info(&pf->pdev->dev,
-				 "    tx_rings[%i]: tx_stats: tx_busy = %lld, completed = %lld, tx_done_old = %lld\n",
+				 "    tx_rings[%i]: tx_stats: tx_busy = %lld, tx_done_old = %lld\n",
 				 i,
 				 vsi->tx_rings[i].tx_stats.tx_busy,
-				 vsi->tx_rings[i].tx_stats.completed,
 				 vsi->tx_rings[i].tx_stats.tx_done_old);
 			dev_info(&pf->pdev->dev,
 				 "    tx_rings[%i]: size = %i, dma = 0x%08lx\n",
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 49d2cfa..32c9aeb 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -346,8 +346,7 @@ static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, int budget)
 		      cpu_to_le64(I40E_TX_DESC_DTYPE_DESC_DONE)))
 			break;
 
-		/* count the packet as being completed */
-		tx_ring->tx_stats.completed++;
+		/* clear next_to_watch to prevent false hangs */
 		tx_buf->next_to_watch = NULL;
 		tx_buf->time_stamp = 0;
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index b1d7722..2fbacaf 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -134,7 +134,6 @@ struct i40e_tx_queue_stats {
 	u64 bytes;
 	u64 restart_queue;
 	u64 tx_busy;
-	u64 completed;
 	u64 tx_done_old;
 };
 
-- 
1.8.3.1

^ permalink raw reply related

* [net-next 01/13] i40e: Link code updates
From: Jeff Kirsher @ 2013-10-10  6:40 UTC (permalink / raw)
  To: davem
  Cc: Anjali Singhai, netdev, gospo, sassmann, Jesse Brandeburg,
	Jeff Kirsher
In-Reply-To: <1381387271-29605-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Anjali Singhai <anjali.singhai@intel.com>

Link events should not print to the log until the device is
administratively up.

Signed-off-by: Anjali Singhai <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 221aa47..657babe 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3703,8 +3703,11 @@ static int i40e_up_complete(struct i40e_vsi *vsi)
 
 	if ((pf->hw.phy.link_info.link_info & I40E_AQ_LINK_UP) &&
 	    (vsi->netdev)) {
+		netdev_info(vsi->netdev, "NIC Link is Up\n");
 		netif_tx_start_all_queues(vsi->netdev);
 		netif_carrier_on(vsi->netdev);
+	} else if (vsi->netdev) {
+		netdev_info(vsi->netdev, "NIC Link is Down\n");
 	}
 	i40e_service_event_schedule(pf);
 
@@ -4153,8 +4156,9 @@ static void i40e_link_event(struct i40e_pf *pf)
 	if (new_link == old_link)
 		return;
 
-	netdev_info(pf->vsi[pf->lan_vsi]->netdev,
-		    "NIC Link is %s\n", (new_link ? "Up" : "Down"));
+	if (!test_bit(__I40E_DOWN, &pf->vsi[pf->lan_vsi]->state))
+		netdev_info(pf->vsi[pf->lan_vsi]->netdev,
+			    "NIC Link is %s\n", (new_link ? "Up" : "Down"));
 
 	/* Notify the base of the switch tree connected to
 	 * the link.  Floating VEBs are not notified.
-- 
1.8.3.1

^ permalink raw reply related

* [net-next 03/13] i40e: Cleanup Tx buffer info layout
From: Jeff Kirsher @ 2013-10-10  6:41 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, gospo, sassmann, Jesse Brandeburg,
	Jeff Kirsher
In-Reply-To: <1381387271-29605-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

- drop the mapped_as_page u8 from the Tx buffer info as it was unused
- use the DMA unmap accessors for Tx DMA
- replace checks of DMA with checks of the unmap length to verify if an
  unmap is needed
- update the Tx buffer layout to make it consistent with igb, ixgbe

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 16 ++++++++--------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h | 13 ++++++-------
 2 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 32c9aeb..3bc3efa 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -195,20 +195,20 @@ static void i40e_fd_handle_status(struct i40e_ring *rx_ring, u32 qw, u8 prog_id)
 static inline void i40e_unmap_tx_resource(struct i40e_ring *ring,
 					  struct i40e_tx_buffer *tx_buffer)
 {
-	if (tx_buffer->dma) {
+	if (dma_unmap_len(tx_buffer, len)) {
 		if (tx_buffer->tx_flags & I40E_TX_FLAGS_MAPPED_AS_PAGE)
 			dma_unmap_page(ring->dev,
-				       tx_buffer->dma,
-				       tx_buffer->length,
+				       dma_unmap_addr(tx_buffer, dma),
+				       dma_unmap_len(tx_buffer, len),
 				       DMA_TO_DEVICE);
 		else
 			dma_unmap_single(ring->dev,
-					 tx_buffer->dma,
-					 tx_buffer->length,
+					 dma_unmap_addr(tx_buffer, dma),
+					 dma_unmap_len(tx_buffer, len),
 					 DMA_TO_DEVICE);
 	}
-	tx_buffer->dma = 0;
 	tx_buffer->time_stamp = 0;
+	dma_unmap_len_set(tx_buffer, len, 0);
 }
 
 /**
@@ -1554,9 +1554,9 @@ static void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 		}
 
 		tx_bi = &tx_ring->tx_bi[i];
-		tx_bi->length = buf_offset + size;
+		dma_unmap_len_set(tx_bi, len, buf_offset + size);
+		dma_unmap_addr_set(tx_bi, dma, dma);
 		tx_bi->tx_flags = tx_flags;
-		tx_bi->dma = dma;
 
 		tx_desc->buffer_addr = cpu_to_le64(dma + buf_offset);
 		tx_desc->cmd_type_offset_bsz = build_ctob(td_cmd, td_offset,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 2fbacaf..711f549 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -110,15 +110,14 @@
 #define I40E_TX_FLAGS_VLAN_SHIFT	16
 
 struct i40e_tx_buffer {
-	struct sk_buff *skb;
-	dma_addr_t dma;
-	unsigned long time_stamp;
-	u16 length;
-	u32 tx_flags;
 	struct i40e_tx_desc *next_to_watch;
+	unsigned long time_stamp;
+	struct sk_buff *skb;
 	unsigned int bytecount;
-	u16 gso_segs;
-	u8 mapped_as_page;
+	unsigned short gso_segs;
+	DEFINE_DMA_UNMAP_ADDR(dma);
+	DEFINE_DMA_UNMAP_LEN(len);
+	u32 tx_flags;
 };
 
 struct i40e_rx_buffer {
-- 
1.8.3.1

^ permalink raw reply related

* [net-next 04/13] i40e: Do not directly increment Tx next_to_use
From: Jeff Kirsher @ 2013-10-10  6:41 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, gospo, sassmann, Jesse Brandeburg,
	Jeff Kirsher
In-Reply-To: <1381387271-29605-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Avoid directly incrementing next_to_use for multiple reasons.  The main
reason being that if we directly increment it then it can attain a state
where it is equal to the ring count.  Technically this is a state it
should not be able to reach but the way this is written it now can.

This patch pulls the value off into a register and then increments it
and writes back either the value or 0 depending on if the value is equal
to the ring count.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 46 ++++++++++++++++-------------
 1 file changed, 25 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 3bc3efa..e8db4dc 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -73,11 +73,12 @@ int i40e_program_fdir_filter(struct i40e_fdir_data *fdir_data,
 		goto dma_fail;
 
 	/* grab the next descriptor */
-	fdir_desc = I40E_TX_FDIRDESC(tx_ring, tx_ring->next_to_use);
-	tx_buf = &tx_ring->tx_bi[tx_ring->next_to_use];
-	tx_ring->next_to_use++;
-	if (tx_ring->next_to_use == tx_ring->count)
-		tx_ring->next_to_use = 0;
+	i = tx_ring->next_to_use;
+	fdir_desc = I40E_TX_FDIRDESC(tx_ring, i);
+	tx_buf = &tx_ring->tx_bi[i];
+
+	i++;
+	tx_ring->next_to_use = (i < tx_ring->count) ? i : 0;
 
 	fdir_desc->qindex_flex_ptype_vsi = cpu_to_le32((fdir_data->q_index
 					     << I40E_TXD_FLTR_QW0_QINDEX_SHIFT)
@@ -134,11 +135,11 @@ int i40e_program_fdir_filter(struct i40e_fdir_data *fdir_data,
 	fdir_desc->fd_id = cpu_to_le32(fdir_data->fd_id);
 
 	/* Now program a dummy descriptor */
-	tx_desc = I40E_TX_DESC(tx_ring, tx_ring->next_to_use);
-	tx_buf = &tx_ring->tx_bi[tx_ring->next_to_use];
-	tx_ring->next_to_use++;
-	if (tx_ring->next_to_use == tx_ring->count)
-		tx_ring->next_to_use = 0;
+	i = tx_ring->next_to_use;
+	tx_desc = I40E_TX_DESC(tx_ring, i);
+
+	i++;
+	tx_ring->next_to_use = (i < tx_ring->count) ? i : 0;
 
 	tx_desc->buffer_addr = cpu_to_le64(dma);
 	td_cmd = I40E_TX_DESC_CMD_EOP |
@@ -148,9 +149,6 @@ int i40e_program_fdir_filter(struct i40e_fdir_data *fdir_data,
 	tx_desc->cmd_type_offset_bsz =
 		build_ctob(td_cmd, 0, I40E_FDIR_MAX_RAW_PACKET_LOOKUP, 0);
 
-	/* Mark the data descriptor to be watched */
-	tx_buf->next_to_watch = tx_desc;
-
 	/* Force memory writes to complete before letting h/w
 	 * know there are new descriptors to fetch.  (Only
 	 * applicable for weak-ordered memory model archs,
@@ -158,6 +156,9 @@ int i40e_program_fdir_filter(struct i40e_fdir_data *fdir_data,
 	 */
 	wmb();
 
+	/* Mark the data descriptor to be watched */
+	tx_buf->next_to_watch = tx_desc;
+
 	writel(tx_ring->next_to_use, tx_ring->tail);
 	return 0;
 
@@ -1143,6 +1144,7 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct sk_buff *skb,
 	struct tcphdr *th;
 	unsigned int hlen;
 	u32 flex_ptype, dtype_cmd;
+	u16 i;
 
 	/* make sure ATR is enabled */
 	if (!(pf->flags & I40E_FLAG_FDIR_ATR_ENABLED))
@@ -1182,10 +1184,11 @@ static void i40e_atr(struct i40e_ring *tx_ring, struct sk_buff *skb,
 	tx_ring->atr_count = 0;
 
 	/* grab the next descriptor */
-	fdir_desc = I40E_TX_FDIRDESC(tx_ring, tx_ring->next_to_use);
-	tx_ring->next_to_use++;
-	if (tx_ring->next_to_use == tx_ring->count)
-		tx_ring->next_to_use = 0;
+	i = tx_ring->next_to_use;
+	fdir_desc = I40E_TX_FDIRDESC(tx_ring, i);
+
+	i++;
+	tx_ring->next_to_use = (i < tx_ring->count) ? i : 0;
 
 	flex_ptype = (tx_ring->queue_index << I40E_TXD_FLTR_QW0_QINDEX_SHIFT) &
 		      I40E_TXD_FLTR_QW0_QINDEX_MASK;
@@ -1481,15 +1484,16 @@ static void i40e_create_tx_ctx(struct i40e_ring *tx_ring,
 			       const u32 cd_tunneling, const u32 cd_l2tag2)
 {
 	struct i40e_tx_context_desc *context_desc;
+	int i = tx_ring->next_to_use;
 
 	if (!cd_type_cmd_tso_mss && !cd_tunneling && !cd_l2tag2)
 		return;
 
 	/* grab the next descriptor */
-	context_desc = I40E_TX_CTXTDESC(tx_ring, tx_ring->next_to_use);
-	tx_ring->next_to_use++;
-	if (tx_ring->next_to_use == tx_ring->count)
-		tx_ring->next_to_use = 0;
+	context_desc = I40E_TX_CTXTDESC(tx_ring, i);
+
+	i++;
+	tx_ring->next_to_use = (i < tx_ring->count) ? i : 0;
 
 	/* cpu_to_le32 and assign to struct fields */
 	context_desc->tunneling_params = cpu_to_le32(cd_tunneling);
-- 
1.8.3.1

^ permalink raw reply related

* [net-next 05/13] i40e: clean up Tx fast path
From: Jeff Kirsher @ 2013-10-10  6:41 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, gospo, sassmann, Jesse Brandeburg,
	Jeff Kirsher
In-Reply-To: <1381387271-29605-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Sync the fast path for i40e_tx_map and i40e_clean_tx_irq so that they
are similar to igb and ixgbe.

- Only update the Tx descriptor ring in tx_map
- Make skb mapping always on the first buffer in the chain
- Drop the use of MAPPED_AS_PAGE Tx flag
- Only store flags on the first buffer_info structure

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 220 +++++++++++++++-------------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h |   1 -
 2 files changed, 122 insertions(+), 99 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e8db4dc..e96de75 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -189,27 +189,30 @@ static void i40e_fd_handle_status(struct i40e_ring *rx_ring, u32 qw, u8 prog_id)
 }
 
 /**
- * i40e_unmap_tx_resource - Release a Tx buffer
+ * i40e_unmap_and_free_tx_resource - Release a Tx buffer
  * @ring:      the ring that owns the buffer
  * @tx_buffer: the buffer to free
  **/
-static inline void i40e_unmap_tx_resource(struct i40e_ring *ring,
-					  struct i40e_tx_buffer *tx_buffer)
+static void i40e_unmap_and_free_tx_resource(struct i40e_ring *ring,
+					    struct i40e_tx_buffer *tx_buffer)
 {
-	if (dma_unmap_len(tx_buffer, len)) {
-		if (tx_buffer->tx_flags & I40E_TX_FLAGS_MAPPED_AS_PAGE)
-			dma_unmap_page(ring->dev,
-				       dma_unmap_addr(tx_buffer, dma),
-				       dma_unmap_len(tx_buffer, len),
-				       DMA_TO_DEVICE);
-		else
+	if (tx_buffer->skb) {
+		dev_kfree_skb_any(tx_buffer->skb);
+		if (dma_unmap_len(tx_buffer, len))
 			dma_unmap_single(ring->dev,
 					 dma_unmap_addr(tx_buffer, dma),
 					 dma_unmap_len(tx_buffer, len),
 					 DMA_TO_DEVICE);
+	} else if (dma_unmap_len(tx_buffer, len)) {
+		dma_unmap_page(ring->dev,
+			       dma_unmap_addr(tx_buffer, dma),
+			       dma_unmap_len(tx_buffer, len),
+			       DMA_TO_DEVICE);
 	}
-	tx_buffer->time_stamp = 0;
+	tx_buffer->next_to_watch = NULL;
+	tx_buffer->skb = NULL;
 	dma_unmap_len_set(tx_buffer, len, 0);
+	/* tx_buffer must be completely set up in the transmit path */
 }
 
 /**
@@ -218,7 +221,6 @@ static inline void i40e_unmap_tx_resource(struct i40e_ring *ring,
  **/
 void i40e_clean_tx_ring(struct i40e_ring *tx_ring)
 {
-	struct i40e_tx_buffer *tx_buffer;
 	unsigned long bi_size;
 	u16 i;
 
@@ -227,13 +229,8 @@ void i40e_clean_tx_ring(struct i40e_ring *tx_ring)
 		return;
 
 	/* Free all the Tx ring sk_buffs */
-	for (i = 0; i < tx_ring->count; i++) {
-		tx_buffer = &tx_ring->tx_bi[i];
-		i40e_unmap_tx_resource(tx_ring, tx_buffer);
-		if (tx_buffer->skb)
-			dev_kfree_skb_any(tx_buffer->skb);
-		tx_buffer->skb = NULL;
-	}
+	for (i = 0; i < tx_ring->count; i++)
+		i40e_unmap_and_free_tx_resource(tx_ring, &tx_ring->tx_bi[i]);
 
 	bi_size = sizeof(struct i40e_tx_buffer) * tx_ring->count;
 	memset(tx_ring->tx_bi, 0, bi_size);
@@ -332,16 +329,18 @@ static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, int budget)
 
 	tx_buf = &tx_ring->tx_bi[i];
 	tx_desc = I40E_TX_DESC(tx_ring, i);
+	i -= tx_ring->count;
 
-	for (; budget; budget--) {
-		struct i40e_tx_desc *eop_desc;
-
-		eop_desc = tx_buf->next_to_watch;
+	do {
+		struct i40e_tx_desc *eop_desc = tx_buf->next_to_watch;
 
 		/* if next_to_watch is not set then there is no work pending */
 		if (!eop_desc)
 			break;
 
+		/* prevent any other reads prior to eop_desc */
+		read_barrier_depends();
+
 		/* if the descriptor isn't done, no work yet to do */
 		if (!(eop_desc->cmd_type_offset_bsz &
 		      cpu_to_le64(I40E_TX_DESC_DTYPE_DESC_DONE)))
@@ -349,44 +348,67 @@ static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, int budget)
 
 		/* clear next_to_watch to prevent false hangs */
 		tx_buf->next_to_watch = NULL;
-		tx_buf->time_stamp = 0;
 
-		/* set memory barrier before eop_desc is verified */
-		rmb();
+		/* update the statistics for this packet */
+		total_bytes += tx_buf->bytecount;
+		total_packets += tx_buf->gso_segs;
 
-		do {
-			i40e_unmap_tx_resource(tx_ring, tx_buf);
+		/* free the skb */
+		dev_kfree_skb_any(tx_buf->skb);
 
-			/* clear dtype status */
-			tx_desc->cmd_type_offset_bsz &=
-				~cpu_to_le64(I40E_TXD_QW1_DTYPE_MASK);
+		/* unmap skb header data */
+		dma_unmap_single(tx_ring->dev,
+				 dma_unmap_addr(tx_buf, dma),
+				 dma_unmap_len(tx_buf, len),
+				 DMA_TO_DEVICE);
 
-			if (likely(tx_desc == eop_desc)) {
-				eop_desc = NULL;
+		/* clear tx_buffer data */
+		tx_buf->skb = NULL;
+		dma_unmap_len_set(tx_buf, len, 0);
 
-				dev_kfree_skb_any(tx_buf->skb);
-				tx_buf->skb = NULL;
-
-				total_bytes += tx_buf->bytecount;
-				total_packets += tx_buf->gso_segs;
-			}
+		/* unmap remaining buffers */
+		while (tx_desc != eop_desc) {
 
 			tx_buf++;
 			tx_desc++;
 			i++;
-			if (unlikely(i == tx_ring->count)) {
-				i = 0;
+			if (unlikely(!i)) {
+				i -= tx_ring->count;
 				tx_buf = tx_ring->tx_bi;
 				tx_desc = I40E_TX_DESC(tx_ring, 0);
 			}
-		} while (eop_desc);
-	}
 
+			/* unmap any remaining paged data */
+			if (dma_unmap_len(tx_buf, len)) {
+				dma_unmap_page(tx_ring->dev,
+					       dma_unmap_addr(tx_buf, dma),
+					       dma_unmap_len(tx_buf, len),
+					       DMA_TO_DEVICE);
+				dma_unmap_len_set(tx_buf, len, 0);
+			}
+		}
+
+		/* move us one more past the eop_desc for start of next pkt */
+		tx_buf++;
+		tx_desc++;
+		i++;
+		if (unlikely(!i)) {
+			i -= tx_ring->count;
+			tx_buf = tx_ring->tx_bi;
+			tx_desc = I40E_TX_DESC(tx_ring, 0);
+		}
+
+		/* update budget accounting */
+		budget--;
+	} while (likely(budget));
+
+	i += tx_ring->count;
 	tx_ring->next_to_clean = i;
 	tx_ring->tx_stats.bytes += total_bytes;
 	tx_ring->tx_stats.packets += total_packets;
 	tx_ring->q_vector->tx.total_bytes += total_bytes;
 	tx_ring->q_vector->tx.total_packets += total_packets;
+
 	if (check_for_tx_hang(tx_ring) && i40e_check_tx_hang(tx_ring)) {
 		/* schedule immediate reset if we believe we hung */
 		dev_info(tx_ring->dev, "Detected Tx Unit Hang\n"
@@ -1515,68 +1537,71 @@ static void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 			struct i40e_tx_buffer *first, u32 tx_flags,
 			const u8 hdr_len, u32 td_cmd, u32 td_offset)
 {
-	struct skb_frag_struct *frag = &skb_shinfo(skb)->frags[0];
 	unsigned int data_len = skb->data_len;
 	unsigned int size = skb_headlen(skb);
-	struct device *dev = tx_ring->dev;
-	u32 paylen = skb->len - hdr_len;
-	u16 i = tx_ring->next_to_use;
+	struct skb_frag_struct *frag;
 	struct i40e_tx_buffer *tx_bi;
 	struct i40e_tx_desc *tx_desc;
-	u32 buf_offset = 0;
+	u16 i = tx_ring->next_to_use;
 	u32 td_tag = 0;
 	dma_addr_t dma;
 	u16 gso_segs;
 
-	dma = dma_map_single(dev, skb->data, size, DMA_TO_DEVICE);
-	if (dma_mapping_error(dev, dma))
-		goto dma_error;
-
 	if (tx_flags & I40E_TX_FLAGS_HW_VLAN) {
 		td_cmd |= I40E_TX_DESC_CMD_IL2TAG1;
 		td_tag = (tx_flags & I40E_TX_FLAGS_VLAN_MASK) >>
 			 I40E_TX_FLAGS_VLAN_SHIFT;
 	}
 
+	if (tx_flags & (I40E_TX_FLAGS_TSO | I40E_TX_FLAGS_FSO))
+		gso_segs = skb_shinfo(skb)->gso_segs;
+	else
+		gso_segs = 1;
+
+	/* multiply data chunks by size of headers */
+	first->bytecount = skb->len - hdr_len + (gso_segs * hdr_len);
+	first->gso_segs = gso_segs;
+	first->skb = skb;
+	first->tx_flags = tx_flags;
+
+	dma = dma_map_single(tx_ring->dev, skb->data, size, DMA_TO_DEVICE);
+
 	tx_desc = I40E_TX_DESC(tx_ring, i);
-	for (;;) {
-		while (size > I40E_MAX_DATA_PER_TXD) {
-			tx_desc->buffer_addr = cpu_to_le64(dma + buf_offset);
+	tx_bi = first;
+
+	for (frag = &skb_shinfo(skb)->frags[0];; frag++) {
+		if (dma_mapping_error(tx_ring->dev, dma))
+			goto dma_error;
+
+		/* record length, and DMA address */
+		dma_unmap_len_set(tx_bi, len, size);
+		dma_unmap_addr_set(tx_bi, dma, dma);
+
+		tx_desc->buffer_addr = cpu_to_le64(dma);
+
+		while (unlikely(size > I40E_MAX_DATA_PER_TXD)) {
 			tx_desc->cmd_type_offset_bsz =
 				build_ctob(td_cmd, td_offset,
 					   I40E_MAX_DATA_PER_TXD, td_tag);
 
-			buf_offset += I40E_MAX_DATA_PER_TXD;
-			size -= I40E_MAX_DATA_PER_TXD;
-
 			tx_desc++;
 			i++;
 			if (i == tx_ring->count) {
 				tx_desc = I40E_TX_DESC(tx_ring, 0);
 				i = 0;
 			}
-		}
 
-		tx_bi = &tx_ring->tx_bi[i];
-		dma_unmap_len_set(tx_bi, len, buf_offset + size);
-		dma_unmap_addr_set(tx_bi, dma, dma);
-		tx_bi->tx_flags = tx_flags;
+			dma += I40E_MAX_DATA_PER_TXD;
+			size -= I40E_MAX_DATA_PER_TXD;
 
-		tx_desc->buffer_addr = cpu_to_le64(dma + buf_offset);
-		tx_desc->cmd_type_offset_bsz = build_ctob(td_cmd, td_offset,
-							  size, td_tag);
+			tx_desc->buffer_addr = cpu_to_le64(dma);
+		}
 
 		if (likely(!data_len))
 			break;
 
-		size = skb_frag_size(frag);
-		data_len -= size;
-		buf_offset = 0;
-		tx_flags |= I40E_TX_FLAGS_MAPPED_AS_PAGE;
-
-		dma = skb_frag_dma_map(dev, frag, 0, size, DMA_TO_DEVICE);
-		if (dma_mapping_error(dev, dma))
-			goto dma_error;
+		tx_desc->cmd_type_offset_bsz = build_ctob(td_cmd, td_offset,
+							  size, td_tag);
 
 		tx_desc++;
 		i++;
@@ -1585,31 +1610,21 @@ static void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 			i = 0;
 		}
 
-		frag++;
-	}
-
-	tx_desc->cmd_type_offset_bsz |=
-		       cpu_to_le64((u64)I40E_TXD_CMD << I40E_TXD_QW1_CMD_SHIFT);
-
-	i++;
-	if (i == tx_ring->count)
-		i = 0;
+		size = skb_frag_size(frag);
+		data_len -= size;
 
-	tx_ring->next_to_use = i;
+		dma = skb_frag_dma_map(tx_ring->dev, frag, 0, size,
+				       DMA_TO_DEVICE);
 
-	if (tx_flags & (I40E_TX_FLAGS_TSO | I40E_TX_FLAGS_FSO))
-		gso_segs = skb_shinfo(skb)->gso_segs;
-	else
-		gso_segs = 1;
+		tx_bi = &tx_ring->tx_bi[i];
+	}
 
-	/* multiply data chunks by size of headers */
-	tx_bi->bytecount = paylen + (gso_segs * hdr_len);
-	tx_bi->gso_segs = gso_segs;
-	tx_bi->skb = skb;
+	tx_desc->cmd_type_offset_bsz =
+		build_ctob(td_cmd, td_offset, size, td_tag) |
+		cpu_to_le64((u64)I40E_TXD_CMD << I40E_TXD_QW1_CMD_SHIFT);
 
-	/* set the timestamp and next to watch values */
+	/* set the timestamp */
 	first->time_stamp = jiffies;
-	first->next_to_watch = tx_desc;
 
 	/* Force memory writes to complete before letting h/w
 	 * know there are new descriptors to fetch.  (Only
@@ -1618,16 +1633,27 @@ static void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 	 */
 	wmb();
 
+	/* set next_to_watch value indicating a packet is present */
+	first->next_to_watch = tx_desc;
+
+	i++;
+	if (i == tx_ring->count)
+		i = 0;
+
+	tx_ring->next_to_use = i;
+
+	/* notify HW of packet */
 	writel(i, tx_ring->tail);
+
 	return;
 
 dma_error:
-	dev_info(dev, "TX DMA map failed\n");
+	dev_info(tx_ring->dev, "TX DMA map failed\n");
 
 	/* clear dma mappings for failed tx_bi map */
 	for (;;) {
 		tx_bi = &tx_ring->tx_bi[i];
-		i40e_unmap_tx_resource(tx_ring, tx_bi);
+		i40e_unmap_and_free_tx_resource(tx_ring, tx_bi);
 		if (tx_bi == first)
 			break;
 		if (i == 0)
@@ -1635,8 +1661,6 @@ dma_error:
 		i--;
 	}
 
-	dev_kfree_skb_any(skb);
-
 	tx_ring->next_to_use = i;
 }
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 711f549..2cb1338 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -103,7 +103,6 @@
 #define I40E_TX_FLAGS_FCCRC		(u32)(1 << 6)
 #define I40E_TX_FLAGS_FSO		(u32)(1 << 7)
 #define I40E_TX_FLAGS_TXSW		(u32)(1 << 8)
-#define I40E_TX_FLAGS_MAPPED_AS_PAGE	(u32)(1 << 9)
 #define I40E_TX_FLAGS_VLAN_MASK		0xffff0000
 #define I40E_TX_FLAGS_VLAN_PRIO_MASK	0xe0000000
 #define I40E_TX_FLAGS_VLAN_PRIO_SHIFT	29
-- 
1.8.3.1

^ permalink raw reply related

* [net-next 07/13] i40e: Add support for Tx byte queue limits
From: Jeff Kirsher @ 2013-10-10  6:41 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, gospo, sassmann, Jesse Brandeburg,
	Jeff Kirsher
In-Reply-To: <1381387271-29605-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Implement BQL (byte queue limit) support in i40e.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 52dfac2..ad2818f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -240,6 +240,13 @@ void i40e_clean_tx_ring(struct i40e_ring *tx_ring)
 
 	tx_ring->next_to_use = 0;
 	tx_ring->next_to_clean = 0;
+
+	if (!tx_ring->netdev)
+		return;
+
+	/* cleanup Tx queue statistics */
+	netdev_tx_reset_queue(netdev_get_tx_queue(tx_ring->netdev,
+						  tx_ring->queue_index));
 }
 
 /**
@@ -436,6 +443,10 @@ static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, int budget)
 		return true;
 	}
 
+	netdev_tx_completed_queue(netdev_get_tx_queue(tx_ring->netdev,
+						      tx_ring->queue_index),
+				  total_packets, total_bytes);
+
 #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2)
 	if (unlikely(total_packets && netif_carrier_ok(tx_ring->netdev) &&
 		     (I40E_DESC_UNUSED(tx_ring) >= TX_WAKE_THRESHOLD))) {
@@ -1602,6 +1613,10 @@ static void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 		build_ctob(td_cmd, td_offset, size, td_tag) |
 		cpu_to_le64((u64)I40E_TXD_CMD << I40E_TXD_QW1_CMD_SHIFT);
 
+	netdev_tx_sent_queue(netdev_get_tx_queue(tx_ring->netdev,
+						 tx_ring->queue_index),
+			     first->bytecount);
+
 	/* set the timestamp */
 	first->time_stamp = jiffies;
 
-- 
1.8.3.1

^ permalink raw reply related

* [net-next 06/13] i40e: Drop dead code and flags from Tx hotpath
From: Jeff Kirsher @ 2013-10-10  6:41 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, gospo, sassmann, Jesse Brandeburg,
	Jeff Kirsher
In-Reply-To: <1381387271-29605-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Drop Tx flag and TXSW which is tested but never set.

As a result of this change we can drop a complicated check that always
resulted in the final result of i40e_tx_csum being equal to the
CHECKSUM_PARTIAL value.  As such we can replace the entire function call
with just a check for skb->summed == CHECKSUM_PARTIAL.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 31 +++++------------------------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h |  1 -
 2 files changed, 5 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e96de75..52dfac2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1300,27 +1300,6 @@ static int i40e_tx_prepare_vlan_flags(struct sk_buff *skb,
 }
 
 /**
- * i40e_tx_csum - is checksum offload requested
- * @tx_ring:  ptr to the ring to send
- * @skb:      ptr to the skb we're sending
- * @tx_flags: the collected send information
- * @protocol: the send protocol
- *
- * Returns true if checksum offload is requested
- **/
-static bool i40e_tx_csum(struct i40e_ring *tx_ring, struct sk_buff *skb,
-			 u32 tx_flags, __be16 protocol)
-{
-	if ((skb->ip_summed != CHECKSUM_PARTIAL) &&
-	    !(tx_flags & I40E_TX_FLAGS_TXSW)) {
-		if (!(tx_flags & I40E_TX_FLAGS_HW_VLAN))
-			return false;
-	}
-
-	return skb->ip_summed == CHECKSUM_PARTIAL;
-}
-
-/**
  * i40e_tso - set up the tso context descriptor
  * @tx_ring:  ptr to the ring to send
  * @skb:      ptr to the skb we're sending
@@ -1785,16 +1764,16 @@ static netdev_tx_t i40e_xmit_frame_ring(struct sk_buff *skb,
 
 	skb_tx_timestamp(skb);
 
+	/* always enable CRC insertion offload */
+	td_cmd |= I40E_TX_DESC_CMD_ICRC;
+
 	/* Always offload the checksum, since it's in the data descriptor */
-	if (i40e_tx_csum(tx_ring, skb, tx_flags, protocol))
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
 		tx_flags |= I40E_TX_FLAGS_CSUM;
 
-	/* always enable offload insertion */
-	td_cmd |= I40E_TX_DESC_CMD_ICRC;
-
-	if (tx_flags & I40E_TX_FLAGS_CSUM)
 		i40e_tx_enable_csum(skb, tx_flags, &td_cmd, &td_offset,
 				    tx_ring, &cd_tunneling);
+	}
 
 	i40e_create_tx_ctx(tx_ring, cd_type_cmd_tso_mss,
 			   cd_tunneling, cd_l2tag2);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 2cb1338..e514247 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -102,7 +102,6 @@
 #define I40E_TX_FLAGS_IPV6		(u32)(1 << 5)
 #define I40E_TX_FLAGS_FCCRC		(u32)(1 << 6)
 #define I40E_TX_FLAGS_FSO		(u32)(1 << 7)
-#define I40E_TX_FLAGS_TXSW		(u32)(1 << 8)
 #define I40E_TX_FLAGS_VLAN_MASK		0xffff0000
 #define I40E_TX_FLAGS_VLAN_PRIO_MASK	0xe0000000
 #define I40E_TX_FLAGS_VLAN_PRIO_SHIFT	29
-- 
1.8.3.1

^ permalink raw reply related

* [net-next 08/13] i40e: Split bytes and packets from Rx/Tx stats
From: Jeff Kirsher @ 2013-10-10  6:41 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, gospo, sassmann, Jesse Brandeburg,
	Jeff Kirsher
In-Reply-To: <1381387271-29605-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

This makes it so that the Tx and Rx byte and packet counts are
separated from the rest of the statistics.  This allows for better
isolation of these stats when we move them into the 64 bit statistics.

Simplify things by re-ordering how the stats display in ethtool.
Instead of displaying all of the Tx queues as a block, followed by all
the Rx queues, the new order is Tx[0], Rx[0], Tx[1], Rx[1], ..., Tx[n],
Rx[n].  This reduces the loops and cleans up the display for testing
purposes since it is very easy to verify if flow director is doing the
right thing as the Tx and Rx queue pair are shown in pairs.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  8 ++++----
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 14 +++++---------
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 12 ++++++++----
 drivers/net/ethernet/intel/i40e/i40e_txrx.c    | 12 ++++++------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h    |  8 +++++---
 5 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 0b01c61..2b6655b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -512,8 +512,8 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
 				 vsi->rx_rings[i].ring_active);
 			dev_info(&pf->pdev->dev,
 				 "    rx_rings[%i]: rx_stats: packets = %lld, bytes = %lld, non_eop_descs = %lld\n",
-				 i, vsi->rx_rings[i].rx_stats.packets,
-				 vsi->rx_rings[i].rx_stats.bytes,
+				 i, vsi->rx_rings[i].stats.packets,
+				 vsi->rx_rings[i].stats.bytes,
 				 vsi->rx_rings[i].rx_stats.non_eop_descs);
 			dev_info(&pf->pdev->dev,
 				 "    rx_rings[%i]: rx_stats: alloc_rx_page_failed = %lld, alloc_rx_buff_failed = %lld\n",
@@ -556,8 +556,8 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
 				 vsi->tx_rings[i].ring_active);
 			dev_info(&pf->pdev->dev,
 				 "    tx_rings[%i]: tx_stats: packets = %lld, bytes = %lld, restart_queue = %lld\n",
-				 i, vsi->tx_rings[i].tx_stats.packets,
-				 vsi->tx_rings[i].tx_stats.bytes,
+				 i, vsi->tx_rings[i].stats.packets,
+				 vsi->tx_rings[i].stats.bytes,
 				 vsi->tx_rings[i].tx_stats.restart_queue);
 			dev_info(&pf->pdev->dev,
 				 "    tx_rings[%i]: tx_stats: tx_busy = %lld, tx_done_old = %lld\n",
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 9a76b8c..8754c6fa 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -587,13 +587,11 @@ static void i40e_get_ethtool_stats(struct net_device *netdev,
 		data[i++] = (i40e_gstrings_net_stats[j].sizeof_stat ==
 			sizeof(u64)) ? *(u64 *)p : *(u32 *)p;
 	}
-	for (j = 0; j < vsi->num_queue_pairs; j++) {
-		data[i++] = vsi->tx_rings[j].tx_stats.packets;
-		data[i++] = vsi->tx_rings[j].tx_stats.bytes;
-	}
-	for (j = 0; j < vsi->num_queue_pairs; j++) {
-		data[i++] = vsi->rx_rings[j].rx_stats.packets;
-		data[i++] = vsi->rx_rings[j].rx_stats.bytes;
+	for (j = 0; j < vsi->num_queue_pairs; j++, i += 4) {
+		data[i] = vsi->tx_rings[j].stats.packets;
+		data[i + 1] = vsi->tx_rings[j].stats.bytes;
+		data[i + 2] = vsi->rx_rings[j].stats.packets;
+		data[i + 3] = vsi->rx_rings[j].stats.bytes;
 	}
 	if (vsi == pf->vsi[pf->lan_vsi]) {
 		for (j = 0; j < I40E_GLOBAL_STATS_LEN; j++) {
@@ -641,8 +639,6 @@ static void i40e_get_strings(struct net_device *netdev, u32 stringset,
 			p += ETH_GSTRING_LEN;
 			snprintf(p, ETH_GSTRING_LEN, "tx-%u.tx_bytes", i);
 			p += ETH_GSTRING_LEN;
-		}
-		for (i = 0; i < vsi->num_queue_pairs; i++) {
 			snprintf(p, ETH_GSTRING_LEN, "rx-%u.rx_packets", i);
 			p += ETH_GSTRING_LEN;
 			snprintf(p, ETH_GSTRING_LEN, "rx-%u.rx_bytes", i);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 657babe..d1b5bae 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -376,8 +376,12 @@ void i40e_vsi_reset_stats(struct i40e_vsi *vsi)
 	memset(&vsi->eth_stats_offsets, 0, sizeof(vsi->eth_stats_offsets));
 	if (vsi->rx_rings)
 		for (i = 0; i < vsi->num_queue_pairs; i++) {
+			memset(&vsi->rx_rings[i].stats, 0 ,
+			       sizeof(vsi->rx_rings[i].stats));
 			memset(&vsi->rx_rings[i].rx_stats, 0 ,
 			       sizeof(vsi->rx_rings[i].rx_stats));
+			memset(&vsi->tx_rings[i].stats, 0 ,
+			       sizeof(vsi->tx_rings[i].stats));
 			memset(&vsi->tx_rings[i].tx_stats, 0,
 			       sizeof(vsi->tx_rings[i].tx_stats));
 		}
@@ -708,14 +712,14 @@ void i40e_update_stats(struct i40e_vsi *vsi)
 		struct i40e_ring *p;
 
 		p = &vsi->rx_rings[q];
-		rx_b += p->rx_stats.bytes;
-		rx_p += p->rx_stats.packets;
+		rx_b += p->stats.bytes;
+		rx_p += p->stats.packets;
 		rx_buf += p->rx_stats.alloc_rx_buff_failed;
 		rx_page += p->rx_stats.alloc_rx_page_failed;
 
 		p = &vsi->tx_rings[q];
-		tx_b += p->tx_stats.bytes;
-		tx_p += p->tx_stats.packets;
+		tx_b += p->stats.bytes;
+		tx_p += p->stats.packets;
 		tx_restart += p->tx_stats.restart_queue;
 		tx_busy += p->tx_stats.tx_busy;
 	}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index ad2818f..3e73bc0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -305,14 +305,14 @@ static bool i40e_check_tx_hang(struct i40e_ring *tx_ring)
 	 * run the check_tx_hang logic with a transmit completion
 	 * pending but without time to complete it yet.
 	 */
-	if ((tx_ring->tx_stats.tx_done_old == tx_ring->tx_stats.packets) &&
+	if ((tx_ring->tx_stats.tx_done_old == tx_ring->stats.packets) &&
 	    tx_pending) {
 		/* make sure it is true for two checks in a row */
 		ret = test_and_set_bit(__I40E_HANG_CHECK_ARMED,
 				       &tx_ring->state);
 	} else {
 		/* update completed stats and disarm the hang check */
-		tx_ring->tx_stats.tx_done_old = tx_ring->tx_stats.packets;
+		tx_ring->tx_stats.tx_done_old = tx_ring->stats.packets;
 		clear_bit(__I40E_HANG_CHECK_ARMED, &tx_ring->state);
 	}
 
@@ -411,8 +411,8 @@ static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, int budget)
 
 	i += tx_ring->count;
 	tx_ring->next_to_clean = i;
-	tx_ring->tx_stats.bytes += total_bytes;
-	tx_ring->tx_stats.packets += total_packets;
+	tx_ring->stats.bytes += total_bytes;
+	tx_ring->stats.packets += total_packets;
 	tx_ring->q_vector->tx.total_bytes += total_bytes;
 	tx_ring->q_vector->tx.total_packets += total_packets;
 
@@ -1075,8 +1075,8 @@ next_desc:
 	}
 
 	rx_ring->next_to_clean = i;
-	rx_ring->rx_stats.packets += total_rx_packets;
-	rx_ring->rx_stats.bytes += total_rx_bytes;
+	rx_ring->stats.packets += total_rx_packets;
+	rx_ring->stats.bytes += total_rx_bytes;
 	rx_ring->q_vector->rx.total_packets += total_rx_packets;
 	rx_ring->q_vector->rx.total_bytes += total_rx_bytes;
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index e514247..7f3f7e3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -126,17 +126,18 @@ struct i40e_rx_buffer {
 	unsigned int page_offset;
 };
 
-struct i40e_tx_queue_stats {
+struct i40e_queue_stats {
 	u64 packets;
 	u64 bytes;
+};
+
+struct i40e_tx_queue_stats {
 	u64 restart_queue;
 	u64 tx_busy;
 	u64 tx_done_old;
 };
 
 struct i40e_rx_queue_stats {
-	u64 packets;
-	u64 bytes;
 	u64 non_eop_descs;
 	u64 alloc_rx_page_failed;
 	u64 alloc_rx_buff_failed;
@@ -215,6 +216,7 @@ struct i40e_ring {
 	bool ring_active;		/* is ring online or not */
 
 	/* stats structs */
+	struct i40e_queue_stats	stats;
 	union {
 		struct i40e_tx_queue_stats tx_stats;
 		struct i40e_rx_queue_stats rx_stats;
-- 
1.8.3.1

^ permalink raw reply related

* [net-next 09/13] i40e: Move q_vectors from pointer to array to array of pointers
From: Jeff Kirsher @ 2013-10-10  6:41 UTC (permalink / raw)
  To: davem
  Cc: Alexander Duyck, netdev, gospo, sassmann, Jesse Brandeburg,
	Jeff Kirsher
In-Reply-To: <1381387271-29605-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Alexander Duyck <alexander.h.duyck@intel.com>

Allocate the q_vectors individually. The advantage to this is that it
allows for easier freeing and allocation.  In addition it makes it so
that we could do node specific allocations at some point in the future.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h         |   5 +-
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  11 +-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |   4 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 152 +++++++++++++++++--------
 4 files changed, 112 insertions(+), 60 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index b5252eb..789304e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -366,7 +366,7 @@ struct i40e_vsi {
 	u8  dtype;
 
 	/* List of q_vectors allocated to this VSI */
-	struct i40e_q_vector *q_vectors;
+	struct i40e_q_vector **q_vectors;
 	int num_q_vectors;
 	int base_vector;
 
@@ -422,8 +422,9 @@ struct i40e_q_vector {
 
 	u8 num_ringpairs;	/* total number of ring pairs in vector */
 
-	char name[IFNAMSIZ + 9];
 	cpumask_t affinity_mask;
+	struct rcu_head rcu;	/* to avoid race with update stats on free */
+	char name[IFNAMSIZ + 9];
 } ____cacheline_internodealigned_in_smp;
 
 /* lan device */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 2b6655b..44e3fa4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -586,15 +586,6 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int seid)
 	dev_info(&pf->pdev->dev,
 		 "    max_frame = %d, rx_hdr_len = %d, rx_buf_len = %d dtype = %d\n",
 		 vsi->max_frame, vsi->rx_hdr_len, vsi->rx_buf_len, vsi->dtype);
-	if (vsi->q_vectors) {
-		for (i = 0; i < vsi->num_q_vectors; i++) {
-			dev_info(&pf->pdev->dev,
-				 "    q_vectors[%i]: base index = %ld\n",
-				 i, ((long int)*vsi->q_vectors[i].rx.ring-
-					(long int)*vsi->q_vectors[0].rx.ring)/
-					sizeof(struct i40e_ring));
-		}
-	}
 	dev_info(&pf->pdev->dev,
 		 "    num_q_vectors = %i, base_vector = %i\n",
 		 vsi->num_q_vectors, vsi->base_vector);
@@ -1995,7 +1986,7 @@ static ssize_t i40e_dbg_netdev_ops_write(struct file *filp,
 			goto netdev_ops_write_done;
 		}
 		for (i = 0; i < vsi->num_q_vectors; i++)
-			napi_schedule(&vsi->q_vectors[i].napi);
+			napi_schedule(&vsi->q_vectors[i]->napi);
 		dev_info(&pf->pdev->dev, "napi called\n");
 	} else {
 		dev_info(&pf->pdev->dev, "unknown command '%s'\n",
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 8754c6fa..bf607da 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -906,8 +906,8 @@ static int i40e_set_coalesce(struct net_device *netdev,
 	}
 
 	vector = vsi->base_vector;
-	q_vector = vsi->q_vectors;
-	for (i = 0; i < vsi->num_q_vectors; i++, vector++, q_vector++) {
+	for (i = 0; i < vsi->num_q_vectors; i++, vector++) {
+		q_vector = vsi->q_vectors[i];
 		q_vector->rx.itr = ITR_TO_REG(vsi->rx_itr_setting);
 		wr32(hw, I40E_PFINT_ITRN(0, vector - 1), q_vector->rx.itr);
 		q_vector->tx.itr = ITR_TO_REG(vsi->tx_itr_setting);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d1b5bae..a090815 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2358,8 +2358,8 @@ static void i40e_vsi_configure_msix(struct i40e_vsi *vsi)
 	 */
 	qp = vsi->base_queue;
 	vector = vsi->base_vector;
-	q_vector = vsi->q_vectors;
-	for (i = 0; i < vsi->num_q_vectors; i++, q_vector++, vector++) {
+	for (i = 0; i < vsi->num_q_vectors; i++, vector++) {
+		q_vector = vsi->q_vectors[i];
 		q_vector->rx.itr = ITR_TO_REG(vsi->rx_itr_setting);
 		q_vector->rx.latency_range = I40E_LOW_LATENCY;
 		wr32(hw, I40E_PFINT_ITRN(I40E_RX_ITR, vector - 1),
@@ -2439,7 +2439,7 @@ static void i40e_enable_misc_int_causes(struct i40e_hw *hw)
  **/
 static void i40e_configure_msi_and_legacy(struct i40e_vsi *vsi)
 {
-	struct i40e_q_vector *q_vector = vsi->q_vectors;
+	struct i40e_q_vector *q_vector = vsi->q_vectors[0];
 	struct i40e_pf *pf = vsi->back;
 	struct i40e_hw *hw = &pf->hw;
 	u32 val;
@@ -2558,7 +2558,7 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
 	int vector, err;
 
 	for (vector = 0; vector < q_vectors; vector++) {
-		struct i40e_q_vector *q_vector = &(vsi->q_vectors[vector]);
+		struct i40e_q_vector *q_vector = vsi->q_vectors[vector];
 
 		if (q_vector->tx.ring[0] && q_vector->rx.ring[0]) {
 			snprintf(q_vector->name, sizeof(q_vector->name) - 1,
@@ -2709,7 +2709,7 @@ static irqreturn_t i40e_intr(int irq, void *data)
 		i40e_flush(hw);
 
 		if (!test_bit(__I40E_DOWN, &pf->state))
-			napi_schedule(&pf->vsi[pf->lan_vsi]->q_vectors[0].napi);
+			napi_schedule(&pf->vsi[pf->lan_vsi]->q_vectors[0]->napi);
 	}
 
 	if (icr0 & I40E_PFINT_ICR0_ADMINQ_MASK) {
@@ -2785,7 +2785,7 @@ static irqreturn_t i40e_intr(int irq, void *data)
  **/
 static void map_vector_to_rxq(struct i40e_vsi *vsi, int v_idx, int r_idx)
 {
-	struct i40e_q_vector *q_vector = &(vsi->q_vectors[v_idx]);
+	struct i40e_q_vector *q_vector = vsi->q_vectors[v_idx];
 	struct i40e_ring *rx_ring = &(vsi->rx_rings[r_idx]);
 
 	rx_ring->q_vector = q_vector;
@@ -2803,7 +2803,7 @@ static void map_vector_to_rxq(struct i40e_vsi *vsi, int v_idx, int r_idx)
  **/
 static void map_vector_to_txq(struct i40e_vsi *vsi, int v_idx, int t_idx)
 {
-	struct i40e_q_vector *q_vector = &(vsi->q_vectors[v_idx]);
+	struct i40e_q_vector *q_vector = vsi->q_vectors[v_idx];
 	struct i40e_ring *tx_ring = &(vsi->tx_rings[t_idx]);
 
 	tx_ring->q_vector = q_vector;
@@ -2891,7 +2891,7 @@ static void i40e_netpoll(struct net_device *netdev)
 	pf->flags |= I40E_FLAG_IN_NETPOLL;
 	if (pf->flags & I40E_FLAG_MSIX_ENABLED) {
 		for (i = 0; i < vsi->num_q_vectors; i++)
-			i40e_msix_clean_rings(0, &vsi->q_vectors[i]);
+			i40e_msix_clean_rings(0, vsi->q_vectors[i]);
 	} else {
 		i40e_intr(pf->pdev->irq, netdev);
 	}
@@ -3077,14 +3077,14 @@ static void i40e_vsi_free_irq(struct i40e_vsi *vsi)
 			u16 vector = i + base;
 
 			/* free only the irqs that were actually requested */
-			if (vsi->q_vectors[i].num_ringpairs == 0)
+			if (vsi->q_vectors[i]->num_ringpairs == 0)
 				continue;
 
 			/* clear the affinity_mask in the IRQ descriptor */
 			irq_set_affinity_hint(pf->msix_entries[vector].vector,
 					      NULL);
 			free_irq(pf->msix_entries[vector].vector,
-				 &vsi->q_vectors[i]);
+				 vsi->q_vectors[i]);
 
 			/* Tear down the interrupt queue link list
 			 *
@@ -3168,6 +3168,38 @@ static void i40e_vsi_free_irq(struct i40e_vsi *vsi)
 }
 
 /**
+ * i40e_free_q_vector - Free memory allocated for specific interrupt vector
+ * @vsi: the VSI being configured
+ * @v_idx: Index of vector to be freed
+ *
+ * This function frees the memory allocated to the q_vector.  In addition if
+ * NAPI is enabled it will delete any references to the NAPI struct prior
+ * to freeing the q_vector.
+ **/
+static void i40e_free_q_vector(struct i40e_vsi *vsi, int v_idx)
+{
+	struct i40e_q_vector *q_vector = vsi->q_vectors[v_idx];
+	int r_idx;
+
+	if (!q_vector)
+		return;
+
+	/* disassociate q_vector from rings */
+	for (r_idx = 0; r_idx < q_vector->tx.count; r_idx++)
+		q_vector->tx.ring[r_idx]->q_vector = NULL;
+	for (r_idx = 0; r_idx < q_vector->rx.count; r_idx++)
+		q_vector->rx.ring[r_idx]->q_vector = NULL;
+
+	/* only VSI w/ an associated netdev is set up w/ NAPI */
+	if (vsi->netdev)
+		netif_napi_del(&q_vector->napi);
+
+	vsi->q_vectors[v_idx] = NULL;
+
+	kfree_rcu(q_vector, rcu);
+}
+
+/**
  * i40e_vsi_free_q_vectors - Free memory allocated for interrupt vectors
  * @vsi: the VSI being un-configured
  *
@@ -3178,24 +3210,8 @@ static void i40e_vsi_free_q_vectors(struct i40e_vsi *vsi)
 {
 	int v_idx;
 
-	for (v_idx = 0; v_idx < vsi->num_q_vectors; v_idx++) {
-		struct i40e_q_vector *q_vector = &vsi->q_vectors[v_idx];
-		int r_idx;
-
-		if (!q_vector)
-			continue;
-
-		/* disassociate q_vector from rings */
-		for (r_idx = 0; r_idx < q_vector->tx.count; r_idx++)
-			q_vector->tx.ring[r_idx]->q_vector = NULL;
-		for (r_idx = 0; r_idx < q_vector->rx.count; r_idx++)
-			q_vector->rx.ring[r_idx]->q_vector = NULL;
-
-		/* only VSI w/ an associated netdev is set up w/ NAPI */
-		if (vsi->netdev)
-			netif_napi_del(&q_vector->napi);
-	}
-	kfree(vsi->q_vectors);
+	for (v_idx = 0; v_idx < vsi->num_q_vectors; v_idx++)
+		i40e_free_q_vector(vsi, v_idx);
 }
 
 /**
@@ -3245,7 +3261,7 @@ static void i40e_napi_enable_all(struct i40e_vsi *vsi)
 		return;
 
 	for (q_idx = 0; q_idx < vsi->num_q_vectors; q_idx++)
-		napi_enable(&vsi->q_vectors[q_idx].napi);
+		napi_enable(&vsi->q_vectors[q_idx]->napi);
 }
 
 /**
@@ -3260,7 +3276,7 @@ static void i40e_napi_disable_all(struct i40e_vsi *vsi)
 		return;
 
 	for (q_idx = 0; q_idx < vsi->num_q_vectors; q_idx++)
-		napi_disable(&vsi->q_vectors[q_idx].napi);
+		napi_disable(&vsi->q_vectors[q_idx]->napi);
 }
 
 /**
@@ -4945,6 +4961,7 @@ static int i40e_vsi_mem_alloc(struct i40e_pf *pf, enum i40e_vsi_type type)
 {
 	int ret = -ENODEV;
 	struct i40e_vsi *vsi;
+	int sz_vectors;
 	int vsi_idx;
 	int i;
 
@@ -4970,14 +4987,14 @@ static int i40e_vsi_mem_alloc(struct i40e_pf *pf, enum i40e_vsi_type type)
 		vsi_idx = i;             /* Found one! */
 	} else {
 		ret = -ENODEV;
-		goto err_alloc_vsi;  /* out of VSI slots! */
+		goto unlock_pf;  /* out of VSI slots! */
 	}
 	pf->next_vsi = ++i;
 
 	vsi = kzalloc(sizeof(*vsi), GFP_KERNEL);
 	if (!vsi) {
 		ret = -ENOMEM;
-		goto err_alloc_vsi;
+		goto unlock_pf;
 	}
 	vsi->type = type;
 	vsi->back = pf;
@@ -4992,12 +5009,25 @@ static int i40e_vsi_mem_alloc(struct i40e_pf *pf, enum i40e_vsi_type type)
 
 	i40e_set_num_rings_in_vsi(vsi);
 
+	/* allocate memory for q_vector pointers */
+	sz_vectors = sizeof(struct i40e_q_vectors *) * vsi->num_q_vectors;
+	vsi->q_vectors = kzalloc(sz_vectors, GFP_KERNEL);
+	if (!vsi->q_vectors) {
+		ret = -ENOMEM;
+		goto err_vectors;
+	}
+
 	/* Setup default MSIX irq handler for VSI */
 	i40e_vsi_setup_irqhandler(vsi, i40e_msix_clean_rings);
 
 	pf->vsi[vsi_idx] = vsi;
 	ret = vsi_idx;
-err_alloc_vsi:
+	goto unlock_pf;
+
+err_vectors:
+	pf->next_vsi = i - 1;
+	kfree(vsi);
+unlock_pf:
 	mutex_unlock(&pf->switch_mutex);
 	return ret;
 }
@@ -5038,6 +5068,9 @@ static int i40e_vsi_clear(struct i40e_vsi *vsi)
 	i40e_put_lump(pf->qp_pile, vsi->base_queue, vsi->idx);
 	i40e_put_lump(pf->irq_pile, vsi->base_vector, vsi->idx);
 
+	/* free the ring and vector containers */
+	kfree(vsi->q_vectors);
+
 	pf->vsi[vsi->idx] = NULL;
 	if (vsi->idx < pf->next_vsi)
 		pf->next_vsi = vsi->idx;
@@ -5257,6 +5290,35 @@ static int i40e_init_msix(struct i40e_pf *pf)
 }
 
 /**
+ * i40e_alloc_q_vector - Allocate memory for a single interrupt vector
+ * @vsi: the VSI being configured
+ * @v_idx: index of the vector in the vsi struct
+ *
+ * We allocate one q_vector.  If allocation fails we return -ENOMEM.
+ **/
+static int i40e_alloc_q_vector(struct i40e_vsi *vsi, int v_idx)
+{
+	struct i40e_q_vector *q_vector;
+
+	/* allocate q_vector */
+	q_vector = kzalloc(sizeof(struct i40e_q_vector), GFP_KERNEL);
+	if (!q_vector)
+		return -ENOMEM;
+
+	q_vector->vsi = vsi;
+	q_vector->v_idx = v_idx;
+	cpumask_set_cpu(v_idx, &q_vector->affinity_mask);
+	if (vsi->netdev)
+		netif_napi_add(vsi->netdev, &q_vector->napi,
+			       i40e_napi_poll, vsi->work_limit);
+
+	/* tie q_vector and vsi together */
+	vsi->q_vectors[v_idx] = q_vector;
+
+	return 0;
+}
+
+/**
  * i40e_alloc_q_vectors - Allocate memory for interrupt vectors
  * @vsi: the VSI being configured
  *
@@ -5267,6 +5329,7 @@ static int i40e_alloc_q_vectors(struct i40e_vsi *vsi)
 {
 	struct i40e_pf *pf = vsi->back;
 	int v_idx, num_q_vectors;
+	int err;
 
 	/* if not MSIX, give the one vector only to the LAN VSI */
 	if (pf->flags & I40E_FLAG_MSIX_ENABLED)
@@ -5276,22 +5339,19 @@ static int i40e_alloc_q_vectors(struct i40e_vsi *vsi)
 	else
 		return -EINVAL;
 
-	vsi->q_vectors = kcalloc(num_q_vectors,
-				 sizeof(struct i40e_q_vector),
-				 GFP_KERNEL);
-	if (!vsi->q_vectors)
-		return -ENOMEM;
-
 	for (v_idx = 0; v_idx < num_q_vectors; v_idx++) {
-		vsi->q_vectors[v_idx].vsi = vsi;
-		vsi->q_vectors[v_idx].v_idx = v_idx;
-		cpumask_set_cpu(v_idx, &vsi->q_vectors[v_idx].affinity_mask);
-		if (vsi->netdev)
-			netif_napi_add(vsi->netdev, &vsi->q_vectors[v_idx].napi,
-				       i40e_napi_poll, vsi->work_limit);
+		err = i40e_alloc_q_vector(vsi, v_idx);
+		if (err)
+			goto err_out;
 	}
 
 	return 0;
+
+err_out:
+	while (v_idx--)
+		i40e_free_q_vector(vsi, v_idx);
+
+	return err;
 }
 
 /**
@@ -5958,7 +6018,7 @@ static int i40e_vsi_setup_vectors(struct i40e_vsi *vsi)
 	int ret = -ENOENT;
 	struct i40e_pf *pf = vsi->back;
 
-	if (vsi->q_vectors) {
+	if (vsi->q_vectors[0]) {
 		dev_info(&pf->pdev->dev, "VSI %d has existing q_vectors\n",
 			 vsi->seid);
 		return -EEXIST;
-- 
1.8.3.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox