Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: 3.5rc6 modprobe -r ip_vs oopses
From: Julian Anastasov @ 2012-07-11 20:59 UTC (permalink / raw)
  To: Dave Jones; +Cc: netdev, wensong, horms
In-Reply-To: <20120711190722.GA31185@redhat.com>


	Hello,

On Wed, 11 Jul 2012, Dave Jones wrote:

> Triggered during modprobe -r ip_vs
> (W taint flag came from something unrelated: 80211 trying >MAX_ORDER alloc failing).
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000750
> IP: [<ffffffffa4c9280b>] ip_vs_dst_event+0x18b/0x200 [ip_vs]
> PGD ae993067 PUD 35cbb067 PMD 0 
> Oops: 0000 [#1] SMP 
> CPU 2 
> 
> Pid: 11534, comm: modprobe Tainted: G        WC   3.5.0-rc6+ #78
>  Dell Inc.                 Precision WorkStation 490    
> /0DT031
> 
> RIP: 0010:[<ffffffffa4c9280b>]  [<ffffffffa4c9280b>] ip_vs_dst_event+0x18b/0x200 [ip_vs]
> RSP: 0000:ffff8800c92b7e28  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff880222fa22a0 RCX: 0000000000000006
> RDX: ffffffffa4caba50 RSI: 2222222222222222 RDI: 2222222222222222
> RBP: ffff8800c92b7e78 R08: 2222222222222222 R09: 2222222222222222
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82bec540
> R13: ffffffff82bec638 R14: ffffffff82bec540 R15: ffffffffa4caba40
> FS:  00007f652be40700(0000) GS:ffff880226a00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000750 CR3: 0000000035ea4000 CR4: 00000000000007e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process modprobe (pid: 11534, threadinfo ffff8800c92b6000, task ffff8801278d4900)
> Stack:
>  2222222222222222 0000000000000100 ffffffffa4caba50 ffffffffa4caa7a0
>  ffffffffa4ca8750 ffff880222fa22a0 ffffffffa4ca8750 ffffffff82bec638
>  ffffffff82bec540 0000000000000000 ffff8800c92b7eb8 ffffffff8156adcf
> 
> Call Trace:
>  [<ffffffff8156adcf>] unregister_netdevice_notifier+0x7f/0x100
>  [<ffffffffa4c96005>] ip_vs_control_cleanup+0x15/0x20 [ip_vs]
>  [<ffffffffa4ca33b2>] ip_vs_cleanup+0x41/0xc8f [ip_vs]
>  [<ffffffff810e0b8c>] sys_delete_module+0x19c/0x2c0
>  [<ffffffff816c1615>] ? sysret_check+0x22/0x5d
>  [<ffffffff8133c04e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff816c15e9>] system_call_fastpath+0x16/0x1b
> Code: 18 75 cd 4c 89 ef e8 35 48 00 00 eb c3 0f 1f 00 48 83 45 b8 01 48 81 
> 7d b8 00 01 00 00 0f 85 cd fe ff ff 49 8b 84 24 c0 0c 00 00 <4c> 8b a8 50 07 
> 00 00 48 05 50 07 00 00 49 39 c5 75 38 48 c7 c7 
> 
> RIP 
>  [<ffffffffa4c9280b>] ip_vs_dst_event+0x18b/0x200 [ip_vs]
>  RSP <ffff8800c92b7e28>
> CR2: 0000000000000750
> ---[ end trace 83bf880b95bd63b8 ]---
> 
> That's here...
> 
>         list_for_each_entry(dest, &net_ipvs(net)->dest_trash, n_list) {
>     226b:       4c 8b a8 50 07 00 00    mov    0x750(%rax),%r13
> 

	Yes, I got the same oops 4 days ago, we already
have a fix:

http://archive.linuxvirtualserver.org/html/lvs-devel/2012-07/msg00002.html
http://archive.linuxvirtualserver.org/html/lvs-devel/2012-07/msg00020.html

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* [PATCH 1/1] net: sched: add ipset ematch
From: Florian Westphal @ 2012-07-11 20:56 UTC (permalink / raw)
  To: netdev; +Cc: kadlec, Florian Westphal

Can be used to match packets against netfilter ip sets created via ipset(8).
skb->sk_iif is used as 'incoming interface', skb->dev is 'outgoing interface'.

Since ipset is usually called from netfilter, the ematch
initializes a fake xt_action_param, pulls the ip header into the
linear area and also sets skb->data to the IP header (otherwise
matching Layer 4 set types doesn't work).

Tested-by: Mr Dash Four <mr.dash.four@googlemail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/pkt_cls.h |    3 +-
 net/sched/Kconfig       |   10 ++++
 net/sched/Makefile      |    1 +
 net/sched/em_ipset.c    |  135 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 148 insertions(+), 1 deletions(-)
 create mode 100644 net/sched/em_ipset.c

diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index 38fbd4b..082eafa 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -453,7 +453,8 @@ enum {
 #define	TCF_EM_TEXT		5
 #define	TCF_EM_VLAN		6
 #define	TCF_EM_CANID		7
-#define	TCF_EM_MAX		7
+#define	TCF_EM_IPSET		8
+#define	TCF_EM_MAX		8
 
 enum {
 	TCF_EM_PROG_TC
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 4a5d2bd..62fb51f 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -517,6 +517,16 @@ config NET_EMATCH_CANID
 	  To compile this code as a module, choose M here: the
 	  module will be called em_canid.
 
+config NET_EMATCH_IPSET
+	tristate "IPset"
+	depends on NET_EMATCH && IP_SET
+	---help---
+	  Say Y here if you want to be able to classify packets based on
+	  ipset membership.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called em_ipset.
+
 config NET_CLS_ACT
 	bool "Actions"
 	---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index bcada75..978cbf0 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -56,3 +56,4 @@ obj-$(CONFIG_NET_EMATCH_U32)	+= em_u32.o
 obj-$(CONFIG_NET_EMATCH_META)	+= em_meta.o
 obj-$(CONFIG_NET_EMATCH_TEXT)	+= em_text.o
 obj-$(CONFIG_NET_EMATCH_CANID)	+= em_canid.o
+obj-$(CONFIG_NET_EMATCH_IPSET)	+= em_ipset.o
diff --git a/net/sched/em_ipset.c b/net/sched/em_ipset.c
new file mode 100644
index 0000000..3130320
--- /dev/null
+++ b/net/sched/em_ipset.c
@@ -0,0 +1,135 @@
+/*
+ * net/sched/em_ipset.c	ipset ematch
+ *
+ * Copyright (c) 2012 Florian Westphal <fw@strlen.de>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ */
+
+#include <linux/gfp.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/skbuff.h>
+#include <linux/netfilter/xt_set.h>
+#include <linux/ipv6.h>
+#include <net/ip.h>
+#include <net/pkt_cls.h>
+
+static int em_ipset_change(struct tcf_proto *tp, void *data, int data_len,
+			   struct tcf_ematch *em)
+{
+	struct xt_set_info *set = data;
+	ip_set_id_t index;
+
+	if (data_len != sizeof(*set))
+		return -EINVAL;
+
+	index = ip_set_nfnl_get_byindex(set->index);
+	if (index == IPSET_INVALID_ID)
+		return -ENOENT;
+
+	em->datalen = sizeof(*set);
+	em->data = (unsigned long)kmemdup(data, em->datalen, GFP_KERNEL);
+	if (em->data)
+		return 0;
+
+	ip_set_nfnl_put(index);
+	return -ENOMEM;
+}
+
+static void em_ipset_destroy(struct tcf_proto *p, struct tcf_ematch *em)
+{
+	const struct xt_set_info *set = (const void *) em->data;
+	if (set) {
+		ip_set_nfnl_put(set->index);
+		kfree((void *) em->data);
+	}
+}
+
+static int em_ipset_match(struct sk_buff *skb, struct tcf_ematch *em,
+			  struct tcf_pkt_info *info)
+{
+	struct ip_set_adt_opt opt;
+	struct xt_action_param acpar;
+	const struct xt_set_info *set = (const void *) em->data;
+	struct net_device *dev, *indev = NULL;
+	int ret, network_offset;
+
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		acpar.family = NFPROTO_IPV4;
+		if (!pskb_network_may_pull(skb, sizeof(struct iphdr)))
+			return 0;
+		acpar.thoff = ip_hdrlen(skb);
+		break;
+	case htons(ETH_P_IPV6):
+		acpar.family = NFPROTO_IPV6;
+		if (!pskb_network_may_pull(skb, sizeof(struct ipv6hdr)))
+			return 0;
+		/* doesn't call ipv6_find_hdr() because ipset doesn't use thoff, yet */
+		acpar.thoff = sizeof(struct ipv6hdr);
+		break;
+	default:
+		return 0;
+	}
+
+	acpar.hooknum = 0;
+
+	opt.family = acpar.family;
+	opt.dim = set->dim;
+	opt.flags = set->flags;
+	opt.cmdflags = 0;
+	opt.timeout = ~0u;
+
+	network_offset = skb_network_offset(skb);
+	skb_pull(skb, network_offset);
+
+	dev = skb->dev;
+
+	rcu_read_lock();
+
+	if (dev && skb->skb_iif)
+		indev = dev_get_by_index_rcu(dev_net(dev), skb->skb_iif);
+
+	acpar.in      = indev ? indev : dev;
+	acpar.out     = dev;
+
+	ret = ip_set_test(set->index, skb, &acpar, &opt);
+
+	rcu_read_unlock();
+
+	skb_push(skb, network_offset);
+	return ret;
+}
+
+static struct tcf_ematch_ops em_ipset_ops = {
+	.kind	  = TCF_EM_IPSET,
+	.change	  = em_ipset_change,
+	.destroy  = em_ipset_destroy,
+	.match	  = em_ipset_match,
+	.owner	  = THIS_MODULE,
+	.link	  = LIST_HEAD_INIT(em_ipset_ops.link)
+};
+
+static int __init init_em_ipset(void)
+{
+	return tcf_em_register(&em_ipset_ops);
+}
+
+static void __exit exit_em_ipset(void)
+{
+	tcf_em_unregister(&em_ipset_ops);
+}
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Florian Westphal <fw@strlen.de>");
+MODULE_DESCRIPTION("TC extended match for IP sets");
+
+module_init(init_em_ipset);
+module_exit(exit_em_ipset);
+
+MODULE_ALIAS_TCF_EMATCH(TCF_EM_IPSET);
-- 
1.7.3.4

^ permalink raw reply related

* Re: [RFC PATCH 09/10] ixgbe: Add support for displaying the number of Tx/Rx channels
From: Alexander Duyck @ 2012-07-11 21:00 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, davem, jeffrey.t.kirsher, edumazet, therbert,
	alexander.duyck
In-Reply-To: <1342030903.2613.52.camel@bwh-desktop.uk.solarflarecom.com>

On 07/11/2012 11:21 AM, Ben Hutchings wrote:
> On Fri, 2012-06-29 at 17:16 -0700, Alexander Duyck wrote:
>> This patch adds support for the ethtool get_channels operation.
>>
>> Since the ixgbe driver has to support DCB as well as the other modes the
>> assumption I made here is that the number of channels in DCB modes refers
>> to the number of queues per traffic class, not the number of queues total.
> [...]
>
> When MSI-X is enabled, a 'channel' is an MSI-X vector and the associated
> queues, i.e. total number of channels reported should be the total
> number of MSI-X vectors in use.  (That was my intended interpretation,
> anyway.  It may be that there is too much variation in the way queues
> and interrupts are associated for these operations to be defined in a
> general way.)
>
> Ben.
>
The problem with the MSI-X interpretation is that ixgbe has that type of
control reversed.  We base everything on the number of queues, and then
from that you can end up determining the number of MSI-X vectors.  So
for example we could tell ixgbe via this interface to generate 64
queues, but if the system only has 8 CPUs we would end up with 8 MSI-X
vectors each with 8 queues.

Also as I mentioned in the case of DCB things get even more
complicated.  We need to have a symmetric number of queues per traffic
class based on the way we currently have DCB implemented.  The way I saw
it I could go two routes, the first being to force channels to be a
multiple of TCs which would have been complicated to deal with, or the
simpler approach I chose which was to apply 'channel' to be per TC. 
This way if DCB is then disabled we can easily revert to the standard
interpretation which would mean we would only have as many queues as the
channels specified.

Thanks,

Alex

^ permalink raw reply

* Re: [RFC PATCH 07/10] ixgbe: Add function for setting XPS queue mapping
From: Alexander Duyck @ 2012-07-11 21:12 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, davem, jeffrey.t.kirsher, edumazet, therbert,
	alexander.duyck
In-Reply-To: <1342030507.2613.46.camel@bwh-desktop.uk.solarflarecom.com>

On 07/11/2012 11:15 AM, Ben Hutchings wrote:
> On Fri, 2012-06-29 at 17:16 -0700, Alexander Duyck wrote:
>> This change adds support for ixgbe to configure the XPS queue mapping on
>> load.  The result of this change is that on open we will now be resetting
>> the number of Tx queues, and then setting the default configuration for XPS
>> based on if ATR is enabled or disabled.
> [...]
>
> I didn't see where you're resetting the number of TX queues; was that
> actually added in an earlier patch?
>
> It seems strange to be resetting XPS configuration on open; normally net
> device configuration persists as long as the device is registered.
> Maybe only do this if the number of TX queues has to change?
>
> Ben.
>
Actually I am working on top of a set of patches for ixgbe that haven't
been submitted upstream.  In one of those patches I moved our call to
netif_set_real_num_tx_queues into ixgbe_open.  The call is only one or
two lines above the call I make to ixgbe_set_xps_mapping.

I will see what I can do about resetting the settings only when we
change the number of queues.

Thanks,

Alex

^ permalink raw reply

* [ethtool PATCH] ethtool: Resolve use of uninitialized memory in rxclass_get_dev_info
From: Alexander Duyck @ 2012-07-11 21:16 UTC (permalink / raw)
  To: netdev; +Cc: bhutchings, jeffrey.t.kirsher, alexander.duyck

The ethtool function for getting the rule count was not zeroing out the
data field before passing it to the kernel.  As a result the value started
uninitialized and was incorrectly returning a result indicating that
devices supported setting new rule indexes.  In order to correct this I am
adding a one line fix that sets data to zero before we pass the command to
the kernel.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 rxclass.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/rxclass.c b/rxclass.c
index 4d49aa6..e1633a8 100644
--- a/rxclass.c
+++ b/rxclass.c
@@ -207,6 +207,7 @@ static int rxclass_get_dev_info(struct cmd_context *ctx, __u32 *count,
 	int err;

 	nfccmd.cmd = ETHTOOL_GRXCLSRLCNT;
+	nfccmd.data = 0;
 	err = send_ioctl(ctx, &nfccmd);
 	*count = nfccmd.rule_cnt;
 	if (driver_select)

^ permalink raw reply related

* Re: [PATCH net-next] r8169: Remove rtl_ocpdr_cond
From: Francois Romieu @ 2012-07-11 22:00 UTC (permalink / raw)
  To: Hayes Wang; +Cc: netdev, linux-kernel
In-Reply-To: <1342009916-1519-1-git-send-email-hayeswang@realtek.com>

Hayes Wang <hayeswang@realtek.com> :
> No waiting is needed for mac_ocp_{write / read}. And the bit 31 of
> OCPDR would not change, so rtl_udelay_loop_wait_high always return
> false. That is, the r8168_mac_ocp_read always retuen ~0.

(testing with davem's 48ee3569f31d91084dc694fef5517eb782428083)

It seemed right at first sight but testing without firmware produces
unexpected results. While it is not exactly fast if I ask it to emit
packets with ping -qf, it turns really slow when I use the -l preload
option: at most a few hundreds pps.

An old pcie 8168b in the same computer behaves correctly with the same
kernel.

I'll try again tomorrow evening.

-- 
Ueimor

^ permalink raw reply

* limit allocation size in ieee802154_sock_sendmsg
From: Dave Jones @ 2012-07-11 22:12 UTC (permalink / raw)
  To: netdev

Sasha reported this last November (https://lkml.org/lkml/2011/11/22/41)
and I keep walking into it while fuzz testing.

Length of the packet is passed in from userspace, and passed down to the
page allocator unchecked. If the allocation size is bigger than MAX_ORDER,
we get this backtrace from the page allocator.

 WARNING: at mm/page_alloc.c:2298 __alloc_pages_nodemask+0x46d/0xb00()
 Call Trace:
  [<ffffffff81064b2f>] warn_slowpath_common+0x7f/0xc0
  [<ffffffff81064b8a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8116105d>] __alloc_pages_nodemask+0x46d/0xb00
  [<ffffffff8155ed3b>] ? __alloc_skb+0x4b/0x230
  [<ffffffff810d43dd>] ? trace_hardirqs_on+0xd/0x10
  [<ffffffff816ace3f>] kmalloc_large_node+0x66/0xab
  [<ffffffff811ad9b5>] __kmalloc_node_track_caller+0x1b5/0x200
  [<ffffffff81559701>] ? sock_alloc_send_pskb+0x271/0x3e0
  [<ffffffff8155ed68>] __alloc_skb+0x78/0x230
  [<ffffffff81559701>] sock_alloc_send_pskb+0x271/0x3e0
  [<ffffffff8156d331>] ? dev_getfirstbyhwtype+0x161/0x250
  [<ffffffff8156d1d0>] ? rps_may_expire_flow+0x220/0x220
  [<ffffffff81559885>] sock_alloc_send_skb+0x15/0x20
  [<ffffffffa4a73347>] dgram_sendmsg+0xe7/0x340 [af_802154]
  [<ffffffffa4a72044>] ieee802154_sock_sendmsg+0x14/0x20 [af_802154]
  [<ffffffff81553787>] sock_sendmsg+0x117/0x130
  [<ffffffff810cdb7d>] ? trace_hardirqs_off+0xd/0x10
  [<ffffffff816b81c7>] ? _raw_spin_unlock_irqrestore+0x77/0x80
  [<ffffffff8134312b>] ? debug_object_activate+0x12b/0x1a0
  [<ffffffff816b81c7>] ? _raw_spin_unlock_irqrestore+0x77/0x80
  [<ffffffff81556b60>] sys_sendto+0x140/0x190
  [<ffffffff816c5014>] ? bad_to_user+0x5e/0x80a
  [<ffffffff8133c04e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff816c15e9>] system_call_fastpath+0x16/0x1b

Signed-off-by: Dave Jones <davej@redhat.com>

diff --git a/net/ieee802154/af_ieee802154.c b/net/ieee802154/af_ieee802154.c
index 40e606f..1da228a 100644
--- a/net/ieee802154/af_ieee802154.c
+++ b/net/ieee802154/af_ieee802154.c
@@ -108,6 +108,9 @@ static int ieee802154_sock_sendmsg(struct kiocb *iocb, struct socket *sock,
 {
 	struct sock *sk = sock->sk;
 
+	if (len > MAX_ORDER_NR_PAGES * PAGE_SIZE)
+		return -EINVAL;
+
 	return sk->sk_prot->sendmsg(iocb, sk, msg, len);
 }
 

^ permalink raw reply related

* Re: UDP ordering when using multiple rx queue
From: Chris Friesen @ 2012-07-11 22:50 UTC (permalink / raw)
  To: Jean-Michel Hautbois; +Cc: netdev
In-Reply-To: <CAL8zT=g5nHd6FprhxFc21XTgfXeaokt4QN1fS4ekcNVkuvZE9g@mail.gmail.com>

On 07/11/2012 01:53 AM, Jean-Michel Hautbois wrote:
> On receiver side, I need to get the packets ordered, or the
> application will consider the packets are late (and then, lost).
> (Yes, the application is badly written on that specific part, but it
> is not mine :)).

Not the first such app I've seen.

> Several tests lead to a simple conclusion : when the NIC has only one
> RX queue, everything is ok (like be2net for instance), but when it has
> more than one RX queue, then I can have "lost packets".
> This is the case for bnx2x or mlx4 for instance.

This depends on the hardware.  The Intel NICs for example (and others, 
I'm just most familiar with them) support multiple queues but they do 
hardware hashing of "flows" such that a given flow will be routed to a 
specific queue and thus stay in order.

> Here are my questions :
> - Is it possible to force a driver to use only one rx queue, even if
> it can use more without reloading the driver (and this is feasible
> only when a parameter exists for that !) ?

This depends on the driver, but generally I would expect this to be a 
module parameter.

> - Is it possible to "force" the network stack to give the packets on
> the correct order (I would say no, as this is not specified in the
> protocol) ?

No, it's up to the hardware/driver.


> My only bet is the first one (forcing one rx queue).
> The last and desperate solution would be rewriting the application,
> not easy to make it accepted.

Depending on the hardware/driver you may be able to enable flow hashing.

Chris

^ permalink raw reply

* pull request: sfc-next 2012-07-11
From: Ben Hutchings @ 2012-07-11 23:13 UTC (permalink / raw)
  To: David Miller; +Cc: linux-net-drivers, netdev

[-- Attachment #1: Type: text/plain, Size: 2546 bytes --]

The following changes since commit c56bf6fe785abbd83751a462f0c7067f7145b97a:

  ipv6: fix a bad cast in ip6_dst_lookup_tail() (2012-07-06 00:23:41 -0700)

are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next.git for-davem

(commit 9a46fccdd28f9939c10f81748e1430afbb95138d)

1. 128-bit MMIO writes for TX push, by Stuart Hodgson.  We previously
   tried to do this with write-combining, but that turned out to be
   unsafe.  Now we use SSE (on x86_64 only).
2. Fix potential badness when running a self-test with SR-IOV enabled.
3. Fix calculation of some interface statistics that could run backward.
4. Miscellaneous cleanup.

Ben.

Ben Hutchings (10):
      sfc: Work around bogus 'uninitialised variable' warning
      sfc: Use generic DMA API, not PCI-DMA API
      sfc: Remove dead write to tso_state::packet_space
      sfc: Stop changing header offsets on TX
      sfc: Use strlcpy() to copy ethtool stats names
      sfc: Use dev_kfree_skb() in efx_end_loopback()
      sfc: Explain why efx_mcdi_exit_assertion() ignores result of efx_mcdi_rpc()
      sfc: Disable VF queues during register self-test
      sfc: Fix interface statistics running backward
      sfc: Correct some comments on enum reset_type

Stuart Hodgson (1):
      sfc: Implement 128-bit writes for efx_writeo_page

 drivers/net/ethernet/sfc/bitfield.h    |    5 ++
 drivers/net/ethernet/sfc/efx.c         |   10 ++--
 drivers/net/ethernet/sfc/enum.h        |    8 ++--
 drivers/net/ethernet/sfc/ethtool.c     |    2 +-
 drivers/net/ethernet/sfc/falcon.c      |   35 +++++++++++--
 drivers/net/ethernet/sfc/falcon_xmac.c |   12 ++--
 drivers/net/ethernet/sfc/filter.c      |    2 +-
 drivers/net/ethernet/sfc/io.h          |   51 +++++++++++++++++
 drivers/net/ethernet/sfc/mcdi.c        |   11 +++-
 drivers/net/ethernet/sfc/net_driver.h  |    9 ++-
 drivers/net/ethernet/sfc/nic.c         |   11 ++---
 drivers/net/ethernet/sfc/nic.h         |   18 ++++++
 drivers/net/ethernet/sfc/rx.c          |   22 ++++----
 drivers/net/ethernet/sfc/selftest.c    |   64 ++++++----------------
 drivers/net/ethernet/sfc/siena.c       |   37 ++++++++++---
 drivers/net/ethernet/sfc/tx.c          |   93 ++++++++++++++------------------
 16 files changed, 237 insertions(+), 153 deletions(-)

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* [PATCH net-next 01/11] sfc: Implement 128-bit writes for efx_writeo_page
From: Ben Hutchings @ 2012-07-11 23:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

Add support for writing a TX descriptor to the NIC in one PCIe
transaction on x86_64 machines.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/bitfield.h |    5 +++
 drivers/net/ethernet/sfc/io.h       |   51 +++++++++++++++++++++++++++++++++++
 2 files changed, 56 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/sfc/bitfield.h b/drivers/net/ethernet/sfc/bitfield.h
index b26a954..8f8af3e 100644
--- a/drivers/net/ethernet/sfc/bitfield.h
+++ b/drivers/net/ethernet/sfc/bitfield.h
@@ -69,6 +69,10 @@
 	((width) == 32 ? ~((u32) 0) :		\
 	 (((((u32) 1) << (width))) - 1))
 
+typedef struct {
+        __le64 b, a;
+} efx_le_128;
+
 /* A doubleword (i.e. 4 byte) datatype - little-endian in HW */
 typedef union efx_dword {
 	__le32 u32[1];
@@ -83,6 +87,7 @@ typedef union efx_qword {
 
 /* An octword (eight-word, i.e. 16 byte) datatype - little-endian in HW */
 typedef union efx_oword {
+	efx_le_128 u128;
 	__le64 u64[2];
 	efx_qword_t qword[2];
 	__le32 u32[4];
diff --git a/drivers/net/ethernet/sfc/io.h b/drivers/net/ethernet/sfc/io.h
index 751d1ec..0868aef 100644
--- a/drivers/net/ethernet/sfc/io.h
+++ b/drivers/net/ethernet/sfc/io.h
@@ -57,10 +57,57 @@
  *   current state.
  */
 
+#ifdef CONFIG_X86_64
+#define EFX_USE_SSE_IO 1
+#endif
+
 #if BITS_PER_LONG == 64
 #define EFX_USE_QWORD_IO 1
 #endif
 
+#ifdef EFX_USE_SSE_IO
+
+static inline void _efx_writeo(struct efx_nic *efx, efx_le_128 value,
+			       unsigned int reg)
+{
+	unsigned long cr0;
+	efx_le_128 xmm_save[1];
+	void __iomem *addr = efx->membase + reg;
+
+	preempt_disable();
+
+	/* Save the xmm0 register to stack */
+	asm volatile(
+		"movq %%cr0,%0          ;\n\t"
+		"clts                   ;\n\t"
+		"movups %%xmm0,(%1)     ;\n\t"
+		: "=&r" (cr0)
+		: "r" (xmm_save)
+		: "memory");
+
+	/* First read the data into register xmm0
+	 * Then write this out to the address that was given
+	 */
+	asm volatile(
+		"movdqu %1,%%xmm0       ;\n\t"
+		"movdqa %%xmm0,%0       ;\n\t"
+		: "=m" (*(unsigned long __force *)addr)
+		: "m" (value)
+		: "memory");
+
+	/* Restore the xmm0 register */
+	asm volatile(
+		"sfence                 ;\n\t"
+		"movups (%1),%%xmm0     ;\n\t"
+		"movq   %0,%%cr0        ;\n\t"
+		:
+		: "r" (cr0), "r" (xmm_save)
+		: "memory");
+
+	preempt_enable();
+}
+#endif /* EFX_USE_SSE_IO */
+
 #ifdef EFX_USE_QWORD_IO
 static inline void _efx_writeq(struct efx_nic *efx, __le64 value,
 				  unsigned int reg)
@@ -235,6 +282,9 @@ static inline void _efx_writeo_page(struct efx_nic *efx, efx_oword_t *value,
 		   "writing register %x with " EFX_OWORD_FMT "\n", reg,
 		   EFX_OWORD_VAL(*value));
 
+#ifdef EFX_USE_SSE_IO
+	_efx_writeo(efx, value->u128, reg + 0);
+#else
 #ifdef EFX_USE_QWORD_IO
 	_efx_writeq(efx, value->u64[0], reg + 0);
 	_efx_writeq(efx, value->u64[1], reg + 8);
@@ -244,6 +294,7 @@ static inline void _efx_writeo_page(struct efx_nic *efx, efx_oword_t *value,
 	_efx_writed(efx, value->u32[2], reg + 8);
 	_efx_writed(efx, value->u32[3], reg + 12);
 #endif
+#endif
 }
 #define efx_writeo_page(efx, value, reg, page)				\
 	_efx_writeo_page(efx, value,					\
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 02/11] sfc: Work around bogus 'uninitialised variable' warning
From: Ben Hutchings @ 2012-07-11 23:15 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

With some gcc versions & optimisations, the compiler will warn that
'depth' in efx_filter_insert_filter() may be used without being
initialised, although this is not the case.

This is related to inlining of efx_filter_search(), which only has
one caller since commit 8db182f4a8a6e2dcb8b65905ea4af56210e65430
('sfc: Remove now-unused filter function').

Shut the compiler up by initialising it to 0.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/filter.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/sfc/filter.c b/drivers/net/ethernet/sfc/filter.c
index fea7f73..c3fd61f 100644
--- a/drivers/net/ethernet/sfc/filter.c
+++ b/drivers/net/ethernet/sfc/filter.c
@@ -662,7 +662,7 @@ s32 efx_filter_insert_filter(struct efx_nic *efx, struct efx_filter_spec *spec,
 	struct efx_filter_table *table = efx_filter_spec_table(state, spec);
 	struct efx_filter_spec *saved_spec;
 	efx_oword_t filter;
-	unsigned int filter_idx, depth;
+	unsigned int filter_idx, depth = 0;
 	u32 key;
 	int rc;
 
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 03/11] sfc: Use generic DMA API, not PCI-DMA API
From: Ben Hutchings @ 2012-07-11 23:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/efx.c        |   10 ++--
 drivers/net/ethernet/sfc/net_driver.h |    2 +-
 drivers/net/ethernet/sfc/nic.c        |    8 ++--
 drivers/net/ethernet/sfc/rx.c         |   22 ++++----
 drivers/net/ethernet/sfc/tx.c         |   83 ++++++++++++++++-----------------
 5 files changed, 62 insertions(+), 63 deletions(-)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index b95f2e1..70554a1 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1103,8 +1103,8 @@ static int efx_init_io(struct efx_nic *efx)
 	 * masks event though they reject 46 bit masks.
 	 */
 	while (dma_mask > 0x7fffffffUL) {
-		if (pci_dma_supported(pci_dev, dma_mask)) {
-			rc = pci_set_dma_mask(pci_dev, dma_mask);
+		if (dma_supported(&pci_dev->dev, dma_mask)) {
+			rc = dma_set_mask(&pci_dev->dev, dma_mask);
 			if (rc == 0)
 				break;
 		}
@@ -1117,10 +1117,10 @@ static int efx_init_io(struct efx_nic *efx)
 	}
 	netif_dbg(efx, probe, efx->net_dev,
 		  "using DMA mask %llx\n", (unsigned long long) dma_mask);
-	rc = pci_set_consistent_dma_mask(pci_dev, dma_mask);
+	rc = dma_set_coherent_mask(&pci_dev->dev, dma_mask);
 	if (rc) {
-		/* pci_set_consistent_dma_mask() is not *allowed* to
-		 * fail with a mask that pci_set_dma_mask() accepted,
+		/* dma_set_coherent_mask() is not *allowed* to
+		 * fail with a mask that dma_set_mask() accepted,
 		 * but just in case...
 		 */
 		netif_err(efx, probe, efx->net_dev,
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 0e57535..8a9f6d4 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -100,7 +100,7 @@ struct efx_special_buffer {
  * @len: Length of this fragment.
  *	This field is zero when the queue slot is empty.
  * @continuation: True if this fragment is not the end of a packet.
- * @unmap_single: True if pci_unmap_single should be used.
+ * @unmap_single: True if dma_unmap_single should be used.
  * @unmap_len: Length of this fragment to unmap
  */
 struct efx_tx_buffer {
diff --git a/drivers/net/ethernet/sfc/nic.c b/drivers/net/ethernet/sfc/nic.c
index 4a9a5be..287738d 100644
--- a/drivers/net/ethernet/sfc/nic.c
+++ b/drivers/net/ethernet/sfc/nic.c
@@ -308,8 +308,8 @@ efx_free_special_buffer(struct efx_nic *efx, struct efx_special_buffer *buffer)
 int efx_nic_alloc_buffer(struct efx_nic *efx, struct efx_buffer *buffer,
 			 unsigned int len)
 {
-	buffer->addr = pci_alloc_consistent(efx->pci_dev, len,
-					    &buffer->dma_addr);
+	buffer->addr = dma_alloc_coherent(&efx->pci_dev->dev, len,
+					  &buffer->dma_addr, GFP_ATOMIC);
 	if (!buffer->addr)
 		return -ENOMEM;
 	buffer->len = len;
@@ -320,8 +320,8 @@ int efx_nic_alloc_buffer(struct efx_nic *efx, struct efx_buffer *buffer,
 void efx_nic_free_buffer(struct efx_nic *efx, struct efx_buffer *buffer)
 {
 	if (buffer->addr) {
-		pci_free_consistent(efx->pci_dev, buffer->len,
-				    buffer->addr, buffer->dma_addr);
+		dma_free_coherent(&efx->pci_dev->dev, buffer->len,
+				  buffer->addr, buffer->dma_addr);
 		buffer->addr = NULL;
 	}
 }
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 243e91f..6d1c6cf 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -155,11 +155,11 @@ static int efx_init_rx_buffers_skb(struct efx_rx_queue *rx_queue)
 		rx_buf->len = skb_len - NET_IP_ALIGN;
 		rx_buf->flags = 0;
 
-		rx_buf->dma_addr = pci_map_single(efx->pci_dev,
+		rx_buf->dma_addr = dma_map_single(&efx->pci_dev->dev,
 						  skb->data, rx_buf->len,
-						  PCI_DMA_FROMDEVICE);
-		if (unlikely(pci_dma_mapping_error(efx->pci_dev,
-						   rx_buf->dma_addr))) {
+						  DMA_FROM_DEVICE);
+		if (unlikely(dma_mapping_error(&efx->pci_dev->dev,
+					       rx_buf->dma_addr))) {
 			dev_kfree_skb_any(skb);
 			rx_buf->u.skb = NULL;
 			return -EIO;
@@ -200,10 +200,10 @@ static int efx_init_rx_buffers_page(struct efx_rx_queue *rx_queue)
 				   efx->rx_buffer_order);
 		if (unlikely(page == NULL))
 			return -ENOMEM;
-		dma_addr = pci_map_page(efx->pci_dev, page, 0,
+		dma_addr = dma_map_page(&efx->pci_dev->dev, page, 0,
 					efx_rx_buf_size(efx),
-					PCI_DMA_FROMDEVICE);
-		if (unlikely(pci_dma_mapping_error(efx->pci_dev, dma_addr))) {
+					DMA_FROM_DEVICE);
+		if (unlikely(dma_mapping_error(&efx->pci_dev->dev, dma_addr))) {
 			__free_pages(page, efx->rx_buffer_order);
 			return -EIO;
 		}
@@ -247,14 +247,14 @@ static void efx_unmap_rx_buffer(struct efx_nic *efx,
 
 		state = page_address(rx_buf->u.page);
 		if (--state->refcnt == 0) {
-			pci_unmap_page(efx->pci_dev,
+			dma_unmap_page(&efx->pci_dev->dev,
 				       state->dma_addr,
 				       efx_rx_buf_size(efx),
-				       PCI_DMA_FROMDEVICE);
+				       DMA_FROM_DEVICE);
 		}
 	} else if (!(rx_buf->flags & EFX_RX_BUF_PAGE) && rx_buf->u.skb) {
-		pci_unmap_single(efx->pci_dev, rx_buf->dma_addr,
-				 rx_buf->len, PCI_DMA_FROMDEVICE);
+		dma_unmap_single(&efx->pci_dev->dev, rx_buf->dma_addr,
+				 rx_buf->len, DMA_FROM_DEVICE);
 	}
 }
 
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index 94d0365..18860f2 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -36,15 +36,15 @@ static void efx_dequeue_buffer(struct efx_tx_queue *tx_queue,
 			       unsigned int *bytes_compl)
 {
 	if (buffer->unmap_len) {
-		struct pci_dev *pci_dev = tx_queue->efx->pci_dev;
+		struct device *dma_dev = &tx_queue->efx->pci_dev->dev;
 		dma_addr_t unmap_addr = (buffer->dma_addr + buffer->len -
 					 buffer->unmap_len);
 		if (buffer->unmap_single)
-			pci_unmap_single(pci_dev, unmap_addr, buffer->unmap_len,
-					 PCI_DMA_TODEVICE);
+			dma_unmap_single(dma_dev, unmap_addr, buffer->unmap_len,
+					 DMA_TO_DEVICE);
 		else
-			pci_unmap_page(pci_dev, unmap_addr, buffer->unmap_len,
-				       PCI_DMA_TODEVICE);
+			dma_unmap_page(dma_dev, unmap_addr, buffer->unmap_len,
+				       DMA_TO_DEVICE);
 		buffer->unmap_len = 0;
 		buffer->unmap_single = false;
 	}
@@ -138,7 +138,7 @@ efx_max_tx_len(struct efx_nic *efx, dma_addr_t dma_addr)
 netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 {
 	struct efx_nic *efx = tx_queue->efx;
-	struct pci_dev *pci_dev = efx->pci_dev;
+	struct device *dma_dev = &efx->pci_dev->dev;
 	struct efx_tx_buffer *buffer;
 	skb_frag_t *fragment;
 	unsigned int len, unmap_len = 0, fill_level, insert_ptr;
@@ -167,17 +167,17 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 	fill_level = tx_queue->insert_count - tx_queue->old_read_count;
 	q_space = efx->txq_entries - 1 - fill_level;
 
-	/* Map for DMA.  Use pci_map_single rather than pci_map_page
+	/* Map for DMA.  Use dma_map_single rather than dma_map_page
 	 * since this is more efficient on machines with sparse
 	 * memory.
 	 */
 	unmap_single = true;
-	dma_addr = pci_map_single(pci_dev, skb->data, len, PCI_DMA_TODEVICE);
+	dma_addr = dma_map_single(dma_dev, skb->data, len, PCI_DMA_TODEVICE);
 
 	/* Process all fragments */
 	while (1) {
-		if (unlikely(pci_dma_mapping_error(pci_dev, dma_addr)))
-			goto pci_err;
+		if (unlikely(dma_mapping_error(dma_dev, dma_addr)))
+			goto dma_err;
 
 		/* Store fields for marking in the per-fragment final
 		 * descriptor */
@@ -246,7 +246,7 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 		i++;
 		/* Map for DMA */
 		unmap_single = false;
-		dma_addr = skb_frag_dma_map(&pci_dev->dev, fragment, 0, len,
+		dma_addr = skb_frag_dma_map(dma_dev, fragment, 0, len,
 					    DMA_TO_DEVICE);
 	}
 
@@ -261,7 +261,7 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 
 	return NETDEV_TX_OK;
 
- pci_err:
+ dma_err:
 	netif_err(efx, tx_err, efx->net_dev,
 		  " TX queue %d could not map skb with %d bytes %d "
 		  "fragments for DMA\n", tx_queue->queue, skb->len,
@@ -284,11 +284,11 @@ netdev_tx_t efx_enqueue_skb(struct efx_tx_queue *tx_queue, struct sk_buff *skb)
 	/* Free the fragment we were mid-way through pushing */
 	if (unmap_len) {
 		if (unmap_single)
-			pci_unmap_single(pci_dev, unmap_addr, unmap_len,
-					 PCI_DMA_TODEVICE);
+			dma_unmap_single(dma_dev, unmap_addr, unmap_len,
+					 DMA_TO_DEVICE);
 		else
-			pci_unmap_page(pci_dev, unmap_addr, unmap_len,
-				       PCI_DMA_TODEVICE);
+			dma_unmap_page(dma_dev, unmap_addr, unmap_len,
+				       DMA_TO_DEVICE);
 	}
 
 	return rc;
@@ -684,20 +684,19 @@ static __be16 efx_tso_check_protocol(struct sk_buff *skb)
  */
 static int efx_tsoh_block_alloc(struct efx_tx_queue *tx_queue)
 {
-
-	struct pci_dev *pci_dev = tx_queue->efx->pci_dev;
+	struct device *dma_dev = &tx_queue->efx->pci_dev->dev;
 	struct efx_tso_header *tsoh;
 	dma_addr_t dma_addr;
 	u8 *base_kva, *kva;
 
-	base_kva = pci_alloc_consistent(pci_dev, PAGE_SIZE, &dma_addr);
+	base_kva = dma_alloc_coherent(dma_dev, PAGE_SIZE, &dma_addr, GFP_ATOMIC);
 	if (base_kva == NULL) {
 		netif_err(tx_queue->efx, tx_err, tx_queue->efx->net_dev,
 			  "Unable to allocate page for TSO headers\n");
 		return -ENOMEM;
 	}
 
-	/* pci_alloc_consistent() allocates pages. */
+	/* dma_alloc_coherent() allocates pages. */
 	EFX_BUG_ON_PARANOID(dma_addr & (PAGE_SIZE - 1u));
 
 	for (kva = base_kva; kva < base_kva + PAGE_SIZE; kva += TSOH_STD_SIZE) {
@@ -714,7 +713,7 @@ static int efx_tsoh_block_alloc(struct efx_tx_queue *tx_queue)
 /* Free up a TSO header, and all others in the same page. */
 static void efx_tsoh_block_free(struct efx_tx_queue *tx_queue,
 				struct efx_tso_header *tsoh,
-				struct pci_dev *pci_dev)
+				struct device *dma_dev)
 {
 	struct efx_tso_header **p;
 	unsigned long base_kva;
@@ -731,7 +730,7 @@ static void efx_tsoh_block_free(struct efx_tx_queue *tx_queue,
 			p = &(*p)->next;
 	}
 
-	pci_free_consistent(pci_dev, PAGE_SIZE, (void *)base_kva, base_dma);
+	dma_free_coherent(dma_dev, PAGE_SIZE, (void *)base_kva, base_dma);
 }
 
 static struct efx_tso_header *
@@ -743,11 +742,11 @@ efx_tsoh_heap_alloc(struct efx_tx_queue *tx_queue, size_t header_len)
 	if (unlikely(!tsoh))
 		return NULL;
 
-	tsoh->dma_addr = pci_map_single(tx_queue->efx->pci_dev,
+	tsoh->dma_addr = dma_map_single(&tx_queue->efx->pci_dev->dev,
 					TSOH_BUFFER(tsoh), header_len,
-					PCI_DMA_TODEVICE);
-	if (unlikely(pci_dma_mapping_error(tx_queue->efx->pci_dev,
-					   tsoh->dma_addr))) {
+					DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(&tx_queue->efx->pci_dev->dev,
+				       tsoh->dma_addr))) {
 		kfree(tsoh);
 		return NULL;
 	}
@@ -759,9 +758,9 @@ efx_tsoh_heap_alloc(struct efx_tx_queue *tx_queue, size_t header_len)
 static void
 efx_tsoh_heap_free(struct efx_tx_queue *tx_queue, struct efx_tso_header *tsoh)
 {
-	pci_unmap_single(tx_queue->efx->pci_dev,
+	dma_unmap_single(&tx_queue->efx->pci_dev->dev,
 			 tsoh->dma_addr, tsoh->unmap_len,
-			 PCI_DMA_TODEVICE);
+			 DMA_TO_DEVICE);
 	kfree(tsoh);
 }
 
@@ -892,13 +891,13 @@ static void efx_enqueue_unwind(struct efx_tx_queue *tx_queue)
 			unmap_addr = (buffer->dma_addr + buffer->len -
 				      buffer->unmap_len);
 			if (buffer->unmap_single)
-				pci_unmap_single(tx_queue->efx->pci_dev,
+				dma_unmap_single(&tx_queue->efx->pci_dev->dev,
 						 unmap_addr, buffer->unmap_len,
-						 PCI_DMA_TODEVICE);
+						 DMA_TO_DEVICE);
 			else
-				pci_unmap_page(tx_queue->efx->pci_dev,
+				dma_unmap_page(&tx_queue->efx->pci_dev->dev,
 					       unmap_addr, buffer->unmap_len,
-					       PCI_DMA_TODEVICE);
+					       DMA_TO_DEVICE);
 			buffer->unmap_len = 0;
 		}
 		buffer->len = 0;
@@ -954,9 +953,9 @@ static int tso_get_head_fragment(struct tso_state *st, struct efx_nic *efx,
 	int hl = st->header_len;
 	int len = skb_headlen(skb) - hl;
 
-	st->unmap_addr = pci_map_single(efx->pci_dev, skb->data + hl,
-					len, PCI_DMA_TODEVICE);
-	if (likely(!pci_dma_mapping_error(efx->pci_dev, st->unmap_addr))) {
+	st->unmap_addr = dma_map_single(&efx->pci_dev->dev, skb->data + hl,
+					len, DMA_TO_DEVICE);
+	if (likely(!dma_mapping_error(&efx->pci_dev->dev, st->unmap_addr))) {
 		st->unmap_single = true;
 		st->unmap_len = len;
 		st->in_len = len;
@@ -1008,7 +1007,7 @@ static int tso_fill_packet_with_fragment(struct efx_tx_queue *tx_queue,
 		buffer->continuation = !end_of_packet;
 
 		if (st->in_len == 0) {
-			/* Transfer ownership of the pci mapping */
+			/* Transfer ownership of the DMA mapping */
 			buffer->unmap_len = st->unmap_len;
 			buffer->unmap_single = st->unmap_single;
 			st->unmap_len = 0;
@@ -1181,18 +1180,18 @@ static int efx_enqueue_skb_tso(struct efx_tx_queue *tx_queue,
 
  mem_err:
 	netif_err(efx, tx_err, efx->net_dev,
-		  "Out of memory for TSO headers, or PCI mapping error\n");
+		  "Out of memory for TSO headers, or DMA mapping error\n");
 	dev_kfree_skb_any(skb);
 
  unwind:
 	/* Free the DMA mapping we were in the process of writing out */
 	if (state.unmap_len) {
 		if (state.unmap_single)
-			pci_unmap_single(efx->pci_dev, state.unmap_addr,
-					 state.unmap_len, PCI_DMA_TODEVICE);
+			dma_unmap_single(&efx->pci_dev->dev, state.unmap_addr,
+					 state.unmap_len, DMA_TO_DEVICE);
 		else
-			pci_unmap_page(efx->pci_dev, state.unmap_addr,
-				       state.unmap_len, PCI_DMA_TODEVICE);
+			dma_unmap_page(&efx->pci_dev->dev, state.unmap_addr,
+				       state.unmap_len, DMA_TO_DEVICE);
 	}
 
 	efx_enqueue_unwind(tx_queue);
@@ -1216,5 +1215,5 @@ static void efx_fini_tso(struct efx_tx_queue *tx_queue)
 
 	while (tx_queue->tso_headers_free != NULL)
 		efx_tsoh_block_free(tx_queue, tx_queue->tso_headers_free,
-				    tx_queue->efx->pci_dev);
+				    &tx_queue->efx->pci_dev->dev);
 }
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 04/11] sfc: Remove dead write to tso_state::packet_space
From: Ben Hutchings @ 2012-07-11 23:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

tso_state::packet_space is always set in tso_start_packet(); the
value set in tso_start() is not used, and is also incorrect.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/tx.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index 18860f2..cfa5f6d 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -926,7 +926,6 @@ static void tso_start(struct tso_state *st, const struct sk_buff *skb)
 	EFX_BUG_ON_PARANOID(tcp_hdr(skb)->syn);
 	EFX_BUG_ON_PARANOID(tcp_hdr(skb)->rst);
 
-	st->packet_space = st->full_packet_size;
 	st->out_len = skb->len - st->header_len;
 	st->unmap_len = 0;
 	st->unmap_single = false;
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 05/11] sfc: Stop changing header offsets on TX
From: Ben Hutchings @ 2012-07-11 23:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

There is nothing in the VLAN driver or core VLAN support that
invalidates the TCP and IP header offsets.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/tx.c |    9 ---------
 1 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index cfa5f6d..9b225a7 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -651,17 +651,8 @@ static __be16 efx_tso_check_protocol(struct sk_buff *skb)
 	EFX_BUG_ON_PARANOID(((struct ethhdr *)skb->data)->h_proto !=
 			    protocol);
 	if (protocol == htons(ETH_P_8021Q)) {
-		/* Find the encapsulated protocol; reset network header
-		 * and transport header based on that. */
 		struct vlan_ethhdr *veh = (struct vlan_ethhdr *)skb->data;
 		protocol = veh->h_vlan_encapsulated_proto;
-		skb_set_network_header(skb, sizeof(*veh));
-		if (protocol == htons(ETH_P_IP))
-			skb_set_transport_header(skb, sizeof(*veh) +
-						 4 * ip_hdr(skb)->ihl);
-		else if (protocol == htons(ETH_P_IPV6))
-			skb_set_transport_header(skb, sizeof(*veh) +
-						 sizeof(struct ipv6hdr));
 	}
 
 	if (protocol == htons(ETH_P_IP)) {
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 06/11] sfc: Use strlcpy() to copy ethtool stats names
From: Ben Hutchings @ 2012-07-11 23:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

Fix CID 113703 in the Coverity report on Linux.

ethtool stats names are limited to 32 bytes including a null
terminator.  Use strlcpy() to ensure that we will always include the
null terminator even if a source string becomes longer than this.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/ethtool.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c
index 03ded36..10536f9 100644
--- a/drivers/net/ethernet/sfc/ethtool.c
+++ b/drivers/net/ethernet/sfc/ethtool.c
@@ -453,7 +453,7 @@ static void efx_ethtool_get_strings(struct net_device *net_dev,
 	switch (string_set) {
 	case ETH_SS_STATS:
 		for (i = 0; i < EFX_ETHTOOL_NUM_STATS; i++)
-			strncpy(ethtool_strings[i].name,
+			strlcpy(ethtool_strings[i].name,
 				efx_ethtool_stats[i].name,
 				sizeof(ethtool_strings[i].name));
 		break;
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 07/11] sfc: Use dev_kfree_skb() in efx_end_loopback()
From: Ben Hutchings @ 2012-07-11 23:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

Fix CID 102619 in the Coverity report on Linux.

efx_end_loopback() iterates over an array of skb pointers of which
some may be null (if efx_begin_loopback() failed).  It should not use
dev_kfree_skb_irq(), which requires non-null pointers.  In practice
this is safe because it does not run in interrupt context and
therefore always ends up calling dev_kfree_skb(), which does allow
null pointers.  But we should make that explicit.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/selftest.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/sfc/selftest.c b/drivers/net/ethernet/sfc/selftest.c
index de4c006..ccc428f 100644
--- a/drivers/net/ethernet/sfc/selftest.c
+++ b/drivers/net/ethernet/sfc/selftest.c
@@ -488,7 +488,7 @@ static int efx_end_loopback(struct efx_tx_queue *tx_queue,
 		skb = state->skbs[i];
 		if (skb && !skb_shared(skb))
 			++tx_done;
-		dev_kfree_skb_any(skb);
+		dev_kfree_skb(skb);
 	}
 
 	netif_tx_unlock_bh(efx->net_dev);
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 08/11] sfc: Explain why efx_mcdi_exit_assertion() ignores result of efx_mcdi_rpc()
From: Ben Hutchings @ 2012-07-11 23:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

Fix CID 113952 in Coverity report on Linux.

This is the one instance where we don't, and shouldn't, check the
return code from efx_mcdi_rpc().  It wasn't immediately obvious to me
why we didn't, so I think an explanation is in order.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/mcdi.c |   11 ++++++++---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/sfc/mcdi.c b/drivers/net/ethernet/sfc/mcdi.c
index 17b6463..fc5e7bb 100644
--- a/drivers/net/ethernet/sfc/mcdi.c
+++ b/drivers/net/ethernet/sfc/mcdi.c
@@ -1001,12 +1001,17 @@ static void efx_mcdi_exit_assertion(struct efx_nic *efx)
 {
 	u8 inbuf[MC_CMD_REBOOT_IN_LEN];
 
-	/* Atomically reboot the mcfw out of the assertion handler */
+	/* If the MC is running debug firmware, it might now be
+	 * waiting for a debugger to attach, but we just want it to
+	 * reboot.  We set a flag that makes the command a no-op if it
+	 * has already done so.  We don't know what return code to
+	 * expect (0 or -EIO), so ignore it.
+	 */
 	BUILD_BUG_ON(MC_CMD_REBOOT_OUT_LEN != 0);
 	MCDI_SET_DWORD(inbuf, REBOOT_IN_FLAGS,
 		       MC_CMD_REBOOT_FLAGS_AFTER_ASSERTION);
-	efx_mcdi_rpc(efx, MC_CMD_REBOOT, inbuf, MC_CMD_REBOOT_IN_LEN,
-		     NULL, 0, NULL);
+	(void) efx_mcdi_rpc(efx, MC_CMD_REBOOT, inbuf, MC_CMD_REBOOT_IN_LEN,
+			    NULL, 0, NULL);
 }
 
 int efx_mcdi_handle_assertion(struct efx_nic *efx)
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 09/11] sfc: Disable VF queues during register self-test
From: Ben Hutchings @ 2012-07-11 23:18 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

Currently VF queues and drivers may remain active during this test.
This could cause memory corruption or spurious test failures.
Therefore we reset the port/function before running these tests on
Siena.

On Falcon this doesn't work: we have to do some additional
initialisation before some blocks will work again.  So refactor the
reset/register-test sequence into an efx_nic_type method so
efx_selftest() doesn't have to consider such quirks.

In the process, fix another minor bug: Siena does not have an
'invisible' reset and the self-test currently fails to push the PHY
configuration after resetting.  Passing RESET_TYPE_ALL to
efx_reset_{down,up}() fixes this.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/falcon.c     |   35 ++++++++++++++++--
 drivers/net/ethernet/sfc/net_driver.h |    7 +++-
 drivers/net/ethernet/sfc/nic.c        |    3 --
 drivers/net/ethernet/sfc/selftest.c   |   62 ++++++++-------------------------
 drivers/net/ethernet/sfc/siena.c      |   29 +++++++++++++--
 5 files changed, 76 insertions(+), 60 deletions(-)

diff --git a/drivers/net/ethernet/sfc/falcon.c b/drivers/net/ethernet/sfc/falcon.c
index 3a1ca2b..12b573a 100644
--- a/drivers/net/ethernet/sfc/falcon.c
+++ b/drivers/net/ethernet/sfc/falcon.c
@@ -25,9 +25,12 @@
 #include "io.h"
 #include "phy.h"
 #include "workarounds.h"
+#include "selftest.h"
 
 /* Hardware control for SFC4000 (aka Falcon). */
 
+static int falcon_reset_hw(struct efx_nic *efx, enum reset_type method);
+
 static const unsigned int
 /* "Large" EEPROM device: Atmel AT25640 or similar
  * 8 KB, 16-bit address, 32 B write block */
@@ -1034,10 +1037,34 @@ static const struct efx_nic_register_test falcon_b0_register_tests[] = {
 	  EFX_OWORD32(0x0003FF0F, 0x00000000, 0x00000000, 0x00000000) },
 };
 
-static int falcon_b0_test_registers(struct efx_nic *efx)
+static int
+falcon_b0_test_chip(struct efx_nic *efx, struct efx_self_tests *tests)
 {
-	return efx_nic_test_registers(efx, falcon_b0_register_tests,
-				      ARRAY_SIZE(falcon_b0_register_tests));
+	enum reset_type reset_method = RESET_TYPE_INVISIBLE;
+	int rc, rc2;
+
+	mutex_lock(&efx->mac_lock);
+	if (efx->loopback_modes) {
+		/* We need the 312 clock from the PHY to test the XMAC
+		 * registers, so move into XGMII loopback if available */
+		if (efx->loopback_modes & (1 << LOOPBACK_XGMII))
+			efx->loopback_mode = LOOPBACK_XGMII;
+		else
+			efx->loopback_mode = __ffs(efx->loopback_modes);
+	}
+	__efx_reconfigure_port(efx);
+	mutex_unlock(&efx->mac_lock);
+
+	efx_reset_down(efx, reset_method);
+
+	tests->registers =
+		efx_nic_test_registers(efx, falcon_b0_register_tests,
+				       ARRAY_SIZE(falcon_b0_register_tests))
+		? -1 : 1;
+
+	rc = falcon_reset_hw(efx, reset_method);
+	rc2 = efx_reset_up(efx, reset_method, rc == 0);
+	return rc ? rc : rc2;
 }
 
 /**************************************************************************
@@ -1818,7 +1845,7 @@ const struct efx_nic_type falcon_b0_nic_type = {
 	.get_wol = falcon_get_wol,
 	.set_wol = falcon_set_wol,
 	.resume_wol = efx_port_dummy_op_void,
-	.test_registers = falcon_b0_test_registers,
+	.test_chip = falcon_b0_test_chip,
 	.test_nvram = falcon_test_nvram,
 
 	.revision = EFX_REV_FALCON_B0,
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 8a9f6d4..55be2fd 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -68,6 +68,8 @@
 #define EFX_TXQ_TYPES		4
 #define EFX_MAX_TX_QUEUES	(EFX_TXQ_TYPES * EFX_MAX_CHANNELS)
 
+struct efx_self_tests;
+
 /**
  * struct efx_special_buffer - An Efx special buffer
  * @addr: CPU base address of the buffer
@@ -901,7 +903,8 @@ static inline unsigned int efx_port_num(struct efx_nic *efx)
  * @get_wol: Get WoL configuration from driver state
  * @set_wol: Push WoL configuration to the NIC
  * @resume_wol: Synchronise WoL state between driver and MC (e.g. after resume)
- * @test_registers: Test read/write functionality of control registers
+ * @test_chip: Test registers.  Should use efx_nic_test_registers(), and is
+ *	expected to reset the NIC.
  * @test_nvram: Test validity of NVRAM contents
  * @revision: Hardware architecture revision
  * @mem_map_size: Memory BAR mapped size
@@ -946,7 +949,7 @@ struct efx_nic_type {
 	void (*get_wol)(struct efx_nic *efx, struct ethtool_wolinfo *wol);
 	int (*set_wol)(struct efx_nic *efx, u32 type);
 	void (*resume_wol)(struct efx_nic *efx);
-	int (*test_registers)(struct efx_nic *efx);
+	int (*test_chip)(struct efx_nic *efx, struct efx_self_tests *tests);
 	int (*test_nvram)(struct efx_nic *efx);
 
 	int revision;
diff --git a/drivers/net/ethernet/sfc/nic.c b/drivers/net/ethernet/sfc/nic.c
index 287738d..326d799 100644
--- a/drivers/net/ethernet/sfc/nic.c
+++ b/drivers/net/ethernet/sfc/nic.c
@@ -124,9 +124,6 @@ int efx_nic_test_registers(struct efx_nic *efx,
 	unsigned address = 0, i, j;
 	efx_oword_t mask, imask, original, reg, buf;
 
-	/* Falcon should be in loopback to isolate the XMAC from the PHY */
-	WARN_ON(!LOOPBACK_INTERNAL(efx));
-
 	for (i = 0; i < n_regs; ++i) {
 		address = regs[i].address;
 		mask = imask = regs[i].mask;
diff --git a/drivers/net/ethernet/sfc/selftest.c b/drivers/net/ethernet/sfc/selftest.c
index ccc428f..96068d1 100644
--- a/drivers/net/ethernet/sfc/selftest.c
+++ b/drivers/net/ethernet/sfc/selftest.c
@@ -120,19 +120,6 @@ static int efx_test_nvram(struct efx_nic *efx, struct efx_self_tests *tests)
 	return rc;
 }
 
-static int efx_test_chip(struct efx_nic *efx, struct efx_self_tests *tests)
-{
-	int rc = 0;
-
-	/* Test register access */
-	if (efx->type->test_registers) {
-		rc = efx->type->test_registers(efx);
-		tests->registers = rc ? -1 : 1;
-	}
-
-	return rc;
-}
-
 /**************************************************************************
  *
  * Interrupt and event queue testing
@@ -699,8 +686,7 @@ int efx_selftest(struct efx_nic *efx, struct efx_self_tests *tests,
 {
 	enum efx_loopback_mode loopback_mode = efx->loopback_mode;
 	int phy_mode = efx->phy_mode;
-	enum reset_type reset_method = RESET_TYPE_INVISIBLE;
-	int rc_test = 0, rc_reset = 0, rc;
+	int rc_test = 0, rc_reset, rc;
 
 	efx_selftest_async_cancel(efx);
 
@@ -737,44 +723,26 @@ int efx_selftest(struct efx_nic *efx, struct efx_self_tests *tests,
 	 */
 	netif_device_detach(efx->net_dev);
 
-	mutex_lock(&efx->mac_lock);
-	if (efx->loopback_modes) {
-		/* We need the 312 clock from the PHY to test the XMAC
-		 * registers, so move into XGMII loopback if available */
-		if (efx->loopback_modes & (1 << LOOPBACK_XGMII))
-			efx->loopback_mode = LOOPBACK_XGMII;
-		else
-			efx->loopback_mode = __ffs(efx->loopback_modes);
-	}
-
-	__efx_reconfigure_port(efx);
-	mutex_unlock(&efx->mac_lock);
-
-	/* free up all consumers of SRAM (including all the queues) */
-	efx_reset_down(efx, reset_method);
-
-	rc = efx_test_chip(efx, tests);
-	if (rc && !rc_test)
-		rc_test = rc;
+	if (efx->type->test_chip) {
+		rc_reset = efx->type->test_chip(efx, tests);
+		if (rc_reset) {
+			netif_err(efx, hw, efx->net_dev,
+				  "Unable to recover from chip test\n");
+			efx_schedule_reset(efx, RESET_TYPE_DISABLE);
+			return rc_reset;
+		}
 
-	/* reset the chip to recover from the register test */
-	rc_reset = efx->type->reset(efx, reset_method);
+		if ((tests->registers < 0) && !rc_test)
+			rc_test = -EIO;
+	}
 
 	/* Ensure that the phy is powered and out of loopback
 	 * for the bist and loopback tests */
+	mutex_lock(&efx->mac_lock);
 	efx->phy_mode &= ~PHY_MODE_LOW_POWER;
 	efx->loopback_mode = LOOPBACK_NONE;
-
-	rc = efx_reset_up(efx, reset_method, rc_reset == 0);
-	if (rc && !rc_reset)
-		rc_reset = rc;
-
-	if (rc_reset) {
-		netif_err(efx, drv, efx->net_dev,
-			  "Unable to recover from chip test\n");
-		efx_schedule_reset(efx, RESET_TYPE_DISABLE);
-		return rc_reset;
-	}
+	__efx_reconfigure_port(efx);
+	mutex_unlock(&efx->mac_lock);
 
 	rc = efx_test_phy(efx, tests, flags);
 	if (rc && !rc_test)
diff --git a/drivers/net/ethernet/sfc/siena.c b/drivers/net/ethernet/sfc/siena.c
index 9f8d7ce..2354886 100644
--- a/drivers/net/ethernet/sfc/siena.c
+++ b/drivers/net/ethernet/sfc/siena.c
@@ -25,10 +25,12 @@
 #include "workarounds.h"
 #include "mcdi.h"
 #include "mcdi_pcol.h"
+#include "selftest.h"
 
 /* Hardware control for SFC9000 family including SFL9021 (aka Siena). */
 
 static void siena_init_wol(struct efx_nic *efx);
+static int siena_reset_hw(struct efx_nic *efx, enum reset_type method);
 

 static void siena_push_irq_moderation(struct efx_channel *channel)
@@ -154,10 +156,29 @@ static const struct efx_nic_register_test siena_register_tests[] = {
 	  EFX_OWORD32(0xFFFFFFFF, 0xFFFFFFFF, 0x00000007, 0x00000000) },
 };
 
-static int siena_test_registers(struct efx_nic *efx)
+static int siena_test_chip(struct efx_nic *efx, struct efx_self_tests *tests)
 {
-	return efx_nic_test_registers(efx, siena_register_tests,
-				      ARRAY_SIZE(siena_register_tests));
+	enum reset_type reset_method = reset_method;
+	int rc, rc2;
+
+	efx_reset_down(efx, reset_method);
+
+	/* Reset the chip immediately so that it is completely
+	 * quiescent regardless of what any VF driver does.
+	 */
+	rc = siena_reset_hw(efx, reset_method);
+	if (rc)
+		goto out;
+
+	tests->registers =
+		efx_nic_test_registers(efx, siena_register_tests,
+				       ARRAY_SIZE(siena_register_tests))
+		? -1 : 1;
+
+	rc = siena_reset_hw(efx, reset_method);
+out:
+	rc2 = efx_reset_up(efx, reset_method, rc == 0);
+	return rc ? rc : rc2;
 }
 
 /**************************************************************************
@@ -649,7 +670,7 @@ const struct efx_nic_type siena_a0_nic_type = {
 	.get_wol = siena_get_wol,
 	.set_wol = siena_set_wol,
 	.resume_wol = siena_init_wol,
-	.test_registers = siena_test_registers,
+	.test_chip = siena_test_chip,
 	.test_nvram = efx_mcdi_nvram_test_all,
 
 	.revision = EFX_REV_SIENA_A0,
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 10/11] sfc: Fix interface statistics running backward
From: Ben Hutchings @ 2012-07-11 23:18 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

Some interface statistics are computed in such a way that they can
sometimes decrease (and even underflow).  Since the computed value
will never be greater than the true value, we fix this by only storing
the computed value when it increases.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/falcon_xmac.c |   12 ++++++------
 drivers/net/ethernet/sfc/nic.h         |   18 ++++++++++++++++++
 drivers/net/ethernet/sfc/siena.c       |    8 ++++----
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/sfc/falcon_xmac.c b/drivers/net/ethernet/sfc/falcon_xmac.c
index 6106ef1..8333865 100644
--- a/drivers/net/ethernet/sfc/falcon_xmac.c
+++ b/drivers/net/ethernet/sfc/falcon_xmac.c
@@ -341,12 +341,12 @@ void falcon_update_stats_xmac(struct efx_nic *efx)
 	FALCON_STAT(efx, XgTxIpSrcErrPkt, tx_ip_src_error);
 
 	/* Update derived statistics */
-	mac_stats->tx_good_bytes =
-		(mac_stats->tx_bytes - mac_stats->tx_bad_bytes -
-		 mac_stats->tx_control * 64);
-	mac_stats->rx_bad_bytes =
-		(mac_stats->rx_bytes - mac_stats->rx_good_bytes -
-		 mac_stats->rx_control * 64);
+	efx_update_diff_stat(&mac_stats->tx_good_bytes,
+			     mac_stats->tx_bytes - mac_stats->tx_bad_bytes -
+			     mac_stats->tx_control * 64);
+	efx_update_diff_stat(&mac_stats->rx_bad_bytes,
+			     mac_stats->rx_bytes - mac_stats->rx_good_bytes -
+			     mac_stats->rx_control * 64);
 }
 
 void falcon_poll_xmac(struct efx_nic *efx)
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index f48ccf6..bab5cd9 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -294,6 +294,24 @@ extern bool falcon_xmac_check_fault(struct efx_nic *efx);
 extern int falcon_reconfigure_xmac(struct efx_nic *efx);
 extern void falcon_update_stats_xmac(struct efx_nic *efx);
 
+/* Some statistics are computed as A - B where A and B each increase
+ * linearly with some hardware counter(s) and the counters are read
+ * asynchronously.  If the counters contributing to B are always read
+ * after those contributing to A, the computed value may be lower than
+ * the true value by some variable amount, and may decrease between
+ * subsequent computations.
+ *
+ * We should never allow statistics to decrease or to exceed the true
+ * value.  Since the computed value will never be greater than the
+ * true value, we can achieve this by only storing the computed value
+ * when it increases.
+ */
+static inline void efx_update_diff_stat(u64 *stat, u64 diff)
+{
+	if ((s64)(diff - *stat) > 0)
+		*stat = diff;
+}
+
 /* Interrupts and test events */
 extern int efx_nic_init_interrupt(struct efx_nic *efx);
 extern void efx_nic_enable_interrupts(struct efx_nic *efx);
diff --git a/drivers/net/ethernet/sfc/siena.c b/drivers/net/ethernet/sfc/siena.c
index 2354886..6bafd21 100644
--- a/drivers/net/ethernet/sfc/siena.c
+++ b/drivers/net/ethernet/sfc/siena.c
@@ -458,8 +458,8 @@ static int siena_try_update_nic_stats(struct efx_nic *efx)
 
 	MAC_STAT(tx_bytes, TX_BYTES);
 	MAC_STAT(tx_bad_bytes, TX_BAD_BYTES);
-	mac_stats->tx_good_bytes = (mac_stats->tx_bytes -
-				    mac_stats->tx_bad_bytes);
+	efx_update_diff_stat(&mac_stats->tx_good_bytes,
+			     mac_stats->tx_bytes - mac_stats->tx_bad_bytes);
 	MAC_STAT(tx_packets, TX_PKTS);
 	MAC_STAT(tx_bad, TX_BAD_FCS_PKTS);
 	MAC_STAT(tx_pause, TX_PAUSE_PKTS);
@@ -492,8 +492,8 @@ static int siena_try_update_nic_stats(struct efx_nic *efx)
 	MAC_STAT(tx_ip_src_error, TX_IP_SRC_ERR_PKTS);
 	MAC_STAT(rx_bytes, RX_BYTES);
 	MAC_STAT(rx_bad_bytes, RX_BAD_BYTES);
-	mac_stats->rx_good_bytes = (mac_stats->rx_bytes -
-				    mac_stats->rx_bad_bytes);
+	efx_update_diff_stat(&mac_stats->rx_good_bytes,
+			     mac_stats->rx_bytes - mac_stats->rx_bad_bytes);
 	MAC_STAT(rx_packets, RX_PKTS);
 	MAC_STAT(rx_good, RX_GOOD_PKTS);
 	MAC_STAT(rx_bad, RX_BAD_FCS_PKTS);
-- 
1.7.7.6



-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* [PATCH net-next 11/11] sfc: Correct some comments on enum reset_type
From: Ben Hutchings @ 2012-07-11 23:19 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-net-drivers
In-Reply-To: <1342048389.2613.59.camel@bwh-desktop.uk.solarflarecom.com>

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
 drivers/net/ethernet/sfc/enum.h |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/sfc/enum.h b/drivers/net/ethernet/sfc/enum.h
index d725a8f..182dbe2 100644
--- a/drivers/net/ethernet/sfc/enum.h
+++ b/drivers/net/ethernet/sfc/enum.h
@@ -136,10 +136,10 @@ enum efx_loopback_mode {
  *
  * Reset methods are numbered in order of increasing scope.
  *
- * @RESET_TYPE_INVISIBLE: don't reset the PHYs or interrupts
- * @RESET_TYPE_ALL: reset everything but PCI core blocks
- * @RESET_TYPE_WORLD: reset everything, save & restore PCI config
- * @RESET_TYPE_DISABLE: disable NIC
+ * @RESET_TYPE_INVISIBLE: Reset datapath and MAC (Falcon only)
+ * @RESET_TYPE_ALL: Reset datapath, MAC and PHY
+ * @RESET_TYPE_WORLD: Reset as much as possible
+ * @RESET_TYPE_DISABLE: Reset datapath, MAC and PHY; leave NIC disabled
  * @RESET_TYPE_TX_WATCHDOG: reset due to TX watchdog
  * @RESET_TYPE_INT_ERROR: reset due to internal error
  * @RESET_TYPE_RX_RECOVERY: reset to recover from RX datapath errors
-- 
1.7.7.6


-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply related

* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Eric Dumazet @ 2012-07-11 23:38 UTC (permalink / raw)
  To: Rick Jones; +Cc: nanditad, netdev, mattmathis, codel, ncardwell, David Miller
In-Reply-To: <4FFDC48D.2030606@hp.com>

On Wed, 2012-07-11 at 11:23 -0700, Rick Jones wrote:
> On 07/11/2012 08:11 AM, Eric Dumazet wrote:
> >
> >
> > Tests using a single TCP flow.
> >
> > Tests on 10Gbit links :
> >
> >
> > echo 16384 >/proc/sys/net/ipv4/tcp_limit_output_bytes
> > OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
> > tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
> > tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 16 tpci_snd_cwnd 79
> > tcpi_reordering 53 tcpi_total_retrans 0
> 
> I take it you hacked your local copy of netperf to emit those?  Or did I 
> leave some cruft behind in something I committed to the repository?
> 
Yep, its a netperf-2.5.0 with a one line change to output these TCP_INFO
bits

> What was the ultimate limiter on throughput?  I notice it didn't achieve 
> link-rate on either 10 GbE nor 1 GbE.
> 

My lab has one fast machine (source in this 10Gb test), and one slow
machine (Intel Q6600 quad core), both with ixgbe cards.

On Gigabit test, the receiver is a laptop.


>  > Thats the plan : limiting numer of bytes in Qdisc, not number of bytes
>  > in socket write queue.
> 
> So the SO_SNDBUF can still grow rather larger than necessary?  It is 
> just that TCP will be nice to the other flows by not dumping all of it 
> into the qdisc at once.  Latency seen by the application itself is then 
> unchanged since there will still be (potentially) as much stuff queued 
> in the SO_SNDBUF as before right?

Of course SO_SNDBUF can grow if autotuning is enabled.

I think there is a bit of misunderstanding about this patch and what it
does.

It only makes sure the amount of packets (from socket write queue) are
cloned in qdisc/device queue in a limited way, not "as much as allowed"

^ permalink raw reply

* Re: [PATCH v3 net-next] tcp: TCP Small Queues
From: Eric Dumazet @ 2012-07-11 23:47 UTC (permalink / raw)
  To: Andi Kleen; +Cc: nanditad, netdev, codel, mattmathis, ncardwell, David Miller
In-Reply-To: <m2r4sirvmg.fsf@firstfloor.org>

On Wed, 2012-07-11 at 11:38 -0700, Andi Kleen wrote:
> Eric Dumazet <eric.dumazet@gmail.com> writes:
> > +
> > +		if (!sock_owned_by_user(sk)) {
> > +			if ((1 << sk->sk_state) &
> > +			    (TCPF_ESTABLISHED | TCPF_FIN_WAIT1 |
> > +			     TCPF_CLOSING | TCPF_CLOSE_WAIT))
> > +				tcp_write_xmit(sk,
> > +					       tcp_current_mss(sk),
> > +					       0, 0,
> > +					       GFP_ATOMIC);
> 
> Did you have any problems with the GFP_ATOMIC allocation failing here?
> I think you move some skb allocs from process context to ATOMIC, right?
> 
> It relies on the VM somehow catching up in another context.
> 
> May be interesting to try out under very high memory pressure.

AFAIK this patch has no effect on memory allocations, because :

At write() time, we use GFP_KERNEL allocations to build the socket write
queue.

And each skb is "precloned", that is allocated from skbuff_fclone_cache.

Then, when we select some skbs from write for transmission, we clone
them. And this cloning doesnt allocate memory, because we use the fast
cloning.

There are some pathological cases were we do allocate memory, but TSQ
should lower probabilities :

1) when we skb_clone(skb), we might need an allocation if the previous
clone is still in flight. This is a retransmit case, and TSQ only takes
care of transmits (of previously unsent skbs : their fast clone is
available)

2) When we need to split the skb because not enough cwnd is available.
   As TSQ tends to limit number of skbs in qdisc, we lower the
probabilities of such splits. Anyway a TSO split only need an sk_buff,
this is not a lot of memory.

3) On bulk sends, most transmits are triggered by incoming ACKS, in
softirq context, so already using GFP_ATOMIC "allocations", if
ever needed (see point 1)

Thanks

^ permalink raw reply

* Re: [RFC PATCH v2] tcp: TCP Small Queues
From: Eric Dumazet @ 2012-07-11 23:49 UTC (permalink / raw)
  To: Rick Jones; +Cc: nanditad, netdev, mattmathis, codel, ncardwell, David Miller
In-Reply-To: <4FFDC985.6050805@hp.com>

On Wed, 2012-07-11 at 11:44 -0700, Rick Jones wrote:
> On 07/11/2012 08:11 AM, Eric Dumazet wrote:
> >
> > Some bench results about the choice of 128KB being the default value:
> 
> What were the starting/baseline figures?
> 
> >
> > Tests using a single TCP flow.
> >
> > Tests on 10Gbit links :
> >
> >
> > echo 16384 >/proc/sys/net/ipv4/tcp_limit_output_bytes
> > OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.99.2 (192.168.99.2) port 0 AF_INET
> > tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
> > tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 16 tpci_snd_cwnd 79
> > tcpi_reordering 53 tcpi_total_retrans 0
> > Local       Local       Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
> > Send Socket Send Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
> > Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
> > Final       Final                                             %     Method %      Method
> > 392360      392360      16384  20.00   1389.53    10^6bits/s  0.52  S      4.30   S      0.737   1.014   usec/KB
> 
> By the way, that double reporting of the local socket send size is fixed in:
> 
> ------------------------------------------------------------------------
> r516 | raj | 2012-01-05 15:48:52 -0800 (Thu, 05 Jan 2012) | 1 line
> 
> report the rsr_size_end in an omni stream test rather than a copy of the 
> lss_size_end
> 
> of netperf and later.  Also, any idea why the local socket send size got 
> so much larger with 1GbE than 10 GbE at that setting of 
> tcp_limit_output_bytes?

The 10Gb receiver is a net-next kernel, but the 1Gb receiver is a 2.6.38
ubuntu kernel. They probably have very different TCP behavior.

^ permalink raw reply

* Re: [PATCH v3 net-next] tcp: TCP Small Queues
From: Eric Dumazet @ 2012-07-12  0:00 UTC (permalink / raw)
  To: Nandita Dukkipati; +Cc: netdev, codel, ncardwell, David Miller, mattmathis
In-Reply-To: <CAB_+Fg496FgQZkJE8uf_4RX_wW+fz1p5Rq8rY=BuA=c6KPakpA@mail.gmail.com>

Note : netdev mailing list doesnt accept HTML ;)

On Wed, 2012-07-11 at 12:29 -0700, Nandita Dukkipati wrote:
> I have a couple of high level questions on the two solutions for
> maintain small queues at qdisc: 1) using codel's feedback versus 2)
> the approach in TSQ to explicitly limit bytes per-flow
> to tcp_limit_output_bytes.
> 
> 
> 1. multiple flows case: codel feedback (be it drop/ECN or direct
> feedback like in your patch with fq_codel: report congestion
> notification at enqueue time) seems to work in maintaining small
> queues regardless of the number of connections. TSQ probably won't
> achieve this goal when number of flows is large. Is this correct?


Problem with codel/fq_codel is that the congestion is delayed.

And my patch (signaling congestion to local tcp senders) raised several
issues.

I had then TSQ idea to have a more general mechanism (not depending on a
particular qdisc setup)

TCP Small Queue is effective even with a large number of flows because
with standard pfifo_fast, we can without TCP Small Queue store 1000
packets in Qdisc, each being 64KB. Thats 64 MBytes...

With TCP Small Queue, you can reach this level of occupancy using 500
flows (2 packets per flow)

Without TCP Small Queue, you can reach this level using 16 flows.

(This of course assumes TSO is on, but modern NIC are TSO enabled)

> 2. cwnd adjustment: codel feedback directly and explicitly controls
> snd_cwnd which in turn will also adjust the send buffer size to
> appropriate value. That's a nice property, which TSQ probably won't
> have because it doesn't explicitly adjust snd_cwnd. 
> 
> Considering these two points, why TSQ over the Codel feedback
> approach?

I dont think they compete. They are in fact complementary.

If you use codel/fq_codel + TSQ, you have both per flow limitation in
qdisc (TSQ) + sojourn time aware and multi flow aware feedback.

^ permalink raw reply

* [PATCH 1/2] tcp: Fix out of bounds access to tcpm_vals
From: Alexander Duyck @ 2012-07-12  0:18 UTC (permalink / raw)
  To: netdev; +Cc: davem, jeffrey.t.kirsher, alexander.duyck, Alexander Duyck

The recent patch "tcp: Maintain dynamic metrics in local cache." introduced
an out of bounds access due to what appears to be a typo.   I believe this
change should resolve the issue by replacing the access to RTAX_CWND with
TCP_METRIC_CWND.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

 net/ipv4/tcp_metrics.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_metrics.c b/net/ipv4/tcp_metrics.c
index 1fd83d3..5a38a2d 100644
--- a/net/ipv4/tcp_metrics.c
+++ b/net/ipv4/tcp_metrics.c
@@ -412,7 +412,7 @@ void tcp_update_metrics(struct sock *sk)
 				       max(tp->snd_cwnd >> 1, tp->snd_ssthresh));
 		if (!tcp_metric_locked(tm, TCP_METRIC_CWND)) {
 			val = tcp_metric_get(tm, TCP_METRIC_CWND);
-			tcp_metric_set(tm, RTAX_CWND, (val + tp->snd_cwnd) >> 1);
+			tcp_metric_set(tm, TCP_METRIC_CWND, (val + tp->snd_cwnd) >> 1);
 		}
 	} else {
 		/* Else slow start did not finish, cwnd is non-sense,

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox