Netdev List
 help / color / mirror / Atom feed
* [PATCH RESENT net-next] net: remove function sk_reset_txq()
From: ZHAO Gang @ 2013-10-22  8:23 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

What sk_reset_txq() does is just calls function sk_tx_queue_reset(),
and sk_reset_txq() is used only in sock.h, by dst_negative_advice().
Let dst_negative_advice() calls sk_tx_queue_reset() directly so we
can remove unneeded sk_reset_txq().

Signed-off-by: ZHAO Gang <gamerh2o@gmail.com>
---
Hope this time I don't mess it up. Sorry for the inconvenience.
---
 include/net/sock.h | 4 +---
 net/core/sock.c    | 6 ------
 2 files changed, 1 insertion(+), 9 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 86bb066..c93542f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1746,8 +1746,6 @@ sk_dst_get(struct sock *sk)
 	return dst;
 }
 
-void sk_reset_txq(struct sock *sk);
-
 static inline void dst_negative_advice(struct sock *sk)
 {
 	struct dst_entry *ndst, *dst = __sk_dst_get(sk);
@@ -1757,7 +1755,7 @@ static inline void dst_negative_advice(struct sock *sk)
 
 		if (ndst != dst) {
 			rcu_assign_pointer(sk->sk_dst_cache, ndst);
-			sk_reset_txq(sk);
+			sk_tx_queue_clear(sk);
 		}
 	}
 }
diff --git a/net/core/sock.c b/net/core/sock.c
index 440afdc..ab20ed9 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -475,12 +475,6 @@ discard_and_relse:
 }
 EXPORT_SYMBOL(sk_receive_skb);
 
-void sk_reset_txq(struct sock *sk)
-{
-	sk_tx_queue_clear(sk);
-}
-EXPORT_SYMBOL(sk_reset_txq);
-
 struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie)
 {
 	struct dst_entry *dst = __sk_dst_get(sk);
-- 
1.8.3.1

^ permalink raw reply related

* Re: [virtio-net] BUG: sleeping function called from invalid context at kernel/mutex.c:616
From: Jason Wang @ 2013-10-22  8:35 UTC (permalink / raw)
  To: Fengguang Wu; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <20131020023418.GA6737@localhost>

[-- Attachment #1: Type: text/plain, Size: 3979 bytes --]

On 10/20/2013 10:34 AM, Fengguang Wu wrote:
> Greetings,
>
> I got the below dmesg and the first bad commit is
>
> commit 3ab098df35f8b98b6553edc2e40234af512ba877
> Author: Jason Wang <jasowang@redhat.com>
> Date:   Tue Oct 15 11:18:58 2013 +0800
>
>     virtio-net: don't respond to cpu hotplug notifier if we're not ready
>     
>     We're trying to re-configure the affinity unconditionally in cpu hotplug
>     callback. This may lead the issue during resuming from s3/s4 since
>     
>     - virt queues haven't been allocated at that time.
>     - it's unnecessary since thaw method will re-configure the affinity.
>     
>     Fix this issue by checking the config_enable and do nothing is we're not ready.
>     
>     The bug were introduced by commit 8de4b2f3ae90c8fc0f17eeaab87d5a951b66ee17
>     (virtio-net: reset virtqueue affinity when doing cpu hotplug).
>     
>     Cc: Rusty Russell <rusty@rustcorp.com.au>
>     Cc: Michael S. Tsirkin <mst@redhat.com>
>     Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>     Acked-by: Michael S. Tsirkin <mst@redhat.com>
>     Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
>     Signed-off-by: Jason Wang <jasowang@redhat.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> [  622.944441] CPU0 attaching NULL sched-domain.
> [  622.944446] CPU1 attaching NULL sched-domain.
> [  622.944485] CPU0 attaching NULL sched-domain.
> [  622.950795] BUG: sleeping function called from invalid context at kernel/mutex.c:616
> [  622.950796] in_atomic(): 1, irqs_disabled(): 1, pid: 10, name: migration/1
> [  622.950796] no locks held by migration/1/10.
> [  622.950798] CPU: 1 PID: 10 Comm: migration/1 Not tainted 3.12.0-rc5-wl-01249-gb91e82d #317
> [  622.950799] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  622.950802]  0000000000000000 ffff88001d42dba0 ffffffff81a32f22 ffff88001bfb9c70
> [  622.950803]  ffff88001d42dbb0 ffffffff810edb02 ffff88001d42dc38 ffffffff81a396ed
> [  622.950805]  0000000000000046 ffff88001d42dbe8 ffffffff810e861d 0000000000000000
> [  622.950805] Call Trace:
> [  622.950810]  [<ffffffff81a32f22>] dump_stack+0x54/0x74
> [  622.950815]  [<ffffffff810edb02>] __might_sleep+0x112/0x114
> [  622.950817]  [<ffffffff81a396ed>] mutex_lock_nested+0x3c/0x3c6
> [  622.950818]  [<ffffffff810e861d>] ? up+0x39/0x3e
> [  622.950821]  [<ffffffff8153ea7c>] ? acpi_os_signal_semaphore+0x21/0x2d
> [  622.950824]  [<ffffffff81565ed1>] ? acpi_ut_release_mutex+0x5e/0x62
> [  622.950828]  [<ffffffff816d04ec>] virtnet_cpu_callback+0x33/0x87
> [  622.950830]  [<ffffffff81a42576>] notifier_call_chain+0x3c/0x5e
> [  622.950832]  [<ffffffff810e86a8>] __raw_notifier_call_chain+0xe/0x10
> [  622.950835]  [<ffffffff810c5556>] __cpu_notify+0x20/0x37
> [  622.950836]  [<ffffffff810c5580>] cpu_notify+0x13/0x15
> [  622.950838]  [<ffffffff81a237cd>] take_cpu_down+0x27/0x3a
> [  622.950841]  [<ffffffff81136289>] stop_machine_cpu_stop+0x93/0xf1
> [  622.950842]  [<ffffffff81136167>] cpu_stopper_thread+0xa0/0x12f
> [  622.950844]  [<ffffffff811361f6>] ? cpu_stopper_thread+0x12f/0x12f
> [  622.950847]  [<ffffffff81119710>] ? lock_release_holdtime.part.7+0xa3/0xa8
> [  622.950848]  [<ffffffff81135e4b>] ? cpu_stop_should_run+0x3f/0x47
> [  622.950850]  [<ffffffff810ea9b0>] smpboot_thread_fn+0x1c5/0x1e3
> [  622.950852]  [<ffffffff810ea7eb>] ? lg_global_unlock+0x67/0x67
> [  622.950854]  [<ffffffff810e36b7>] kthread+0xd8/0xe0
> [  622.950857]  [<ffffffff81a3bfad>] ? wait_for_common+0x12f/0x164
> [  622.950859]  [<ffffffff810e35df>] ? kthread_create_on_node+0x124/0x124
> [  622.950861]  [<ffffffff81a45ffc>] ret_from_fork+0x7c/0xb0
> [  622.950862]  [<ffffffff810e35df>] ? kthread_create_on_node+0x124/0x124
> [  622.950876] smpboot: CPU 1 is now offline
> [  623.194556] SMP alternatives: lockdep: fixing up alternatives
> [  623.194559] smpboot: Booting Node 0 Processor 1 APIC 0x1
 
Thanks for the testing Fengguang, could you please try the attached
patch to see if it works?

[-- Attachment #2: 0001-virtio-net-fix.patch --]
[-- Type: text/x-patch, Size: 1464 bytes --]

>From 01e6c3f71c202aa02e4feda169e7cc9fb24193f5 Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Mon, 21 Oct 2013 20:39:09 +0800
Subject: [PATCH] virtio-net: fix

---
 drivers/net/virtio_net.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9fbdfcd..bbc9cb8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1118,11 +1118,6 @@ static int virtnet_cpu_callback(struct notifier_block *nfb,
 {
 	struct virtnet_info *vi = container_of(nfb, struct virtnet_info, nb);
 
-	mutex_lock(&vi->config_lock);
-
-	if (!vi->config_enable)
-		goto done;
-
 	switch(action & ~CPU_TASKS_FROZEN) {
 	case CPU_ONLINE:
 	case CPU_DOWN_FAILED:
@@ -1136,8 +1131,6 @@ static int virtnet_cpu_callback(struct notifier_block *nfb,
 		break;
 	}
 
-done:
-	mutex_unlock(&vi->config_lock);
 	return NOTIFY_OK;
 }
 
@@ -1699,6 +1692,8 @@ static int virtnet_freeze(struct virtio_device *vdev)
 	struct virtnet_info *vi = vdev->priv;
 	int i;
 
+	unregister_hotcpu_notifier(&vi->nb);
+
 	/* Prevent config work handler from accessing the device */
 	mutex_lock(&vi->config_lock);
 	vi->config_enable = false;
@@ -1747,6 +1742,10 @@ static int virtnet_restore(struct virtio_device *vdev)
 	virtnet_set_queues(vi, vi->curr_queue_pairs);
 	rtnl_unlock();
 
+	err = register_hotcpu_notifier(&vi->nb);
+	if (err)
+		return err;
+
 	return 0;
 }
 #endif
-- 
1.8.1.2


[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related

* [PATCH RESEND] packet: Deliver VLAN TPID to userspace
From: Atzm Watanabe @ 2013-10-22  8:39 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, Ben Hutchings

After the 802.1AD support, userspace packet receivers
(packet dumper, software switch, and the like) need how to know
VLAN TPID in order to reassemble original tagged frame.

Signed-off-by: Atzm Watanabe <atzm@stratosphere.co.jp>
---
struct tpacket_hdr_variant1 looks like that is allowed to grow,
as the length combined with struct tpacket3_hdr is explicit at
run-time.

 include/uapi/linux/if_packet.h | 5 +++--
 net/packet/af_packet.c         | 8 ++++++--
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/if_packet.h b/include/uapi/linux/if_packet.h
index dbf0666..6e36e0a 100644
--- a/include/uapi/linux/if_packet.h
+++ b/include/uapi/linux/if_packet.h
@@ -83,7 +83,7 @@ struct tpacket_auxdata {
 	__u16		tp_mac;
 	__u16		tp_net;
 	__u16		tp_vlan_tci;
-	__u16		tp_padding;
+	__u16		tp_vlan_tpid;
 };
 
 /* Rx ring - header status */
@@ -132,12 +132,13 @@ struct tpacket2_hdr {
 	__u32		tp_sec;
 	__u32		tp_nsec;
 	__u16		tp_vlan_tci;
-	__u16		tp_padding;
+	__u16		tp_vlan_tpid;
 };
 
 struct tpacket_hdr_variant1 {
 	__u32	tp_rxhash;
 	__u32	tp_vlan_tci;
+	__u32	tp_vlan_tpid;
 };
 
 struct tpacket3_hdr {
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 2e8286b..fbcc882 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -895,9 +895,11 @@ static void prb_fill_vlan_info(struct tpacket_kbdq_core *pkc,
 {
 	if (vlan_tx_tag_present(pkc->skb)) {
 		ppd->hv1.tp_vlan_tci = vlan_tx_tag_get(pkc->skb);
+		ppd->hv1.tp_vlan_tpid = (__force __u32)ntohs(pkc->skb->vlan_proto);
 		ppd->tp_status = TP_STATUS_VLAN_VALID;
 	} else {
 		ppd->hv1.tp_vlan_tci = 0;
+		ppd->hv1.tp_vlan_tpid = 0;
 		ppd->tp_status = TP_STATUS_AVAILABLE;
 	}
 }
@@ -1836,11 +1838,12 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
 		h.h2->tp_nsec = ts.tv_nsec;
 		if (vlan_tx_tag_present(skb)) {
 			h.h2->tp_vlan_tci = vlan_tx_tag_get(skb);
+			h.h2->tp_vlan_tpid = ntohs(skb->vlan_proto);
 			status |= TP_STATUS_VLAN_VALID;
 		} else {
 			h.h2->tp_vlan_tci = 0;
+			h.h2->tp_vlan_tpid = 0;
 		}
-		h.h2->tp_padding = 0;
 		hdrlen = sizeof(*h.h2);
 		break;
 	case TPACKET_V3:
@@ -2788,11 +2791,12 @@ static int packet_recvmsg(struct kiocb *iocb, struct socket *sock,
 		aux.tp_net = skb_network_offset(skb);
 		if (vlan_tx_tag_present(skb)) {
 			aux.tp_vlan_tci = vlan_tx_tag_get(skb);
+			aux.tp_vlan_tpid = ntohs(skb->vlan_proto);
 			aux.tp_status |= TP_STATUS_VLAN_VALID;
 		} else {
 			aux.tp_vlan_tci = 0;
+			aux.tp_vlan_tpid = 0;
 		}
-		aux.tp_padding = 0;
 		put_cmsg(msg, SOL_PACKET, PACKET_AUXDATA, sizeof(aux), &aux);
 	}
 
-- 
1.8.1.5

^ permalink raw reply related

* [PATCH net] netpoll: fix rx_hook() interface by passing the skb
From: Antonio Quartulli @ 2013-10-22  8:48 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Antonio Quartulli
In-Reply-To: <20131022.025038.1046903740187748879.davem@davemloft.net>

Right now skb->data is passed to rx_hook() even if the skb
has not been linearised and without giving rx_hook() a way
to linearise it.

Change the rx_hook() interface and make it accept the skb
as argument. In this way users implementing rx_hook() can
perform all the needed operations to properly (and safely)
access the skb data.

Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
---
 include/linux/netpoll.h |  2 +-
 net/core/netpoll.c      | 10 ++++------
 2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/include/linux/netpoll.h b/include/linux/netpoll.h
index f3c7c24..5352160 100644
--- a/include/linux/netpoll.h
+++ b/include/linux/netpoll.h
@@ -24,7 +24,7 @@ struct netpoll {
 	struct net_device *dev;
 	char dev_name[IFNAMSIZ];
 	const char *name;
-	void (*rx_hook)(struct netpoll *, int, char *, int);
+	void (*rx_hook)(struct netpoll *np, struct sk_buff *skb, int offset);
 
 	union inet_addr local_ip, remote_ip;
 	bool ipv6;
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index fc75c9e..b415437 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -834,9 +834,8 @@ int __netpoll_rx(struct sk_buff *skb, struct netpoll_info *npinfo)
 			if (np->local_port && np->local_port != ntohs(uh->dest))
 				continue;
 
-			np->rx_hook(np, ntohs(uh->source),
-				       (char *)(uh+1),
-				       ulen - sizeof(struct udphdr));
+			np->rx_hook(np, skb,
+				    (unsigned char *)(uh + 1) - skb->data);
 			hits++;
 		}
 	} else {
@@ -872,9 +871,8 @@ int __netpoll_rx(struct sk_buff *skb, struct netpoll_info *npinfo)
 			if (np->local_port && np->local_port != ntohs(uh->dest))
 				continue;
 
-			np->rx_hook(np, ntohs(uh->source),
-				       (char *)(uh+1),
-				       ulen - sizeof(struct udphdr));
+			np->rx_hook(np, skb,
+				    (unsigned char *)(uh + 1) - skb->data);
 			hits++;
 		}
 #endif
-- 
1.8.4

^ permalink raw reply related

* RE: [PATCH net] netpoll: fix rx_hook() interface by passing the skb
From: David Laight @ 2013-10-22  9:09 UTC (permalink / raw)
  To: Antonio Quartulli, David S. Miller; +Cc: netdev
In-Reply-To: <1382431715-3128-1-git-send-email-antonio@meshcoding.com>

> Subject: [PATCH net] netpoll: fix rx_hook() interface by passing the skb
> 
> Right now skb->data is passed to rx_hook() even if the skb
> has not been linearised and without giving rx_hook() a way
> to linearise it.
> 
> Change the rx_hook() interface and make it accept the skb
> as argument. In this way users implementing rx_hook() can
> perform all the needed operations to properly (and safely)
> access the skb data.
...
> -	void (*rx_hook)(struct netpoll *, int, char *, int);
> +	void (*rx_hook)(struct netpoll *np, struct sk_buff *skb, int offset);

You can't do that change without changing the way that hooks are registered
so that any existing modules will fail to register their hooks.

	David

^ permalink raw reply

* Re: [PATCH net] netpoll: fix rx_hook() interface by passing the skb
From: Antonio Quartulli @ 2013-10-22 10:11 UTC (permalink / raw)
  To: David Laight; +Cc: David S. Miller, netdev
In-Reply-To: <AE90C24D6B3A694183C094C60CF0A2F6026B739B@saturn3.aculab.com>

[-- Attachment #1: Type: text/plain, Size: 921 bytes --]

On Tue, Oct 22, 2013 at 10:09:00AM +0100, David Laight wrote:
> > Subject: [PATCH net] netpoll: fix rx_hook() interface by passing the skb
> > 
> > Right now skb->data is passed to rx_hook() even if the skb
> > has not been linearised and without giving rx_hook() a way
> > to linearise it.
> > 
> > Change the rx_hook() interface and make it accept the skb
> > as argument. In this way users implementing rx_hook() can
> > perform all the needed operations to properly (and safely)
> > access the skb data.
> ...
> > -	void (*rx_hook)(struct netpoll *, int, char *, int);
> > +	void (*rx_hook)(struct netpoll *np, struct sk_buff *skb, int offset);
> 
> You can't do that change without changing the way that hooks are registered
> so that any existing modules will fail to register their hooks.

There is no hook registration in the kernel tree. All the users are outside.


-- 
Antonio Quartulli

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: Stale IPv6 address accumulation on linux 3.2.17
From: Hannes Frederic Sowa @ 2013-10-22 10:18 UTC (permalink / raw)
  To: Templin, Fred L; +Cc: netdev@vger.kernel.org
In-Reply-To: <2134F8430051B64F815C691A62D9831813520C@XCH-BLV-504.nw.nos.boeing.com>

Hi Fred!

On Mon, Oct 21, 2013 at 03:50:24PM +0000, Templin, Fred L wrote:
> On linux 3.2.17, I have a host that configures IPv6 addresses on
> an eth0 interface based on Router Advertisements received from an
> on-link linux box configured as an IPv6 router and running radvd.
> When the host gets an RA, it configures both an EUI-64-based IPv6
> address and an IPv6 privacy address, so it has two IPv6 addresses.
> But, if I leave the host up for long periods of time, it seems to
> accumulate additional IPv6 addresses - perhaps these are stale
> IPv6 privacy addresses?
> 
> Is this known behavior, and if so is there a way to turn it off?
> Or, perhaps this was a known bug that has been corrected in more
> recent linux kernel versions?

Could you send me the output of ip -6 a l?

Greetings,

  Hannes

^ permalink raw reply

* Re: [PATCH] ixgbe: Reduce memory consumption with larger page sizes
From: Jeff Kirsher @ 2013-10-22 10:23 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: netdev, benh
In-Reply-To: <20131022103757.162f1a79@kryten>

[-- Attachment #1: Type: text/plain, Size: 1030 bytes --]

On Tue, 2013-10-22 at 10:37 +1100, Anton Blanchard wrote:
> The ixgbe driver allocates pages for its receive rings. It currently
> uses 512 pages, regardless of page size. During receive handling it
> adds the unused part of the page back into the rx ring, avoiding the
> need for a new allocation.
> 
> On a ppc64 box with 64 threads and 64kB pages, we end up with
> 512 entries * 64 rx queues * 64kB = 2GB memory used. Even more of a
> concern is that we use up 2GB of IOMMU space in order to map all this
> memory.
> 
> The driver makes a number of decisions based on if PAGE_SIZE is less
> than 8kB, so use this as the breakpoint and only allocate 128 entries
> on 8kB or larger page sizes.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
> 
> Jeff: The breakpoint and the ring size I chose was pretty arbitrary,
> feel free to adjust as you see fit. Our main concern is we get that
> 2GB
> consumption down to something more reasonable :)

Thanks Anton, I will add your patch to my queue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [PATCH] igb: Add EEPROM IO stubs for iNVM
From: Jeff Kirsher @ 2013-10-22 10:45 UTC (permalink / raw)
  To: Marek Vasut
  Cc: netdev, e1000-devel, Carolyn Wyborny, Aaron Brown,
	David S. Miller
In-Reply-To: <1382412123-4782-1-git-send-email-marex@denx.de>

[-- Attachment #1: Type: text/plain, Size: 4607 bytes --]

On Tue, 2013-10-22 at 05:22 +0200, Marek Vasut wrote:
> Add stub functions for EEPROM operations in case where the i210 is
> used without external EEPROM. The EEPROM operations must not be set
> to NULL, since otherwise we will get a backtrace when attempting the
> command below. Once such place to trigger this is from igb_ethtool.c
> igb_set_eeprom(), where hw->nvm.ops.write() is called without first
> checking if .write() is valid . By grepping through the code, there
> are more such occasions which assume .write() to be always valid.
> Thus, instead of poluting the code with checks, add stubs. I believe
> it'd be prefferable to possibly even implement those functions, but
> my knowledge of the adapter is still limited and as far as I
> understand,
> the iNVM is programmable only once.
> 
> Command:
> 
> $ ethtool -E eth0 magic 0x157b8086 offset 6 value 0x1b
> 
> Backtrace:
> 
> Unable to handle kernel NULL pointer dereference at virtual address
> 00000000
> pgd = be7ac000
> [00000000] *pgd=4e6a6831, *pte=00000000, *ppte=00000000
> Internal error: Oops: 80000007 [#1] SMP ARM
> CPU: 2 PID: 59 Comm: ethtool Not tainted 3.12.0-rc6+ #8
> task: bf8f3600 ti: be73c000 task.ti: be73c000
> PC is at 0x0
> LR is at igb_set_eeprom+0x27c/0x3b4
> pc : [<00000000>]    lr : [<803bc780>]    psr: 20000013
> sp : be73dd80  ip : 00000000  fp : be73ddf4
> r10: 00000001  r9 : 00000003  r8 : be6d6000
> r7 : bfa64a38  r6 : be6d7000  r5 : bfa64000  r4 : be73de20
> r3 : be6d6000  r2 : 00000001  r1 : 00000003  r0 : bfa64a38
> Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> Control: 10c53c7d  Table: 4e7ac04a  DAC: 00000015
> Process ethtool (pid: 59, stack limit = 0xbe73c240)
> Stack: (0xbe73dd80 to 0xbe73e000)
> dd80: 803c7e2c 803c8554 00000000 00000000 00000000 803c827c 00000004
> 00000000
> dda0: 00000000 00000000 00010800 00080008 00000008 be73ddc0 800d1a34
> 8054db6c
> ddc0: be6d6000 00000003 00ad97c8 00000001 be73c000 bfa64000 be6d7000
> 80584d58
> dde0: be73de20 00ad97d8 be73de7c be73ddf8 80465e00 803bc510 be73de14
> be6d7000
> de00: e4114bb3 00000000 be73de6c 0000000c 8055048c 80076b5c 00000002
> 00000000
> de20: 0000000c 157b8086 00000006 00000001 8094cac0 be73c000 00000000
> 00000000
> de40: be73de7c 00008946 8094cac0 7e8cfcf4 be73de98 00008946 8094cac0
> 7e8cfcf4
> de60: be73de98 be73c000 00000000 00000000 be73dee4 be73de80 80474f54
> 804656ac
> de80: 000000a8 00000200 be73dec4 be73de98 80077c7c 80076b5c 30687465
> 00000000
> dea0: 00000000 00000000 00ad97c8 00000000 00000000 00000000 be73c000
> 00008946
> dec0: fffffdfd 7e8cfcf4 7e8cfcf4 7e8cfcf4 bf18c020 00000000 be73df04
> be73dee8
> dee0: 80449c18 80474ac8 80449b94 00008946 be6b0600 00000003 be73df74
> be73df08
> df00: 800e6d10 80449ba0 be6b0600 00030002 be6b5f40 be6b5f40 be73df3c
> be73df28
> df20: 80554b2c 802b03e8 be6b5f6c be6b5f00 be73df5c be73c000 8000ea44
> be73c000
> df40: 8000eab0 bf8f3600 00000001 00008946 00000003 00000000 7e8cfcf4
> be6b0600
> df60: be73c000 00000000 be73dfa4 be73df78 800e72ac 800e6c98 be73df94
> 00000000
> df80: 80076b64 0002bd0c 00000000 0002bcc8 00000036 8000ebe4 00000000
> be73dfa8
> dfa0: 8000ea20 800e7278 0002bd0c 00000000 00000003 00008946 7e8cfcf4
> 7e8cfcf4
> dfc0: 0002bd0c 00000000 0002bcc8 00000036 00000000 00000000 00000000
> 7e8cfb84
> dfe0: 7e8cfe65 7e8cfb78 0001201c 0004535c 20000010 00000003 00000000
> 00000000
> Backtrace:
> [<803bc504>] (igb_set_eeprom+0x0/0x3b4) from [<80465e00>] (dev_ethtool
> +0x760/0x1f68)
> [<804656a0>] (dev_ethtool+0x0/0x1f68) from [<80474f54>] (dev_ioctl
> +0x498/0x86c)
> [<80474abc>] (dev_ioctl+0x0/0x86c) from [<80449c18>] (sock_ioctl
> +0x84/0x258)
> [<80449b94>] (sock_ioctl+0x0/0x258) from [<800e6d10>] (do_vfs_ioctl
> +0x84/0x5e0)
>  r6:00000003 r5:be6b0600 r4:00008946 r3:80449b94
> [<800e6c8c>] (do_vfs_ioctl+0x0/0x5e0) from [<800e72ac>] (SyS_ioctl
> +0x40/0x68)
> [<800e726c>] (SyS_ioctl+0x0/0x68) from [<8000ea20>] (ret_fast_syscall
> +0x0/0x48)
>  r8:8000ebe4 r7:00000036 r6:0002bcc8 r5:00000000 r4:0002bd0c
> Code: bad PC value
> ---[ end trace 59379e9bf8fc8437 ]---
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> Cc: Carolyn Wyborny <carolyn.wyborny@intel.com>
> Cc: Aaron Brown <aaron.f.brown@intel.com>
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Cc: David S. Miller <davem@davemloft.net>
> ---
>  drivers/net/ethernet/intel/igb/e1000_i210.c | 46
> +++++++++++++++++++++++++++--
>  1 file changed, 43 insertions(+), 3 deletions(-)

Thanks Marek, I have added the patch to my queue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* Re: [E1000-devel] [PATCH net-next] e1000: fix wrong queue idx calculation
From: Jeff Kirsher @ 2013-10-22 10:46 UTC (permalink / raw)
  To: Hong Zhiguo; +Cc: davem, e1000-devel, netdev, Hong Zhiguo
In-Reply-To: <1382256924-12598-1-git-send-email-zhiguohong@tencent.com>

[-- Attachment #1: Type: text/plain, Size: 413 bytes --]

On Sun, 2013-10-20 at 16:15 +0800, Hong Zhiguo wrote:
> From: Hong Zhiguo <zhiguohong@tencent.com>
> 
> tx_ring and adapter->tx_ring are already of type "struct
> e1000_tx_ring *"
> 
> Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com>
> ---
>  drivers/net/ethernet/intel/e1000/e1000_main.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)

Thanks Hong, I have added your patch to my queue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* (unknown)
From: andran @ 2013-10-21 20:51 UTC (permalink / raw)


[-- Attachment #1: Type: text/plain, Size: 55 bytes --]



-- 
Do you need help? View attachment for more info


[-- Attachment #2: Loan offer.odt --]
[-- Type: application/vnd.oasis.opendocument.text, Size: 5327 bytes --]

^ permalink raw reply

* Re: BUG: scheduling while atomic dev_set_promiscuity->__dev_notify_flags
From: Nicolas Dichtel @ 2013-10-22 11:52 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: netdev
In-Reply-To: <CAMEtUuy91zYJ=bj1dzfdqE8kqZ3rE1RgdR-PZYekSUg8_xoTBw@mail.gmail.com>

Le 22/10/2013 03:04, Alexei Starovoitov a écrit :
> Hi Nicolas,
>
> after commit 991fb3f74c "dev: always advertise rx_flags changes via netlink"
> I'm seeing 'sleeping in atomic' bug.
>
> Steps to reproduce:
> ip tuntap add dev tap1 mode tap
> ifconfig tap1 up
> tcpdump -nei tap1
> and in different terminal:
> ip tuntap del dev tap1 mode tap
>
> [  271.627994] device tap1 left promiscuous mode
> [  271.639897] BUG: sleeping function called from invalid context at
> mm/slub.c:940
> [  271.664491] in_atomic(): 1, irqs_disabled(): 0, pid: 3394, name: ip
> [  271.677525] INFO: lockdep is turned off.
> [  271.690503] CPU: 0 PID: 3394 Comm: ip Tainted: G        W    3.12.0-rc3+ #73
> [  271.703996] Hardware name: System manufacturer System Product
> Name/P8Z77 WS, BIOS 3007 07/26/2012
> [  271.731254]  ffffffff81a58506 ffff8807f0d57a58 ffffffff817544e5
> ffff88082fa0f428
> [  271.760261]  ffff8808071f5f40 ffff8807f0d57a88 ffffffff8108bad1
> ffffffff81110ff8
> [  271.790683]  0000000000000010 00000000000000d0 00000000000000d0
> ffff8807f0d57af8
> [  271.822332] Call Trace:
> [  271.838234]  [<ffffffff817544e5>] dump_stack+0x55/0x76
> [  271.854446]  [<ffffffff8108bad1>] __might_sleep+0x181/0x240
> [  271.870836]  [<ffffffff81110ff8>] ? rcu_irq_exit+0x68/0xb0
> [  271.887076]  [<ffffffff811a80be>] kmem_cache_alloc_node+0x4e/0x2a0
> [  271.903368]  [<ffffffff810b4ddc>] ? vprintk_emit+0x1dc/0x5a0
> [  271.919716]  [<ffffffff81614d67>] ? __alloc_skb+0x57/0x2a0
> [  271.936088]  [<ffffffff810b4de0>] ? vprintk_emit+0x1e0/0x5a0
> [  271.952504]  [<ffffffff81614d67>] __alloc_skb+0x57/0x2a0
> [  271.968902]  [<ffffffff8163a0b2>] rtmsg_ifinfo+0x52/0x100
> [  271.985302]  [<ffffffff8162ac6d>] __dev_notify_flags+0xad/0xc0
> [  272.001642]  [<ffffffff8162ad0c>] __dev_set_promiscuity+0x8c/0x1c0
> [  272.017917]  [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
> [  272.033961]  [<ffffffff8162b109>] dev_set_promiscuity+0x29/0x50
> [  272.049855]  [<ffffffff8172e937>] packet_dev_mc+0x87/0xc0
> [  272.065494]  [<ffffffff81732052>] packet_notifier+0x1b2/0x380
> [  272.080915]  [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
> [  272.096009]  [<ffffffff81761c66>] notifier_call_chain+0x66/0x150
> [  272.110803]  [<ffffffff8108503e>] __raw_notifier_call_chain+0xe/0x10
> [  272.125468]  [<ffffffff81085056>] raw_notifier_call_chain+0x16/0x20
> [  272.139984]  [<ffffffff81620190>] call_netdevice_notifiers_info+0x40/0x70
> [  272.154523]  [<ffffffff816201d6>] call_netdevice_notifiers+0x16/0x20
> [  272.168552]  [<ffffffff816224c5>] rollback_registered_many+0x145/0x240
> [  272.182263]  [<ffffffff81622641>] rollback_registered+0x31/0x40
> [  272.195369]  [<ffffffff816229c8>] unregister_netdevice_queue+0x58/0x90
> [  272.208230]  [<ffffffff81547ca0>] __tun_detach+0x140/0x340
> [  272.220686]  [<ffffffff81547ed6>] tun_chr_close+0x36/0x60
>
> packet_notifier() does rcu_read_lock() before calling into packet_dev_mc() .
>
> Not sure how to fix it cleanly, other than disabling a notify here.
> Any suggestion?
I don't reproduce it. Can you send me your .config?
I will look more deeply at the code.


Regards,
Nicolas

^ permalink raw reply

* RE: [PATCH net] netpoll: fix rx_hook() interface by passing the skb
From: David Laight @ 2013-10-22 12:46 UTC (permalink / raw)
  To: Antonio Quartulli; +Cc: David S. Miller, netdev
In-Reply-To: <20131022101127.GJ1544@neomailbox.net>

> Subject: Re: [PATCH net] netpoll: fix rx_hook() interface by passing the skb
> 
> On Tue, Oct 22, 2013 at 10:09:00AM +0100, David Laight wrote:
> > > Subject: [PATCH net] netpoll: fix rx_hook() interface by passing the skb
> > >
> > > Right now skb->data is passed to rx_hook() even if the skb
> > > has not been linearised and without giving rx_hook() a way
> > > to linearise it.
> > >
> > > Change the rx_hook() interface and make it accept the skb
> > > as argument. In this way users implementing rx_hook() can
> > > perform all the needed operations to properly (and safely)
> > > access the skb data.
> > ...
> > > -	void (*rx_hook)(struct netpoll *, int, char *, int);
> > > +	void (*rx_hook)(struct netpoll *np, struct sk_buff *skb, int offset);
> >
> > You can't do that change without changing the way that hooks are registered
> > so that any existing modules will fail to register their hooks.
> 
> There is no hook registration in the kernel tree. All the users are outside.

Looking at __netpoll_rx() I notice that there isn't an skb_pull for the
udp header.

Actually, I think the alignment rules effectively imply that iph->ihl
(the second byte) will always be in the first skb fragment so the
code could sensible do a single skb_pull() that includes the udp header.

I can't remember which value you passed as 'offset' (and my mailer makes
it hard to find), but to ease the code changes the offset of the udp data
would make sense.
In that case you still need to pass the source port.
If you do rx_hook(np, source_port, skb, offset) then if anyone manages to
load an old module (or code that casts the assignement to rx_poll)
at least it won't go 'bang'.
Renaming the structure member will guarantee to generate compile errors.

	David




^ permalink raw reply

* Re: [PATCH net-next 0/2] Removal of struct esp_data
From: Steffen Klassert @ 2013-10-22 13:08 UTC (permalink / raw)
  To: David Miller; +Cc: mathias.krause, netdev, herbert
In-Reply-To: <20131018.135536.686066381481925652.davem@davemloft.net>

On Fri, Oct 18, 2013 at 01:55:36PM -0400, David Miller wrote:
> From: Mathias Krause <mathias.krause@secunet.com>
> Date: Fri, 18 Oct 2013 12:09:03 +0200
> 
> > This series removes one level of indirection when accessing the aead
> > crypto algorithm in ESP transforms by simply removing struct esp_data.
> > This results in smaller code and less memory usage per xfrm state.
> > 
> > Please apply!
> 
> No objections from me, I'll let Steffen pick this up.

I'm a bit hesitating with removing the padlen field. We resisted
several attempts to remove it in the past. It is currenly unused,
but it provides the infrastructure for ESP padding as defined
in RFC 4303. However, RFC 4303 recommends the use of TFC padding
instead to conceal the actual length of the packet. So I'm not
sure what's the actual usecase for ESP padding. I'll reconsider
this next week when I'm back at office.

^ permalink raw reply

* Re: [PATCH] Revert "bridge: only expire the mdb entry when query is received"
From: Vladislav Yasevich @ 2013-10-22 13:10 UTC (permalink / raw)
  To: David Miller
  Cc: amwang, netdev@vger.kernel.org, bridge, LKML, Stephen Hemminger,
	linus.luessing
In-Reply-To: <20131021.184509.1933008514161772000.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1789 bytes --]

On Mon, Oct 21, 2013 at 6:45 PM, David Miller <davem@davemloft.net> wrote:

> From: Linus Lüssing <linus.luessing@web.de>
> Date: Sun, 20 Oct 2013 00:58:57 +0200
>
> > While this commit was a good attempt to fix issues occuring when no
> > multicast querier is present, this commit still has two more issues:
> >
> > 1) There are cases where mdb entries do not expire even if there is a
> > querier present. The bridge will unnecessarily continue flooding
> > multicast packets on the according ports.
> >
> > 2) Never removing an mdb entry could be exploited for a Denial of
> > Service by an attacker on the local link, slowly, but steadily eating up
> > all memory.
> >
> > Actually, this commit became obsolete with
> > "bridge: disable snooping if there is no querier" (b00589af3b)
> > which included fixes for a few more cases.
> >
> > Therefore reverting the following commits (the commit stated in the
> > commit message plus three of its follow up fixes):
> >
> > ---
> > Revert "bridge: update mdb expiration timer upon reports."
> > This reverts commit f144febd93d5ee534fdf23505ab091b2b9088edc.
> > Revert "bridge: do not call setup_timer() multiple times"
> > This reverts commit 1faabf2aab1fdaa1ace4e8c829d1b9cf7bfec2f1.
> > Revert "bridge: fix some kernel warning in multicast timer"
> > This reverts commit c7e8e8a8f7a70b343ca1e0f90a31e35ab2d16de1.
> > Revert "bridge: only expire the mdb entry when query is received"
> > This reverts commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b.
> > ---
>
> Cong, and other bridge folks, please review this revert.
>

Makes sense and make the implementation better follow the spec.
Looks like the issues seen before are resolved by the revert.

Reviewed-by: Vlad Yasevich <vyasevich@gmail.com>

[-- Attachment #2: Type: text/html, Size: 2524 bytes --]

^ permalink raw reply

* Re: [PATCH] Revert "bridge: only expire the mdb entry when query is received"
From: Vlad Yasevich @ 2013-10-22 13:13 UTC (permalink / raw)
  To: David Miller, linus.luessing
  Cc: stephen, netdev, bridge, linux-kernel, amwang
In-Reply-To: <20131021.184509.1933008514161772000.davem@davemloft.net>

On 10/21/2013 06:45 PM, David Miller wrote:
> From: Linus Lüssing <linus.luessing@web.de>
> Date: Sun, 20 Oct 2013 00:58:57 +0200
>
>> While this commit was a good attempt to fix issues occuring when no
>> multicast querier is present, this commit still has two more issues:
>>
>> 1) There are cases where mdb entries do not expire even if there is a
>> querier present. The bridge will unnecessarily continue flooding
>> multicast packets on the according ports.
>>
>> 2) Never removing an mdb entry could be exploited for a Denial of
>> Service by an attacker on the local link, slowly, but steadily eating up
>> all memory.
>>
>> Actually, this commit became obsolete with
>> "bridge: disable snooping if there is no querier" (b00589af3b)
>> which included fixes for a few more cases.
>>
>> Therefore reverting the following commits (the commit stated in the
>> commit message plus three of its follow up fixes):
>>
>> ---
>> Revert "bridge: update mdb expiration timer upon reports."
>> This reverts commit f144febd93d5ee534fdf23505ab091b2b9088edc.
>> Revert "bridge: do not call setup_timer() multiple times"
>> This reverts commit 1faabf2aab1fdaa1ace4e8c829d1b9cf7bfec2f1.
>> Revert "bridge: fix some kernel warning in multicast timer"
>> This reverts commit c7e8e8a8f7a70b343ca1e0f90a31e35ab2d16de1.
>> Revert "bridge: only expire the mdb entry when query is received"
>> This reverts commit 9f00b2e7cf241fa389733d41b615efdaa2cb0f5b.
>> ---
>
> Cong, and other bridge folks, please review this revert.
>
t  http://vger.kernel.org/majordomo-info.html
>

Makes sense and make the implementation better follow the spec.
Looks like the issues seen before are resolved by the revert.

-vlad

^ permalink raw reply

* Re: [PATCH RFC 4/5] net:stmmac: fix jumbo frame handling.
From: Giuseppe CAVALLARO @ 2013-10-22 13:24 UTC (permalink / raw)
  To: Jimmy PERCHET; +Cc: netdev, jimmy.perchet
In-Reply-To: <52655640.4060405@parrot.com>

On 10/21/2013 6:28 PM, Jimmy PERCHET wrote:
> On 21/10/2013 15:40, Giuseppe CAVALLARO wrote:
>> On 10/16/2013 5:24 PM, Jimmy Perchet wrote:
>>> This patch addresses several issues which prevent jumbo frames from working properly :
>>> .jumbo frames' last descriptor was not closed
>>> .several confusion regarding descriptor's max buffer size
>>> .frags could not be jumbo
>>>
>>> Signed-off-by: Jimmy Perchet <jimmy.perchet@parrot.com>
>>
>>
>> Jimmy, thx for thi patch. BElow some my first notes.
>
> Thanks a lot for this first review.

welcome

>
>> I'll continue to look at the patch to verify if I missed
>> soemthing. I kindly ask you, for the next version, to add
>> more comments especially in the function to prepare the
>> tx desc in order to help me on reviewing.
>
> Sure ;)
>
> I hope do v2 by next week.

ok thx, I'll try to help on reviewing for the v2 again.

>
> I'm OK with most of your comments. Some additional
> notes below:
>
>>>    }
>>> @@ -81,7 +81,7 @@ static inline void ndesc_end_tx_desc_on_ring(struct dma_desc *p, int ter)
>>>
>>>    static inline void norm_set_tx_desc_len_on_ring(struct dma_desc *p, int len)
>>>    {
>>> -    if (unlikely(len > BUF_SIZE_2KiB)) {
>>> +    if (unlikely(len >= BUF_SIZE_2KiB)) {
>>
>> we cannot manage a size of 2048 on normal desc
>>
>> Pls you should verify to not break the back-compatibility.
>
> IMHO, this actually fix the problem you think I create.
> In current code, if len is equal to 2048, buffer1_size is set to 2048,
> this is wrong because the max size is actually 2047...

IIRC, for normal descriptors, the TBS2/1 are just 11 bits
so the max programmable size is 2047 (0x7ff).

>
>>
>>>            p->des01.etx.buffer1_size = BUF_SIZE_2KiB - 1;
>>>            p->des01.etx.buffer2_size = len - p->des01.etx.buffer1_size;
>>>        } else
>
>
>
>>>
>>>    static void stmmac_refill_desc3(void *priv_ptr, struct dma_desc *p)
>>>    {
>>> @@ -103,13 +90,13 @@ static void stmmac_refill_desc3(void *priv_ptr, struct dma_desc *p)
>>>        if (unlikely(priv->plat->has_gmac))
>>>            /* Fill DES3 in case of RING mode */
>>>            if (priv->dma_buf_sz >= BUF_SIZE_8KiB)
>>> -            p->des3 = p->des2 + BUF_SIZE_8KiB;
>>> +            p->des3 = p->des2 + BUF_SIZE_8KiB - 1;
>>
>> is it correct? can you check?
>
> The actual buffer's max size is 8191, so, in ring mode,
> the second buffer must start at p->des2 + 8191.
>
>>> -    priv->cur_tx++;
>>> +    priv->cur_tx += nb_desc;
>>
>> can we avoid to use the nb_desc?
> Actually, it is a preparation for my 5th patch : I want to write cur_tx only once.
> I can split this.

ok

>
>
>
> Best Regards,
> Jimmy
>
>

^ permalink raw reply

* RE: [PATCH] net-bnx2x: Fix byte order problem on NVRAM writes
From: Yuval Mintz @ 2013-10-22 13:30 UTC (permalink / raw)
  To: Nate Klein, netdev@vger.kernel.org
  Cc: Eilon Greenstein, linux-kernel@vger.kernel.org
In-Reply-To: <1382392621-8998-1-git-send-email-nxk@google.com>

> Tested:
>     ethtool -e eth0 raw on >first.nvram
>     ethtool -E eth0 <first.nvram
>     ethtool -e eth0 raw on >second.nvram
>     cmp first.nvram second.nvram || ethtool -E eth0 <second.nvram
>     (No output means pass.)

Hi Nate,

We're aware of this `bug' for some time - we've encountered it when
trying to fix the endian sparse warnings in the driver.

Sadly, there are already existing user applications that assume that this is 
the driver's behaviour - i.e., those applications prepare their buffers in a 
manner which assumes the endian of the writes; changing this write will 
cause those tools to break.
That's why we haven't fixed the issue before, and cannot support such a
fix. We're more than willing to document it somewhere, if that seems 
useful to anyone.

Thanks,
Yuval

> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
> b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
> index 8213cc8..35671fb 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_ethtool.c
> @@ -1549,7 +1549,7 @@ static int bnx2x_nvram_write_dword(struct bnx2x
> *bp, u32 offset, u32 val,
>  	REG_WR(bp, MCP_REG_MCPR_NVM_COMMAND,
> MCPR_NVM_COMMAND_DONE);
> 
>  	/* write the data */
> -	REG_WR(bp, MCP_REG_MCPR_NVM_WRITE, val);
> +	REG_WR(bp, MCP_REG_MCPR_NVM_WRITE, cpu_to_be32(val));
> 
>  	/* address of the NVRAM to write to */
>  	REG_WR(bp, MCP_REG_MCPR_NVM_ADDR,
> --

^ permalink raw reply

* Re: [PATCH RFC 3/5] net:stmmac: ensure we reclaim all dirty descriptors.
From: Giuseppe CAVALLARO @ 2013-10-22 13:33 UTC (permalink / raw)
  To: Eric Dumazet, Jimmy PERCHET; +Cc: netdev
In-Reply-To: <1382381375.3284.79.camel@edumazet-glaptop.roam.corp.google.com>

On 10/21/2013 8:49 PM, Eric Dumazet wrote:
> On Mon, 2013-10-21 at 11:32 -0700, Eric Dumazet wrote:
>> On Mon, 2013-10-21 at 15:10 +0200, Jimmy PERCHET wrote:
>>> Hello Peppe,
>>>
>>
>>> I can reproduce this problem by issuing 9KiB jumbo frames on 10MBit/s link.
>>> If socket's wmemory size is about 500kiB (or less), the transfer stall.
>>> (I guess it is reproducible with 1500o frames by decreasing
>>> socket's wmemory to 90KB)
>>> Re-arming the timer fix this behaviour.
>>>
>>> Here my understanding of this issue :
>>> With 9KiB frames and 500kiB of wmemory, only 60 frames can be
>>> prepared in a row. It is below the tx coalescence threshold,
>>> so there will be no interrupt. When the tx coalescence timer
>>> expires (40ms after), only five descriptors have to be
>>> freed (9000*5 @ 10Mbit/s = 34ms), it is not enough to reach
>>> the socket's wake-up threshold. We get into a deadlock :
>>> *Socket is waiting for free buffers before performing new transfer.
>>> *Driver is waiting for new transfer before performing cleanup.
>>>
>>> Maybe, it is not a real life use-case, and is not worth
>>> a patch. What do you think ?
>>>
>>
>> I think there is probably a bug in the driver, a race of some sort,
>> and it would be better to find it and fix it ;)
>>
>
> coalesce params should not be hardcoded, but depend on link speed and
> mtu.
>
> On 10Mbits, and MTU=9000 there is really no point using coalescing !

so the final patch could be to tune/disable the tx coalesce according
to speed and mtu.

Indeed I had added something that can already  help on that.

We can tune the tx_coal_frames and decide to set the IC bit
(interrupt on completion bit) in the frame to be transmitted.
This can be done via ethtool.

This should reduce the mitigation so, for sure, you can tune all in case
of low speed or jumbo. IIRC, you could decide to disable mitigation
at all. To Jimmy, can you try this? In any case, let me know.

Peppe

^ permalink raw reply

* [net-next v2 00/14][pull request] Intel Wired LAN Driver Updates
From: Jeff Kirsher @ 2013-10-22 14:22 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, gospo, sassmann

This series contains updates to i40e only.

Jesse provides 6 patches against i40e.  First is a patch to reduce
CPU utilization by reducing read-flush to read in the hot path.  Next
couple of patches resolve coverity issues reported by Hannes Frederic
Sowa <hannes@stressinduktion.org>.  Then Jesse refactored i40e to cleanup
functions which used cpu_to_xxx(foo) which caused a lot of line wrapping.

Mitch provides 2 i40e patches.  First fixes a panic when tx_rings[0]
are not allocated, his second patch corrects a math error when
assigning MSI-X vectors to VFs.  The vectors-per-vf value reported
by the hardware already conveniently reports one less than the actual
value.

Shannon provides 5 patches against i40e.  His first patch corrects a
number of little bugs in the error handling of irq setup, most of
which ended up panicing the kernel.  Next he fixes the overactive
IRQ issue seen in testing and allows the use of the legacy interrupt.
Shannon then provides a cleanup of the arguments declared at the
beginning of each function.  Then he provides a patch to make sure
that there are really rings and queues before trying to dump
information in them.  Lastly he simplifies the code by using an
already existing variable.

Catherine provides an i40e patch to bump the version.

v2:
 - Remove unneeded parenthesis in patch 3 based on feedback from
   Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
 - Fix patch description for patch 11 based on feedback from
   Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>

The following are changes since commit bda301c9dc545f81bd70c1eecb8572bfc5eb524c:
  Merge branch 'sit_tso'
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master

Catherine Sullivan (1):
  i40e: Bump version

Jesse Brandeburg (6):
  i40e: do not flush after re-enabling interrupts
  i40e: debugfs fixups
  i40e: clamp debugfs nvm read command
  i40e: fix use of untrusted scalar value warning
  i40e: fix sign extension issue
  i40e: refactor fdir setup function

Mitch Williams (2):
  i40e: don't free nonexistent rings
  i40e: assign correct vector to VF

Shannon Nelson (5):
  i40e: fixup legacy interrupt handling
  i40e: tweaking icr0 handling for legacy irq
  i40e: reorder block declarations in debugfs
  i40e: check vsi ptrs before dumping them
  i40e: use pf_id for pf function id in qtx_ctl

 drivers/net/ethernet/intel/i40e/i40e.h             |   1 +
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c     | 135 ++++++++++++---------
 drivers/net/ethernet/intel/i40e/i40e_main.c        |  50 ++++----
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        |  83 ++++++-------
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |   4 +-
 5 files changed, 146 insertions(+), 127 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [net-next v2 01/14] i40e: do not flush after re-enabling interrupts
From: Jeff Kirsher @ 2013-10-22 14:22 UTC (permalink / raw)
  To: davem; +Cc: Jesse Brandeburg, netdev, gospo, sassmann, Jeff Kirsher
In-Reply-To: <1382451757-9817-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

Hot path doesn't need read-flush after interrupt enable, and this
flush really causes a lot of extra cpu utilization.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 --
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index fbe7fe2..69ed801 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -2560,7 +2560,7 @@ void i40e_irq_dynamic_enable(struct i40e_vsi *vsi, int vector)
 	      I40E_PFINT_DYN_CTLN_CLEARPBA_MASK |
 	      (I40E_ITR_NONE << I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT);
 	wr32(hw, I40E_PFINT_DYN_CTLN(vector - 1), val);
-	i40e_flush(hw);
+	/* skip the flush */
 }
 
 /**
@@ -2709,6 +2709,7 @@ static int i40e_vsi_enable_irq(struct i40e_vsi *vsi)
 		i40e_irq_dynamic_enable_icr0(pf);
 	}
 
+	i40e_flush(&pf->hw);
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index dc89e72..fbc40cd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -559,8 +559,6 @@ static void i40e_update_dynamic_itr(struct i40e_q_vector *q_vector)
 	i40e_set_new_dynamic_itr(&q_vector->tx);
 	if (old_itr != q_vector->tx.itr)
 		wr32(hw, reg_addr, q_vector->tx.itr);
-
-	i40e_flush(hw);
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related

* [net-next v2 03/14] i40e: assign correct vector to VF
From: Jeff Kirsher @ 2013-10-22 14:22 UTC (permalink / raw)
  To: davem
  Cc: Mitch Williams, netdev, gospo, sassmann, sergei.shtylyov,
	Jesse Brandeburg, Jeff Kirsher
In-Reply-To: <1382451757-9817-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

Correct math error when assigning MSI-X vectors to VFs. The vectors-per-vf
value reported by the hardware already conveniently reports one less than the
actual value.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
v2:
 - Removed unnecessary parenthesis
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 8967e58..35f4909 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -251,7 +251,7 @@ static void i40e_config_irq_link_list(struct i40e_vf *vf, u16 vsi_idx,
 		reg_idx = I40E_VPINT_LNKLST0(vf->vf_id);
 	else
 		reg_idx = I40E_VPINT_LNKLSTN(
-			    ((pf->hw.func_caps.num_msix_vectors_vf - 1)
+					   (pf->hw.func_caps.num_msix_vectors_vf
 					      * vf->vf_id) + (vector_id - 1));
 
 	if (vecmap->rxq_map == 0 && vecmap->txq_map == 0) {
-- 
1.8.3.1

^ permalink raw reply related

* [net-next v2 02/14] i40e: don't free nonexistent rings
From: Jeff Kirsher @ 2013-10-22 14:22 UTC (permalink / raw)
  To: davem
  Cc: Mitch Williams, netdev, gospo, sassmann, Jesse Brandeburg,
	Jeff Kirsher
In-Reply-To: <1382451757-9817-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Mitch Williams <mitch.a.williams@intel.com>

Not all VSIs have rings! Check to see if rings were actually allocated before
freeing them.

This prevents a panic when tx_rings[0] is not allocated.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 69ed801..a8c18fa 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5160,11 +5160,12 @@ static s32 i40e_vsi_clear_rings(struct i40e_vsi *vsi)
 {
 	int i;
 
-	for (i = 0; i < vsi->alloc_queue_pairs; i++) {
-		kfree_rcu(vsi->tx_rings[i], rcu);
-		vsi->tx_rings[i] = NULL;
-		vsi->rx_rings[i] = NULL;
-	}
+	if (vsi->tx_rings[0])
+		for (i = 0; i < vsi->alloc_queue_pairs; i++) {
+			kfree_rcu(vsi->tx_rings[i], rcu);
+			vsi->tx_rings[i] = NULL;
+			vsi->rx_rings[i] = NULL;
+		}
 
 	return 0;
 }
-- 
1.8.3.1

^ permalink raw reply related

* [net-next v2 04/14] i40e: fixup legacy interrupt handling
From: Jeff Kirsher @ 2013-10-22 14:22 UTC (permalink / raw)
  To: davem
  Cc: Shannon Nelson, netdev, gospo, sassmann, Jesse Brandeburg,
	Jeff Kirsher
In-Reply-To: <1382451757-9817-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Shannon Nelson <shannon.nelson@intel.com>

There were a number of little bugs in the error handling of irq setup, most of
which ended up panicing the kernel, and are addressed by this patch, along with
a couple formatting issues.

Legacy interrupts (including MSI) are used only in the case of failure to
allocate MSI-X interrupts.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index a8c18fa..270190a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4615,7 +4615,8 @@ static void i40e_fdir_setup(struct i40e_pf *pf)
 	bool new_vsi = false;
 	int err, i;
 
-	if (!(pf->flags & (I40E_FLAG_FDIR_ENABLED|I40E_FLAG_FDIR_ATR_ENABLED)))
+	if (!(pf->flags & (I40E_FLAG_FDIR_ENABLED |
+			   I40E_FLAG_FDIR_ATR_ENABLED)))
 		return;
 
 	pf->atr_sample_rate = I40E_DEFAULT_ATR_SAMPLE_RATE;
@@ -5435,7 +5436,8 @@ static void i40e_init_interrupt_scheme(struct i40e_pf *pf)
 	if (pf->flags & I40E_FLAG_MSIX_ENABLED) {
 		err = i40e_init_msix(pf);
 		if (err) {
-			pf->flags &= ~(I40E_FLAG_RSS_ENABLED	   |
+			pf->flags &= ~(I40E_FLAG_MSIX_ENABLED	   |
+					I40E_FLAG_RSS_ENABLED	   |
 					I40E_FLAG_MQ_ENABLED	   |
 					I40E_FLAG_DCB_ENABLED	   |
 					I40E_FLAG_SRIOV_ENABLED	   |
@@ -5450,14 +5452,17 @@ static void i40e_init_interrupt_scheme(struct i40e_pf *pf)
 
 	if (!(pf->flags & I40E_FLAG_MSIX_ENABLED) &&
 	    (pf->flags & I40E_FLAG_MSI_ENABLED)) {
+		dev_info(&pf->pdev->dev, "MSIX not available, trying MSI\n");
 		err = pci_enable_msi(pf->pdev);
 		if (err) {
-			dev_info(&pf->pdev->dev,
-				 "MSI init failed (%d), trying legacy.\n", err);
+			dev_info(&pf->pdev->dev, "MSI init failed - %d\n", err);
 			pf->flags &= ~I40E_FLAG_MSI_ENABLED;
 		}
 	}
 
+	if (!(pf->flags & (I40E_FLAG_MSIX_ENABLED | I40E_FLAG_MSI_ENABLED)))
+		dev_info(&pf->pdev->dev, "MSIX and MSI not available, falling back to Legacy IRQ\n");
+
 	/* track first vector for misc interrupts */
 	err = i40e_get_lump(pf, pf->irq_pile, 1, I40E_PILE_VALID_BIT-1);
 }
@@ -6110,8 +6115,9 @@ static int i40e_vsi_setup_vectors(struct i40e_vsi *vsi)
 		goto vector_setup_out;
 	}
 
-	vsi->base_vector = i40e_get_lump(pf, pf->irq_pile,
-					 vsi->num_q_vectors, vsi->idx);
+	if (vsi->num_q_vectors)
+		vsi->base_vector = i40e_get_lump(pf, pf->irq_pile,
+						 vsi->num_q_vectors, vsi->idx);
 	if (vsi->base_vector < 0) {
 		dev_info(&pf->pdev->dev,
 			 "failed to get q tracking for VSI %d, err=%d\n",
-- 
1.8.3.1

^ permalink raw reply related

* [net-next v2 05/14] i40e: debugfs fixups
From: Jeff Kirsher @ 2013-10-22 14:22 UTC (permalink / raw)
  To: davem
  Cc: Jesse Brandeburg, netdev, gospo, sassmann, Hannes Frederic Sowa,
	Jeff Kirsher
In-Reply-To: <1382451757-9817-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

debugfs fixes for issues found by coverity.

This issue was identified by the coverity checker, reported by Hannes Frederic
Sowa.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 38 ++++++++++++++++++--------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 19e248f..304f39d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -2019,21 +2019,35 @@ static const struct file_operations i40e_dbg_netdev_ops_fops = {
  **/
 void i40e_dbg_pf_init(struct i40e_pf *pf)
 {
-	struct dentry *pfile __attribute__((unused));
+	struct dentry *pfile;
 	const char *name = pci_name(pf->pdev);
+	const struct device *dev = &pf->pdev->dev;
 
 	pf->i40e_dbg_pf = debugfs_create_dir(name, i40e_dbg_root);
-	if (pf->i40e_dbg_pf) {
-		pfile = debugfs_create_file("command", 0600, pf->i40e_dbg_pf,
-					    pf, &i40e_dbg_command_fops);
-		pfile = debugfs_create_file("dump", 0600, pf->i40e_dbg_pf, pf,
-					    &i40e_dbg_dump_fops);
-		pfile = debugfs_create_file("netdev_ops", 0600, pf->i40e_dbg_pf,
-					    pf, &i40e_dbg_netdev_ops_fops);
-	} else {
-		dev_info(&pf->pdev->dev,
-			 "debugfs entry for %s failed\n", name);
-	}
+	if (!pf->i40e_dbg_pf)
+		return;
+
+	pfile = debugfs_create_file("command", 0600, pf->i40e_dbg_pf, pf,
+				    &i40e_dbg_command_fops);
+	if (!pfile)
+		goto create_failed;
+
+	pfile = debugfs_create_file("dump", 0600, pf->i40e_dbg_pf, pf,
+				    &i40e_dbg_dump_fops);
+	if (!pfile)
+		goto create_failed;
+
+	pfile = debugfs_create_file("netdev_ops", 0600, pf->i40e_dbg_pf, pf,
+				    &i40e_dbg_netdev_ops_fops);
+	if (!pfile)
+		goto create_failed;
+
+	return;
+
+create_failed:
+	dev_info(dev, "debugfs dir/file for %s failed\n", name);
+	debugfs_remove_recursive(pf->i40e_dbg_pf);
+	return;
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox