Netdev List

Netdev List
 help / color / mirror / Atom feed

* e1000e, 3.4.1, , jumbo frames are not working
From: Denys Fedoryshchenko @ 2012-06-08 14:59 UTC (permalink / raw)
  To: jeffrey.t.kirsher, rkagan, stable, jesse.brandeburg,
	bruce.w.allan, carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
	peter.p.waskiewicz.jr, alexander.h.duyck, john.ronciak, dnelson,
	e1000-devel, netdev, linux-kernel

Just tried to enable larger mtu on interface, and failed, even i try to 
disable both of mentioned in dmesg offloads.

05:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit 
Ethernet Controller (Copper) (rev 01)
05:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit 
Ethernet Controller (Copper) (rev 01)

L2TP ~ # ethtool -i eth0
driver: e1000e
version: 1.9.5-k
firmware-version: 1.0-0
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes

Here is what i did:
ifconfig eth1 mtu 2000
got dmesg:
[ 9160.679354] e1000e 0000:05:00.0: eth0: Jumbo frames cannot be 
enabled when both receive checksum offload and receive hashing are 
enabled.  Disable one of the receive offload features before enabling 
jumbos.

ethtool -K eth1 rxhash off
dmesg:
[ 9194.208856] e1000e 0000:05:00.1: eth1: Reset adapter
[ 9197.295425] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow 
Control: None

Lost connectivity for few seconds, then again i tried to:
ifconfig eth1 mtu 2000

dmesg:
[ 9207.797616] e1000e 0000:05:00.0: eth0: Jumbo frames cannot be 
enabled when both receive checksum offload and receive hashing are 
enabled.  Disable one of the receive offload features before enabling 
jumbos.

ethtool -K eth1 rx off
[ 9222.398034] e1000e 0000:05:00.1: eth1: Reset adapter
[ 9225.497550] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow 
Control: None

Again, tried: ifconfig eth1 mtu 2000
dmesg:
[ 9254.795445] e1000e 0000:05:00.0: eth0: Jumbo frames cannot be 
enabled when both receive checksum offload and receive hashing are 
enabled.  Disable one of the receive offload features before enabling 
jumbos.

L2TP ~ # ethtool -k eth1
Offload parameters for eth1:
rx-checksumming: off
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: off

L2TP ~ # ethtool -d eth1
MAC Registers
-------------
0x00000: CTRL (Device control register)  0x40144241
       Endian mode (buffers):             little
       Link reset:                        normal
       Set link up:                       1
       Invert Loss-Of-Signal:             no
       Receive flow control:              disabled
       Transmit flow control:             disabled
       VLAN mode:                         enabled
       Auto speed detect:                 disabled
       Speed select:                      1000Mb/s
       Force speed:                       no
       Force duplex:                      no
0x00008: STATUS (Device status register) 0x02080787
       Duplex:                            full
       Link up:                           link config
       TBI mode:                          disabled
       Link speed:                        1000Mb/s
       Bus type:                          PCI Express
       Port number:                       1
0x00100: RCTL (Receive control register) 0x04008002
       Receiver:                          enabled
       Store bad packets:                 disabled
       Unicast promiscuous:               disabled
       Multicast promiscuous:             disabled
       Long packet:                       disabled
       Descriptor minimum threshold size: 1/2
       Broadcast accept mode:             accept
       VLAN filter:                       disabled
       Canonical form indicator:          disabled
       Discard pause frames:              filtered
       Pass MAC control frames:           don't pass
       Receive buffer size:               2048
0x02808: RDLEN (Receive desc length)     0x00001000
0x02810: RDH   (Receive desc head)       0x00000046
0x02818: RDT   (Receive desc tail)       0x00000040
0x02820: RDTR  (Receive delay timer)     0x00000020
0x00400: TCTL (Transmit ctrl register)   0x3103F0FA
       Transmitter:                       enabled
       Pad short packets:                 enabled
       Software XOFF Transmission:        disabled
       Re-transmit on late collision:     enabled
0x03808: TDLEN (Transmit desc length)    0x00001000
0x03810: TDH   (Transmit desc head)      0x00000098
0x03818: TDT   (Transmit desc tail)      0x00000099
0x03820: TIDV  (Transmit delay timer)    0x00000008
PHY type:                                unknown


---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.

^ permalink raw reply

* [PATCH net-next] af_unix: speedup /proc/net/unix
From: Eric Dumazet @ 2012-06-08 15:03 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Steven Whitehouse, Pavel Emelyanov

From: Eric Dumazet <edumazet@google.com>

/proc/net/unix has quadratic behavior, and can hold unix_table_lock for
a while if high number of unix sockets are alive. (90 ms for 200k
sockets...)

We already have a hash table, so its quite easy to use it.

Problem is unbound sockets are still hashed in a single hash slot
(unix_socket_table[UNIX_HASH_TABLE])

This patch also spreads unbound sockets to 256 hash slots, to speedup
both /proc/net/unix and unix_diag.

Time to read /proc/net/unix with 200k unix sockets :
(time dd if=/proc/net/unix of=/dev/null bs=4k)

before : 520 secs
after : 2 secs

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
---
 include/net/af_unix.h |    3 -
 net/unix/af_unix.c    |  110 +++++++++++++++++++++++-----------------
 net/unix/diag.c       |    6 +-
 3 files changed, 70 insertions(+), 49 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 2ee33da..b5f8988 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -14,10 +14,11 @@ extern struct sock *unix_get_socket(struct file *filp);
 extern struct sock *unix_peer_get(struct sock *);
 
 #define UNIX_HASH_SIZE	256
+#define UNIX_HASH_BITS	8
 
 extern unsigned int unix_tot_inflight;
 extern spinlock_t unix_table_lock;
-extern struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1];
+extern struct hlist_head unix_socket_table[2 * UNIX_HASH_SIZE];
 
 struct unix_address {
 	atomic_t	refcnt;
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 641f2e4..cf83f6b 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -115,15 +115,24 @@
 #include <net/checksum.h>
 #include <linux/security.h>
 
-struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1];
+struct hlist_head unix_socket_table[2 * UNIX_HASH_SIZE];
 EXPORT_SYMBOL_GPL(unix_socket_table);
 DEFINE_SPINLOCK(unix_table_lock);
 EXPORT_SYMBOL_GPL(unix_table_lock);
 static atomic_long_t unix_nr_socks;
 
-#define unix_sockets_unbound	(&unix_socket_table[UNIX_HASH_SIZE])
 
-#define UNIX_ABSTRACT(sk)	(unix_sk(sk)->addr->hash != UNIX_HASH_SIZE)
+static struct hlist_head *unix_sockets_unbound(void *addr)
+{
+	unsigned long hash = (unsigned long)addr;
+
+	hash ^= hash >> 16;
+	hash ^= hash >> 8;
+	hash %= UNIX_HASH_SIZE;
+	return &unix_socket_table[UNIX_HASH_SIZE + hash];
+}
+
+#define UNIX_ABSTRACT(sk)	(unix_sk(sk)->addr->hash < UNIX_HASH_SIZE)
 
 #ifdef CONFIG_SECURITY_NETWORK
 static void unix_get_secdata(struct scm_cookie *scm, struct sk_buff *skb)
@@ -645,7 +654,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock)
 	INIT_LIST_HEAD(&u->link);
 	mutex_init(&u->readlock); /* single task reading lock */
 	init_waitqueue_head(&u->peer_wait);
-	unix_insert_socket(unix_sockets_unbound, sk);
+	unix_insert_socket(unix_sockets_unbound(sk), sk);
 out:
 	if (sk == NULL)
 		atomic_long_dec(&unix_nr_socks);
@@ -2239,47 +2248,58 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
 }
 
 #ifdef CONFIG_PROC_FS
-static struct sock *first_unix_socket(int *i)
-{
-	for (*i = 0; *i <= UNIX_HASH_SIZE; (*i)++) {
-		if (!hlist_empty(&unix_socket_table[*i]))
-			return __sk_head(&unix_socket_table[*i]);
-	}
-	return NULL;
-}
 
-static struct sock *next_unix_socket(int *i, struct sock *s)
-{
-	struct sock *next = sk_next(s);
-	/* More in this chain? */
-	if (next)
-		return next;
-	/* Look for next non-empty chain. */
-	for ((*i)++; *i <= UNIX_HASH_SIZE; (*i)++) {
-		if (!hlist_empty(&unix_socket_table[*i]))
-			return __sk_head(&unix_socket_table[*i]);
-	}
-	return NULL;
-}
+#define BUCKET_SPACE (BITS_PER_LONG - (UNIX_HASH_BITS + 1) - 1)
+
+#define get_bucket(x) ((x) >> BUCKET_SPACE)
+#define get_offset(x) ((x) & ((1L << BUCKET_SPACE) - 1))
+#define set_bucket_offset(b, o) ((b) << BUCKET_SPACE | (o))
 
 struct unix_iter_state {
 	struct seq_net_private p;
-	int i;
 };
 
-static struct sock *unix_seq_idx(struct seq_file *seq, loff_t pos)
+static struct sock *unix_from_bucket(struct seq_file *seq, loff_t *pos)
 {
-	struct unix_iter_state *iter = seq->private;
-	loff_t off = 0;
-	struct sock *s;
+	unsigned long offset = get_offset(*pos);
+	unsigned long bucket = get_bucket(*pos);
+	struct sock *sk;
+	unsigned long count = 0;
 
-	for (s = first_unix_socket(&iter->i); s; s = next_unix_socket(&iter->i, s)) {
-		if (sock_net(s) != seq_file_net(seq))
+	for (sk = sk_head(&unix_socket_table[bucket]); sk; sk = sk_next(sk)) {
+		if (sock_net(sk) != seq_file_net(seq))
 			continue;
-		if (off == pos)
-			return s;
-		++off;
+		if (++count == offset)
+			break;
 	}
+
+	return sk;
+}
+
+static struct sock *unix_next_socket(struct seq_file *seq,
+				     struct sock *sk,
+				     loff_t *pos)
+{
+	unsigned long bucket;
+
+	while (sk > (struct sock *)SEQ_START_TOKEN) {
+		sk = sk_next(sk);
+		if (!sk)
+			goto next_bucket;
+		if (sock_net(sk) == seq_file_net(seq))
+			return sk;
+	}
+
+	do {
+		sk = unix_from_bucket(seq, pos);
+		if (sk)
+			return sk;
+
+next_bucket:
+		bucket = get_bucket(*pos) + 1;
+		*pos = set_bucket_offset(bucket, 1);
+	} while (bucket < ARRAY_SIZE(unix_socket_table));
+
 	return NULL;
 }
 
@@ -2287,22 +2307,20 @@ static void *unix_seq_start(struct seq_file *seq, loff_t *pos)
 	__acquires(unix_table_lock)
 {
 	spin_lock(&unix_table_lock);
-	return *pos ? unix_seq_idx(seq, *pos - 1) : SEQ_START_TOKEN;
+
+	if (!*pos)
+		return SEQ_START_TOKEN;
+
+	if (get_bucket(*pos) >= ARRAY_SIZE(unix_socket_table))
+		return NULL;
+
+	return unix_next_socket(seq, NULL, pos);
 }
 
 static void *unix_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-	struct unix_iter_state *iter = seq->private;
-	struct sock *sk = v;
 	++*pos;
-
-	if (v == SEQ_START_TOKEN)
-		sk = first_unix_socket(&iter->i);
-	else
-		sk = next_unix_socket(&iter->i, sk);
-	while (sk && (sock_net(sk) != seq_file_net(seq)))
-		sk = next_unix_socket(&iter->i, sk);
-	return sk;
+	return unix_next_socket(seq, v, pos);
 }
 
 static void unix_seq_stop(struct seq_file *seq, void *v)
diff --git a/net/unix/diag.c b/net/unix/diag.c
index 47d3002..7e8a24b 100644
--- a/net/unix/diag.c
+++ b/net/unix/diag.c
@@ -195,7 +195,9 @@ static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	num = s_num = cb->args[1];
 
 	spin_lock(&unix_table_lock);
-	for (slot = s_slot; slot <= UNIX_HASH_SIZE; s_num = 0, slot++) {
+	for (slot = s_slot;
+	     slot < ARRAY_SIZE(unix_socket_table);
+	     s_num = 0, slot++) {
 		struct sock *sk;
 		struct hlist_node *node;
 
@@ -228,7 +230,7 @@ static struct sock *unix_lookup_by_ino(int ino)
 	struct sock *sk;
 
 	spin_lock(&unix_table_lock);
-	for (i = 0; i <= UNIX_HASH_SIZE; i++) {
+	for (i = 0; i < ARRAY_SIZE(unix_socket_table); i++) {
 		struct hlist_node *node;
 
 		sk_for_each(sk, node, &unix_socket_table[i])

^ permalink raw reply related

* Re: [PATCH] bonding: Fix corrupted queue_mapping
From: Tom Herbert @ 2012-06-08 15:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1339140238.6001.42.camel@edumazet-glaptop>

> I must say I dont understand dev_pick_tx() anymore.
>
> It seems to ignore skb->queue_mapping (unless device provides its own
> ndo_select_queue() and this functions is aware of skb->queue_mapping, as
> correctly done in ixgbe)
>
> So commit fff3269907897ee (tcp: reflect SYN queue_mapping into SYNACK
> packets) works on ixgbe, but probably not on other multiqueue devices.
>
> This sounds like a regression to me.
>
Maybe the fundamental issue is that the queue mappings only allow for
one level of multi queue device.  It might be better if bonding didn't
have one and dev_pick_tx did the right thin (use xps on bonding
maybe).

Tom

>
>

^ permalink raw reply

* Re: [PATCH] bonding: Fix corrupted queue_mapping
From: Eric Dumazet @ 2012-06-08 15:11 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Miller, netdev
In-Reply-To: <CA+mtBx9xpy6Ly=3kiBc+sdHqFfsO5KRK8O=VAXTb3rofAgm2ow@mail.gmail.com>

On Fri, 2012-06-08 at 08:04 -0700, Tom Herbert wrote:

> Maybe the fundamental issue is that the queue mappings only allow for
> one level of multi queue device.  It might be better if bonding didn't
> have one and dev_pick_tx did the right thin (use xps on bonding
> maybe).

bonding misuses multiqueue infrastructure to divert frames on selected
slaves, or maybe I am wrong.

^ permalink raw reply

* Re: [PATCH] usbnet: Activate the halt interrupt endpoint to fix endless "XactErr" error
From: Huajun Li @ 2012-06-08 15:24 UTC (permalink / raw)
  To: Alan Stern; +Cc: Ming Lei, David S. Miller, linux-usb, netdev
In-Reply-To: <Pine.LNX.4.44L0.1206080940350.1360-100000@iolanthe.rowland.org>

On Fri, Jun 8, 2012 at 9:43 PM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Fri, 8 Jun 2012, Huajun Li wrote:
>
>> > If so, looks mistaken value is returned from the host controller driver,
>> > but not sure if your device is buggy. What is your host controller?
>> >
>> Nothing related to HC.
>> I tried to find out the endpoint state, but found it was halt. I think
>> this is the root cause.
>
> No, it isn't.  Endpoint halt causes a -EPIPE error, not -EPROTO.
> -EPROTO indicates that the device's firmware has crashed.
>
>> What's your opinion to handle "-EPROTO" error in usbnet.c?
>> Please check usbnet.c again, when "-EPROTO" occurs, it just pints
>> error msg and re-submit the interrupt URB, and then causes endless
>> "EactErr" error msg.
>
> One possibility is to wait for a little while before resubmitting the
> URB, and after 10 failures in a row, attempt a reset.
>
Alan, thanks for your proposal.
You mean reset the device after 10 failures, right ?

BTW, I ever tried to sleep several seconds before submitting the 1st
interrupt URB, but it did not work.

^ permalink raw reply

* RE: e1000e, 3.4.1, , jumbo frames are not working
From: Allan, Bruce W @ 2012-06-08 15:38 UTC (permalink / raw)
  To: Denys Fedoryshchenko, Kirsher, Jeffrey T, rkagan@parallels.com,
	stable@vger.kernel.org, Brandeburg, Jesse, Wyborny, Carolyn,
	Skidmore, Donald C, Rose, Gregory V, Waskiewicz Jr, Peter P,
	Duyck, Alexander H, Ronciak, John, dnelson@redhat.com,
	e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <87ed034d58606650514bdd3b88b0b002@visp.net.lb>

> -----Original Message-----
> From: Denys Fedoryshchenko [mailto:denys@visp.net.lb]
> Sent: Friday, June 08, 2012 7:59 AM
> To: Kirsher, Jeffrey T; rkagan@parallels.com; stable@vger.kernel.org;
> Brandeburg, Jesse; Allan, Bruce W; Wyborny, Carolyn; Skidmore, Donald
> C; Rose, Gregory V; Waskiewicz Jr, Peter P; Duyck, Alexander H;
> Ronciak, John; dnelson@redhat.com; e1000-devel@lists.sourceforge.net;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: e1000e, 3.4.1, , jumbo frames are not working
> 
> Just tried to enable larger mtu on interface, and failed, even i try
> to
> disable both of mentioned in dmesg offloads.
> 
> 05:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
> Ethernet Controller (Copper) (rev 01)
> 05:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit
> Ethernet Controller (Copper) (rev 01)
> 
> L2TP ~ # ethtool -i eth0
> driver: e1000e
> version: 1.9.5-k
> firmware-version: 1.0-0
> bus-info: 0000:05:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> 
> Here is what i did:
> ifconfig eth1 mtu 2000
> got dmesg:
> [ 9160.679354] e1000e 0000:05:00.0: eth0: Jumbo frames cannot be
> enabled when both receive checksum offload and receive hashing are
> enabled.  Disable one of the receive offload features before enabling
> jumbos.
> 
> ethtool -K eth1 rxhash off
> dmesg:
> [ 9194.208856] e1000e 0000:05:00.1: eth1: Reset adapter
> [ 9197.295425] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow
> Control: None
> 
> Lost connectivity for few seconds, then again i tried to:
> ifconfig eth1 mtu 2000
> 
> dmesg:
> [ 9207.797616] e1000e 0000:05:00.0: eth0: Jumbo frames cannot be
> enabled when both receive checksum offload and receive hashing are
> enabled.  Disable one of the receive offload features before enabling
> jumbos.
> 
> ethtool -K eth1 rx off
> [ 9222.398034] e1000e 0000:05:00.1: eth1: Reset adapter
> [ 9225.497550] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow
> Control: None
> 
> Again, tried: ifconfig eth1 mtu 2000
> dmesg:
> [ 9254.795445] e1000e 0000:05:00.0: eth0: Jumbo frames cannot be
> enabled when both receive checksum offload and receive hashing are
> enabled.  Disable one of the receive offload features before enabling
> jumbos.
> 
> L2TP ~ # ethtool -k eth1
> Offload parameters for eth1:
> rx-checksumming: off
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: on
> udp-fragmentation-offload: off
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off
> rx-vlan-offload: on
> tx-vlan-offload: on
> ntuple-filters: off
> receive-hashing: off
> 
> L2TP ~ # ethtool -d eth1
> MAC Registers
> -------------
> 0x00000: CTRL (Device control register)  0x40144241
>        Endian mode (buffers):             little
>        Link reset:                        normal
>        Set link up:                       1
>        Invert Loss-Of-Signal:             no
>        Receive flow control:              disabled
>        Transmit flow control:             disabled
>        VLAN mode:                         enabled
>        Auto speed detect:                 disabled
>        Speed select:                      1000Mb/s
>        Force speed:                       no
>        Force duplex:                      no
> 0x00008: STATUS (Device status register) 0x02080787
>        Duplex:                            full
>        Link up:                           link config
>        TBI mode:                          disabled
>        Link speed:                        1000Mb/s
>        Bus type:                          PCI Express
>        Port number:                       1
> 0x00100: RCTL (Receive control register) 0x04008002
>        Receiver:                          enabled
>        Store bad packets:                 disabled
>        Unicast promiscuous:               disabled
>        Multicast promiscuous:             disabled
>        Long packet:                       disabled
>        Descriptor minimum threshold size: 1/2
>        Broadcast accept mode:             accept
>        VLAN filter:                       disabled
>        Canonical form indicator:          disabled
>        Discard pause frames:              filtered
>        Pass MAC control frames:           don't pass
>        Receive buffer size:               2048
> 0x02808: RDLEN (Receive desc length)     0x00001000
> 0x02810: RDH   (Receive desc head)       0x00000046
> 0x02818: RDT   (Receive desc tail)       0x00000040
> 0x02820: RDTR  (Receive delay timer)     0x00000020
> 0x00400: TCTL (Transmit ctrl register)   0x3103F0FA
>        Transmitter:                       enabled
>        Pad short packets:                 enabled
>        Software XOFF Transmission:        disabled
>        Re-transmit on late collision:     enabled
> 0x03808: TDLEN (Transmit desc length)    0x00001000
> 0x03810: TDH   (Transmit desc head)      0x00000098
> 0x03818: TDT   (Transmit desc tail)      0x00000099
> 0x03820: TIDV  (Transmit delay timer)    0x00000008
> PHY type:                                unknown
> 
> 
> ---
> Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.

Thanks for the report.  We are aware of the issue and have already begun
working on a patch to resolve the issue.

Bruce.


^ permalink raw reply

* Re: Deadlock, L2TP over IP are not working, 3.4.1
From: Benjamin LaHaise @ 2012-06-08 15:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Francois Romieu, Denys Fedoryshchenko, davem, netdev,
	linux-kernel
In-Reply-To: <1339134438.6001.13.camel@edumazet-glaptop>

On Fri, Jun 08, 2012 at 07:47:18AM +0200, Eric Dumazet wrote:
> I have no idea how many l2tp_eth devices are setup at once in typical
> conf.

Depends on the usage scenario.  L2TP is commonly used for terminating 
customer connections by wholesale ISPs.  In that kind of edge routing 
use-case, tens of thousands of interfaces are easily possible.

		-ben
-- 
"Thought is the essence of where you are now."

^ permalink raw reply

* Re: 3.5.0+ - Linus GIT - WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xeb/0x15f()
From: Miles Lane @ 2012-06-08 15:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: LKML, Andrew Morton, Wim Van Sebroeck, Jay Cliburn, Chris Snook,
	netdev, Huang Xiong
In-Reply-To: <1339053257.26966.100.camel@edumazet-glaptop>

On Thu, Jun 7, 2012 at 3:14 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-06-07 at 08:39 +0200, Eric Dumazet wrote:
>> On Thu, 2012-06-07 at 02:16 -0400, Miles Lane wrote:
>> > WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xeb/0x15f()
>> > Hardware name: UL50VT
>> > NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
>> > Modules linked in: hfsplus hfs vfat msdos fat snd_hrtimer ipv6
>> > snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep
>> > snd_pcm_oss snd_seq_dummy snd_mixer_oss uvcvideo videobuf2_core
>> > snd_pcm videodev snd_seq_oss snd_seq_midi snd_rawmidi media
>> > snd_seq_midi_event acpi_cpufreq videobuf2_vmalloc videobuf2_memops
>> > snd_seq iwlwifi snd_timer snd_seq_device asus_laptop mac80211
>> > sparse_keymap snd cfg80211 coretemp soundcore psmouse snd_page_alloc
>> > rtc_cmos mperf processor evdev rfkill battery led_class input_polldev
>> > ac i915 nouveau sr_mod cdrom sd_mod ehci_hcd atl1c uhci_hcd intel_agp
>> > ttm usbcore intel_gtt usb_common drm_kms_helper thermal video
>> > thermal_sys hwmon button
>> > Pid: 3025, comm: hud-service Not tainted 3.5.0-rc1+ #128
>> > Call Trace:
>> >  <IRQ>  [<ffffffff8102d42f>] warn_slowpath_common+0x7e/0x97
>> >  [<ffffffff8102d4dc>] warn_slowpath_fmt+0x41/0x43
>> >  [<ffffffff81360f1c>] dev_watchdog+0xeb/0x15f
>> >  [<ffffffff8103af44>] run_timer_softirq+0x20e/0x356
>> >  [<ffffffff8103ae7e>] ? run_timer_softirq+0x148/0x356
>> >  [<ffffffff81360e31>] ? netif_tx_unlock+0x57/0x57
>> >  [<ffffffff810344f8>] __do_softirq+0x103/0x239
>> >  [<ffffffff8107122a>] ? clockevents_program_event+0x9c/0xb9
>> >  [<ffffffff8140a4cc>] call_softirq+0x1c/0x30
>> >  [<ffffffff81003bb9>] do_softirq+0x37/0x82
>> >  [<ffffffff81034888>] irq_exit+0x4c/0xb1
>> >  [<ffffffff8101ba71>] smp_apic_timer_interrupt+0x76/0x84
>> >  [<ffffffff81409adc>] apic_timer_interrupt+0x6c/0x80
>> >  <EOI>  [<ffffffff81105161>] ? fget_raw_light+0x4c/0x7d
>> >  [<ffffffff81105161>] ? fget_raw_light+0x4c/0x7d
>> >  [<ffffffff8111153b>] sys_fcntl+0x23/0x53b
>> >  [<ffffffff81004b68>] ? print_context_stack+0x44/0xb1
>> >  [<ffffffff81408fe2>] system_call_fastpath+0x16/0x1b
>> > ---[ end trace c1f284d9c873031d ]---
>>
>> CC netdev and Huang Xiong
>>
>> Atheros drivers are known to have buggy tx completion, its incredible...
>>
>> You could try following patch, not a 'perfect' solution, but a fix.
>
> And if you feel lucky, you could try the following one as well, a step
> into right direction :
>
>  drivers/net/ethernet/atheros/atl1c/atl1c_main.c |   86 ++++----------
>  1 file changed, 30 insertions(+), 56 deletions(-)
>
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 9cc1570..44940f4 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -1528,6 +1528,16 @@ static inline void atl1c_clear_phy_int(struct atl1c_adapter *adapter)
>        spin_unlock(&adapter->mdio_lock);
>  }
>
> +static inline u16 atl1c_tpd_avail(const struct atl1c_tpd_ring *tpd_ring)
> +{
> +       u16 next_to_use = tpd_ring->next_to_use;
> +       u16 next_to_clean = atomic_read(&tpd_ring->next_to_clean);
> +
> +       return (u16)(next_to_clean > next_to_use) ?
> +               (next_to_clean - next_to_use - 1) :
> +               (tpd_ring->count + next_to_clean - next_to_use - 1);
> +}
> +
>  static bool atl1c_clean_tx_irq(struct atl1c_adapter *adapter,
>                                enum atl1c_trans_queue type)
>  {
> @@ -1551,10 +1561,14 @@ static bool atl1c_clean_tx_irq(struct atl1c_adapter *adapter,
>                atomic_set(&tpd_ring->next_to_clean, next_to_clean);
>        }
>
> +       spin_lock(&adapter->tx_lock);
> +
>        if (netif_queue_stopped(adapter->netdev) &&
> -                       netif_carrier_ok(adapter->netdev)) {
> +           netif_carrier_ok(adapter->netdev) &&
> +           atl1c_tpd_avail(tpd_ring) >= tpd_ring->count / 4)
>                netif_wake_queue(adapter->netdev);
> -       }
> +
> +       spin_unlock(&adapter->tx_lock);
>
>        return true;
>  }
> @@ -1856,20 +1870,6 @@ static void atl1c_netpoll(struct net_device *netdev)
>  }
>  #endif
>
> -static inline u16 atl1c_tpd_avail(struct atl1c_adapter *adapter, enum atl1c_trans_queue type)
> -{
> -       struct atl1c_tpd_ring *tpd_ring = &adapter->tpd_ring[type];
> -       u16 next_to_use = 0;
> -       u16 next_to_clean = 0;
> -
> -       next_to_clean = atomic_read(&tpd_ring->next_to_clean);
> -       next_to_use   = tpd_ring->next_to_use;
> -
> -       return (u16)(next_to_clean > next_to_use) ?
> -               (next_to_clean - next_to_use - 1) :
> -               (tpd_ring->count + next_to_clean - next_to_use - 1);
> -}
> -
>  /*
>  * get next usable tpd
>  * Note: should call atl1c_tdp_avail to make sure
> @@ -1899,24 +1899,6 @@ atl1c_get_tx_buffer(struct atl1c_adapter *adapter, struct atl1c_tpd_desc *tpd)
>                        (struct atl1c_tpd_desc *)tpd_ring->desc];
>  }
>
> -/* Calculate the transmit packet descript needed*/
> -static u16 atl1c_cal_tpd_req(const struct sk_buff *skb)
> -{
> -       u16 tpd_req;
> -       u16 proto_hdr_len = 0;
> -
> -       tpd_req = skb_shinfo(skb)->nr_frags + 1;
> -
> -       if (skb_is_gso(skb)) {
> -               proto_hdr_len = skb_transport_offset(skb) + tcp_hdrlen(skb);
> -               if (proto_hdr_len < skb_headlen(skb))
> -                       tpd_req++;
> -               if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
> -                       tpd_req++;
> -       }
> -       return tpd_req;
> -}
> -
>  static int atl1c_tso_csum(struct atl1c_adapter *adapter,
>                          struct sk_buff *skb,
>                          struct atl1c_tpd_desc **tpd,
> @@ -2099,10 +2081,10 @@ static void atl1c_tx_map(struct atl1c_adapter *adapter,
>        buffer_info->skb = skb;
>  }
>
> -static void atl1c_tx_queue(struct atl1c_adapter *adapter, struct sk_buff *skb,
> -                          struct atl1c_tpd_desc *tpd, enum atl1c_trans_queue type)
> +static void atl1c_tx_queue(const struct atl1c_adapter *adapter,
> +                          const struct atl1c_tpd_ring *tpd_ring,
> +                          enum atl1c_trans_queue type)
>  {
> -       struct atl1c_tpd_ring *tpd_ring = &adapter->tpd_ring[type];
>        u16 reg;
>
>        reg = type == atl1c_trans_high ? REG_TPD_PRI1_PIDX : REG_TPD_PRI0_PIDX;
> @@ -2113,35 +2095,19 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
>                                          struct net_device *netdev)
>  {
>        struct atl1c_adapter *adapter = netdev_priv(netdev);
> -       unsigned long flags;
> -       u16 tpd_req = 1;
>        struct atl1c_tpd_desc *tpd;
>        enum atl1c_trans_queue type = atl1c_trans_normal;
> +       const struct atl1c_tpd_ring *tpd_ring = &adapter->tpd_ring[type];
>
>        if (test_bit(__AT_DOWN, &adapter->flags)) {
>                dev_kfree_skb_any(skb);
>                return NETDEV_TX_OK;
>        }
>
> -       tpd_req = atl1c_cal_tpd_req(skb);
> -       if (!spin_trylock_irqsave(&adapter->tx_lock, flags)) {
> -               if (netif_msg_pktdata(adapter))
> -                       dev_info(&adapter->pdev->dev, "tx locked\n");
> -               return NETDEV_TX_LOCKED;
> -       }
> -
> -       if (atl1c_tpd_avail(adapter, type) < tpd_req) {
> -               /* no enough descriptor, just stop queue */
> -               netif_stop_queue(netdev);
> -               spin_unlock_irqrestore(&adapter->tx_lock, flags);
> -               return NETDEV_TX_BUSY;
> -       }
> -
>        tpd = atl1c_get_tpd(adapter, type);
>
>        /* do TSO and check sum */
>        if (atl1c_tso_csum(adapter, skb, &tpd, type) != 0) {
> -               spin_unlock_irqrestore(&adapter->tx_lock, flags);
>                dev_kfree_skb_any(skb);
>                return NETDEV_TX_OK;
>        }
> @@ -2160,9 +2126,17 @@ static netdev_tx_t atl1c_xmit_frame(struct sk_buff *skb,
>                tpd->word1 |= 1 << TPD_ETH_TYPE_SHIFT; /* Ethernet frame */
>
>        atl1c_tx_map(adapter, skb, tpd, type);
> -       atl1c_tx_queue(adapter, skb, tpd, type);
> +       atl1c_tx_queue(adapter, tpd_ring, type);
> +
> +       if (atl1c_tpd_avail(tpd_ring) < MAX_SKB_FRAGS + 4) {
> +               unsigned long flags;
> +
> +               spin_lock_irqsave(&adapter->tx_lock, flags);
> +               if (atl1c_tpd_avail(tpd_ring) < MAX_SKB_FRAGS + 4)
> +                       netif_stop_queue(netdev);
> +               spin_unlock_irqrestore(&adapter->tx_lock, flags);
> +       }
>
> -       spin_unlock_irqrestore(&adapter->tx_lock, flags);
>        return NETDEV_TX_OK;
>  }
>
>
>

With this patch applied to Linus' GIT tree (updated last night), I get
this warning:

[  187.346706] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xeb/0x15f()
[  187.346709] Hardware name: UL50VT
[  187.346712] NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
[  187.346825] Modules linked in: snd_hrtimer ipv6
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep
snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss
snd_seq_midi snd_rawmidi uvcvideo videobuf2_core videodev acpi_cpufreq
media snd_seq_midi_event iwlwifi videobuf2_vmalloc snd_seq mac80211
asus_laptop rtc_cmos cfg80211 videobuf2_memops sparse_keymap snd_timer
snd_seq_device mperf battery ac led_class psmouse snd input_polldev
coretemp processor soundcore snd_page_alloc rfkill evdev acpi_call(O)
i915 nouveau fbcon tileblit ttm font bitblit softcursor drm_kms_helper
intel_agp intel_gtt drm agpgart sd_mod sr_mod cdrom uhci_hcd fb fbdev
i2c_algo_bit ehci_hcd i2c_core cfbcopyarea atl1c usbcore mxm_wmi
cfbimgblt usb_common cfbfillrect thermal video backlight thermal_sys
hwmon wmi button
[  187.346832] Pid: 2899, comm: compiz Tainted: G           O 3.5.0-rc1+ #131
[  187.346834] Call Trace:
[  187.346843]  <IRQ>  [<ffffffff8102d42f>] warn_slowpath_common+0x7e/0x97
[  187.346847]  [<ffffffff8102d4dc>] warn_slowpath_fmt+0x41/0x43
[  187.346863]  [<ffffffff81302028>] dev_watchdog+0xeb/0x15f
[  187.346869]  [<ffffffff8103af44>] run_timer_softirq+0x20e/0x356
[  187.346873]  [<ffffffff8103ae7e>] ? run_timer_softirq+0x148/0x356
[  187.346878]  [<ffffffff81301f3d>] ? netif_tx_unlock+0x57/0x57
[  187.346883]  [<ffffffff810344f8>] __do_softirq+0x103/0x239
[  187.346889]  [<ffffffff8107123a>] ? clockevents_program_event+0x9c/0xb9
[  187.346894]  [<ffffffff813ab38c>] call_softirq+0x1c/0x30
[  187.346899]  [<ffffffff81003bb9>] do_softirq+0x37/0x82
[  187.346903]  [<ffffffff81034888>] irq_exit+0x4c/0xb1
[  187.346909]  [<ffffffff8101ba71>] smp_apic_timer_interrupt+0x76/0x84
[  187.346914]  [<ffffffff813aa99c>] apic_timer_interrupt+0x6c/0x80
[  187.346921]  <EOI>  [<ffffffff813a9ec7>] ? sysret_check+0x1b/0x56
[  187.346925] ---[ end trace 954b24373ae625e3 ]---

^ permalink raw reply

* Re: [PATCH] usbnet: Activate the halt interrupt endpoint to fix endless "XactErr" error
From: Huajun Li @ 2012-06-08 15:54 UTC (permalink / raw)
  To: Ming Lei; +Cc: David S. Miller, linux-usb, netdev
In-Reply-To: <CACVXFVN0Y0p+WXUhJco6EKCdjx6Wg-guJLe79Bmj2K9KQGKToA@mail.gmail.com>

On Fri, Jun 8, 2012 at 9:56 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> On Fri, Jun 8, 2012 at 2:24 PM, Huajun Li <huajun.li.lee@gmail.com> wrote:
>> On Fri, Jun 8, 2012 at 1:22 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>>> If so, looks mistaken value is returned from the host controller driver,
>>> but not sure if your device is buggy. What is your host controller?
>>>
>> Nothing related to HC.
>> I tried to find out the endpoint state, but found it was halt. I think
>> this is the root cause.
>
> I mean that your HCD should not return -EPROTO if only the interrupt
> endpoint's HALT feature is set, and it should return -EPIPE.
>
>>
>>> Also suppose your device is buggy, and the correct solution should
>>> be addding quirk for the driver to clear halt before the 1st submit status
>>> urb.
>>>
>> I ever worked out a patch just as you said and it could work.
>> However, if this can be fixed by common framework just like usbnet.c,
>> and there is no sideeffect, then why not.
>
> There is side effect, at least sending out the command of
> clear feature(HALT) is mistaken in logic if  -EPROTO is returned for
> the endpoint.
>
>>>
>>> I just have tried to switch configuration by sysfs interface on the g_multi
>>> and don't trigger the error.
>>>
>> The driver is common one, but not just for a specific device.
>
> The problem is that your device is a specific buggy device, and the interrupt
> endpoint shouldn't be set HALT after SetConfiguration(), see 9.4.5 of USB 2.0
> spec.
>
> So it is reasonable to add a quirk to fix the problem for the device, that has
> document benefits, also considered that the device is a very specific case.
>

If add a quirk to fix this issue, it need copy usbnet_cdc_bind() and
write a new xxx_cdc_bind() just to let it clear the halt. What's your
proposal?

BTW, the device can work well if I modprobe  cdc_ether driver after
changed the configuration.

>>
>>>>
>>>>> Is the "XactErr" msg printed just after switching to cdc_ether interface
>>>>> by changing configuration?
>>>>>
>>>>
>>>> Yes, just as I mentioned in my original email.
>>>> And it did not work even I removed the driver and re-installed it.
>>>>
>>>>>> Maybe this is a common issue, so fix it by activating the endpoint
>>>>>> once the error occurs.
>>>>>>
>>>>>> Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
>>>>>> ---
>>>>>>  drivers/net/usb/usbnet.c   |   33 +++++++++++++++++++++++++++++++++
>>>>>>  include/linux/usb/usbnet.h |   15 ++++++++-------
>>>>>>  2 files changed, 41 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>>>>>> index 9f58330..f13922b 100644
>>>>>> --- a/drivers/net/usb/usbnet.c
>>>>>> +++ b/drivers/net/usb/usbnet.c
>>>>>> @@ -537,6 +537,11 @@ static void intr_complete (struct urb *urb)
>>>>>>                          "intr shutdown, code %d\n", status);
>>>>>>                return;
>>>>>>
>>>>>> +       case -EPIPE:
>>>>>> +       case -EPROTO:
>>>>>
>>>>> It is good to handle EPIPE error here, but looks it is no sense to
>>>>> clear halt for
>>>>> bus transfer failure. At least, no clear halt is done for returning -EPROTO from
>>>>> rx/tx transfer currently.
>>>>
>>>> Just as I said above, there is different issue can cause -EPROTO, at
>>>> least, for my case it is because the interrupt endpoint is not active.
>>>> If the error occurs, the driver need try to fix it instead of just
>>>> printing an error msg.
>>>
>>> One problem in your patch is that if the  -EPROTO is caused by bad cable
>>> or interference, clean halt may not be sent to device successfully, and will
>>> cause -EPROTO further.
>>
>> What's your opinion to handle "-EPROTO" error in usbnet.c?
>> Please check usbnet.c again, when "-EPROTO" occurs, it just pints
>> error msg and re-submit the interrupt URB, and then causes endless
>> "EactErr" error msg.
>
> Yes, it should be bug, but clear feature(HALT) is not correct for the situation.
>
>>
>> At least, this patch lets the driver try to fix the error before
>> resubmit the URB.
>>
>>>
>>>>
>>>>>
>>>>>> +               usbnet_defer_kevent(dev, EVENT_STS_HALT);
>>>>>> +               return;
>>>>>> +
>>>>>>        /* NOTE:  not throttling like RX/TX, since this endpoint
>>>>>>         * already polls infrequently
>>>>>>         */
>>>>>> @@ -967,6 +972,34 @@ fail_halt:
>>>>>>                }
>>>>>>        }
>>>>>>
>>>>>> +       if (test_bit(EVENT_STS_HALT, &dev->flags)) {
>>>>>> +               unsigned pipe;
>>>>>> +               struct usb_endpoint_descriptor *desc;
>>>>>> +
>>>>>> +               desc = &dev->status->desc;
>>>>>> +               pipe = usb_rcvintpipe(dev->udev,
>>>>>> +                       desc->bEndpointAddress & USB_ENDPOINT_NUMBER_MASK);
>>>>>> +               status = usb_autopm_get_interface(dev->intf);
>>>>>> +               if (status < 0)
>>>>>> +                       goto fail_sts;
>>>>>> +               status = usb_clear_halt(dev->udev, pipe);
>>>>>> +               usb_autopm_put_interface(dev->intf);
>>>>>> +
>>>>>> +               if (status < 0 && status != -EPIPE && status != -ESHUTDOWN) {
>>>>>> +fail_sts:
>>>>>> +                       netdev_err(dev->net,
>>>>>> +                               "can't clear intr halt, status %d\n", status);
>>>>>> +               } else {
>>>>>> +                       clear_bit(EVENT_STS_HALT, &dev->flags);
>>>>>> +                       memset(dev->interrupt->transfer_buffer, 0,
>>>>>> +                               dev->interrupt->transfer_buffer_length);
>>>>>
>>>>> The above is not necessary.
>>>>
>>>> Ming, do you mean the above one line, or others ?
>>>
>>> Yes, it is the above line.
>>>
>>
>> Then not sure whether the buffer will be tainted without this line.
>
> It isn't necessary,  the buffer should include valid data if URB->status
> returns zero.
>

That's great, thanks.

>
> Thanks,
> --
> Ming Lei

^ permalink raw reply

* Re: Deadlock, L2TP over IP are not working, 3.4.1
From: Denys Fedoryshchenko @ 2012-06-08 15:56 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Eric Dumazet, Francois Romieu, davem, netdev, linux-kernel
In-Reply-To: <20120608154106.GD5024@kvack.org>

On 2012-06-08 18:41, Benjamin LaHaise wrote:
> On Fri, Jun 08, 2012 at 07:47:18AM +0200, Eric Dumazet wrote:
>> I have no idea how many l2tp_eth devices are setup at once in 
>> typical
>> conf.
>
> Depends on the usage scenario.  L2TP is commonly used for terminating
> customer connections by wholesale ISPs.  In that kind of edge routing
> use-case, tens of thousands of interfaces are easily possible.
>
> 		-ben
In my case it is few hundreds. I am not sure it is typical case, but i 
really will like if this setup
will give me good performance. Since Linux usually used on the PC's, i 
don't think they will scale more
than 2-3 Gbps. Also L2TP pseudowire (l2tp_eth) usually done not 
directly to end-users, but to LAC's, so i think
it is up to thousand.

Right now i am routing around 700Mbps over there, and noticed some 
problem with pskb_expand_head,
but that's separate question i will ask, if i will not be able to sort 
it out by myself.


---
Denys Fedoryshchenko, Network Engineer, Virtual ISP S.A.L.

^ permalink raw reply

* Re: Deadlock, L2TP over IP are not working, 3.4.1
From: Eric Dumazet @ 2012-06-08 16:04 UTC (permalink / raw)
  To: Denys Fedoryshchenko
  Cc: Benjamin LaHaise, Francois Romieu, davem, netdev, linux-kernel
In-Reply-To: <67d8beaf421874815145fdefc69b3366@visp.net.lb>

On Fri, 2012-06-08 at 18:56 +0300, Denys Fedoryshchenko wrote:

> Right now i am routing around 700Mbps over there, and noticed some 
> problem with pskb_expand_head,
> but that's separate question i will ask, if i will not be able to sort 
> it out by myself.
> 

Before spending too much time on this, make sure you use a kernel
including commit 617c8c11236 ( skb: avoid unnecessary reallocations in
__skb_cow )

^ permalink raw reply

* Re: [PATCH] usbnet: Activate the halt interrupt endpoint to fix endless "XactErr" error
From: Alan Stern @ 2012-06-08 16:13 UTC (permalink / raw)
  To: Huajun Li
  Cc: Ming Lei, David S. Miller, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CA+v9cxag9NV2ud+oupyziLN3nLgkgj+kyTcaOnEjYNjqXtM4Bw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, 8 Jun 2012, Huajun Li wrote:

> > One possibility is to wait for a little while before resubmitting the
> > URB, and after 10 failures in a row, attempt a reset.
> >
> Alan, thanks for your proposal.
> You mean reset the device after 10 failures, right ?

Yes.  But I don't know if it will help your problem.

> BTW, I ever tried to sleep several seconds before submitting the 1st
> interrupt URB, but it did not work.

As Ming Lei said, it sounds like your device isn't working right.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] bonding: Fix corrupted queue_mapping
From: John Fastabend @ 2012-06-08 16:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Tom Herbert, David Miller, netdev
In-Reply-To: <1339168272.6001.116.camel@edumazet-glaptop>

On 6/8/2012 8:11 AM, Eric Dumazet wrote:
> On Fri, 2012-06-08 at 08:04 -0700, Tom Herbert wrote:
>
>> Maybe the fundamental issue is that the queue mappings only allow for
>> one level of multi queue device.  It might be better if bonding didn't
>> have one and dev_pick_tx did the right thin (use xps on bonding
>> maybe).
>
> bonding misuses multiqueue infrastructure to divert frames on selected
> slaves, or maybe I am wrong.
>

This is right see bond_slave_override() here the slaves queue_ids
are mapped to skb->queue_mapping via this TX_QUEUE_OVERRIDE param.

^ permalink raw reply

* [PATCH] l2tp: fix a race in l2tp_ip_sendmsg()
From: Eric Dumazet @ 2012-06-08 16:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, James Chapman, Denys Fedoryshchenko

From: Eric Dumazet <edumazet@google.com>

Commit 081b1b1bb27f (l2tp: fix l2tp_ip_sendmsg() route handling) added
a race, in case IP route cache is disabled.

In this case, we should not do the dst_release(&rt->dst), since it'll
free the dst immediately, instead of waiting a RCU grace period.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: James Chapman <jchapman@katalix.com>
Cc: Denys Fedoryshchenko <denys@visp.net.lb>
---
 net/l2tp/l2tp_ip.c |    9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 70614e7..61d8b75 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -464,10 +464,12 @@ static int l2tp_ip_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *m
 					   sk->sk_bound_dev_if);
 		if (IS_ERR(rt))
 			goto no_route;
-		if (connected)
+		if (connected) {
 			sk_setup_caps(sk, &rt->dst);
-		else
-			dst_release(&rt->dst); /* safe since we hold rcu_read_lock */
+		} else {
+			skb_dst_set(skb, &rt->dst);
+			goto xmit;
+		}
 	}
 
 	/* We dont need to clone dst here, it is guaranteed to not disappear.
@@ -475,6 +477,7 @@ static int l2tp_ip_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *m
 	 */
 	skb_dst_set_noref(skb, &rt->dst);
 
+xmit:
 	/* Queue the packet to IP for output */
 	rc = ip_queue_xmit(skb, &inet->cork.fl);
 	rcu_read_unlock();

^ permalink raw reply related

* Re: 3.5.0+ - Linus GIT - WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xeb/0x15f()
From: Miles Lane @ 2012-06-08 16:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: LKML, Andrew Morton, Wim Van Sebroeck, Jay Cliburn, Chris Snook,
	netdev, Huang Xiong
In-Reply-To: <1339051157.26966.97.camel@edumazet-glaptop>

On Thu, Jun 7, 2012 at 2:39 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2012-06-07 at 02:16 -0400, Miles Lane wrote:
>> WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xeb/0x15f()
>> Hardware name: UL50VT
>> NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
>> Modules linked in: hfsplus hfs vfat msdos fat snd_hrtimer ipv6
>> snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep
>> snd_pcm_oss snd_seq_dummy snd_mixer_oss uvcvideo videobuf2_core
>> snd_pcm videodev snd_seq_oss snd_seq_midi snd_rawmidi media
>> snd_seq_midi_event acpi_cpufreq videobuf2_vmalloc videobuf2_memops
>> snd_seq iwlwifi snd_timer snd_seq_device asus_laptop mac80211
>> sparse_keymap snd cfg80211 coretemp soundcore psmouse snd_page_alloc
>> rtc_cmos mperf processor evdev rfkill battery led_class input_polldev
>> ac i915 nouveau sr_mod cdrom sd_mod ehci_hcd atl1c uhci_hcd intel_agp
>> ttm usbcore intel_gtt usb_common drm_kms_helper thermal video
>> thermal_sys hwmon button
>> Pid: 3025, comm: hud-service Not tainted 3.5.0-rc1+ #128
>> Call Trace:
>>  <IRQ>  [<ffffffff8102d42f>] warn_slowpath_common+0x7e/0x97
>>  [<ffffffff8102d4dc>] warn_slowpath_fmt+0x41/0x43
>>  [<ffffffff81360f1c>] dev_watchdog+0xeb/0x15f
>>  [<ffffffff8103af44>] run_timer_softirq+0x20e/0x356
>>  [<ffffffff8103ae7e>] ? run_timer_softirq+0x148/0x356
>>  [<ffffffff81360e31>] ? netif_tx_unlock+0x57/0x57
>>  [<ffffffff810344f8>] __do_softirq+0x103/0x239
>>  [<ffffffff8107122a>] ? clockevents_program_event+0x9c/0xb9
>>  [<ffffffff8140a4cc>] call_softirq+0x1c/0x30
>>  [<ffffffff81003bb9>] do_softirq+0x37/0x82
>>  [<ffffffff81034888>] irq_exit+0x4c/0xb1
>>  [<ffffffff8101ba71>] smp_apic_timer_interrupt+0x76/0x84
>>  [<ffffffff81409adc>] apic_timer_interrupt+0x6c/0x80
>>  <EOI>  [<ffffffff81105161>] ? fget_raw_light+0x4c/0x7d
>>  [<ffffffff81105161>] ? fget_raw_light+0x4c/0x7d
>>  [<ffffffff8111153b>] sys_fcntl+0x23/0x53b
>>  [<ffffffff81004b68>] ? print_context_stack+0x44/0xb1
>>  [<ffffffff81408fe2>] system_call_fastpath+0x16/0x1b
>> ---[ end trace c1f284d9c873031d ]---
>
> CC netdev and Huang Xiong
>
> Atheros drivers are known to have buggy tx completion, its incredible...
>
> You could try following patch, not a 'perfect' solution, but a fix.
>
> Thanks
>
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index 9cc1570..31224f3 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -1551,10 +1551,12 @@ static bool atl1c_clean_tx_irq(struct atl1c_adapter *adapter,
>                atomic_set(&tpd_ring->next_to_clean, next_to_clean);
>        }
>
> +       spin_lock(&adapter->tx_lock);
>        if (netif_queue_stopped(adapter->netdev) &&
>                        netif_carrier_ok(adapter->netdev)) {
>                netif_wake_queue(adapter->netdev);
>        }
> +       spin_unlock(&adapter->tx_lock);
>
>        return true;
>  }
>
>
>
>

I tested this patch as well and got the following (identical to the
warning with the other patch you sent):

[  704.534177] atl1c 0000:04:00.0: atl1c: eth0 NIC Link is Up<100 Mbps
Full Duplex>
[  714.346649] ------------[ cut here ]------------
[  714.346670] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xeb/0x15f()
[  714.346674] Hardware name: UL50VT
[  714.346679] NETDEV WATCHDOG: eth0 (atl1c): transmit queue 0 timed out
[  714.346854] Modules linked in: snd_hrtimer ipv6
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep
snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss
snd_seq_midi snd_rawmidi snd_seq_midi_event acpi_cpufreq snd_seq
iwlwifi uvcvideo snd_timer videobuf2_core snd_seq_device mac80211
videodev snd media videobuf2_vmalloc videobuf2_memops coretemp psmouse
cfg80211 soundcore snd_page_alloc rtc_cmos ac mperf battery processor
asus_laptop evdev sparse_keymap rfkill led_class input_polldev
acpi_call(O) i915 nouveau ttm fbcon tileblit font bitblit softcursor
drm_kms_helper drm fb fbdev intel_agp sr_mod cdrom ehci_hcd uhci_hcd
sd_mod cfbcopyarea i2c_algo_bit intel_gtt agpgart i2c_core mxm_wmi
atl1c usbcore cfbimgblt cfbfillrect usb_common thermal video backlight
thermal_sys hwmon wmi button
[  714.346864] Pid: 3230, comm: unity-panel-ser Tainted: G           O
3.5.0-rc1+ #132
[  714.346867] Call Trace:
[  714.346882]  <IRQ>  [<ffffffff8102d42f>] warn_slowpath_common+0x7e/0x97
[  714.346890]  [<ffffffff8102d4dc>] warn_slowpath_fmt+0x41/0x43
[  714.346914]  [<ffffffff81302028>] dev_watchdog+0xeb/0x15f
[  714.346923]  [<ffffffff8103af44>] run_timer_softirq+0x20e/0x356
[  714.346930]  [<ffffffff8103ae7e>] ? run_timer_softirq+0x148/0x356
[  714.346938]  [<ffffffff81301f3d>] ? netif_tx_unlock+0x57/0x57
[  714.346946]  [<ffffffff810344f8>] __do_softirq+0x103/0x239
[  714.346954]  [<ffffffff81071246>] ? clockevents_program_event+0x9c/0xb9
[  714.346964]  [<ffffffff813ab38c>] call_softirq+0x1c/0x30
[  714.346971]  [<ffffffff81003bb9>] do_softirq+0x37/0x82
[  714.346977]  [<ffffffff81034888>] irq_exit+0x4c/0xb1
[  714.346987]  [<ffffffff8101ba71>] smp_apic_timer_interrupt+0x76/0x84
[  714.346994]  [<ffffffff813aa99c>] apic_timer_interrupt+0x6c/0x80
[  714.347006]  <EOI>  [<ffffffff813a9ec7>] ? sysret_check+0x1b/0x56
[  714.347011] ---[ end trace 8a2db16274f46b16 ]---

^ permalink raw reply

* Re: 3.5.0+ - Linus GIT - WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0xeb/0x15f()
From: Eric Dumazet @ 2012-06-08 16:33 UTC (permalink / raw)
  To: Miles Lane
  Cc: LKML, Andrew Morton, Wim Van Sebroeck, Jay Cliburn, Chris Snook,
	netdev, Huang Xiong
In-Reply-To: <CAHFgRy-u3Xh=1XjrZi2x+=6bBdQ=Ve8i6jer9ppjYEWun70hxQ@mail.gmail.com>

On Fri, 2012-06-08 at 12:26 -0400, Miles Lane wrote:

> I tested this patch as well and got the following (identical to the
> warning with the other patch you sent):

Is this card working on a previous kernel ?

If yes, you probably should do a bisection.

^ permalink raw reply

* Re: [PATCH] usbnet: Activate the halt interrupt endpoint to fix endless "XactErr" error
From: Huajun Li @ 2012-06-08 16:36 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ming Lei, David S. Miller, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <Pine.LNX.4.44L0.1206081212150.1360-100000-IYeN2dnnYyZXsRXLowluHWD2FQJk+8+b@public.gmane.org>

On Sat, Jun 9, 2012 at 12:13 AM, Alan Stern <stern-nwvwT67g6+6dFdvTe/nMLpVzexx5G7lz@public.gmane.org> wrote:
> On Fri, 8 Jun 2012, Huajun Li wrote:
>
>> > One possibility is to wait for a little while before resubmitting the
>> > URB, and after 10 failures in a row, attempt a reset.
>> >
>> Alan, thanks for your proposal.
>> You mean reset the device after 10 failures, right ?
>
> Yes.  But I don't know if it will help your problem.
>

usb_queue_reset_device() could make it work again, I ever tried this.

And I tried usb_clear_halt() and found this also made it work, so
drafted this patch.

>> BTW, I ever tried to sleep several seconds before submitting the 1st
>> interrupt URB, but it did not work.
>
> As Ming Lei said, it sounds like your device isn't working right.
>

Yes.
However, if add a quirk to fix the issue,  I did not find a elegant
solution, at least based on current usbnet/cdc_ether framework.
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* BUG (?) multicast loopback (IP6SKB_FORWARDED)
From: maxd @ 2012-06-08 16:48 UTC (permalink / raw)
  To: netdev

Hi guys,
I found a probably wrong behaviour while doing some tests with multicast 
routing on IPv6 with kernel 2.6.29. I will try to describe what's wrong in the 
code in the following. I will use the latest kernel sources (3.5-rc1)
as reference source code (line numbers are taken there).
Let's assume a scenario with a node with two network interfaces acting as a 
multicast router. The router receives the message on one interface and needs to 
forward it on the other interface. Looking at the packet flow inside the 
kernel, we notice that

in ip6mr.c, line 2282, a flag is set:
IP6CB(skb)->flags |= IP6SKB_FORWARDED;

After this, a multicast packet can be looped back (see line 124 in ip6_output.
c where function ip6_dev_loopback_xmit is called). 
The packet is hence reinjected in the stack.

The packet is processed by function ipv6_rcv (ip6_input.c), and then by 
ipv6_mc_input (ip6_input.c).

In ipv6_rcv, line 82, the previously set flag is cleared
memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));

In ipv6_mc_input, , line 268, the flag is checked to determine if the packet 
has been already forwarded. Since the flag has been cleared, the kernel cannot 
determine that the packet has been looped back, and will hence try to forward 
it again.

Trying to forward a looped back packet determines a wrong behaviour of the 
multicast routing protocol (PIM): the kernel believes that a multicast message 
has been received from a wrong interface (line 1993 in ip6mr.c), discard the 
message (this explains why the packet does not loop forever) and triggers the 
transmission of an ASSERT message. Basically, the node ends up sending an 
ASSERT message because of a looped back packet. 

WDYT? Is my analysis correct? Which is the best way to fix this issue?

Thanks,
Massimiliano D'Angelo

^ permalink raw reply

* VALE, a Virtual Local Ethernet. http://info.iet.unipi.it/~luigi/vale/
From: Luigi Rizzo @ 2012-06-08 17:17 UTC (permalink / raw)
  To: netdev

We have just completed a netmap extensions that let you build a
local high speed switch called VALE which i think can be very useful.

	http://info.iet.unipi.it/~luigi/vale/

VALE is a software Virtual Local Ethernet whose ports are accessible
using the netmap API. Designed to be used as the interconnect between
virtual machines (or as a fast local bus), it works as a learning
bridge and supports speeds of up to 20 Mpps with short frames, and
an aggregate 70 Gbit/s with 1514-byte packets. The VALE paper
contains more details and performance measurements.

VALE is implemented as a small extension of the netmap module, and
is available for FreeBSD and Linux. The source code includes a
backend for qemu and KVM, so you can use VALE to interconnect virtual
machines launching them with

	qemu -net nic -net netmap,ifname=vale0 ...
	qemu -net nic -net netmap,ifname=vale1 ...
	...

Processes can talk to a VALE switch too, so you can use the pkt-gen
or bridge tools that are part of the netmap distribution, or even
the pcap.c module that maps libpcap calls into netmap equivalents.
This lets you use VALE for all sort of pcap-based applications.

More details, code, bootable images on the VALE page,

	http://info.iet.unipi.it/~luigi/vale/

feedback welcome, as usual.

cheers
luigi

^ permalink raw reply

* ethtool 3.4 released
From: Ben Hutchings @ 2012-06-08 17:04 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 741 bytes --]

ethtool version 3.4 has been released.

Home page: https://ftp.kernel.org/pub/software/network/ethtool/
Download link:
https://ftp.kernel.org/pub/software/network/ethtool/ethtool-3.4.tar.gz

Release notes:

	* Cleanup: Merge RX NFC options
	* Doc: Improve description of RX NFC options
	* Doc: Add ntuple to the -K option in the man page
	* Feature: Show time stamping capabilities (-T option)
	* Feature: Dump plug-in module EEPROM (-m option)
	* Feature: Show and change all generic net device features
	  (-k and -K options)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH 1/1 v2] Ethtool: Add EEE support
From: Ben Hutchings @ 2012-06-08 17:19 UTC (permalink / raw)
  To: Yuval Mintz; +Cc: netdev, eilong, peppe.cavallaro
In-Reply-To: <1339010124-23413-1-git-send-email-yuvalmin@broadcom.com>

On Wed, 2012-06-06 at 22:15 +0300, Yuval Mintz wrote:
> This patch adds 2 new ethtool commands which can be
> used to manipulate network interfaces' support in
> EEE.
> 
> Output of 'get' has the following form:
> 
> 	EEE Settings for p2p1:
> 		EEE status: enabled - active
> 		Tx LPI: 1000 (us)
> 		Supported EEE link modes:  10000baseT/Full
> 		Advertised EEE link modes:  10000baseT/Full
> 		Link partner advertised EEE link modes:  10000baseT/Full
> 
> Thanks goes to Giuseppe Cavallaro for his original patch.
[...]

You're supposed to actually run the self-tests (make check):

make[1]: Entering directory `/home/bwh/src/ethtool'
E: ethtool --get-eee devname foo returns 1
FAIL: test-cmdline

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: BUG (?) multicast loopback (IP6SKB_FORWARDED)
From: Eric Dumazet @ 2012-06-08 17:23 UTC (permalink / raw)
  To: maxd@inwind.it; +Cc: netdev
In-Reply-To: <7813214.2267681339174116704.JavaMail.defaultUser@defaultHost>

On Fri, 2012-06-08 at 18:48 +0200, maxd@inwind.it wrote:
> Hi guys,
> I found a probably wrong behaviour while doing some tests with multicast 
> routing on IPv6 with kernel 2.6.29. I will try to describe what's wrong in the 
> code in the following. I will use the latest kernel sources (3.5-rc1)
> as reference source code (line numbers are taken there).
> Let's assume a scenario with a node with two network interfaces acting as a 
> multicast router. The router receives the message on one interface and needs to 
> forward it on the other interface. Looking at the packet flow inside the 
> kernel, we notice that
> 
> in ip6mr.c, line 2282, a flag is set:
> IP6CB(skb)->flags |= IP6SKB_FORWARDED;
> 
> After this, a multicast packet can be looped back (see line 124 in ip6_output.
> c where function ip6_dev_loopback_xmit is called). 
> The packet is hence reinjected in the stack.
> 
> The packet is processed by function ipv6_rcv (ip6_input.c), and then by 
> ipv6_mc_input (ip6_input.c).
> 
> In ipv6_rcv, line 82, the previously set flag is cleared
> memset(IP6CB(skb), 0, sizeof(struct inet6_skb_parm));
> 
> In ipv6_mc_input, , line 268, the flag is checked to determine if the packet 
> has been already forwarded. Since the flag has been cleared, the kernel cannot 
> determine that the packet has been looped back, and will hence try to forward 
> it again.
> 
> Trying to forward a looped back packet determines a wrong behaviour of the 
> multicast routing protocol (PIM): the kernel believes that a multicast message 
> has been received from a wrong interface (line 1993 in ip6mr.c), discard the 
> message (this explains why the packet does not loop forever) and triggers the 
> transmission of an ASSERT message. Basically, the node ends up sending an 
> ASSERT message because of a looped back packet. 
> 
> WDYT? Is my analysis correct? Which is the best way to fix this issue?

I guess your analysis is correct, try to revert commit
6b7fdc3ae18a0598a999156b62d55ea55220e00f ([IPV6]: Clean skb cb on IPv6
input) ?

^ permalink raw reply

* Re: [PATCH ethtool] ethtool: fix to display support for KX4 and KX PHY
From: Ben Hutchings @ 2012-06-08 17:23 UTC (permalink / raw)
  To: Ajit Khaparde; +Cc: netdev
In-Reply-To: <20120606200336.GA21620@akhaparde-VBox>

Applied, thanks.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* [net-next PATCH 00/02] net/ipv4: Add support for new tunnel type VTI.
From: Saurabh @ 2012-06-08 17:32 UTC (permalink / raw)
  To: netdev

Introduction:
Virtual tunnel interface is a way to represent policy based IPsec tunnels as virtual interfaces in linux. This is similar to Cisco's VTI (virtual tunnel interface) and Juniper's representaion of secure tunnel (st.xx). The advantage of representing an IPsec tunnel as an interface is that it is possible to plug Ipsec tunnels into the routing protocol infrastructure of a router. Therefore it becomes possible to influence the packet path by toggling the link state of the tunnel or based on routing metrics.

Overview:
Natively linux kernel does not support ipsec as an interface. Also secure interface assume a ipsec policy 4 tupple of {dst-ip-any, src-ip-any, dst-port-any, src-port-any}. Applying this 4 tuple in linux would result in all traffic matching the ipsec policy. What is needed is a tunnel distinguisher. The linux kernel skbuff has fwmark which is used for policy based routing (PBR). Linux kernel version 2.6.35 enhanced SPD/SADB to use fwmark as part of the IPsec policy. Strongswan has also introduced support for this kernel feature with version 4.5.0. We can therefore use the fwmark as the distinguisher for tunnel interface. We can also create a light weight tunnel kernel module (vti) to give the notion of an interface for rest of the kernel routing system. The tunnel module does not do any enc
 apsulation/decapsulation. The kernel's xfrm modules still do the esp encryption/decryption.

Usage:
ip tunnel add sti15 mode vti remote 12.0.0.1 local 12.0.0.3 ikey 15
or
ip link add sti15 type vti key 15 remote 12.0.0.1 local 12.0.0.3

Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
Reviewed-by: Stephen Hemminger <shemminger@vyatta.com>

---

^ permalink raw reply

* [net-next PATCH 01/02] net/ipv4: VTI support rx-path hook in xfrm4_mode_tunnel.
From: Saurabh @ 2012-06-08 17:32 UTC (permalink / raw)
  To: netdev



Add hook for rx-path xfmr4_mode_tunnel for VTI tunnel module.

Signed-off-by: Saurabh Mohan <saurabh.mohan@vyatta.com>
Reviewed-by: Stephen Hemminger <shemminger@vyatta.com>

---
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index e0a55df..04214c0 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1475,6 +1475,8 @@ extern int xfrm4_output(struct sk_buff *skb);
 extern int xfrm4_output_finish(struct sk_buff *skb);
 extern int xfrm4_tunnel_register(struct xfrm_tunnel *handler, unsigned short family);
 extern int xfrm4_tunnel_deregister(struct xfrm_tunnel *handler, unsigned short family);
+extern int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel *handler);
+extern int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel *handler);
 extern int xfrm6_extract_header(struct sk_buff *skb);
 extern int xfrm6_extract_input(struct xfrm_state *x, struct sk_buff *skb);
 extern int xfrm6_rcv_spi(struct sk_buff *skb, int nexthdr, __be32 spi);
diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c
index ed4bf11..4fc2944 100644
--- a/net/ipv4/xfrm4_mode_tunnel.c
+++ b/net/ipv4/xfrm4_mode_tunnel.c
@@ -15,6 +15,68 @@
 #include <net/ip.h>
 #include <net/xfrm.h>
 
+/*
+ * Informational hook. The decap is still done here.
+ */
+static struct xfrm_tunnel __rcu *rcv_notify_handlers __read_mostly;
+static DEFINE_MUTEX(xfrm4_mode_tunnel_input_mutex);
+
+int xfrm4_mode_tunnel_input_register(struct xfrm_tunnel *handler)
+{
+	struct xfrm_tunnel __rcu **pprev;
+	struct xfrm_tunnel *t;
+
+	int ret = -EEXIST;
+	int priority = handler->priority;
+
+	mutex_lock(&xfrm4_mode_tunnel_input_mutex);
+
+	for (pprev = &rcv_notify_handlers;
+		(t = rcu_dereference_protected(*pprev,
+		lockdep_is_held(&xfrm4_mode_tunnel_input_mutex))) != NULL;
+		pprev = &t->next) {
+		if (t->priority > priority)
+			break;
+		if (t->priority == priority)
+			goto err;
+
+	}
+
+	handler->next = *pprev;
+	rcu_assign_pointer(*pprev, handler);
+
+	ret = 0;
+
+err:
+	mutex_unlock(&xfrm4_mode_tunnel_input_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(xfrm4_mode_tunnel_input_register);
+
+int xfrm4_mode_tunnel_input_deregister(struct xfrm_tunnel *handler)
+{
+	struct xfrm_tunnel __rcu **pprev;
+	struct xfrm_tunnel *t;
+	int ret = -ENOENT;
+
+	mutex_lock(&xfrm4_mode_tunnel_input_mutex);
+	for (pprev = &rcv_notify_handlers;
+		(t = rcu_dereference_protected(*pprev,
+		lockdep_is_held(&xfrm4_mode_tunnel_input_mutex))) != NULL;
+		pprev = &t->next) {
+		if (t == handler) {
+			*pprev = handler->next;
+			ret = 0;
+			break;
+		}
+	}
+	mutex_unlock(&xfrm4_mode_tunnel_input_mutex);
+	synchronize_net();
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(xfrm4_mode_tunnel_input_deregister);
+
 static inline void ipip_ecn_decapsulate(struct sk_buff *skb)
 {
 	struct iphdr *inner_iph = ipip_hdr(skb);
@@ -64,8 +126,14 @@ static int xfrm4_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
 	return 0;
 }
 
+#define for_each_input_rcu(head, handler)	\
+	for (handler = rcu_dereference(head);	\
+		handler != NULL;		\
+		handler = rcu_dereference(handler->next))  \
+
 static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 {
+	struct xfrm_tunnel *handler;
 	int err = -EINVAL;
 
 	if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPIP)
@@ -74,6 +142,10 @@ static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
 	if (!pskb_may_pull(skb, sizeof(struct iphdr)))
 		goto out;
 
+	/* The handlers do not consume the skb. */
+	for_each_input_rcu(rcv_notify_handlers, handler)
+		handler->handler(skb);
+
 	if (skb_cloned(skb) &&
 	    (err = pskb_expand_head(skb, 0, 0, GFP_ATOMIC)))
 		goto out;

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox