Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] pktgen: Remove a dangerous debug print.
From: Ben Greear @ 2010-10-27 20:38 UTC (permalink / raw)
  To: Nelson Elhage; +Cc: Robert Olsson, linux-kernel, netdev, Eugene Teo
In-Reply-To: <1288206788-21063-1-git-send-email-nelhage@ksplice.com>

On 10/27/2010 12:13 PM, Nelson Elhage wrote:
> We were allocating an arbitrarily-large buffer on the stack, which would allow a
> buggy or malicious userspace program to overflow the kernel stack.
>
> Since the debug printk() was just printing exactly the text passed from
> userspace, it's probably just as easy for anyone who might use it to augment (or
> just strace(1)) the program writing to the pktgen file, so let's just not bother
> trying to print the whole buffer.

Maybe just allocate that buffer on the heap instead of stack?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH] ipv6: addrconf: clear IPv6 addresses and routes when losing link
From: Maciej Żenczykowski @ 2010-10-27 20:39 UTC (permalink / raw)
  To: Lorenzo Colitti; +Cc: netdev, Brian Haley
In-Reply-To: <AANLkTinA+znFn4txgq_2=njuX+eCTNJW+n_psbMYkU9d@mail.gmail.com>

> The current privacy address comes back because it's a time-based hash.

I would guess it uses jiffies (or some other equally fine-grained
time, or just pulls from a random entropy source),

I see no reason why the second time around it would generate the same
privacy address.
In other words: are you absolutely certain of this?  It's certainly
not how I would expect it to function.

v2.6.36/net/ipv6/addrconf.c
#ifdef CONFIG_IPV6_PRIVACY
#include <linux/random.h>
#endif

and

http://lxr.linux.no/linux+v2.6.36/net/ipv6/addrconf.c#L1608

static int __ipv6_regen_rndid(struct inet6_dev *idev) {
        get_random_bytes(idev->rndid, sizeof(idev->rndid));
        idev->rndid[0] &= ~0x02;

certainly seem to point towards it being totally random.

> I think the old ones are gone. Still, I think it's better that
> connections from 1 day ago don't work any more (the default for
> privacy addresses is 1 day), than if all new and all old connections
> don't work any more.

No, the default is _no more_ than 1 day preferred and 7 days valid.
But those are the upper maximums, the actual values are the
preferred/validity values taken from the RA (limited to whatever the
max limit sysctls are set to).

^ permalink raw reply

* [RFC PATCH 0/1] vhost: Reduce TX used buffer signal for performance
From: Shirley Ma @ 2010-10-27 21:00 UTC (permalink / raw)
  To: mst@redhat.com, David Miller; +Cc: netdev, kvm, linux-kernel

This patch will change vhost TX used buffer guest signaling from one by
one to 3/4 ring size. I have tried different size, like 4, 16, 1/4 size,
1/2 size, and found that the large size is best for message size between
256 - 4K with netperf TCP_STREAM test, so 3/4 of the ring size is picked
up for signaling. 

Tested both UDP and TCP performance with guest 2vcpu. The 60 secs
netperf run shows that guest to host performance for TCP.

TCP_STREAM

Message size	Guest CPU(%)	BW (Mb/s)
		before:after	before:after

256		57.84:58.42	1678.47:1908.75
512		68.68:60.21	1844.18:3387.33
1024		68.01:58.70	1945.14:3384.72
2048		65.36:54.25	2342.45:3799.31
4096		63.25:54.62	3307.11:4451.78
8192		59.57:57.89	6038.64:6694.04

UDP_STREAM
1024		49.64:26.69	1161.0:1687.6
2048		49.88:29.25	2326.8:2850.9
4096		49.59:29.15	3871.1:4880.3
8192		46.09:32.66	6822.9:7825.1
16K		42.90:34.96	11347.1:11767.4

For large message size, 60 secs run remains almost the same. I guess the
signal might not play a big role in large message transmission.

Shirley

^ permalink raw reply

* Re: kernel panic in fib_rules_lookup [2.6.27.7 vendor-patched]
From: Joe Buehler @ 2010-10-27 21:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1288103614.2622.0.camel@edumazet-laptop>

Eric, tests with your fixes and with the smp barrier corrections are
looking positive.  More testing is needed before a final verdict is in
however -- the load test crashed after 8 hours due to an application bug
that we're working on fixing now.

Joe Buehler

^ permalink raw reply

* Re: kernel panic in fib_rules_lookup [2.6.27.7 vendor-patched]
From: Eric Dumazet @ 2010-10-27 21:05 UTC (permalink / raw)
  To: Joe Buehler; +Cc: netdev
In-Reply-To: <4CC89333.1040407@cox.net>

Le mercredi 27 octobre 2010 à 17:01 -0400, Joe Buehler a écrit :
> Eric, tests with your fixes and with the smp barrier corrections are
> looking positive.  More testing is needed before a final verdict is in
> however -- the load test crashed after 8 hours due to an application bug
> that we're working on fixing now.
> 

Excellent ! Thanks for the report.




^ permalink raw reply

* Re: [RFC PATCH 0/1] vhost: Reduce TX used buffer signal for performance
From: Shirley Ma @ 2010-10-27 21:05 UTC (permalink / raw)
  To: mst@redhat.com; +Cc: David Miller, netdev, kvm, linux-kernel
In-Reply-To: <1288213257.17571.25.camel@localhost.localdomain>

This patch changes vhost TX used buffer signal to guest from one by
one to up to 3/4 of vring size. This change improves vhost TX message 
size from 256 to 8K performance for both bandwidth and CPU utilization 
without inducing any regression.

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---

 drivers/vhost/net.c   |   19 ++++++++++++++++++-
 drivers/vhost/vhost.c |   31 +++++++++++++++++++++++++++++++
 drivers/vhost/vhost.h |    3 +++
 3 files changed, 52 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 4b4da5b..bd1ba71 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -198,7 +198,24 @@ static void handle_tx(struct vhost_net *net)
 		if (err != len)
 			pr_debug("Truncated TX packet: "
 				 " len %d != %zd\n", err, len);
-		vhost_add_used_and_signal(&net->dev, vq, head, 0);
+		/*
+		 * if no pending buffer size allocate, signal used buffer
+		 * one by one, otherwise, signal used buffer when reaching
+		 * 3/4 ring size to reduce CPU utilization.
+		 */
+		if (unlikely(vq->pend))
+			vhost_add_used_and_signal(&net->dev, vq, head, 0);
+		else {
+			vq->pend[vq->num_pend].id = head;
+			vq->pend[vq->num_pend].len = 0;
+			++vq->num_pend;
+			if (vq->num_pend == (vq->num - (vq->num >> 2))) {
+				vhost_add_used_and_signal_n(&net->dev, vq,
+							    vq->pend,
+							    vq->num_pend);
+				vq->num_pend = 0;
+			}
+		}
 		total_len += len;
 		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
 			vhost_poll_queue(&vq->poll);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 94701ff..47696d2 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -170,6 +170,16 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->call_ctx = NULL;
 	vq->call = NULL;
 	vq->log_ctx = NULL;
+	/* signal pending used buffers */
+	if (vq->pend) {
+		if (vq->num_pend != 0) {
+			vhost_add_used_and_signal_n(dev, vq, vq->pend,
+						    vq->num_pend);
+			vq->num_pend = 0;
+		}
+		kfree(vq->pend);
+	}
+	vq->pend = NULL;
 }
 
 static int vhost_worker(void *data)
@@ -273,7 +283,13 @@ long vhost_dev_init(struct vhost_dev *dev,
 		dev->vqs[i].heads = NULL;
 		dev->vqs[i].dev = dev;
 		mutex_init(&dev->vqs[i].mutex);
+		dev->vqs[i].num_pend = 0;
+		dev->vqs[i].pend = NULL;
 		vhost_vq_reset(dev, dev->vqs + i);
+		/* signal 3/4 of ring size used buffers */
+		dev->vqs[i].pend = kmalloc((dev->vqs[i].num -
+					   (dev->vqs[i].num >> 2)) *
+					   sizeof *vq->peed, GFP_KERNEL);
 		if (dev->vqs[i].handle_kick)
 			vhost_poll_init(&dev->vqs[i].poll,
 					dev->vqs[i].handle_kick, POLLIN, dev);
@@ -599,6 +615,21 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
 			r = -EINVAL;
 			break;
 		}
+		if (vq->num != s.num) {
+			/* signal used buffers first */
+			if (vq->pend) {
+				if (vq->num_pend != 0) {
+					vhost_add_used_and_signal_n(vq->dev, vq,
+								    vq->pend,
+								    vq->num_pend);
+					vq->num_pend = 0;
+				}
+				kfree(vq->pend);
+			}
+			/* realloc pending used buffers size */
+			vq->pend = kmalloc((s.num - (s.num >> 2)) *
+					   sizeof *vq->pend, GFP_KERNEL);
+		}
 		vq->num = s.num;
 		break;
 	case VHOST_SET_VRING_BASE:
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 073d06a..78949c0 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -108,6 +108,9 @@ struct vhost_virtqueue {
 	/* Log write descriptors */
 	void __user *log_base;
 	struct vhost_log *log;
+	/* delay multiple used buffers to signal once */
+	int num_pend;
+	struct vring_used_elem *pend;
 };
 
 struct vhost_dev {

^ permalink raw reply related

* Re: [PATCH] e1000e: add netpoll support for MSI/MSI-X IRQ modes
From: Jeff Kirsher @ 2010-10-27 21:12 UTC (permalink / raw)
  To: David Miller
  Cc: dongdong.deng@windriver.com, jesse@nicira.com, Allan, Bruce W,
	Duyck, Alexander H, Wyborny, Carolyn, Skidmore, Donald C,
	Rose, Gregory V, Waskiewicz Jr, Peter P, Ronciak, John,
	e1000-devel@lists.sourceforge.net, netdev@vger.kernel.org
In-Reply-To: <20101027.112025.102538977.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 183 bytes --]

On Wed, 2010-10-27 at 11:20 -0700, David Miller wrote:
> Intel folks, I assume you guys will look at this and integrate it.
> 
> Thanks.

Correct.  I will add this to my queue.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* GoogleUK 11th Anniversary Promotions***Contact Mr. Grahams Benfield via email for more info***
From: 11th Anniversary Celeberation @ 2010-10-27 20:29 UTC (permalink / raw)




-- 
Your E-mail just W o n £850,000 UK Pounds in our on-going 11th Anniversary
Online Promo!!

YOUR DETAILS:
L u c k y # : 12-12-23-35-40-41(12),
T i c k e t # : 00869575733664,
C G P N : 7-22-71-00-66-12.

CONTACT MR GRAHAMS BENFIELD VIA EMAIL BELOW TO GET MORE INFORMATION.
-E-mail: infoannouncer22@gmail.com
          : paymentform.grahamsbenfield@gmail.com

MRS. HELENA GERRAD 
(c) 2010 Google Online Asst. Coordinator/Announcer.
Hotlines: +44-702-405-3930, +44-702-407-3447
====================================================
Note: Your winning details should be sent to the two email addresses 
      above inorder to get all the necessary informations about your prize.

^ permalink raw reply

* Re: [PATCH] tunnels: Fix tunnels change rcu protection
From: David Miller @ 2010-10-27 21:21 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xemul, netdev
In-Reply-To: <1288206372.2658.13.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 27 Oct 2010 21:06:12 +0200

> Le mercredi 27 octobre 2010 à 21:02 +0200, Eric Dumazet a écrit :
> 
> 
>> Hmm, maybe we should allocate a "struct ip_tunnel_parm" instead of using
>> an embedded one (in struct ip_tunnel), and stick an rcu_head in it to
>> delay its freeing...
>> 
> 
> I forgot to Ack your patch, of course.
> 
> We can implement something better when net-next-2.6 re-opens.
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] bonding: Fix lockdep warning after bond_vlan_rx_register()
From: David Miller @ 2010-10-27 21:21 UTC (permalink / raw)
  To: jarkao2; +Cc: eric.dumazet, netdev, jesse, fubar
In-Reply-To: <20101027170822.GA1902@del.dom.local>

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Wed, 27 Oct 2010 19:08:22 +0200

> [ Full info at netdev: Wed, 27 Oct 2010 12:24:30 +0200
>   Subject: [BUG net-2.6 vlan/bonding] lockdep splats ]
> 
> Use BH variant of write_lock(&bond->lock) (as elsewhere in bond_main)
> to prevent this dependency.
> 
> Fixes commit f35188faa0fbabefac476536994f4b6f3677380f [v2.6.36]
> 
> Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
> Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Jay Vosburgh <fubar@us.ibm.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2] ehea: Fixing statistics
From: David Miller @ 2010-10-27 21:21 UTC (permalink / raw)
  To: eric.dumazet; +Cc: leitao, netdev
In-Reply-To: <1288205610.2658.2.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 27 Oct 2010 20:53:30 +0200

> Le mercredi 27 octobre 2010 à 14:45 -0400, Breno Leitao a écrit :
>> (Applied over Eric's "ehea: fix use after free" patch)
>> 
>> Currently ehea stats are broken. The bytes counters are got from
>> the hardware, while the packets counters are got from the device
>> driver. Also, the device driver counters are resetted during the
>> the down process, and the hardware aren't, causing some weird
>> numbers.
>> 
>> This patch just consolidates the packets and bytes on the device
>> driver.
>> 
>> Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
> 
> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [RFC][net-next-2.6 PATCH v2] 8021q: set hard_header_len when VLAN offload features are toggled
From: John Fastabend @ 2010-10-27 21:40 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev@vger.kernel.org, bhutchings@solarflare.com
In-Reply-To: <AANLkTi=9tL5yrVGWsOagcgyKte5z8R9ADdz5n-Uf2Lsw@mail.gmail.com>

On 10/26/2010 7:05 PM, Jesse Gross wrote:
> On Tue, Oct 26, 2010 at 2:59 PM, John Fastabend
> <john.r.fastabend@intel.com> wrote:
>> Toggling the vlan tx|rx hw offloads needs to set the hard_header_len
>> as well otherwise we end up using LL_RESERVED_SPACE incorrectly.
>> This results in pskb_expand_head() being used unnecessarily.
>>
>> This add a check in vlan_transfer_features  to catch the ETH_FLAG_TXVLAN
>> flag and set the header length. This requires drivers to add the
>> ETH_FLAG_TXVLAN to vlan_features.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> 
> I think this addresses all of the original problems.  However, I don't
> think that we want to have drivers claim to support vlan offloading as
> a feature for vlan packets.  That implies some type of QinQ
> functionality to me.  In addition, if the vlan device claims to
> support offloading and a second vlan device is stacked on top of it,
> then the two will clobber skb->vlan_tci.  It's probably simpler to
> just keep track of whether vlan offloading is currently enabled so we
> can find out whether it changed.
> 

Agreed. Rather then trying to be clever this is probably the easiest.

--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -334,6 +334,12 @@ Hunk #1, a/net/8021q/vlan.c static void vlan_transfer_features(struct net_device *dev,
        vlandev->features &= ~dev->vlan_features;
        vlandev->features |= dev->features & dev->vlan_features;
        vlandev->gso_max_size = dev->gso_max_size;
+
+       if (dev->features & NETIF_F_HW_VLAN_TX)
+               vlandev->hard_header_len = dev->hard_header_len;
+       else
+               vlandev->hard_header_len = dev->hard_header_len + VLAN_HLEN;
+
 #if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
        vlandev->fcoe_ddp_xid = dev->fcoe_ddp_xid;
 #endif


>> ---
>>
>>  net/8021q/vlan.c |   10 ++++++++++
>>  1 files changed, 10 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>> index 05b867e..825011b 100644
>> --- a/net/8021q/vlan.c
>> +++ b/net/8021q/vlan.c
>> @@ -334,6 +334,16 @@ static void vlan_transfer_features(struct net_device *dev,
>>        vlandev->features &= ~dev->vlan_features;
>>        vlandev->features |= dev->features & dev->vlan_features;
>>        vlandev->gso_max_size = dev->gso_max_size;
>> +
>> +       /* is ETH_FLAGS_TXVLAN being toggled */
>> +       if ((vlandev->features & ETH_FLAG_TXVLAN) ^
>> +           (old_features & ETH_FLAG_TXVLAN)) {
>> +               if (vlandev->features & ETH_FLAG_TXVLAN)
>> +                       vlandev->hard_header_len -= VLAN_HLEN;
>> +               else
>> +                       vlandev->hard_header_len += VLAN_HLEN;
>> +       }
> 
> The correct flag for dev->features is NETIF_F_HW_VLAN_TX.
> ETH_FLAGS_TXVLAN is an ethtool construct (that happens to have the
> same value).
> 
> Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
From: Shirley Ma @ 2010-10-27 21:58 UTC (permalink / raw)
  To: mst@redhat.com; +Cc: David Miller, netdev, kvm, linux-kernel

This patch changes vhost TX used buffer guest signal from one by
one to 3/4 of vring size. This change improves vhost TX transmission 
both bandwidth and CPU utilization performance for 256 to 8K messages s
ize without inducing any regression.


Signed-off-by: Shirley Ma <xma@us.ibm.com>
---
 drivers/vhost/net.c   |   20 +++++++++++++++++++-
 drivers/vhost/vhost.c |   31 +++++++++++++++++++++++++++++++
 drivers/vhost/vhost.h |    3 +++
 3 files changed, 53 insertions(+), 1 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 4b4da5b..45e07cd 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -128,6 +128,7 @@ static void handle_tx(struct vhost_net *net)
 	int err, wmem;
 	size_t hdr_size;
 	struct socket *sock;
+	int max_pend = vq->num - (vq->num >> 2); 
 
 	sock = rcu_dereference_check(vq->private_data,
 				     lockdep_is_held(&vq->mutex));
@@ -198,7 +199,24 @@ static void handle_tx(struct vhost_net *net)
 		if (err != len)
 			pr_debug("Truncated TX packet: "
 				 " len %d != %zd\n", err, len);
-		vhost_add_used_and_signal(&net->dev, vq, head, 0);
+		/*
+		 * if no pending buffer size allocate, signal used buffer
+		 * one by one, otherwise, signal used buffer when reaching
+		 * 3/4 ring size to reduce CPU utilization.
+		 */
+		if (unlikely(vq->pend))
+			vhost_add_used_and_signal(&net->dev, vq, head, 0);
+		else {
+			vq->pend[vq->num_pend].id = head;
+			vq->pend[vq->num_pend].len = 0;
+			++vq->num_pend;
+			if (vq->num_pend == max_pend) {
+				vhost_add_used_and_signal_n(&net->dev, vq,
+							    vq->pend,
+							    vq->num_pend);
+				vq->num_pend = 0;
+			}
+		}
 		total_len += len;
 		if (unlikely(total_len >= VHOST_NET_WEIGHT)) {
 			vhost_poll_queue(&vq->poll);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 94701ff..9486a25 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -170,6 +170,16 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->call_ctx = NULL;
 	vq->call = NULL;
 	vq->log_ctx = NULL;
+	/* signal pending used buffers */
+	if (vq->pend) {
+		if (vq->num_pend != 0) {
+			vhost_add_used_and_signal_n(dev, vq, vq->pend,
+						    vq->num_pend);
+			vq->num_pend = 0;
+		}
+		kfree(vq->pend);
+	}
+	vq->pend = NULL;
 }
 
 static int vhost_worker(void *data)
@@ -273,7 +283,13 @@ long vhost_dev_init(struct vhost_dev *dev,
 		dev->vqs[i].heads = NULL;
 		dev->vqs[i].dev = dev;
 		mutex_init(&dev->vqs[i].mutex);
+		dev->vqs[i].num_pend = 0;
+		dev->vqs[i].pend = NULL;
 		vhost_vq_reset(dev, dev->vqs + i);
+		/* signal 3/4 of ring size used buffers */
+		dev->vqs[i].pend = kmalloc((dev->vqs[i].num -
+					   (dev->vqs[i].num >> 2)) *
+					   sizeof *vq->pend, GFP_KERNEL);
 		if (dev->vqs[i].handle_kick)
 			vhost_poll_init(&dev->vqs[i].poll,
 					dev->vqs[i].handle_kick, POLLIN, dev);
@@ -599,6 +615,21 @@ static long vhost_set_vring(struct vhost_dev *d, int ioctl, void __user *argp)
 			r = -EINVAL;
 			break;
 		}
+		if (vq->num != s.num) {
+			/* signal used buffers first */
+			if (vq->pend) {
+				if (vq->num_pend != 0) {
+					vhost_add_used_and_signal_n(vq->dev, vq,
+								    vq->pend,
+								    vq->num_pend);
+					vq->num_pend = 0;
+				}
+				kfree(vq->pend);
+			}
+			/* realloc pending used buffers size */
+			vq->pend = kmalloc((s.num - (s.num >> 2)) *
+					   sizeof *vq->pend, GFP_KERNEL);
+		}
 		vq->num = s.num;
 		break;
 	case VHOST_SET_VRING_BASE:
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 073d06a..78949c0 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -108,6 +108,9 @@ struct vhost_virtqueue {
 	/* Log write descriptors */
 	void __user *log_base;
 	struct vhost_log *log;
+	/* delay multiple used buffers to signal once */
+	int num_pend;
+	struct vring_used_elem *pend;
 };
 
 struct vhost_dev {



^ permalink raw reply related

* IPV6 raw socket denies bind(2)
From: Jan Engelhardt @ 2010-10-27 22:01 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Eric Dumazet

[Doublepost: linux-netdev does not exist. Sigh. "netdev" is totally 
nonstandard to all the other linux-* list names. :-?]

Hi,

I was trying out raw sockets and stumbled into a case whereby I cannot 
call bind(2) on a AF_INET6, SOCK_RAW socket. Apparently, I am triggering 
this particular code path that means absolutely nothing to me:

# ./rawtest

kernel pseudo callgraph:
static int rawv6_bind(struct sock *sk, struct sockaddr *uaddr, int 
addr_len)
addr >= SIN6_LEN_RFC2133
addr_type != IPV6_ADDR_MAPPED
sk->sk_state == TCP_CLOSED
addr_type != IPV6_ADDR_ANY
!(addr_type & IPV6_ADDR_LINKLOCAL)
!(addr_type & IPV6_ADDR_MULTICAST)
dev == NULL
ipv6_chk_addr does not have any addresses to loop through (wtf? checked 
with printk.)
=> going -EADDRNOTAVAIL

At this point I have no idea why ipv6_chk_addr does not run through 
its loop. No devices in the hash bucket or something. Happens in 
2.6.36. I hope somebody can shed some light here.

---Userspace testcase---
#include <sys/socket.h>
#include <stdio.h>
#include <string.h>
#include <netinet/udp.h>
#include <netinet/in.h>
#include <netinet/ip6.h>
#include <arpa/inet.h>
#include <stdlib.h>

int main(void)
{
	struct sockaddr_in6 src = {};
	int sk;

	sk = socket(AF_INET6, SOCK_RAW, IPPROTO_UDP);
	memset(&src, 0, sizeof(src));
	inet_pton(AF_INET6, "::1", &src);
	src.sin6_family = AF_INET6;

	if (bind(sk, (void *)&src, sizeof(src)) < 0) {
		perror("bind");
		abort();
	}
	return 0;
}

^ permalink raw reply

* [GIT] Networking
From: David Miller @ 2010-10-27 22:05 UTC (permalink / raw)
  To: torvalds; +Cc: akpm, netdev, linux-kernel


Hey, I'll resolve the verify_iovec() issue this evening so that we can
wrap that sucker up.  But for now here's some fallout fixing changes
as well as some other misc stuff:

1) dev_can_checksum() doesn't handle nested VLAN properly, also
   generic checksum capability does not imply FCOE checksumming.
   Both from Ben Hutchings.

2) typhoon driver fails to wait for RX mode commands to finish,
   also use new VLAN accel interfaces.  From David Dillow.

3) Tunnel transmit recursion limit too low, increase to 10.

4) Fix tms380tr build failure on x86-64 due to too large udelay().

5) ipv6 TPROXY needs to check CAP_NET_ADMIN just like ipv4, from
   Balazs Scheidler.

6) Add caif-u5500 driver, from Amarnath Revanna.

7) Add tscan1 CAN driver, from Andre B. Oliveira.

8) Missed MTU updates in ip6_tunnel, from Anders Franzen.

9) Lots of missing __rcu annotations, from Eric Dumazet.

10) Fix bonding lockdep spew, from Jarek Poplawski.

11) Fix slhc double-export of symbol, from Denis Kirjanov.

12) cxgb4 too-early-queue access crash fix from Dimitris Michailidis,
    also use new VLAN accel interfaces.

13) mlx4_en out-of-bounds array access fix from Eli Cohen.

14) IPv6 temporary address handling fixes from Glenn Wurster.

15) Fix RX crashes in gianfar, from Jarek Poplawski.

16) Fix ipv6 defrag dependencies with ip6tables and tproxy, from KOVACS Krisztian.

17) Missing CONFIG_SYSCTL checks in ipv6 reasm netfilter code.

18) Tunnel RCU protection fix from Pavel Emelyanov.

19) be2net bug fixes from Somnath Kotur (calling netif_carrier_off() too
    early, UDP packet handling, and worker thread destruction and
    scheduling bugs).

20) qlcnic can use invalid VLAN ids, from Sony Chacko.

21) Accidental exporting of static functions in l2tp, from Stephen Rothwell.

22) Toss lazy workqueue from connector, from Tejun Heo.

23) Final function staticization round from Stephen Hemminger.

Please pull, thanks a lot!

The following changes since commit 12ba8d1e9262ce81a695795410bd9ee5c9407ba1:

  fix braino in fs: do not assign default i_ino in new_inode (2010-10-26 20:25:45 -0700)

are available in the git repository at:
  master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.git master

Amarnath Revanna (3):
      caif-u5500: Adding shared memory include
      caif-u5500: CAIF shared memory mailbox interface
      caif-u5500: Build config for CAIF shared mem driver

Anders Franzen (1):
      ip6_tunnel dont update the mtu on the route.

Andre B. Oliveira (1):
      can: tscan1: add driver for TS-CAN1 boards

Balazs Scheidler (1):
      tproxy: Add missing CAP_NET_ADMIN check to ipv6 side

Ben Greear (2):
      ath9k: Properly initialize ath_common->cc_lock.
      ath5k: Properly initialize ath_common->cc_lock.

Ben Hutchings (2):
      net: Fix some corner cases in dev_can_checksum()
      net: NETIF_F_HW_CSUM does not imply FCoE CRC offload

Breno Leitao (1):
      ehea: Fixing statistics

Christian Lamparter (4):
      carl9170: fix async command buffer leak
      mac80211: don't sanitize invalid rates
      carl9170: fix memory leak issue in async cmd macro wrappers
      carl9170: fix scheduling while atomic

David Dillow (2):
      typhoon: wait for RX mode commands to finish
      typhoon: update to new VLAN acceleration model

David S. Miller (4):
      net: Increase xmit RECURSION_LIMIT to 10.
      tms380tr: Use mdelay() in tms380tr_wait().
      netfilter: Add missing CONFIG_SYSCTL checks in ipv6's nf_conntrack_reasm.c
      Merge branch 'master' of git://git.kernel.org/.../linville/wireless-2.6

Denis Kirjanov (1):
      slhc: Don't export symbols twice

Dimitris Michailidis (2):
      cxgb4: fix crash due to manipulating queues before registration
      cxgb4: update to utilize the newer VLAN infrastructure

Divy Le Ray (1):
      cxgb3: fix device opening error path

Don Fry (1):
      iwlwifi: quiet a noisy printk

Eli Cohen (1):
      mlx4_en: Fix out of bounds array access

Eric Dumazet (17):
      netlink: fix netlink_change_ngroups()
      vlan: rcu annotations
      ipv6: ip6_ptr rcu annotations
      net/802: add __rcu annotations
      tunnels: add _rcu annotations
      rps: add __rcu annotations
      net_ns: add __rcu annotations
      net: add __rcu annotation to sk_filter
      ipv4: add __rcu annotations to ip_ra_chain
      fib: fix fib_nl_newrule()
      fib_hash: fix rcu sparse and logical errors
      ipv4: add __rcu annotations to routes.c
      net: add __rcu annotations to protocol
      tunnels: add __rcu annotations
      fib_rules: __rcu annotates ctarget
      inetpeer: __rcu annotations
      ehea: fix use after free

Felix Fietkau (3):
      ath9k: fix crash in ath_update_survey_stats
      ath9k: fix handling of rate control probe frames
      ath9k: resume aggregation immediately after a hardware reset

Glenn Wurster (2):
      IPv6: Create temporary address if none exists.
      IPv6: Temp addresses are immediately deleted.

Grazvydas Ignotas (1):
      wl1251: fix module names

Guo-Fu Tseng (1):
      jme: Support WoL after shutdown

Harvey Harrison (3):
      vmxnet3: remove set_flag_le{16,64} helpers
      vmxnet3: annotate hwaddr members as __iomem pointers
      vmxnet3: fix typo setting confPA

Jarek Poplawski (2):
      gianfar: Fix crashes on RX path (Was Re: [Bugme-new] [Bug 19692] New: linux-2.6.36-rc5 crash with gianfar ethernet at full line rate traffic)
      bonding: Fix lockdep warning after bond_vlan_rx_register()

Joe Perches (1):
      drivers/atm/eni.c: Remove multiple uses of KERN_<level>

Joshua Hoke (1):
      macb: Don't re-enable interrupts while in polling mode

Julia Lawall (3):
      drivers/net/sb1000.c: delete double assignment
      drivers/net/typhoon.c: delete double assignment
      drivers/isdn: delete double assignment

KOVACS Krisztian (1):
      netfilter: fix module dependency issues with IPv6 defragmentation, ip6tables and xt_TPROXY

Luis R. Rodriguez (2):
      cfg80211: fix regression on processing country IEs
      ath9k_hw: Fix TX carrier leakage for IEEE compliance on AR9003 2.2

Marc Kleine-Budde (12):
      can: at91_can: use correct bit to enable CAN_CTRLMODE_3_SAMPLES
      can: at91_can: fix reception of extended frames
      can: at91_can: fix use after free of priv
      can: at91_can: fix compiler warning in at91_irq_err_state
      can: at91_can: fix section mismatch warning
      can: at91_can: implement and use at91_get_berr_counter
      can: at91_can: set bittiming in chip_start
      can: at91_can: convert readl, writel their __raw pendants
      can: at91_can: convert dev_<level> printing to netdev_<level>
      can: at91_can: add KBUILD_MODNAME to bittiming constant
      can: flexcan: fix use after free of priv
      can: mcp251x: fix reception of standard RTR frames

Masayuki Ohtake (1):
      can: Topcliff: Add PCH_CAN driver.

Nicolas Kaiser (1):
      drivers/net: sgiseeq: fix return on error

Paul Gortmaker (1):
      pktgen: clean up handling of local/transient counter vars

Pavel Emelyanov (1):
      tunnels: Fix tunnels change rcu protection

Rafael J. Wysocki (1):
      tg3: Do not call device_set_wakeup_enable() under spin_lock_bh

Rafał Miłecki (1):
      b43: N-PHY: fix infinite-loop-typo

Rajkumar Manoharan (1):
      mac80211: Fix ibss station got expired immediately

Randy Dunlap (1):
      pch_can: depends on PCI

Ron Mercer (1):
      qlge: bugfix: Restoring the vlan setting.

Senthil Balasubramanian (1):
      ath9k_hw: Fix divide by zero cases in paprd.

Somnath Kotur (3):
      be2net: Call netif_carier_off() after register_netdev()
      be2net: Fix CSO for UDP packets
      be2net: Schedule/Destroy worker thread in probe()/remove() rather than open()/close()

Sony Chacko (2):
      qlcnic: reduce rx ring size
      qlcnic: define valid vlan id range

Stephen Rothwell (1):
      l2tp: static functions should not be exported

Tejun Heo (2):
      connector: remove lazy workqueue creation
      mac80211: cancel restart_work explicitly instead of depending on flush_scheduled_work()

Ursula Braun (1):
      ipv6: fix refcnt problem related to POSTDAD state

amit salecha (1):
      qlcnic: fix mac learning

sjur.brandeland@stericsson.com (1):
      caif-u5500: CAIF shared memory transport protocol

stephen hemminger (12):
      mlx4: make functions local and remove dead code.
      l2tp: make local function static
      benet: remove dead code
      benet: make be_poll_rx local
      atl1c: make functions static
      atlx: make local functions/data static
      phylib: make local function static
      vxge: make functions local and remove dead code
      qlge: make local functions static
      qlge: disable unsed dump code
      bnx2x: make local function static and remove dead code
      e1000: make e1000_reinit_safe local

 Documentation/networking/phy.txt                   |   18 -
 drivers/atm/eni.c                                  |    7 +-
 drivers/connector/cn_queue.c                       |   75 +-
 drivers/connector/connector.c                      |    9 +-
 drivers/isdn/hardware/mISDN/mISDNinfineon.c        |    2 +-
 drivers/isdn/hisax/l3_1tr6.c                       |    6 +-
 drivers/net/atl1c/atl1c.h                          |    2 -
 drivers/net/atl1c/atl1c_main.c                     |    6 +-
 drivers/net/atlx/atl1.c                            |   12 +-
 drivers/net/atlx/atl1.h                            |    9 +-
 drivers/net/atlx/atlx.c                            |    4 +
 drivers/net/benet/be_cmds.c                        |   36 -
 drivers/net/benet/be_cmds.h                        |    2 -
 drivers/net/benet/be_main.c                        |   49 +-
 drivers/net/bnx2x/bnx2x.h                          |    5 -
 drivers/net/bnx2x/bnx2x_cmn.c                      |    3 +-
 drivers/net/bnx2x/bnx2x_cmn.h                      |   55 -
 drivers/net/bnx2x/bnx2x_init_ops.h                 |   34 +-
 drivers/net/bnx2x/bnx2x_link.c                     |  137 +--
 drivers/net/bnx2x/bnx2x_link.h                     |   15 -
 drivers/net/bnx2x/bnx2x_main.c                     |   55 +-
 drivers/net/bonding/bond_main.c                    |    4 +-
 drivers/net/caif/Kconfig                           |    7 +
 drivers/net/caif/Makefile                          |    4 +
 drivers/net/caif/caif_shm_u5500.c                  |  129 ++
 drivers/net/caif/caif_shmcore.c                    |  744 ++++++++++
 drivers/net/can/Kconfig                            |    8 +
 drivers/net/can/Makefile                           |    1 +
 drivers/net/can/at91_can.c                         |   95 +-
 drivers/net/can/flexcan.c                          |    3 +-
 drivers/net/can/mcp251x.c                          |    3 +
 drivers/net/can/pch_can.c                          | 1463 ++++++++++++++++++++
 drivers/net/can/sja1000/Kconfig                    |   12 +
 drivers/net/can/sja1000/Makefile                   |    1 +
 drivers/net/can/sja1000/tscan1.c                   |  216 +++
 drivers/net/cxgb3/cxgb3_main.c                     |    8 +-
 drivers/net/cxgb4/cxgb4.h                          |    1 -
 drivers/net/cxgb4/cxgb4_main.c                     |   33 +-
 drivers/net/cxgb4/sge.c                            |   23 +-
 drivers/net/e1000/e1000_main.c                     |    2 +-
 drivers/net/ehea/ehea.h                            |    2 +
 drivers/net/ehea/ehea_main.c                       |   42 +-
 drivers/net/gianfar.c                              |    6 +-
 drivers/net/jme.c                                  |   45 +-
 drivers/net/macb.c                                 |   27 +-
 drivers/net/mlx4/icm.c                             |   28 +-
 drivers/net/mlx4/icm.h                             |    2 -
 drivers/net/mlx4/port.c                            |   11 +
 drivers/net/phy/phy.c                              |   13 +-
 drivers/net/phy/phy_device.c                       |   19 +-
 drivers/net/qlcnic/qlcnic.h                        |    7 +-
 drivers/net/qlcnic/qlcnic_ethtool.c                |   23 +-
 drivers/net/qlcnic/qlcnic_main.c                   |   19 +-
 drivers/net/qlge/qlge.h                            |   12 +-
 drivers/net/qlge/qlge_main.c                       |   24 +-
 drivers/net/qlge/qlge_mpi.c                        |    6 +-
 drivers/net/sb1000.c                               |    6 +-
 drivers/net/sgiseeq.c                              |    2 +-
 drivers/net/slhc.c                                 |   15 +-
 drivers/net/tg3.c                                  |   10 +-
 drivers/net/tokenring/tms380tr.c                   |    2 +-
 drivers/net/typhoon.c                              |   92 +-
 drivers/net/vmxnet3/upt1_defs.h                    |    8 +-
 drivers/net/vmxnet3/vmxnet3_defs.h                 |    6 +-
 drivers/net/vmxnet3/vmxnet3_drv.c                  |   22 +-
 drivers/net/vmxnet3/vmxnet3_ethtool.c              |   14 +-
 drivers/net/vmxnet3/vmxnet3_int.h                  |   19 +-
 drivers/net/vxge/vxge-config.c                     |  332 ++++--
 drivers/net/vxge/vxge-config.h                     |  227 +---
 drivers/net/vxge/vxge-ethtool.c                    |    2 +-
 drivers/net/vxge/vxge-main.c                       |   64 +-
 drivers/net/vxge/vxge-main.h                       |   59 +-
 drivers/net/vxge/vxge-traffic.c                    |  101 +--
 drivers/net/vxge/vxge-traffic.h                    |  134 --
 drivers/net/wireless/ath/ath5k/base.c              |    1 +
 .../net/wireless/ath/ath9k/ar9003_2p2_initvals.h   |  191 ++-
 drivers/net/wireless/ath/ath9k/ar9003_paprd.c      |   14 +-
 drivers/net/wireless/ath/ath9k/beacon.c            |    2 +-
 drivers/net/wireless/ath/ath9k/init.c              |    1 +
 drivers/net/wireless/ath/ath9k/main.c              |    7 +-
 drivers/net/wireless/ath/ath9k/xmit.c              |    8 +-
 drivers/net/wireless/ath/carl9170/cmd.h            |   51 +-
 drivers/net/wireless/ath/carl9170/main.c           |    2 +-
 drivers/net/wireless/ath/carl9170/usb.c            |   25 +-
 drivers/net/wireless/b43/phy_n.c                   |    2 +-
 drivers/net/wireless/iwlwifi/iwl-agn-tx.c          |    3 +-
 drivers/net/wireless/wl1251/Makefile               |    8 +-
 include/linux/connector.h                          |    8 -
 include/linux/netdevice.h                          |   18 +-
 include/linux/phy.h                                |   12 -
 include/net/caif/caif_shm.h                        |   26 +
 include/net/dst.h                                  |    2 +-
 include/net/fib_rules.h                            |    2 +-
 include/net/garp.h                                 |    2 +-
 include/net/inetpeer.h                             |    2 +-
 include/net/ip.h                                   |    4 +-
 include/net/ip6_tunnel.h                           |    2 +-
 include/net/ipip.h                                 |    6 +-
 include/net/net_namespace.h                        |    2 +-
 include/net/protocol.h                             |    4 +-
 include/net/sock.h                                 |    2 +-
 include/net/xfrm.h                                 |    4 +-
 net/802/garp.c                                     |   18 +-
 net/802/stp.c                                      |    4 +-
 net/8021q/vlan.c                                   |    6 +-
 net/core/dev.c                                     |   38 +-
 net/core/fib_rules.c                               |   21 +-
 net/core/filter.c                                  |    4 +-
 net/core/net-sysfs.c                               |   20 +-
 net/core/net_namespace.c                           |    4 +-
 net/core/pktgen.c                                  |   30 +-
 net/core/sock.c                                    |    2 +-
 net/core/sysctl_net_core.c                         |    3 +-
 net/ipv4/fib_hash.c                                |   36 +-
 net/ipv4/gre.c                                     |    5 +-
 net/ipv4/inetpeer.c                                |  138 ++-
 net/ipv4/ip_gre.c                                  |    1 +
 net/ipv4/ip_sockglue.c                             |   10 +-
 net/ipv4/ipip.c                                    |    1 +
 net/ipv4/protocol.c                                |    8 +-
 net/ipv4/route.c                                   |   75 +-
 net/ipv4/tunnel4.c                                 |   29 +-
 net/ipv4/udp.c                                     |    2 +-
 net/ipv6/addrconf.c                                |   16 +-
 net/ipv6/ip6_tunnel.c                              |    2 +
 net/ipv6/ipv6_sockglue.c                           |    4 +
 net/ipv6/netfilter/Kconfig                         |    5 +
 net/ipv6/netfilter/Makefile                        |    5 +-
 net/ipv6/netfilter/nf_conntrack_reasm.c            |    5 +-
 net/ipv6/protocol.c                                |    8 +-
 net/ipv6/raw.c                                     |    2 +-
 net/ipv6/sit.c                                     |    1 +
 net/ipv6/tunnel6.c                                 |   24 +-
 net/ipv6/udp.c                                     |    2 +-
 net/l2tp/l2tp_core.c                               |   53 +-
 net/l2tp/l2tp_core.h                               |   33 -
 net/l2tp/l2tp_ip.c                                 |    2 +-
 net/mac80211/ibss.c                                |    1 +
 net/mac80211/main.c                                |    8 +-
 net/mac80211/rate.c                                |    3 +
 net/netfilter/Kconfig                              |    2 +
 net/netfilter/xt_TPROXY.c                          |   10 +-
 net/netfilter/xt_socket.c                          |   12 +-
 net/netlink/af_netlink.c                           |   65 +-
 net/wireless/reg.c                                 |    2 +-
 145 files changed, 4008 insertions(+), 1822 deletions(-)
 create mode 100644 drivers/net/caif/caif_shm_u5500.c
 create mode 100644 drivers/net/caif/caif_shmcore.c
 create mode 100644 drivers/net/can/pch_can.c
 create mode 100644 drivers/net/can/sja1000/tscan1.c
 create mode 100644 include/net/caif/caif_shm.h

^ permalink raw reply

* [patch] fix stack overflow in pktgen_if_write()
From: Dan Carpenter @ 2010-10-27 22:12 UTC (permalink / raw)
  To: nelhage
  Cc: Eric Dumazet, David S. Miller, Robert Olsson, Andy Shevchenko,
	netdev
In-Reply-To: <1288206788-21063-1-git-send-email-nelhage@ksplice.com>

Nelson Elhage says he was able to oops both amd64 and i386 test 
machines with 8k writes to the pktgen file.  Let's just allocate the
buffer on the heap instead of on the stack.

This can only be triggered by root so there are no security issues here.

Reported-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: Dan Carpenter <error27@gmail.com>
---
I saw this on twitter.  Hi Nelson, could you test this?

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 2c0df0f..b5d3c70 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -887,12 +887,14 @@ static ssize_t pktgen_if_write(struct file *file,
 	i += len;
 
 	if (debug) {
-		char tb[count + 1];
-		if (copy_from_user(tb, user_buffer, count))
-			return -EFAULT;
-		tb[count] = 0;
+		char *tb;
+
+		tb = strndup_user(user_buffer, count + 1);
+		if (IS_ERR(tb))
+			return PTR_ERR(tb);
 		printk(KERN_DEBUG "pktgen: %s,%lu  buffer -:%s:-\n", name,
 		       (unsigned long)count, tb);
+		kfree(tb);
 	}
 
 	if (!strcmp(name, "min_pkt_size")) {

^ permalink raw reply related

* RE: [PATCH net-next] ixgbe: fix stats handling
From: Tantilov, Emil S @ 2010-10-27 22:35 UTC (permalink / raw)
  To: Eric Dumazet, David Miller, Waskiewicz Jr, Peter P,
	Kirsher, Jeffrey T
  Cc: netdev
In-Reply-To: <1286799439.2737.21.camel@edumazet-laptop>

>-----Original Message-----
>From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
>Sent: Monday, October 11, 2010 5:17 AM
>To: David Miller; Waskiewicz Jr, Peter P; Tantilov, Emil S; Kirsher,
>Jeffrey T
>Cc: netdev
>Subject: [PATCH net-next] ixgbe: fix stats handling
>
>Hi
>
>I am sending this patch for Intel people review/test and acknowledge.
>
>Thanks !
>
>[PATCH net-next] ixgbe: fix stats handling
>
>Current ixgbe stats have following problems :
>
>- Not 64 bit safe (on 32bit arches)
>
>- Not safe in ixgbe_clean_rx_irq() :
>   All cpus dirty a common location (netdev->stats.rx_bytes &
>netdev->stats.rx_packets) without proper synchronization.
>   This slow down a bit multiqueue operations, and possibly miss some
>updates.
>
>Fixes :
>
>Implement ndo_get_stats64() method to provide accurate 64bit rx|tx
>bytes/packets counters, using 64bit safe infrastructure.
>
>ixgbe_get_ethtool_stats() also use this infrastructure to provide 64bit
>safe counters.
>
>Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
>CC: Peter Waskiewicz <peter.p.waskiewicz.jr@intel.com>
>CC: Emil Tantilov <emil.s.tantilov@intel.com>
>CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>---
> drivers/net/ixgbe/ixgbe.h         |    3 +-
> drivers/net/ixgbe/ixgbe_ethtool.c |   29 +++++++++++---------
> drivers/net/ixgbe/ixgbe_main.c    |   40 +++++++++++++++++++++++++---
> 3 files changed, 56 insertions(+), 16 deletions(-)
>
>diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h
>index a8c47b0..944d9e2 100644
>--- a/drivers/net/ixgbe/ixgbe.h
>+++ b/drivers/net/ixgbe/ixgbe.h
>@@ -180,8 +180,9 @@ struct ixgbe_ring {
> 					 */
>
> 	struct ixgbe_queue_stats stats;
>-	unsigned long reinit_state;
>+	struct u64_stats_sync syncp;
> 	int numa_node;
>+	unsigned long reinit_state;
> 	u64 rsc_count;			/* stat for coalesced packets */
> 	u64 rsc_flush;			/* stats for flushed packets */
> 	u32 restart_queue;		/* track tx queue restarts */
>diff --git a/drivers/net/ixgbe/ixgbe_ethtool.c
>b/drivers/net/ixgbe/ixgbe_ethtool.c
>index d4ac943..3c7f15d 100644
>--- a/drivers/net/ixgbe/ixgbe_ethtool.c
>+++ b/drivers/net/ixgbe/ixgbe_ethtool.c
>@@ -999,12 +999,11 @@ static void ixgbe_get_ethtool_stats(struct net_device
>*netdev,
>                                     struct ethtool_stats *stats, u64
>*data)
> {
> 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
>-	u64 *queue_stat;
>-	int stat_count = sizeof(struct ixgbe_queue_stats) / sizeof(u64);
> 	struct rtnl_link_stats64 temp;
> 	const struct rtnl_link_stats64 *net_stats;
>-	int j, k;
>-	int i;
>+	unsigned int start;
>+	struct ixgbe_ring *ring;
>+	int i, j;
> 	char *p = NULL;
>
> 	ixgbe_update_stats(adapter);
>@@ -1025,16 +1024,22 @@ static void ixgbe_get_ethtool_stats(struct
>net_device *netdev,
> 		           sizeof(u64)) ? *(u64 *)p : *(u32 *)p;
> 	}
> 	for (j = 0; j < adapter->num_tx_queues; j++) {
>-		queue_stat = (u64 *)&adapter->tx_ring[j]->stats;
>-		for (k = 0; k < stat_count; k++)
>-			data[i + k] = queue_stat[k];
>-		i += k;
>+		ring = adapter->tx_ring[j];
>+		do {
>+			start = u64_stats_fetch_begin_bh(&ring->syncp);
>+			data[i]   = ring->stats.packets;
>+			data[i+1] = ring->stats.bytes;
>+		} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
>+		i += 2;
> 	}
> 	for (j = 0; j < adapter->num_rx_queues; j++) {
>-		queue_stat = (u64 *)&adapter->rx_ring[j]->stats;
>-		for (k = 0; k < stat_count; k++)
>-			data[i + k] = queue_stat[k];
>-		i += k;
>+		ring = adapter->rx_ring[j];
>+		do {
>+			start = u64_stats_fetch_begin_bh(&ring->syncp);
>+			data[i]   = ring->stats.packets;
>+			data[i+1] = ring->stats.bytes;
>+		} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
>+		i += 2;
> 	}
> 	if (adapter->flags & IXGBE_FLAG_DCB_ENABLED) {
> 		for (j = 0; j < MAX_TX_PACKET_BUFFERS; j++) {
>diff --git a/drivers/net/ixgbe/ixgbe_main.c
>b/drivers/net/ixgbe/ixgbe_main.c
>index 95dbf60..1efbcde 100644
>--- a/drivers/net/ixgbe/ixgbe_main.c
>+++ b/drivers/net/ixgbe/ixgbe_main.c
>@@ -824,8 +824,10 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector
>*q_vector,
>
> 	tx_ring->total_bytes += total_bytes;
> 	tx_ring->total_packets += total_packets;
>+	u64_stats_update_begin(&tx_ring->syncp);
> 	tx_ring->stats.packets += total_packets;
> 	tx_ring->stats.bytes += total_bytes;
>+	u64_stats_update_end(&tx_ring->syncp);
> 	return count < tx_ring->work_limit;
> }
>
>@@ -1172,7 +1174,6 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector
>*q_vector,
> 			       int *work_done, int work_to_do)
> {
> 	struct ixgbe_adapter *adapter = q_vector->adapter;
>-	struct net_device *netdev = adapter->netdev;
> 	struct pci_dev *pdev = adapter->pdev;
> 	union ixgbe_adv_rx_desc *rx_desc, *next_rxd;
> 	struct ixgbe_rx_buffer *rx_buffer_info, *next_buffer;
>@@ -1298,8 +1299,10 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector
>*q_vector,
> 					rx_ring->rsc_count++;
> 				rx_ring->rsc_flush++;
> 			}
>+			u64_stats_update_begin(&rx_ring->syncp);
> 			rx_ring->stats.packets++;
> 			rx_ring->stats.bytes += skb->len;
>+			u64_stats_update_end(&rx_ring->syncp);
> 		} else {
> 			if (rx_ring->flags & IXGBE_RING_RX_PS_ENABLED) {
> 				rx_buffer_info->skb = next_buffer->skb;
>@@ -1375,8 +1378,6 @@ next_desc:
>
> 	rx_ring->total_packets += total_rx_packets;
> 	rx_ring->total_bytes += total_rx_bytes;
>-	netdev->stats.rx_bytes += total_rx_bytes;
>-	netdev->stats.rx_packets += total_rx_packets;
>
> 	return cleaned;
> }
>@@ -6559,6 +6560,38 @@ static void ixgbe_netpoll(struct net_device *netdev)
> }
> #endif
>
>+static struct rtnl_link_stats64 *ixgbe_get_stats64(struct net_device
>*netdev,
>+						   struct rtnl_link_stats64 *stats)
>+{
>+	struct ixgbe_adapter *adapter = netdev_priv(netdev);
>+	int i;
>+
>+	/* accurate rx/tx bytes/packets stats */
>+	dev_txq_stats_fold(netdev, stats);
>+	for (i = 0; i < adapter->num_rx_queues; i++) {
>+		struct ixgbe_ring *ring = adapter->rx_ring[i];
>+		u64 bytes, packets;
>+		unsigned int start;
>+
>+		do {
>+			start = u64_stats_fetch_begin_bh(&ring->syncp);
>+			packets = ring->stats.packets;
>+			bytes   = ring->stats.bytes;
>+		} while (u64_stats_fetch_retry_bh(&ring->syncp, start));
>+		stats->rx_packets += packets;
>+		stats->rx_bytes   += bytes;
>+	}
>+
>+	/* following stats updated by ixgbe_watchdog_task() */
>+	stats->multicast	= netdev->stats.multicast;
>+	stats->rx_errors	= netdev->stats.rx_errors;
>+	stats->rx_length_errors	= netdev->stats.rx_length_errors;
>+	stats->rx_crc_errors	= netdev->stats.rx_crc_errors;
>+	stats->rx_missed_errors	= netdev->stats.rx_missed_errors;
>+	return stats;
>+}
>+
>+
> static const struct net_device_ops ixgbe_netdev_ops = {
> 	.ndo_open		= ixgbe_open,
> 	.ndo_stop		= ixgbe_close,
>@@ -6578,6 +6611,7 @@ static const struct net_device_ops ixgbe_netdev_ops =
>{
> 	.ndo_set_vf_vlan	= ixgbe_ndo_set_vf_vlan,
> 	.ndo_set_vf_tx_rate	= ixgbe_ndo_set_vf_bw,
> 	.ndo_get_vf_config	= ixgbe_ndo_get_vf_config,
>+	.ndo_get_stats64	= ixgbe_get_stats64,
> #ifdef CONFIG_NET_POLL_CONTROLLER
> 	.ndo_poll_controller	= ixgbe_netpoll,
> #endif
>

Eric,

We are seeing intermittent hangs on ia32 arch which seem to point to this patch:

BUG: unable to handle kernel NULL pointer dereference at 00000040
 IP: [<f7f6b537>] ixgbe_get_stats64+0x47/0x120 [ixgbe]
 *pdpt = 000000002dc83001 *pde = 000000032d7e5067
 Oops: 0000 [#2] SMP
 last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
 Modules linked in: act_skbedit cls_u32 sch_multiq ixgbe mdio netconsole configfs autofs4 8021q garp stp llc sunrpc ipv6 e
xt3 jbd dm_mirror dm_region_hash dm_log dm_round_robin dm_multipath power_meter hwmon sg ses enclosure dcdbas pcspkr serio_raw iTCO_wdt iTCO_vendor_support io
atdma dca i7core_edac edac_core bnx2 ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix megaraid_sas dm_mod [last unloaded: mperf
]

 Pid: 1939, comm: irqbalance Tainted: G      D W   2.6.36-rc7-upstream-net-next-2.6-ixgbe-queue-i386-g55e1a84 #1 09CGW2/Po
werEdge T610
 EIP: 0060:[<f7f6b537>] EFLAGS: 00010206 CPU: 0
 EIP is at ixgbe_get_stats64+0x47/0x120 [ixgbe]
 EAX: 00000000 EBX: ecc45e4c ECX: ebea0400 EDX: 00000000
 ESI: ebea0000 EDI: 00000018 EBP: f7f846a0 ESP: ecc45d88
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
 Process irqbalance (pid: 1939, ti=ecc44000 task=efc63a50 task.ti=ecc44000)
 Stack:
 ecc45e4c 00000000 ebea0400 ebea0000 ecc45e4c ebea0000 ecc45f04 f7f846a0
 <0> c0750593 ebea0000 edfec340 ebea0000 000002b5 c075063a edfec340 c0993eec
 <0> ebe78000 0000021c 00000000 00000009 00000000 00000000 00000000 00000000
 Call Trace:
 [<c0750593>] ? dev_get_stats+0x33/0xc0
 [<c075063a>] ? dev_seq_printf_stats+0x1a/0x180
 [<c07507aa>] ? dev_seq_show+0xa/0x20
 [<c052398f>] ? seq_read+0x22f/0x3d0
 [<c0523760>] ? seq_read+0x0/0x3d0
 [<c054fdde>] ? proc_reg_read+0x5e/0x90
 [<c054fd80>] ? proc_reg_read+0x0/0x90
 [<c050a1dd>] ? vfs_read+0x9d/0x160
 [<c049d4ef>] ? audit_syscall_entry+0x20f/0x230
 [<c050a971>] ? sys_read+0x41/0x70
 [<c0409cdf>] ? sysenter_do_call+0x12/0x28
 Code: 60 4f 7e c8 8b 44 24 08 8b b8 20 06 00 00 85 ff 7e 63 c7 44 24 04 00 00 00 00 66 90 8b 54 24 04 8b 4c 24 08 8b 84 9
1 00 05 00 00 <8b> 50 40 eb 06 8d 74 26 00 89 ca f6 c2 01 0f 85 ae 00 00 00 8b
 EIP: [<f7f6b537>] ixgbe_get_stats64+0x47/0x120 [ixgbe] SS:ESP 0068:ecc45d88
 CR2: 0000000000000040
 ---[ end trace 51ea89f4e57f54f1 ]---

Emil

^ permalink raw reply

* Re: [patch] fix stack overflow in pktgen_if_write()
From: Dan Carpenter @ 2010-10-27 22:40 UTC (permalink / raw)
  To: nelhage
  Cc: Eric Dumazet, David S. Miller, Robert Olsson, Andy Shevchenko,
	netdev
In-Reply-To: <20101027221234.GN6062@bicker>

On Thu, Oct 28, 2010 at 12:12:35AM +0200, Dan Carpenter wrote:
> -		char tb[count + 1];
> -		if (copy_from_user(tb, user_buffer, count))
> -			return -EFAULT;
> -		tb[count] = 0;
> +		char *tb;
> +
> +		tb = strndup_user(user_buffer, count + 1);

Crap...  This should be memdup_user().

Sorry about that.  I'll send v2.

regards,
dan carpenter

> +		if (IS_ERR(tb))
> +			return PTR_ERR(tb);
>  		printk(KERN_DEBUG "pktgen: %s,%lu  buffer -:%s:-\n", name,
>  		       (unsigned long)count, tb);
> +		kfree(tb);
>  	}
>  
>  	if (!strcmp(name, "min_pkt_size")) {

^ permalink raw reply

* [patch v2] fix stack overflow in pktgen_if_write()
From: Dan Carpenter @ 2010-10-27 22:43 UTC (permalink / raw)
  To: nelhage
  Cc: Eric Dumazet, David S. Miller, Robert Olsson, Andy Shevchenko,
	netdev
In-Reply-To: <20101027221234.GN6062@bicker>

Nelson Elhage says he was able to oops both amd64 and i386 test 
machines with 8k writes to the pktgen file.  Let's just allocate the
buffer on the heap instead of on the stack.

This can only be triggered by root so there are no security issues here.

Reported-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: Dan Carpenter <error27@gmail.com>
---
I saw this on twitter.  Hi Nelson, could you test this?

V2:  strndup_user() => memdup_user()  

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 2c0df0f..b5d3c70 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -887,12 +887,14 @@ static ssize_t pktgen_if_write(struct file *file,
 	i += len;
 
 	if (debug) {
-		char tb[count + 1];
-		if (copy_from_user(tb, user_buffer, count))
-			return -EFAULT;
-		tb[count] = 0;
+		char *tb;
+
+		tb = memdup_user(user_buffer, count + 1);
+		if (IS_ERR(tb))
+			return PTR_ERR(tb);
 		printk(KERN_DEBUG "pktgen: %s,%lu  buffer -:%s:-\n", name,
 		       (unsigned long)count, tb);
+		kfree(tb);
 	}
 
 	if (!strcmp(name, "min_pkt_size")) {


^ permalink raw reply related

* Re: [RFC][net-next-2.6 PATCH v2] 8021q: set hard_header_len when VLAN offload features are toggled
From: Jesse Gross @ 2010-10-27 23:04 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev@vger.kernel.org, bhutchings@solarflare.com
In-Reply-To: <4CC89C3A.7000209@intel.com>

On Wed, Oct 27, 2010 at 2:40 PM, John Fastabend
<john.r.fastabend@intel.com> wrote:
> On 10/26/2010 7:05 PM, Jesse Gross wrote:
>> On Tue, Oct 26, 2010 at 2:59 PM, John Fastabend
>> <john.r.fastabend@intel.com> wrote:
>>> Toggling the vlan tx|rx hw offloads needs to set the hard_header_len
>>> as well otherwise we end up using LL_RESERVED_SPACE incorrectly.
>>> This results in pskb_expand_head() being used unnecessarily.
>>>
>>> This add a check in vlan_transfer_features  to catch the ETH_FLAG_TXVLAN
>>> flag and set the header length. This requires drivers to add the
>>> ETH_FLAG_TXVLAN to vlan_features.
>>>
>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>
>> I think this addresses all of the original problems.  However, I don't
>> think that we want to have drivers claim to support vlan offloading as
>> a feature for vlan packets.  That implies some type of QinQ
>> functionality to me.  In addition, if the vlan device claims to
>> support offloading and a second vlan device is stacked on top of it,
>> then the two will clobber skb->vlan_tci.  It's probably simpler to
>> just keep track of whether vlan offloading is currently enabled so we
>> can find out whether it changed.
>>
>
> Agreed. Rather then trying to be clever this is probably the easiest.
>
> --- a/net/8021q/vlan.c
> +++ b/net/8021q/vlan.c
> @@ -334,6 +334,12 @@ Hunk #1, a/net/8021q/vlan.c static void vlan_transfer_features(struct net_device *dev,
>        vlandev->features &= ~dev->vlan_features;
>        vlandev->features |= dev->features & dev->vlan_features;
>        vlandev->gso_max_size = dev->gso_max_size;
> +
> +       if (dev->features & NETIF_F_HW_VLAN_TX)
> +               vlandev->hard_header_len = dev->hard_header_len;
> +       else
> +               vlandev->hard_header_len = dev->hard_header_len + VLAN_HLEN;
> +

Great, that's even simpler than I was thinking.

I think this series is ready to go.

^ permalink raw reply

* Re: [patch v2] fix stack overflow in pktgen_if_write()
From: Nelson Elhage @ 2010-10-27 23:06 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Eric Dumazet, David S. Miller, Robert Olsson, Andy Shevchenko,
	netdev
In-Reply-To: <20101027224302.GQ6062@bicker>

You want to add a trailing NUL, or else printk will read off the end of the
buffer.

Also, by memdup()ing count + 1 bytes, you're technically reading one more byte
than userspace asked for, which could in principle lead to a spurious EFAULT.

- Nelson

On Thu, Oct 28, 2010 at 12:43:02AM +0200, Dan Carpenter wrote:
> Nelson Elhage says he was able to oops both amd64 and i386 test 
> machines with 8k writes to the pktgen file.  Let's just allocate the
> buffer on the heap instead of on the stack.
> 
> This can only be triggered by root so there are no security issues here.
> 
> Reported-by: Nelson Elhage <nelhage@ksplice.com>
> Signed-off-by: Dan Carpenter <error27@gmail.com>
> ---
> I saw this on twitter.  Hi Nelson, could you test this?
> 
> V2:  strndup_user() => memdup_user()  
> 
> diff --git a/net/core/pktgen.c b/net/core/pktgen.c
> index 2c0df0f..b5d3c70 100644
> --- a/net/core/pktgen.c
> +++ b/net/core/pktgen.c
> @@ -887,12 +887,14 @@ static ssize_t pktgen_if_write(struct file *file,
>  	i += len;
>  
>  	if (debug) {
> -		char tb[count + 1];
> -		if (copy_from_user(tb, user_buffer, count))
> -			return -EFAULT;
> -		tb[count] = 0;
> +		char *tb;
> +
> +		tb = memdup_user(user_buffer, count + 1);
> +		if (IS_ERR(tb))
> +			return PTR_ERR(tb);
>  		printk(KERN_DEBUG "pktgen: %s,%lu  buffer -:%s:-\n", name,
>  		       (unsigned long)count, tb);
> +		kfree(tb);
>  	}
>  
>  	if (!strcmp(name, "min_pkt_size")) {
> 

^ permalink raw reply

* Re: IPV6 raw socket denies bind(2)
From: Brian Haley @ 2010-10-27 23:54 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: netdev, David S. Miller, Eric Dumazet
In-Reply-To: <alpine.LNX.2.01.1010280000260.7820@obet.zrqbmnf.qr>

On 10/27/2010 06:01 PM, Jan Engelhardt wrote:
> int main(void)
> {
> 	struct sockaddr_in6 src = {};
> 	int sk;
> 
> 	sk = socket(AF_INET6, SOCK_RAW, IPPROTO_UDP);
> 	memset(&src, 0, sizeof(src));
> 	inet_pton(AF_INET6, "::1", &src);
> 	src.sin6_family = AF_INET6;
> 
> 	if (bind(sk, (void *)&src, sizeof(src)) < 0) {
> 		perror("bind");
> 		abort();
> 	}
> 	return 0;
> }

You're trashing the sockaddr, try this patch:

< 	inet_pton(AF_INET6, "::1", &src);
---
> 	inet_pton(AF_INET6, "::1", &src.sin6_addr);

-Brian

^ permalink raw reply

* Re: [PATCH 2.6.36/stable v2] vlan: Fix crash when hwaccel rx pkt for non-existant vlan.
From: Jesse Gross @ 2010-10-28  0:11 UTC (permalink / raw)
  To: Ben Greear; +Cc: netdev
In-Reply-To: <1288112797-21550-1-git-send-email-greearb@candelatech.com>

On Tue, Oct 26, 2010 at 10:06 AM, Ben Greear <greearb@candelatech.com> wrote:
> The vlan_hwaccel_do_receive code expected skb->dev to always
> be a vlan device, but if the NIC was promisc, and the VLAN
> for a particular VID was not configured, then this method
> could receive a packet where skb->dev was NOT a vlan
> device.  This caused access of bad memory and a crash.
>
> Signed-off-by: Ben Greear <greearb@candelatech.com>
> ---
> v1 -> v2:  Simplify patch..no need for setting pkt-type, etc.
>
> :100644 100644 0eb96f7... 0687b6c... M  net/8021q/vlan_core.c
> :100644 100644 660dd41... 5dc45b9... M  net/core/dev.c
>  net/8021q/vlan_core.c |    3 +++
>  net/core/dev.c        |    5 +++--
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
> index 0eb96f7..0687b6c 100644
> --- a/net/8021q/vlan_core.c
> +++ b/net/8021q/vlan_core.c
> @@ -43,6 +43,9 @@ int vlan_hwaccel_do_receive(struct sk_buff *skb)
>        struct net_device *dev = skb->dev;
>        struct vlan_rx_stats     *rx_stats;
>
> +       if (!is_vlan_dev(dev))
> +               return 0;
> +
>        skb->dev = vlan_dev_info(dev)->real_dev;
>        netif_nit_deliver(skb);
>

What if we dropped any packet with a tag in skb->vlan_tci before it
gets to the bridge hooks?  That would accomplish the original goal of
getting packets to tcpdump while preventing them from making it to
places where they aren't expected,  It will provide the same behavior
as earlier kernels.

> diff --git a/net/core/dev.c b/net/core/dev.c
> index 660dd41..5dc45b9 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2828,8 +2828,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>        if (!netdev_tstamp_prequeue)
>                net_timestamp_check(skb);
>
> -       if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
> -               return NET_RX_SUCCESS;
> +       if (vlan_tx_tag_present(skb))
> +               /* This method cannot fail at this time. */
> +               vlan_hwaccel_do_receive(skb);

This is correct but it's not a bugfix, so I'm not sure that it should
go to -stable.  It's already been fixed for 2.6.37.

^ permalink raw reply

* Re: [PATCH 2.6.36/stable v2] vlan: Fix crash when hwaccel rx pkt for non-existant vlan.
From: Ben Greear @ 2010-10-28  0:15 UTC (permalink / raw)
  To: Jesse Gross; +Cc: netdev
In-Reply-To: <AANLkTi=EHVBSNNmsts4xTVZ2DGTBD92mHnpP0e5ZYEx1@mail.gmail.com>

On 10/27/2010 05:11 PM, Jesse Gross wrote:
> On Tue, Oct 26, 2010 at 10:06 AM, Ben Greear<greearb@candelatech.com>  wrote:
>> The vlan_hwaccel_do_receive code expected skb->dev to always
>> be a vlan device, but if the NIC was promisc, and the VLAN
>> for a particular VID was not configured, then this method
>> could receive a packet where skb->dev was NOT a vlan
>> device.  This caused access of bad memory and a crash.
>>
>> Signed-off-by: Ben Greear<greearb@candelatech.com>
>> ---
>> v1 ->  v2:  Simplify patch..no need for setting pkt-type, etc.
>>
>> :100644 100644 0eb96f7... 0687b6c... M  net/8021q/vlan_core.c
>> :100644 100644 660dd41... 5dc45b9... M  net/core/dev.c
>>   net/8021q/vlan_core.c |    3 +++
>>   net/core/dev.c        |    5 +++--
>>   2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c
>> index 0eb96f7..0687b6c 100644
>> --- a/net/8021q/vlan_core.c
>> +++ b/net/8021q/vlan_core.c
>> @@ -43,6 +43,9 @@ int vlan_hwaccel_do_receive(struct sk_buff *skb)
>>         struct net_device *dev = skb->dev;
>>         struct vlan_rx_stats     *rx_stats;
>>
>> +       if (!is_vlan_dev(dev))
>> +               return 0;
>> +
>>         skb->dev = vlan_dev_info(dev)->real_dev;
>>         netif_nit_deliver(skb);
>>
>
> What if we dropped any packet with a tag in skb->vlan_tci before it
> gets to the bridge hooks?  That would accomplish the original goal of
> getting packets to tcpdump while preventing them from making it to
> places where they aren't expected,  It will provide the same behavior
> as earlier kernels.

The VLAN code has changed a lot since I messed with it last, so
there very well may be better ways to fix this than what I came up
with.  Please propose a patch if you have a suggestion.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* multi-machine simultaneous kernel panic in tcp_transmit_kcb
From: Doug Hughes @ 2010-10-28  1:04 UTC (permalink / raw)
  To: netdev

3 machines within 1 minute of each other (odd, by itself, but not the 
root of the question).

2 of this:
2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64 x86_64 
x86_64 GNU/Linux
(I have a screen shot on the kvm)
all Cent 5.4

1 Xen instances with 2.6.18-128.1.14.el5xen #1 SMP Wed Jun 17 07:10:16 
EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

a slightly older kernel but crashed within one minute of the other two. 
Since it's a xen, I have a text traceback:

  Pid: 0, comm: swapper Not tainted 2.6.18-128.1.14.el5xen #1
RIP: e030:[<ffffffff8040e077>]  [<ffffffff8040e077>] pskb_copy+0x133/0x1b1
RSP: e02b:ffffffff8066ade0  EFLAGS: 00010282
RAX: ffff8800325fa120 RBX: ffff8800434f5780 RCX: ffff88006d311930
RDX: 656363612f647074 RSI: ffff8800325fa130 RDI: 0000000000000002
RBP: ffff8800549aa680 R08: 7ffffffffffffffe R09: 0000000000000000
R10: ffff8800434f5780 R11: 00000000000000c8 R12: 0000000000000220
R13: ffff8800549aa680 R14: 0000000000000000 R15: ffffffffff578000
FS:  00002b84514af260(0000) GS:ffffffff805ba000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process swapper (pid: 0, threadinfo ffffffff8062a000, task ffffffff804e0a80)
Stack:  ffffffff802886d9  ffff88006ad3d280  0000000000000001  ffffffff80222485
  ffff880025665380  000000017d0f80ab  0000000000000001  ffff88006ad3d280
  ffff8800549aa680  00000000ffffff8f
Call Trace:
<IRQ>  [<ffffffff802886d9>] rebalance_tick+0x18b/0x3d4
  [<ffffffff80222485>] tcp_transmit_skb+0x73/0x667
  [<ffffffff8043903a>] tcp_retransmit_skb+0x53d/0x638
  [<ffffffff8043a569>] tcp_write_timer+0x0/0x68e
  [<ffffffff8043a9d6>] tcp_write_timer+0x46d/0x68e
  [<ffffffff80291f8b>] run_timer_softirq+0x13f/0x1c6
  [<ffffffff802130d6>] __do_softirq+0x8d/0x13b
  [<ffffffff80260da4>] call_softirq+0x1c/0x278
  [<ffffffff8026e0be>] do_softirq+0x31/0x98
  [<ffffffff8026df39>] do_IRQ+0xec/0xf5
  [<ffffffff803a7b94>] evtchn_do_upcall+0x13b/0x1fb
  [<ffffffff802608d6>] do_hypervisor_callback+0x1e/0x2c
<EOI>  [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
  [<ffffffff802063aa>] hypercall_page+0x3aa/0x1000
  [<ffffffff8026f511>] raw_safe_halt+0x84/0xa8
  [<ffffffff8026ca52>] xen_idle+0x38/0x4a
  [<ffffffff8024b0d8>] cpu_idle+0x97/0xba
  [<ffffffff80634b09>] start_kernel+0x21f/0x224
  [<ffffffff806341e5>] _sinittext+0x1e5/0x1eb


Code: 48 8b 02 25 00 40 02 00 48 3d 00 40 02 00 75 04 48 8b 52 10
RIP  [<ffffffff8040e077>] pskb_copy+0x133/0x1b1
  RSP <ffffffff8066ade0>
<0>Kernel panic - not syncing: Fatal exception


---

The first 4 lines of the trace on the xen and the non-xen are the same 
except for the addresses.

In fact, they are the same up until the 9th line where they start to 
diverge a little bit.

The last thing in the kern log before the crash on one was an nfs server 
not responding, but those happen sporadically and often enough that I 
don't suspect it's related.

Given that its looks, seemed like an appropriate question for netdev 
(following a failed google search)






^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox