Netdev List
 help / color / mirror / Atom feed
* Re: multi-machine simultaneous kernel panic in tcp_transmit_kcb
From: Ben Hutchings @ 2010-10-28 20:42 UTC (permalink / raw)
  To: Doug Hughes; +Cc: netdev
In-Reply-To: <4CC8CC0A.5000705@will.to>

Doug Hughes wrote:
> 3 machines within 1 minute of each other (odd, by itself, but not the 
> root of the question).
> 
> 2 of this:
> 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64 x86_64 
> x86_64 GNU/Linux
> (I have a screen shot on the kvm)
> all Cent 5.4
[...]
 
Please don't ask netdev to support an old distribution kernel.  Try
asking on CentOS support forums or buy support from Red Hat.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* ethtool: missing implementation of n_priv_flags
From: Brandeburg, Jesse @ 2010-10-28 20:34 UTC (permalink / raw)
  To: jgarzik@redhat.com
  Cc: netdev@vger.kernel.org, Wyborny, Carolyn, Brandeburg, Jesse

Was just looking at implementing the driver private flags to add a new feature to 
a driver.

It appears that nothing in the core ethtool.c ever accesses or prints n_priv_flags, 
even if a driver assigns it (which none in the kernel currently do)

Is this just an oversight?

^ permalink raw reply

* Re: [PATCH] cxgb4vf: fix crash due to manipulating queues before registration
From: David Miller @ 2010-10-28 20:21 UTC (permalink / raw)
  To: leedom; +Cc: netdev
In-Reply-To: <1288297007-663-1-git-send-email-leedom@chelsio.com>

From: Casey Leedom <leedom@chelsio.com>
Date: Thu, 28 Oct 2010 13:16:47 -0700

> Before commit "net: allocate tx queues in register_netdevice"
> netif_tx_stop_all_queues and related functions could be used between
> device allocation and registration but now only after registration.
> cxgb4 has such a call before registration and crashes now.  Move it
> after register_netdev.
> 
> Signed-off-by: Casey Leedom <leedom@chelsio.com>

Why are you manipulating the queue at all here?

The queue state is "don't care" at this point in time,
and has no meaning until ->open() is invoked.

^ permalink raw reply

* [PATCH] cxgb4vf: fix crash due to manipulating queues before registration
From: Casey Leedom @ 2010-10-28 20:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, Casey Leedom

Before commit "net: allocate tx queues in register_netdevice"
netif_tx_stop_all_queues and related functions could be used between
device allocation and registration but now only after registration.
cxgb4 has such a call before registration and crashes now.  Move it
after register_netdev.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
---
 drivers/net/cxgb4vf/cxgb4vf_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb4vf/cxgb4vf_main.c b/drivers/net/cxgb4vf/cxgb4vf_main.c
index 555ecc5..b9d92a5 100644
--- a/drivers/net/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/cxgb4vf/cxgb4vf_main.c
@@ -2600,7 +2600,6 @@ static int __devinit cxgb4vf_pci_probe(struct pci_dev *pdev,
 		pi->xact_addr_filt = -1;
 		pi->rx_offload = RX_CSO;
 		netif_carrier_off(netdev);
-		netif_tx_stop_all_queues(netdev);
 		netdev->irq = pdev->irq;
 
 		netdev->features = (NETIF_F_SG | NETIF_F_TSO | NETIF_F_TSO6 |
@@ -2661,6 +2660,7 @@ static int __devinit cxgb4vf_pci_probe(struct pci_dev *pdev,
 			continue;
 		}
 
+		netif_tx_stop_all_queues(netdev);
 		set_bit(pidx, &adapter->registered_device_map);
 	}
 	if (adapter->registered_device_map == 0) {
-- 
1.7.0.4


^ permalink raw reply related

* Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
From: Shirley Ma @ 2010-10-28 20:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: David Miller, netdev, kvm, linux-kernel
In-Reply-To: <1288294355.11251.20.camel@localhost.localdomain>

On Thu, 2010-10-28 at 12:32 -0700, Shirley Ma wrote:
> Also I found a big TX regression for old guest and new guest. For old
> guest, I am able to get almost 11Gb/s for 2K message size, but for the
> new guest kernel, I can only get 3.5 Gb/s with the patch and same
> host.
> I will dig it why. 

The regression is from guest kernel, not from this patch. Tested 2.6.31
kernel, it's performance is less than 2Gb/s for 2K message size already.
I will resubmit the patch for review. 

I will start to test from 2.6.30 kernel to figure it when TX regression
induced in virtio_net. Any suggestion which guest kernel I should test
to figure out this regression?

Thanks
Shirley


^ permalink raw reply

* Re: [Just for fun] loopback: avoid softirq on most transmits
From: David Miller @ 2010-10-28 20:09 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1288295326.2711.35.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 28 Oct 2010 21:48:46 +0200

> With the introduction of xmit_recursion percpu variable, its pretty
> cheap to check our recursion level in loopback transmit, and avoid
> raising softirq.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> tbench faster by 4%, sorry I couldnt resist...

Hehehe :-)

Maybe even that limit is low enough to prevent stack overflow
situations even when doing NFS over a loopback to a raid volume using
XFS as the filesystem which seems to be the standard stack usage
stress test.

But really, just like DST iteration, we should probably make these
things more iterative.

The cool thing about loopback is that we have a trigger for the cases
we care about, release_sock().

So we could have something like:

1) lock_sock() sets "local cpu will run release_sock()" mark.

2) netif_rx() checks mark, if set it puts SKB on "release_sock()
   local cpu work queue"

3) release_sock() retains mark, and runs SKB queue until empty.
   Once SKB work queue is empty, mark is cleared.

Anyways, just an idea.

^ permalink raw reply

* Re: [Security] TIPC security issues
From: Paul Gortmaker @ 2010-10-28 19:51 UTC (permalink / raw)
  To: David Miller
  Cc: torvalds, drosenberg, jon.maloy, allan.stephens, netdev, security
In-Reply-To: <20101027.105047.183059900.davem@davemloft.net>

[Re: [Security] TIPC security issues] On 27/10/2010 (Wed 10:50) David Miller wrote:

> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Wed, 27 Oct 2010 10:37:46 -0700
> 
> > If you _really_ care deeply, then some packet-oriented protocol can
> > just have its own private packet size limit (which would be way less
> > than 2GB), and then just look at the total size and say "oh, the total
> > size is bigger than my limit, so I'll just error out". Then, the fact
> > that verify_iovec() may have truncated the message to 2GB-1 doesn't
> > matter at all.
> > 
> > (Practically speaking, I bet all packet-oriented protocols already
> > have a limit that is enforced by simply allocation patterns, so I
> > don't think it's actually a problem even now)
> 
> This is, as it turns out, effectively what the TIPC socket layer
> already does.
> 
> Most of the send calls that propagate down to this code adding up the
> iov_len lengths gets passed a maximum packet size.
> 

In keeping with this idea, perhaps this is a better solution for getting
an immediate fix to the tipc part of this issue than the previous
patches I'd sent?  I can see some immediate advantages to this:

   -it adds checks that arguably should have been there since day
    one, since it is always best to check for garbage input ASAP.

   -it is a much smaller change, and thus easier to review and have
    confidence in

   -by being smaller and clearer, it lends itself better for being
    directly cherry picked onto the -stable release(s).

We'll still need to clean up the mishmash of variable types being
used in the tipc internals, but at least we can then do that in
a development cycle, and we won't have to inflict those bigger
cleanup changesets back onto GregKH.

Paul.

----

>From 3fb200c1b27cf5cde668888ab85cffb1e9c6314f Mon Sep 17 00:00:00 2001
From: Allan Stephens <Allan.Stephens@windriver.com>
Date: Thu, 28 Oct 2010 07:58:24 -0400
Subject: [PATCH] tipc: Fix security hole exploitable by excessive send requests

Add checks to TIPC's socket send routines to promptly detect and
abort attempts to send more than 66,000 bytes in a single TIPC
message, or more than 2**31-1 bytes in a single TIPC byte stream
request.  This prevents excessively large size_t based inputs from
reaching internal tipc routines that currently use int values where
they risk being truncated or incorrectly wrapped.

The three checks are added to send_msg() send_packet() and
send_stream() -- all of which are entered via proto_ops .sendmsg, which
in turn already checked for msg_iovlen > UIO_MAXIOV [in net/socket.c],
so there is no need to repeat that specific test in these new checks.

Reported-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 include/linux/tipc.h |    2 +-
 net/tipc/socket.c    |   10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/include/linux/tipc.h b/include/linux/tipc.h
index d10614b..1fd2889 100644
--- a/include/linux/tipc.h
+++ b/include/linux/tipc.h
@@ -101,7 +101,7 @@ static inline unsigned int tipc_node(__u32 addr)
  * Limiting values for messages
  */
 
-#define TIPC_MAX_USER_MSG_SIZE	66000
+#define TIPC_MAX_USER_MSG_SIZE	66000U
 
 /*
  * Message importance levels
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 33217fc..3562cf9 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -542,6 +542,8 @@ static int send_msg(struct kiocb *iocb, struct socket *sock,
 	if (unlikely((m->msg_namelen < sizeof(*dest)) ||
 		     (dest->family != AF_TIPC)))
 		return -EINVAL;
+	if (total_len > TIPC_MAX_USER_MSG_SIZE)
+		return -EMSGSIZE;
 
 	if (iocb)
 		lock_sock(sk);
@@ -649,6 +651,9 @@ static int send_packet(struct kiocb *iocb, struct socket *sock,
 	if (unlikely(dest))
 		return send_msg(iocb, sock, m, total_len);
 
+	if (total_len > TIPC_MAX_USER_MSG_SIZE)
+		return -EMSGSIZE;
+
 	if (iocb)
 		lock_sock(sk);
 
@@ -733,6 +738,11 @@ static int send_stream(struct kiocb *iocb, struct socket *sock,
 		goto exit;
 	}
 
+	if (total_len > (unsigned)INT_MAX) {
+		res = -EMSGSIZE;
+		goto exit;
+	}
+
 	/*
 	 * Send each iovec entry using one or more messages
 	 *
-- 
1.7.3.1


^ permalink raw reply related

* [Just for fun] loopback: avoid softirq on most transmits
From: Eric Dumazet @ 2010-10-28 19:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

With the introduction of xmit_recursion percpu variable, its pretty
cheap to check our recursion level in loopback transmit, and avoid
raising softirq.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
tbench faster by 4%, sorry I couldnt resist...

 drivers/net/loopback.c    |   13 +++++++++++--
 include/linux/netdevice.h |    3 +++
 net/core/dev.c            |    2 +-
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 2d9663a..5bd73c0 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -74,7 +74,7 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
 				 struct net_device *dev)
 {
 	struct pcpu_lstats *lb_stats;
-	int len;
+	int len, res;
 
 	skb_orphan(skb);
 
@@ -84,7 +84,16 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
 	lb_stats = this_cpu_ptr(dev->lstats);
 
 	len = skb->len;
-	if (likely(netif_rx(skb) == NET_RX_SUCCESS)) {
+
+	/*
+	 * avoid raising softirq if our recursion level is low
+	 */
+	if (likely(__this_cpu_read(xmit_recursion) <= 2))
+		res = netif_receive_skb(skb);
+	else
+		res = netif_rx(skb);
+
+	if (likely(res == NET_RX_SUCCESS)) {
 		u64_stats_update_begin(&lb_stats->syncp);
 		lb_stats->bytes += len;
 		lb_stats->packets++;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 072652d..918330b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1741,6 +1741,9 @@ extern void dev_kfree_skb_any(struct sk_buff *skb);
 extern int		netif_rx(struct sk_buff *skb);
 extern int		netif_rx_ni(struct sk_buff *skb);
 #define HAVE_NETIF_RECEIVE_SKB 1
+
+DECLARE_PER_CPU(int, xmit_recursion);
+
 extern int		netif_receive_skb(struct sk_buff *skb);
 extern gro_result_t	dev_gro_receive(struct napi_struct *napi,
 					struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 35dfb83..aadf09b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2208,7 +2208,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 	return rc;
 }
 
-static DEFINE_PER_CPU(int, xmit_recursion);
+DEFINE_PER_CPU(int, xmit_recursion);
 #define RECURSION_LIMIT 10
 
 /**



^ permalink raw reply related

* Re: [PATCH] ip_gre: fix fallback tunnel setup
From: Eric Dumazet @ 2010-10-28 19:34 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: David Miller, netdev@vger.kernel.org
In-Reply-To: <4CC9CF06.1090806@parallels.com>

Le jeudi 28 octobre 2010 à 23:29 +0400, Pavel Emelyanov a écrit :

> Indeed. I missed the fact, that the gre driver uses ndo_init for
> all devices including the fb one :(
> 
> Acked-by: Pavel Emelyanov <xemul@openvz.org>

Well, I discovered this right now, it was not that obvious, maybe we
should use similar setup for all fb tunnels as well...

Thanks !



^ permalink raw reply

* Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
From: Shirley Ma @ 2010-10-28 19:32 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: David Miller, netdev, kvm, linux-kernel
In-Reply-To: <20101028052021.GD5599@redhat.com>

On Thu, 2010-10-28 at 07:20 +0200, Michael S. Tsirkin wrote:
> My concern is this can delay signalling for unlimited time.
> Could you pls test this with guests that do not have
> 2b5bbe3b8bee8b38bdc27dd9c0270829b6eb7eeb
> b0c39dbdc204006ef3558a66716ff09797619778
> that is 2.6.31 and older? 

The patch only induces delay signaling unlimited time when there is no
TX packet to transmit. I thought TX signaling only noticing guest to
release the used buffers, anything else beside this?

I tested rhel5u5 guest (2.6.18 kernel), it works fine. I checked the two
commits log, I don't think this patch could cause any issue w/o these
two patches.

Also I found a big TX regression for old guest and new guest. For old
guest, I am able to get almost 11Gb/s for 2K message size, but for the
new guest kernel, I can only get 3.5 Gb/s with the patch and same host.
I will dig it why.

thanks
Shirley


^ permalink raw reply

* Re: [PATCH] ip_gre: fix fallback tunnel setup
From: Pavel Emelyanov @ 2010-10-28 19:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev@vger.kernel.org
In-Reply-To: <1288293079.2711.19.camel@edumazet-laptop>

On 10/28/2010 11:11 PM, Eric Dumazet wrote:
> Le jeudi 28 octobre 2010 à 20:47 +0200, Eric Dumazet a écrit :
>> Le jeudi 28 octobre 2010 à 11:41 -0700, David Miller a écrit :
>>> I am still able to revert this I think without screwing up
>>> publicly visible history, so I will double check and do the
>>> revert if I can.
>>
>> Cool, I'll provide a patch in a couple of minutes, when tested.
>>
> 
> I believe the right fix is this one, Pavel what do you think ?

Indeed. I missed the fact, that the gre driver uses ndo_init for
all devices including the fb one :(

Acked-by: Pavel Emelyanov <xemul@openvz.org>

> With your patch, we allocate the per_cpu data twice for the fallback
> tunnel, thus leaking memory.

Yup...

> Thanks

^ permalink raw reply

* [PATCH] ip_gre: fix fallback tunnel setup
From: Eric Dumazet @ 2010-10-28 19:11 UTC (permalink / raw)
  To: David Miller; +Cc: xemul, netdev
In-Reply-To: <1288291679.2711.1.camel@edumazet-laptop>

Le jeudi 28 octobre 2010 à 20:47 +0200, Eric Dumazet a écrit :
> Le jeudi 28 octobre 2010 à 11:41 -0700, David Miller a écrit :
> > I am still able to revert this I think without screwing up
> > publicly visible history, so I will double check and do the
> > revert if I can.
> 
> Cool, I'll provide a patch in a couple of minutes, when tested.
> 

I believe the right fix is this one, Pavel what do you think ?

With your patch, we allocate the per_cpu data twice for the fallback
tunnel, thus leaking memory.

Thanks

Note: free_percpu(NULL) is legal

[PATCH] ip_gre: fix fallback tunnel setup

Before making the fallback tunnel visible to lookups, we should make
sure it is completely setup, once ipgre_tunnel_init() had been called
and tstats per_cpu pointer allocated.

move rcu_assign_pointer(ign->tunnels_wc[0], tunnel); from
ipgre_fb_tunnel_init() to ipgre_init_net()

Based on a patch from Pavel Emelyanov

Reported-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/ip_gre.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 01087e0..70ff77f 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1325,7 +1325,6 @@ static void ipgre_fb_tunnel_init(struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
 	struct iphdr *iph = &tunnel->parms.iph;
-	struct ipgre_net *ign = net_generic(dev_net(dev), ipgre_net_id);
 
 	tunnel->dev = dev;
 	strcpy(tunnel->parms.name, dev->name);
@@ -1336,7 +1335,6 @@ static void ipgre_fb_tunnel_init(struct net_device *dev)
 	tunnel->hlen		= sizeof(struct iphdr) + 4;
 
 	dev_hold(dev);
-	rcu_assign_pointer(ign->tunnels_wc[0], tunnel);
 }
 
 
@@ -1383,10 +1381,12 @@ static int __net_init ipgre_init_net(struct net *net)
 	if ((err = register_netdev(ign->fb_tunnel_dev)))
 		goto err_reg_dev;
 
+	rcu_assign_pointer(ign->tunnels_wc[0],
+			   netdev_priv(ign->fb_tunnel_dev));
 	return 0;
 
 err_reg_dev:
-	free_netdev(ign->fb_tunnel_dev);
+	ipgre_dev_free(ign->fb_tunnel_dev);
 err_alloc_dev:
 	return err;
 }



^ permalink raw reply related

* pci_function_reset from driver
From: Rajesh Borundia @ 2010-10-28 18:52 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, Ameen Rahman, Anirban Chakraborty,
	Amit Salecha

Hi David,

For FLR supported  device can I issue FLR on that 
device from the driver ?

For example in kdump case it is recommended to reset 
the device after loading of crash kernel.Can we issue FLR
from driver in such case(call some function like pci_reset_function).

Rajesh



^ permalink raw reply

* NULL pointer dereference at netxen_nic_probe+0x813/0x9a0
From: Bjorn Helgaas @ 2010-10-28 18:50 UTC (permalink / raw)
  To: Amit Kumar Salecha; +Cc: netdev, linux-kernel

This is on current Linus upstream as of this morning (8128057)
on an HP DL785:

QLogic/NetXen Network Driver v4.0.74
netxen_nic 0000:07:00.0: PCI INT A -> GSI 30 (level, low) -> IRQ 30
netxen_nic 0000:07:00.0: setting latency timer to 64
netxen_nic 0000:07:00.0: 2MB memory map
netxen_nic 0000:07:00.0: loading firmware from flash
netxen_nic 0000:07:00.0: using 64-bit dma mask
kernel: Quad Gig LP Board S/N TI9ABK0266  Chip rev 0x42
netxen_nic 0000:07:00.0: firmware v4.0.520 [legacy]
netxen_nic 0000:07:00.0: irq 72 for MSI/MSI-X
netxen_nic 0000:07:00.0: irq 73 for MSI/MSI-X
netxen_nic 0000:07:00.0: irq 74 for MSI/MSI-X
netxen_nic 0000:07:00.0: irq 75 for MSI/MSI-X
netxen_nic 0000:07:00.0: using msi-x interrupts
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff8160afda>] netxen_nic_probe+0x813/0x9a0
PGD 0 
Oops: 0002 [#1] SMP 
last sysfs file: 
CPU 0 
Modules linked in:

Pid: 1650, comm: work_for_cpu Not tainted 2.6.36-07338-g8128057 #269 /ProLiant DL785 G5   
RIP: 0010:[<ffffffff8160afda>]  [<ffffffff8160afda>] netxen_nic_probe+0x813/0x9a0
RSP: 0018:ffff8806138abe30  EFLAGS: 00010246
RAX: 0000000000000010 RBX: ffff8806139126c0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880613895616 RDI: ffff880613912000
RBP: ffff8806138abe90 R08: 0000000000000000 R09: ffff8806138abb80
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880613912000
R13: ffff8812174f7000 R14: ffff880613912000 R15: ffff8812174f7000
FS:  0000000000000000(0000) GS:ffff8800cfa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000010 CR3: 0000000001c07000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process work_for_cpu (pid: 1650, threadinfo ffff8806138aa000, task ffff880616f12be0)
Stack:
 ffff8812174f7090 0000000000000246 ffff8806138abe90 ffff8812174f7000
 00008806138abfd8 0000000000000282 68cd0025b30068cc ffff880c17439d30
 ffff8812174f7090 ffff8812174f7000 ffff8812174f7208 0000000000000000
Call Trace:
 [<ffffffff81203696>] local_pci_probe+0x48/0x91
 [<ffffffff81052bae>] ? do_work_for_cpu+0x0/0x26
 [<ffffffff81052bc1>] do_work_for_cpu+0x13/0x26
 [<ffffffff81052bae>] ? do_work_for_cpu+0x0/0x26
 [<ffffffff81057a7b>] kthread+0x81/0x89
 [<ffffffff81003854>] kernel_thread_helper+0x4/0x10
 [<ffffffff810579fa>] ? kthread+0x0/0x89
 [<ffffffff81003850>] ? kernel_thread_helper+0x0/0x10
Code: 00 eb 15 49 8d bf 90 00 00 00 48 c7 c6 1b 2e aa 81 31 c0 e8 c0 4e cd ff 4c 89 f7 e8 d6 bb ee ff 49 8b 96 00 03 00 00 48 8d 42 10 <f0> 80 4a 10 01 4c 89 f7 e8 a3 7e ed ff 85 c0 41 89 c4 74 2a 49 
RIP  [<ffffffff8160afda>] netxen_nic_probe+0x813/0x9a0
 RSP <ffff8806138abe30>
CR2: 0000000000000010
---[ end trace 059c7071bbf8de1f ]---

^ permalink raw reply

* Re: [PATCH] ip_gre: fix percpu stats accounting
From: David Miller @ 2010-10-28 18:49 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xemul, netdev
In-Reply-To: <1288291679.2711.1.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 28 Oct 2010 20:47:59 +0200

> Le jeudi 28 octobre 2010 à 11:41 -0700, David Miller a écrit :
>> I am still able to revert this I think without screwing up
>> publicly visible history, so I will double check and do the
>> revert if I can.
> 
> Cool, I'll provide a patch in a couple of minutes, when tested.

Thanks.

^ permalink raw reply

* Re: [Security] TIPC security issues
From: David Miller @ 2010-10-28 18:49 UTC (permalink / raw)
  To: andy.grover
  Cc: torvalds, jon.maloy, netdev, drosenberg, security, allan.stephens,
	rds-devel
In-Reply-To: <4CC9C4B0.50404@oracle.com>

From: Andy Grover <andy.grover@oracle.com>
Date: Thu, 28 Oct 2010 11:45:04 -0700

> Yes that's right, it's to map a memory region that will be the target
> of an RDMA operation. I don't know why struct rds_iovec was used
> instead of struct iovec, but I think we're stuck, since it's part of
> our socket API.
> 
> I'll send DaveM patches to fix those two immediately-identified
> problems today, and we'll take a good long look at the rest of the
> code for further issues.

FWIW, I would strongly suggest that you copy the iovecs into the
kernel before parsing them like sys_sendmsg() and sys_recvmsg() do in
net/socket.c as part of these fixes.

^ permalink raw reply

* Re: [Security] TIPC security issues
From: Andy Grover @ 2010-10-28 18:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, jon.maloy, netdev, drosenberg, security,
	allan.stephens, RDS Devel
In-Reply-To: <AANLkTikdh-kqroCDA5EsjSMTLjViqe_U=hxM1L6U4Ppb@mail.gmail.com>

On 10/28/2010 08:32 AM, Linus Torvalds wrote:
> Heh. We apparently have _another_ iovec overflow in networking. This time rds.
>
> Reported by Thomas Pollet<thomas.pollet@gmail.com>: look at
> net/rds/rdma.c around line 490. It doesn't use the regular iovec code,
> instead it cooks its own, and has a few problems with overflow.
>
> It gathers the number of pages into an "unsigned int", and for each
> entry in its own rds_iovec it will check that the size is<  UINT_MAX,
> and then generate the number of pages for that entry. With the whole
> "unaligned address adds one" logic, it means that each entry can get
> (UINT_MAX>>  PAGE_SHIFT)+1 pages.

FWIW both the signed issue and not checking the iovec changed were 
correct in 2.6.36, and only added in ff87e97.

> And how many entries can we have? Apparently that is capped to
> UINT_MAX too. So add all those up, and they can easily overflow the
> unsigned int page counter.
>
> So this time fixing verify_iovec() doesn't help, because rds just
> cooks its own, and this is using a totally different interface: it
> seems to hook into sendmsg, but it looks like it uses the ancillary
> data objects and passes in its own magical iovec rather than use any
> "normal" iovec thing. I don't know the code, I may be totally off.

Yes that's right, it's to map a memory region that will be the target of 
an RDMA operation. I don't know why struct rds_iovec was used instead of 
struct iovec, but I think we're stuck, since it's part of our socket API.

I'll send DaveM patches to fix those two immediately-identified problems 
today, and we'll take a good long look at the rest of the code for 
further issues.

Regards -- Andy

^ permalink raw reply

* Re: [PATCH] ip_gre: fix percpu stats accounting
From: Eric Dumazet @ 2010-10-28 18:47 UTC (permalink / raw)
  To: David Miller; +Cc: xemul, netdev
In-Reply-To: <20101028.114102.112602886.davem@davemloft.net>

Le jeudi 28 octobre 2010 à 11:41 -0700, David Miller a écrit :
> I am still able to revert this I think without screwing up
> publicly visible history, so I will double check and do the
> revert if I can.

Cool, I'll provide a patch in a couple of minutes, when tested.

Thanks



^ permalink raw reply

* Re: [PATCH] ip_gre: fix percpu stats accounting
From: David Miller @ 2010-10-28 18:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xemul, netdev
In-Reply-To: <1288291138.2711.0.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 28 Oct 2010 20:38:58 +0200

> Le jeudi 28 octobre 2010 à 10:29 -0700, David Miller a écrit :
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Thu, 28 Oct 2010 18:33:54 +0200
>> 
>> > Le jeudi 28 octobre 2010 à 20:07 +0400, Pavel Emelyanov a écrit :
>> >> commit e985aad7 (ip_gre: percpu stats accounting) forgot the fallback
>> >> tunnel case (gre0).
>> >> 
>> >> This is the 4th part of the "foo: fix percpu stats accounting" series ;)
>> >> 
>> >> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
>> >> 
>> > 
>> > Indeed, right you are ;)
>> > 
>> > Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>> 
>> Applied.
> 
> 
> Hmm, actually patch is buggy, sorry...

I am still able to revert this I think without screwing up
publicly visible history, so I will double check and do the
revert if I can.

^ permalink raw reply

* Re: [PATCH] ip_gre: fix percpu stats accounting
From: Eric Dumazet @ 2010-10-28 18:38 UTC (permalink / raw)
  To: David Miller; +Cc: xemul, netdev
In-Reply-To: <20101028.102917.246527148.davem@davemloft.net>

Le jeudi 28 octobre 2010 à 10:29 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 28 Oct 2010 18:33:54 +0200
> 
> > Le jeudi 28 octobre 2010 à 20:07 +0400, Pavel Emelyanov a écrit :
> >> commit e985aad7 (ip_gre: percpu stats accounting) forgot the fallback
> >> tunnel case (gre0).
> >> 
> >> This is the 4th part of the "foo: fix percpu stats accounting" series ;)
> >> 
> >> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
> >> 
> > 
> > Indeed, right you are ;)
> > 
> > Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Applied.


Hmm, actually patch is buggy, sorry...




^ permalink raw reply

* Re: [net-next] stmmac: enable/disable rx/tx in the core with a single write.
From: David Miller @ 2010-10-28 18:38 UTC (permalink / raw)
  To: peppe.cavallaro; +Cc: netdev, armando.visconti
In-Reply-To: <1288069094-25365-1-git-send-email-peppe.cavallaro@st.com>

From: Giuseppe CAVALLARO <peppe.cavallaro@st.com>
Date: Tue, 26 Oct 2010 06:58:14 +0200

> From: avisconti <armando.visconti@st.com>
> 
> This patch enables and disables the rx and tx bits in the MAC control reg
> by using a single write operation.
> This also solves a possible problem (spotted on SPEAr platforms) at 10Mbps
> where two consecutive writes to a MAC control register can take more than
> 4 phy_clk cycles.
> 
> Signed-off-by: Armando Visconti <armando.visconti@st.com>
> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
From: David Miller @ 2010-10-28 18:37 UTC (permalink / raw)
  To: torvalds; +Cc: netdev, drosenberg, jon.maloy, allan.stephens
In-Reply-To: <AANLkTinq5iYU3A41vLHLLWVpoS-A4mB-RYrrWYCGKk+-@mail.gmail.com>

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu, 28 Oct 2010 11:33:56 -0700

> On Thu, Oct 28, 2010 at 11:22 AM, David Miller <davem@davemloft.net> wrote:
>>
>> -       int tot_len = 0;
>> +       size_t tot_len = 0;
> 
> I would actually keep "tot_len" as an "int".
 ...
>> +int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
>>  {
>>        int size, ct;
>> -       long err;
>> +       size_t err;
> 
> Same thing here. Making "err" be an "int" is actually the right thing
> to do, because then it matches the return type (iow, if it was any
> other type, there would be an implicit cast, and if it didn't fit in
> "int", that would be a bug anyway).

Yep, agreed on all counts, I'll make those changes.

^ permalink raw reply

* Re: [PATCH] net: Limit socket I/O iovec total length to INT_MAX.
From: Linus Torvalds @ 2010-10-28 18:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, drosenberg, jon.maloy, allan.stephens
In-Reply-To: <20101028.112231.232747062.davem@davemloft.net>

On Thu, Oct 28, 2010 at 11:22 AM, David Miller <davem@davemloft.net> wrote:
>
> -       int tot_len = 0;
> +       size_t tot_len = 0;

I would actually keep "tot_len" as an "int".

The whole point of this:

> +               if (len > INT_MAX - tot_len)
> +                       len = INT_MAX - tot_len;
> +
>                tot_len += len;

Is that "tot_len" can _never_ become larger than INT_MAX, because we
never add a "len" that would make it bigger than that.

So "len" itself should be the correct unsigned size_t (so that the
"len > INT_MAX - tot_len" thing is done as an unsigned comparison),
but "tot_len" itself is very much designed to fit in "int".

> +int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
>  {
>        int size, ct;
> -       long err;
> +       size_t err;

Same thing here. Making "err" be an "int" is actually the right thing
to do, because then it matches the return type (iow, if it was any
other type, there would be an implicit cast, and if it didn't fit in
"int", that would be a bug anyway).

                     Linus

^ permalink raw reply

* Re: [PATCH] net: atarilance - flags should be unsigned long
From: David Miller @ 2010-10-28 18:35 UTC (permalink / raw)
  To: geert; +Cc: akpm, netdev, linux-kernel, linux-m68k
In-Reply-To: <alpine.DEB.2.00.1010282030530.29788@ayla.of.borg>

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: Thu, 28 Oct 2010 20:31:53 +0200 (CEST)

> drivers/net/atarilance.c: In function ‘addr_accessible’:
> drivers/net/atarilance.c:413: warning: comparison of distinct pointer types lacks a cast
> drivers/net/atarilance.c:450: warning: comparison of distinct pointer types lacks a cast
> 
> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 1/1] netxen: fix kdump
From: David Miller @ 2010-10-28 18:34 UTC (permalink / raw)
  To: amit.salecha; +Cc: netdev, ameen.rahman, anirban.chakraborty, rajesh.borundia
In-Reply-To: <1288169510-4655-1-git-send-email-amit.salecha@qlogic.com>

From: Amit Kumar Salecha <amit.salecha@qlogic.com>
Date: Wed, 27 Oct 2010 01:51:50 -0700

> From: Rajesh Borundia <rajesh.borundia@qlogic.com>
> 
> Reset the whole hw instead of freeing hw resources
> consumed by each pci function.
> 
> Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
> Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>

Applied, thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox