Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH ethtool 0/7] Update RX n-tuple filtering
From: Jeff Garzik @ 2010-10-28 22:02 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, linux-net-drivers
In-Reply-To: <1285278428.7794.27.camel@achroite.uk.solarflarecom.com>

On 09/23/2010 05:47 PM, Ben Hutchings wrote:
> This patch series brings ethtool up to date with my recent changes to RX
> n-tuple filtering in the kernel.
>
> Ben.
>
> Ben Hutchings (7):
>    ethtool-copy.h: sync with net-next
>    ethtool: Generalise cmdline_info::unwanted_val to a "seen" flag or
>      bitmask
>    ethtool: Fix RX n-tuple masks and documentation
>    ethtool: Add MAC parameter type based on the parse_sopass() function
>    ethtool: Add Ethernet-level RX n-tuple filtering and 'clear' action
>    ethtool: Update sfc register dump
>    ethtool: Add my authorship and Solarflare copyright notice
>
>   AUTHORS        |    1 +
>   ethtool-copy.h |  228 +++++++++++++++++++++++++++++++++---------------
>   ethtool.8      |   79 ++++++++++++-----
>   ethtool.c      |  265 +++++++++++++++++++++++++++++++++++++++++---------------
>   sfc.c          |   25 +++---

Looks good... will push into this release (ethtool 2.6.36)



^ permalink raw reply

* Re: ethtool: missing implementation of n_priv_flags
From: Jeff Garzik @ 2010-10-28 21:59 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Brandeburg, Jesse, netdev@vger.kernel.org, Wyborny, Carolyn
In-Reply-To: <20101028215417.GL15074@solarflare.com>

On Thu, Oct 28, 2010 at 10:54:18PM +0100, Ben Hutchings wrote:
> Brandeburg, Jesse wrote:
> [...]
> > We'll take a shot at an implementation in the ethtool proper and post it 
> > (hopefully soon).  I imagine it will just be printed when one runs the 
> > command 
> > # ethtool ethX
> > 
> > and the set side will probably be implemented as part of -s
> > 
> > # ethtool -s ethX pflag [0-0xFFFFFFFF]
> 
> This is crap.  Use ETHTOOL_GSTRINGS with string_set = ETH_SS_PRIV_FLAGS
> to get the flag names, then convert that array into an array of struct
> cmdline_info and parse the flags by name.

Indeed.  It was intended to be a flexible interface where a driver can
easily pass arbitrary text-named flags to the user for setting/clearing.

If e1000e has a special Intel-specific feature that makes the NIC
process packets more rapidly, you could select the string "go_faster" in
the ethtool private flags interface.  The ethtool utility reads the
strings, which determine the flags exported for that network interface.

	Jeff




^ permalink raw reply

* Re: ethtool: missing implementation of n_priv_flags
From: Ben Hutchings @ 2010-10-28 21:54 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: jgarzik@redhat.com, netdev@vger.kernel.org, Wyborny, Carolyn
In-Reply-To: <alpine.WNT.2.00.1010281405001.11080@jbrandeb-desk1.amr.corp.intel.com>

Brandeburg, Jesse wrote:
[...]
> We'll take a shot at an implementation in the ethtool proper and post it 
> (hopefully soon).  I imagine it will just be printed when one runs the 
> command 
> # ethtool ethX
> 
> and the set side will probably be implemented as part of -s
> 
> # ethtool -s ethX pflag [0-0xFFFFFFFF]

This is crap.  Use ETHTOOL_GSTRINGS with string_set = ETH_SS_PRIV_FLAGS
to get the flag names, then convert that array into an array of struct
cmdline_info and parse the flags by name.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: ethtool: missing implementation of n_priv_flags
From: Jeff Garzik @ 2010-10-28 21:53 UTC (permalink / raw)
  To: Brandeburg, Jesse; +Cc: netdev@vger.kernel.org, Wyborny, Carolyn
In-Reply-To: <F169D4F5E1F1974DBFAFABF47F60C10AA5E60671@orsmsx507.amr.corp.intel.com>

On Thu, Oct 28, 2010 at 01:34:34PM -0700, Brandeburg, Jesse wrote:
> Was just looking at implementing the driver private flags to add a new feature to 
> a driver.
> 
> It appears that nothing in the core ethtool.c ever accesses or prints n_priv_flags, 
> even if a driver assigns it (which none in the kernel currently do)
> 
> Is this just an oversight?

It was waiting for the first user, basically...  I created an example
kernel patch for one Intel driver, but you guys didn't seem interested
in it at the time.  Glad to see it's getting some attention.

	Jeff




^ permalink raw reply

* Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
From: Shirley Ma @ 2010-10-28 21:40 UTC (permalink / raw)
  To: Sridhar Samudrala
  Cc: Michael S. Tsirkin, David Miller, netdev, kvm, linux-kernel
In-Reply-To: <1288299878.30131.2.camel@sridhar.beaverton.ibm.com>

On Thu, 2010-10-28 at 14:04 -0700, Sridhar Samudrala wrote:
> It would be some change in virtio-net driver that may have improved
> the
> latency of small messages which in turn would have reduced the
> bandwidth
> as TCP could not accumulate and send large packets.

I will check out any latency improvement patch in virtio_net. If that's
the case, whether it is good to have some tunable parameter to benefit
both BW and latency workload?

Shirley 

^ permalink raw reply

* Re: [PATCH] cxgb4vf: fix crash due to manipulating queues before registration
From: Casey Leedom @ 2010-10-28 21:16 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20101028.132122.245402585.davem@davemloft.net>

On Oct 28, 2010, at 1:21 PM, David Miller wrote:

> From: Casey Leedom <leedom@chelsio.com>
> Date: Thu, 28 Oct 2010 13:16:47 -0700
> 
>> Before commit "net: allocate tx queues in register_netdevice"
>> netif_tx_stop_all_queues and related functions could be used between
>> device allocation and registration but now only after registration.
>> cxgb4 has such a call before registration and crashes now.  Move it
>> after register_netdev.
>> 
>> Signed-off-by: Casey Leedom <leedom@chelsio.com>
> 
> Why are you manipulating the queue at all here?
> 
> The queue state is "don't care" at this point in time,
> and has no meaning until ->open() is invoked.

True.  This driver was modeled on cxgb4, we wanted to make sure that the cxgb4 crash fix is ported to cxgb4vf.  Let me sync up with the other Chelsio maintainers.

Casey


^ permalink raw reply

* Re: ethtool: missing implementation of n_priv_flags
From: Brandeburg, Jesse @ 2010-10-28 21:10 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: jgarzik@redhat.com, netdev@vger.kernel.org, Wyborny, Carolyn
In-Reply-To: <20101028210024.GK15074@solarflare.com>

On Thu, 28 Oct 2010, Ben Hutchings wrote:
> Brandeburg, Jesse wrote:
> > Was just looking at implementing the driver private flags to add a new feature to 
> > a driver.
> > 
> > It appears that nothing in the core ethtool.c ever accesses or prints n_priv_flags, 
> > even if a driver assigns it (which none in the kernel currently do)
> [...]
> 
> You mean net/core/ethtool.c?  Or the ethtool utility?

The ethtool utility (the app that would print the 1-1 match between bits 
and strings)
 
> If you set it in your get_drvinfo() operation then it will be copied out.
> As for the ethtool utility, I've implemented some generic flag parsing and
> printing functionality so it shouldn't be too hard to use that for private
> flags.

Right, I see the code where it gets copied into drvinfo in the kernel but 
nothing ever prints it on the userspace side.

We'll take a shot at an implementation in the ethtool proper and post it 
(hopefully soon).  I imagine it will just be printed when one runs the 
command 
# ethtool ethX

and the set side will probably be implemented as part of -s

# ethtool -s ethX pflag [0-0xFFFFFFFF]


^ permalink raw reply

* Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
From: Sridhar Samudrala @ 2010-10-28 21:04 UTC (permalink / raw)
  To: Shirley Ma; +Cc: Michael S. Tsirkin, David Miller, netdev, kvm, linux-kernel
In-Reply-To: <1288296835.11251.24.camel@localhost.localdomain>

On Thu, 2010-10-28 at 13:13 -0700, Shirley Ma wrote:
> On Thu, 2010-10-28 at 12:32 -0700, Shirley Ma wrote:
> > Also I found a big TX regression for old guest and new guest. For old
> > guest, I am able to get almost 11Gb/s for 2K message size, but for the
> > new guest kernel, I can only get 3.5 Gb/s with the patch and same
> > host.
> > I will dig it why. 
> 
> The regression is from guest kernel, not from this patch. Tested 2.6.31
> kernel, it's performance is less than 2Gb/s for 2K message size already.
> I will resubmit the patch for review. 
> 
> I will start to test from 2.6.30 kernel to figure it when TX regression
> induced in virtio_net. Any suggestion which guest kernel I should test
> to figure out this regression?

It would be some change in virtio-net driver that may have improved the
latency of small messages which in turn would have reduced the bandwidth
as TCP could not accumulate and send large packets.

Thanks
Sridhar


^ permalink raw reply

* Re: ethtool: missing implementation of n_priv_flags
From: Ben Hutchings @ 2010-10-28 21:00 UTC (permalink / raw)
  To: Brandeburg, Jesse
  Cc: jgarzik@redhat.com, netdev@vger.kernel.org, Wyborny, Carolyn
In-Reply-To: <F169D4F5E1F1974DBFAFABF47F60C10AA5E60671@orsmsx507.amr.corp.intel.com>

Brandeburg, Jesse wrote:
> Was just looking at implementing the driver private flags to add a new feature to 
> a driver.
> 
> It appears that nothing in the core ethtool.c ever accesses or prints n_priv_flags, 
> even if a driver assigns it (which none in the kernel currently do)
[...]

You mean net/core/ethtool.c?  Or the ethtool utility?

If you set it in your get_drvinfo() operation then it will be copied out.
As for the ethtool utility, I've implemented some generic flag parsing and
printing functionality so it shouldn't be too hard to use that for private
flags.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: multi-machine simultaneous kernel panic in tcp_transmit_kcb
From: Ben Hutchings @ 2010-10-28 20:42 UTC (permalink / raw)
  To: Doug Hughes; +Cc: netdev
In-Reply-To: <4CC8CC0A.5000705@will.to>

Doug Hughes wrote:
> 3 machines within 1 minute of each other (odd, by itself, but not the 
> root of the question).
> 
> 2 of this:
> 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64 x86_64 
> x86_64 GNU/Linux
> (I have a screen shot on the kvm)
> all Cent 5.4
[...]
 
Please don't ask netdev to support an old distribution kernel.  Try
asking on CentOS support forums or buy support from Red Hat.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* ethtool: missing implementation of n_priv_flags
From: Brandeburg, Jesse @ 2010-10-28 20:34 UTC (permalink / raw)
  To: jgarzik@redhat.com
  Cc: netdev@vger.kernel.org, Wyborny, Carolyn, Brandeburg, Jesse

Was just looking at implementing the driver private flags to add a new feature to 
a driver.

It appears that nothing in the core ethtool.c ever accesses or prints n_priv_flags, 
even if a driver assigns it (which none in the kernel currently do)

Is this just an oversight?

^ permalink raw reply

* Re: [PATCH] cxgb4vf: fix crash due to manipulating queues before registration
From: David Miller @ 2010-10-28 20:21 UTC (permalink / raw)
  To: leedom; +Cc: netdev
In-Reply-To: <1288297007-663-1-git-send-email-leedom@chelsio.com>

From: Casey Leedom <leedom@chelsio.com>
Date: Thu, 28 Oct 2010 13:16:47 -0700

> Before commit "net: allocate tx queues in register_netdevice"
> netif_tx_stop_all_queues and related functions could be used between
> device allocation and registration but now only after registration.
> cxgb4 has such a call before registration and crashes now.  Move it
> after register_netdev.
> 
> Signed-off-by: Casey Leedom <leedom@chelsio.com>

Why are you manipulating the queue at all here?

The queue state is "don't care" at this point in time,
and has no meaning until ->open() is invoked.

^ permalink raw reply

* [PATCH] cxgb4vf: fix crash due to manipulating queues before registration
From: Casey Leedom @ 2010-10-28 20:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, Casey Leedom

Before commit "net: allocate tx queues in register_netdevice"
netif_tx_stop_all_queues and related functions could be used between
device allocation and registration but now only after registration.
cxgb4 has such a call before registration and crashes now.  Move it
after register_netdev.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
---
 drivers/net/cxgb4vf/cxgb4vf_main.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/cxgb4vf/cxgb4vf_main.c b/drivers/net/cxgb4vf/cxgb4vf_main.c
index 555ecc5..b9d92a5 100644
--- a/drivers/net/cxgb4vf/cxgb4vf_main.c
+++ b/drivers/net/cxgb4vf/cxgb4vf_main.c
@@ -2600,7 +2600,6 @@ static int __devinit cxgb4vf_pci_probe(struct pci_dev *pdev,
 		pi->xact_addr_filt = -1;
 		pi->rx_offload = RX_CSO;
 		netif_carrier_off(netdev);
-		netif_tx_stop_all_queues(netdev);
 		netdev->irq = pdev->irq;
 
 		netdev->features = (NETIF_F_SG | NETIF_F_TSO | NETIF_F_TSO6 |
@@ -2661,6 +2660,7 @@ static int __devinit cxgb4vf_pci_probe(struct pci_dev *pdev,
 			continue;
 		}
 
+		netif_tx_stop_all_queues(netdev);
 		set_bit(pidx, &adapter->registered_device_map);
 	}
 	if (adapter->registered_device_map == 0) {
-- 
1.7.0.4


^ permalink raw reply related

* Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
From: Shirley Ma @ 2010-10-28 20:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: David Miller, netdev, kvm, linux-kernel
In-Reply-To: <1288294355.11251.20.camel@localhost.localdomain>

On Thu, 2010-10-28 at 12:32 -0700, Shirley Ma wrote:
> Also I found a big TX regression for old guest and new guest. For old
> guest, I am able to get almost 11Gb/s for 2K message size, but for the
> new guest kernel, I can only get 3.5 Gb/s with the patch and same
> host.
> I will dig it why. 

The regression is from guest kernel, not from this patch. Tested 2.6.31
kernel, it's performance is less than 2Gb/s for 2K message size already.
I will resubmit the patch for review. 

I will start to test from 2.6.30 kernel to figure it when TX regression
induced in virtio_net. Any suggestion which guest kernel I should test
to figure out this regression?

Thanks
Shirley


^ permalink raw reply

* Re: [Just for fun] loopback: avoid softirq on most transmits
From: David Miller @ 2010-10-28 20:09 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1288295326.2711.35.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 28 Oct 2010 21:48:46 +0200

> With the introduction of xmit_recursion percpu variable, its pretty
> cheap to check our recursion level in loopback transmit, and avoid
> raising softirq.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
> tbench faster by 4%, sorry I couldnt resist...

Hehehe :-)

Maybe even that limit is low enough to prevent stack overflow
situations even when doing NFS over a loopback to a raid volume using
XFS as the filesystem which seems to be the standard stack usage
stress test.

But really, just like DST iteration, we should probably make these
things more iterative.

The cool thing about loopback is that we have a trigger for the cases
we care about, release_sock().

So we could have something like:

1) lock_sock() sets "local cpu will run release_sock()" mark.

2) netif_rx() checks mark, if set it puts SKB on "release_sock()
   local cpu work queue"

3) release_sock() retains mark, and runs SKB queue until empty.
   Once SKB work queue is empty, mark is cleared.

Anyways, just an idea.

^ permalink raw reply

* Re: [Security] TIPC security issues
From: Paul Gortmaker @ 2010-10-28 19:51 UTC (permalink / raw)
  To: David Miller
  Cc: torvalds, drosenberg, jon.maloy, allan.stephens, netdev, security
In-Reply-To: <20101027.105047.183059900.davem@davemloft.net>

[Re: [Security] TIPC security issues] On 27/10/2010 (Wed 10:50) David Miller wrote:

> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Wed, 27 Oct 2010 10:37:46 -0700
> 
> > If you _really_ care deeply, then some packet-oriented protocol can
> > just have its own private packet size limit (which would be way less
> > than 2GB), and then just look at the total size and say "oh, the total
> > size is bigger than my limit, so I'll just error out". Then, the fact
> > that verify_iovec() may have truncated the message to 2GB-1 doesn't
> > matter at all.
> > 
> > (Practically speaking, I bet all packet-oriented protocols already
> > have a limit that is enforced by simply allocation patterns, so I
> > don't think it's actually a problem even now)
> 
> This is, as it turns out, effectively what the TIPC socket layer
> already does.
> 
> Most of the send calls that propagate down to this code adding up the
> iov_len lengths gets passed a maximum packet size.
> 

In keeping with this idea, perhaps this is a better solution for getting
an immediate fix to the tipc part of this issue than the previous
patches I'd sent?  I can see some immediate advantages to this:

   -it adds checks that arguably should have been there since day
    one, since it is always best to check for garbage input ASAP.

   -it is a much smaller change, and thus easier to review and have
    confidence in

   -by being smaller and clearer, it lends itself better for being
    directly cherry picked onto the -stable release(s).

We'll still need to clean up the mishmash of variable types being
used in the tipc internals, but at least we can then do that in
a development cycle, and we won't have to inflict those bigger
cleanup changesets back onto GregKH.

Paul.

----

>From 3fb200c1b27cf5cde668888ab85cffb1e9c6314f Mon Sep 17 00:00:00 2001
From: Allan Stephens <Allan.Stephens@windriver.com>
Date: Thu, 28 Oct 2010 07:58:24 -0400
Subject: [PATCH] tipc: Fix security hole exploitable by excessive send requests

Add checks to TIPC's socket send routines to promptly detect and
abort attempts to send more than 66,000 bytes in a single TIPC
message, or more than 2**31-1 bytes in a single TIPC byte stream
request.  This prevents excessively large size_t based inputs from
reaching internal tipc routines that currently use int values where
they risk being truncated or incorrectly wrapped.

The three checks are added to send_msg() send_packet() and
send_stream() -- all of which are entered via proto_ops .sendmsg, which
in turn already checked for msg_iovlen > UIO_MAXIOV [in net/socket.c],
so there is no need to repeat that specific test in these new checks.

Reported-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 include/linux/tipc.h |    2 +-
 net/tipc/socket.c    |   10 ++++++++++
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/include/linux/tipc.h b/include/linux/tipc.h
index d10614b..1fd2889 100644
--- a/include/linux/tipc.h
+++ b/include/linux/tipc.h
@@ -101,7 +101,7 @@ static inline unsigned int tipc_node(__u32 addr)
  * Limiting values for messages
  */
 
-#define TIPC_MAX_USER_MSG_SIZE	66000
+#define TIPC_MAX_USER_MSG_SIZE	66000U
 
 /*
  * Message importance levels
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 33217fc..3562cf9 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -542,6 +542,8 @@ static int send_msg(struct kiocb *iocb, struct socket *sock,
 	if (unlikely((m->msg_namelen < sizeof(*dest)) ||
 		     (dest->family != AF_TIPC)))
 		return -EINVAL;
+	if (total_len > TIPC_MAX_USER_MSG_SIZE)
+		return -EMSGSIZE;
 
 	if (iocb)
 		lock_sock(sk);
@@ -649,6 +651,9 @@ static int send_packet(struct kiocb *iocb, struct socket *sock,
 	if (unlikely(dest))
 		return send_msg(iocb, sock, m, total_len);
 
+	if (total_len > TIPC_MAX_USER_MSG_SIZE)
+		return -EMSGSIZE;
+
 	if (iocb)
 		lock_sock(sk);
 
@@ -733,6 +738,11 @@ static int send_stream(struct kiocb *iocb, struct socket *sock,
 		goto exit;
 	}
 
+	if (total_len > (unsigned)INT_MAX) {
+		res = -EMSGSIZE;
+		goto exit;
+	}
+
 	/*
 	 * Send each iovec entry using one or more messages
 	 *
-- 
1.7.3.1


^ permalink raw reply related

* [Just for fun] loopback: avoid softirq on most transmits
From: Eric Dumazet @ 2010-10-28 19:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

With the introduction of xmit_recursion percpu variable, its pretty
cheap to check our recursion level in loopback transmit, and avoid
raising softirq.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
tbench faster by 4%, sorry I couldnt resist...

 drivers/net/loopback.c    |   13 +++++++++++--
 include/linux/netdevice.h |    3 +++
 net/core/dev.c            |    2 +-
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 2d9663a..5bd73c0 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -74,7 +74,7 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
 				 struct net_device *dev)
 {
 	struct pcpu_lstats *lb_stats;
-	int len;
+	int len, res;
 
 	skb_orphan(skb);
 
@@ -84,7 +84,16 @@ static netdev_tx_t loopback_xmit(struct sk_buff *skb,
 	lb_stats = this_cpu_ptr(dev->lstats);
 
 	len = skb->len;
-	if (likely(netif_rx(skb) == NET_RX_SUCCESS)) {
+
+	/*
+	 * avoid raising softirq if our recursion level is low
+	 */
+	if (likely(__this_cpu_read(xmit_recursion) <= 2))
+		res = netif_receive_skb(skb);
+	else
+		res = netif_rx(skb);
+
+	if (likely(res == NET_RX_SUCCESS)) {
 		u64_stats_update_begin(&lb_stats->syncp);
 		lb_stats->bytes += len;
 		lb_stats->packets++;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 072652d..918330b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1741,6 +1741,9 @@ extern void dev_kfree_skb_any(struct sk_buff *skb);
 extern int		netif_rx(struct sk_buff *skb);
 extern int		netif_rx_ni(struct sk_buff *skb);
 #define HAVE_NETIF_RECEIVE_SKB 1
+
+DECLARE_PER_CPU(int, xmit_recursion);
+
 extern int		netif_receive_skb(struct sk_buff *skb);
 extern gro_result_t	dev_gro_receive(struct napi_struct *napi,
 					struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 35dfb83..aadf09b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2208,7 +2208,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 	return rc;
 }
 
-static DEFINE_PER_CPU(int, xmit_recursion);
+DEFINE_PER_CPU(int, xmit_recursion);
 #define RECURSION_LIMIT 10
 
 /**



^ permalink raw reply related

* Re: [PATCH] ip_gre: fix fallback tunnel setup
From: Eric Dumazet @ 2010-10-28 19:34 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: David Miller, netdev@vger.kernel.org
In-Reply-To: <4CC9CF06.1090806@parallels.com>

Le jeudi 28 octobre 2010 à 23:29 +0400, Pavel Emelyanov a écrit :

> Indeed. I missed the fact, that the gre driver uses ndo_init for
> all devices including the fb one :(
> 
> Acked-by: Pavel Emelyanov <xemul@openvz.org>

Well, I discovered this right now, it was not that obvious, maybe we
should use similar setup for all fb tunnels as well...

Thanks !



^ permalink raw reply

* Re: [RFC PATCH 1/1] vhost: TX used buffer guest signal accumulation
From: Shirley Ma @ 2010-10-28 19:32 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: David Miller, netdev, kvm, linux-kernel
In-Reply-To: <20101028052021.GD5599@redhat.com>

On Thu, 2010-10-28 at 07:20 +0200, Michael S. Tsirkin wrote:
> My concern is this can delay signalling for unlimited time.
> Could you pls test this with guests that do not have
> 2b5bbe3b8bee8b38bdc27dd9c0270829b6eb7eeb
> b0c39dbdc204006ef3558a66716ff09797619778
> that is 2.6.31 and older? 

The patch only induces delay signaling unlimited time when there is no
TX packet to transmit. I thought TX signaling only noticing guest to
release the used buffers, anything else beside this?

I tested rhel5u5 guest (2.6.18 kernel), it works fine. I checked the two
commits log, I don't think this patch could cause any issue w/o these
two patches.

Also I found a big TX regression for old guest and new guest. For old
guest, I am able to get almost 11Gb/s for 2K message size, but for the
new guest kernel, I can only get 3.5 Gb/s with the patch and same host.
I will dig it why.

thanks
Shirley


^ permalink raw reply

* Re: [PATCH] ip_gre: fix fallback tunnel setup
From: Pavel Emelyanov @ 2010-10-28 19:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev@vger.kernel.org
In-Reply-To: <1288293079.2711.19.camel@edumazet-laptop>

On 10/28/2010 11:11 PM, Eric Dumazet wrote:
> Le jeudi 28 octobre 2010 à 20:47 +0200, Eric Dumazet a écrit :
>> Le jeudi 28 octobre 2010 à 11:41 -0700, David Miller a écrit :
>>> I am still able to revert this I think without screwing up
>>> publicly visible history, so I will double check and do the
>>> revert if I can.
>>
>> Cool, I'll provide a patch in a couple of minutes, when tested.
>>
> 
> I believe the right fix is this one, Pavel what do you think ?

Indeed. I missed the fact, that the gre driver uses ndo_init for
all devices including the fb one :(

Acked-by: Pavel Emelyanov <xemul@openvz.org>

> With your patch, we allocate the per_cpu data twice for the fallback
> tunnel, thus leaking memory.

Yup...

> Thanks

^ permalink raw reply

* [PATCH] ip_gre: fix fallback tunnel setup
From: Eric Dumazet @ 2010-10-28 19:11 UTC (permalink / raw)
  To: David Miller; +Cc: xemul, netdev
In-Reply-To: <1288291679.2711.1.camel@edumazet-laptop>

Le jeudi 28 octobre 2010 à 20:47 +0200, Eric Dumazet a écrit :
> Le jeudi 28 octobre 2010 à 11:41 -0700, David Miller a écrit :
> > I am still able to revert this I think without screwing up
> > publicly visible history, so I will double check and do the
> > revert if I can.
> 
> Cool, I'll provide a patch in a couple of minutes, when tested.
> 

I believe the right fix is this one, Pavel what do you think ?

With your patch, we allocate the per_cpu data twice for the fallback
tunnel, thus leaking memory.

Thanks

Note: free_percpu(NULL) is legal

[PATCH] ip_gre: fix fallback tunnel setup

Before making the fallback tunnel visible to lookups, we should make
sure it is completely setup, once ipgre_tunnel_init() had been called
and tstats per_cpu pointer allocated.

move rcu_assign_pointer(ign->tunnels_wc[0], tunnel); from
ipgre_fb_tunnel_init() to ipgre_init_net()

Based on a patch from Pavel Emelyanov

Reported-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/ip_gre.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 01087e0..70ff77f 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1325,7 +1325,6 @@ static void ipgre_fb_tunnel_init(struct net_device *dev)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
 	struct iphdr *iph = &tunnel->parms.iph;
-	struct ipgre_net *ign = net_generic(dev_net(dev), ipgre_net_id);
 
 	tunnel->dev = dev;
 	strcpy(tunnel->parms.name, dev->name);
@@ -1336,7 +1335,6 @@ static void ipgre_fb_tunnel_init(struct net_device *dev)
 	tunnel->hlen		= sizeof(struct iphdr) + 4;
 
 	dev_hold(dev);
-	rcu_assign_pointer(ign->tunnels_wc[0], tunnel);
 }
 
 
@@ -1383,10 +1381,12 @@ static int __net_init ipgre_init_net(struct net *net)
 	if ((err = register_netdev(ign->fb_tunnel_dev)))
 		goto err_reg_dev;
 
+	rcu_assign_pointer(ign->tunnels_wc[0],
+			   netdev_priv(ign->fb_tunnel_dev));
 	return 0;
 
 err_reg_dev:
-	free_netdev(ign->fb_tunnel_dev);
+	ipgre_dev_free(ign->fb_tunnel_dev);
 err_alloc_dev:
 	return err;
 }



^ permalink raw reply related

* pci_function_reset from driver
From: Rajesh Borundia @ 2010-10-28 18:52 UTC (permalink / raw)
  To: David Miller
  Cc: netdev@vger.kernel.org, Ameen Rahman, Anirban Chakraborty,
	Amit Salecha

Hi David,

For FLR supported  device can I issue FLR on that 
device from the driver ?

For example in kdump case it is recommended to reset 
the device after loading of crash kernel.Can we issue FLR
from driver in such case(call some function like pci_reset_function).

Rajesh



^ permalink raw reply

* NULL pointer dereference at netxen_nic_probe+0x813/0x9a0
From: Bjorn Helgaas @ 2010-10-28 18:50 UTC (permalink / raw)
  To: Amit Kumar Salecha; +Cc: netdev, linux-kernel

This is on current Linus upstream as of this morning (8128057)
on an HP DL785:

QLogic/NetXen Network Driver v4.0.74
netxen_nic 0000:07:00.0: PCI INT A -> GSI 30 (level, low) -> IRQ 30
netxen_nic 0000:07:00.0: setting latency timer to 64
netxen_nic 0000:07:00.0: 2MB memory map
netxen_nic 0000:07:00.0: loading firmware from flash
netxen_nic 0000:07:00.0: using 64-bit dma mask
kernel: Quad Gig LP Board S/N TI9ABK0266  Chip rev 0x42
netxen_nic 0000:07:00.0: firmware v4.0.520 [legacy]
netxen_nic 0000:07:00.0: irq 72 for MSI/MSI-X
netxen_nic 0000:07:00.0: irq 73 for MSI/MSI-X
netxen_nic 0000:07:00.0: irq 74 for MSI/MSI-X
netxen_nic 0000:07:00.0: irq 75 for MSI/MSI-X
netxen_nic 0000:07:00.0: using msi-x interrupts
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff8160afda>] netxen_nic_probe+0x813/0x9a0
PGD 0 
Oops: 0002 [#1] SMP 
last sysfs file: 
CPU 0 
Modules linked in:

Pid: 1650, comm: work_for_cpu Not tainted 2.6.36-07338-g8128057 #269 /ProLiant DL785 G5   
RIP: 0010:[<ffffffff8160afda>]  [<ffffffff8160afda>] netxen_nic_probe+0x813/0x9a0
RSP: 0018:ffff8806138abe30  EFLAGS: 00010246
RAX: 0000000000000010 RBX: ffff8806139126c0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880613895616 RDI: ffff880613912000
RBP: ffff8806138abe90 R08: 0000000000000000 R09: ffff8806138abb80
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880613912000
R13: ffff8812174f7000 R14: ffff880613912000 R15: ffff8812174f7000
FS:  0000000000000000(0000) GS:ffff8800cfa00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000010 CR3: 0000000001c07000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process work_for_cpu (pid: 1650, threadinfo ffff8806138aa000, task ffff880616f12be0)
Stack:
 ffff8812174f7090 0000000000000246 ffff8806138abe90 ffff8812174f7000
 00008806138abfd8 0000000000000282 68cd0025b30068cc ffff880c17439d30
 ffff8812174f7090 ffff8812174f7000 ffff8812174f7208 0000000000000000
Call Trace:
 [<ffffffff81203696>] local_pci_probe+0x48/0x91
 [<ffffffff81052bae>] ? do_work_for_cpu+0x0/0x26
 [<ffffffff81052bc1>] do_work_for_cpu+0x13/0x26
 [<ffffffff81052bae>] ? do_work_for_cpu+0x0/0x26
 [<ffffffff81057a7b>] kthread+0x81/0x89
 [<ffffffff81003854>] kernel_thread_helper+0x4/0x10
 [<ffffffff810579fa>] ? kthread+0x0/0x89
 [<ffffffff81003850>] ? kernel_thread_helper+0x0/0x10
Code: 00 eb 15 49 8d bf 90 00 00 00 48 c7 c6 1b 2e aa 81 31 c0 e8 c0 4e cd ff 4c 89 f7 e8 d6 bb ee ff 49 8b 96 00 03 00 00 48 8d 42 10 <f0> 80 4a 10 01 4c 89 f7 e8 a3 7e ed ff 85 c0 41 89 c4 74 2a 49 
RIP  [<ffffffff8160afda>] netxen_nic_probe+0x813/0x9a0
 RSP <ffff8806138abe30>
CR2: 0000000000000010
---[ end trace 059c7071bbf8de1f ]---

^ permalink raw reply

* Re: [PATCH] ip_gre: fix percpu stats accounting
From: David Miller @ 2010-10-28 18:49 UTC (permalink / raw)
  To: eric.dumazet; +Cc: xemul, netdev
In-Reply-To: <1288291679.2711.1.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 28 Oct 2010 20:47:59 +0200

> Le jeudi 28 octobre 2010 à 11:41 -0700, David Miller a écrit :
>> I am still able to revert this I think without screwing up
>> publicly visible history, so I will double check and do the
>> revert if I can.
> 
> Cool, I'll provide a patch in a couple of minutes, when tested.

Thanks.

^ permalink raw reply

* Re: [Security] TIPC security issues
From: David Miller @ 2010-10-28 18:49 UTC (permalink / raw)
  To: andy.grover
  Cc: torvalds, jon.maloy, netdev, drosenberg, security, allan.stephens,
	rds-devel
In-Reply-To: <4CC9C4B0.50404@oracle.com>

From: Andy Grover <andy.grover@oracle.com>
Date: Thu, 28 Oct 2010 11:45:04 -0700

> Yes that's right, it's to map a memory region that will be the target
> of an RDMA operation. I don't know why struct rds_iovec was used
> instead of struct iovec, but I think we're stuck, since it's part of
> our socket API.
> 
> I'll send DaveM patches to fix those two immediately-identified
> problems today, and we'll take a good long look at the rest of the
> code for further issues.

FWIW, I would strongly suggest that you copy the iovecs into the
kernel before parsing them like sys_sendmsg() and sys_recvmsg() do in
net/socket.c as part of these fixes.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox