Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] Crash in tun
From: Max Krasnyansky @ 2012-07-19 17:57 UTC (permalink / raw)
  To: David Miller; +Cc: mikulas, eric.dumazet, netdev
In-Reply-To: <20120719.104721.1649022907880598997.davem@davemloft.net>

On 07/19/2012 10:47 AM, David Miller wrote:
> From: Max Krasnyansky <maxk@qualcomm.com>
> Date: Thu, 19 Jul 2012 10:44:01 -0700
> 
>> btw I don't remember now who added the socket business to tun_struct and why.
> 
> Is GIT really so broken on your computer that you can't find the
> answer to this question in like 5 seconds as I just did?

No. I'm just too lazy these days. Too much surfing I guess :).

> commit 33dccbb050bbe35b88ca8cf1228dcf3e4d4b3554
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Thu Feb 5 21:25:32 2009 -0800
> 
>     tun: Limit amount of queued packets per device
> <snip>     
>     This patch attempts to apply the same bandaid to the tuntap device.
>     It creates a pseudo-socket object which is used to account our
>     packets just as a normal socket does for UDP.  Of course things
>     are a little complex because we're actually reinjecting traffic
>     back into the stack rather than out of the stack.

Thanks for the info. Overall it definitely makes sense. Still feels a bit of an overkill.
i.e. That we need to allocated a socket just for accounting but I guess all the involved
skb primitives are heavily based on that. If there are other use cases like this maybe
it makes sense to factor accounting stuff out of the socket struct?

Max

^ permalink raw reply

* Re: [PATCH v3 0/7] TCP Fast Open client
From: David Miller @ 2012-07-19 17:56 UTC (permalink / raw)
  To: ycheng; +Cc: hkchu, edumazet, ncardwell, sivasankar, netdev
In-Reply-To: <1342716191-19196-1-git-send-email-ycheng@google.com>

From: Yuchung Cheng <ycheng@google.com>
Date: Thu, 19 Jul 2012 09:43:04 -0700

> ChangeLog since v2:
>   - Added seqlock to update Fast Open metrics
>   - Move TCP magic code in inet_wait_for_connect() to inet_stream_connect()
>   - Move up MSG_FASTOPEN macro for better header formatting
> 
> ChangeLog since v1:
>   - Reduce tons of code by storing Fast Open stats in the TCP metrics :)
>   - Clarify the purpose of using an experimental option in patch 1/7

Ok I've applied this series and am doing build testing, if nothing
falls out from that I'll push it out to net-next.

Thanks a lot for doing this work.

^ permalink raw reply

* Re: [PATCH] mlx4_en: map entire pages to increase throughput
From: David Miller @ 2012-07-19 17:53 UTC (permalink / raw)
  To: cascardo; +Cc: netdev, yevgenyp, ogerlitz, amirv, brking, leitao, klebers
In-Reply-To: <1342458113-10384-1-git-send-email-cascardo@linux.vnet.ibm.com>

From: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Date: Mon, 16 Jul 2012 14:01:53 -0300

> In its receive path, mlx4_en driver maps each page chunk that it pushes
> to the hardware and unmaps it when pushing it up the stack. This limits
> throughput to about 3Gbps on a Power7 8-core machine.
> 
> One solution is to map the entire allocated page at once. However, this
> requires that we keep track of every page fragment we give to a
> descriptor. We also need to work with the discipline that all fragments will
> be released (in the sense that it will not be reused by the driver
> anymore) in the order they are allocated to the driver.
> 
> This requires that we don't reuse any fragments, every single one of
> them must be reallocated. We do that by releasing all the fragments that
> are processed and only after finished processing the descriptors, we
> start the refill.
> 
> We also must somehow guarantee that we either refill all fragments in a
> descriptor or none at all, without resorting to giving up a page
> fragment that we would have already given. Otherwise, we would break the
> discipline of only releasing the fragments in the order they were
> allocated.
> 
> This has passed page allocation fault injections (restricted to the
> driver by using required-start and required-end) and device hotplug
> while 16 TCP streams were able to deliver more than 9Gbps.
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>

I have not seen any reasonable objections to this patch, so I have
applied it to net-next, thanks!

^ permalink raw reply

* Re: [PATCH net-next v2] ipv4: tcp: remove per net tcp_sock
From: David Miller @ 2012-07-19 17:51 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, therbert, wsommerfeld
In-Reply-To: <1342720197.2626.4624.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 19 Jul 2012 19:49:57 +0200

> On Thu, 2012-07-19 at 10:36 -0700, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Thu, 19 Jul 2012 19:34:03 +0200
>> 
>> > v2 : move unicast_sock out of ip_send_unicast_reply() body
>> >      init sk_refcnt to 1, in case some driver get/put a reference on
>> >      socket.
>> 
>> The compiler seems much happier with this, applied, thanks Eric :-)
> 
> Maybe I should install your compiler ;) What is the version you
> currently use ?

On x86-64 I use:

[davem@dokdo net-next]$ gcc --version
gcc (GCC) 4.7.0 20120507 (Red Hat 4.7.0-5)
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and on sparc64 I use gcc-4.7.1 vanilla.

^ permalink raw reply

* Re: [PATCH] sfc: initialize dynamic sysfs attributes for lockdep
From: David Miller @ 2012-07-19 17:51 UTC (permalink / raw)
  To: bhutchings; +Cc: mschmidt, netdev, linux-net-drivers
In-Reply-To: <1342718345.2617.46.camel@bwh-desktop.uk.solarflarecom.com>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Thu, 19 Jul 2012 18:19:05 +0100

> On Thu, 2012-07-19 at 19:04 +0200, Michal Schmidt wrote:
>> Dynamically allocated sysfs attributes must be initialized using
>> sysfs_attr_init(), otherwise lockdep complains:
>> BUG: key <address> not in .data!
>>
>> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
> 
> Acked-by: Ben Hutchings <bhutchings@solarflare.com>

Applied.

^ permalink raw reply

* Re: [PATCH] bridge: update documentation references
From: David Miller @ 2012-07-19 17:50 UTC (permalink / raw)
  To: shemminger; +Cc: netdev
In-Reply-To: <20120719100107.088e5bec@s6510.linuxnetplumber.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Thu, 19 Jul 2012 10:01:07 -0700

> Update the references to bridge utilities and web pages
> to current locations
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2] net: e100: ucode is optional in some cases
From: David Miller @ 2012-07-19 17:50 UTC (permalink / raw)
  To: bjorn
  Cc: netdev, jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
	peter.p.waskiewicz.jr, alexander.h.duyck, john.ronciak,
	e1000-devel
In-Reply-To: <1342715320-990-1-git-send-email-bjorn@mork.no>

From: Bjørn Mork <bjorn@mork.no>
Date: Thu, 19 Jul 2012 18:28:40 +0200

>   commit 9ac32e1b firmware: convert e100 driver to request_firmware()
> 
> did a straight conversion of the in-driver ucode to external
> files.  This introduced the possibility of the driver failing
> to enable an interface due to missing ucode. There was no
> evaluation of the importance of the ucode at the time.
> 
> Based on comments in earlier versions of this driver, and in
> the source code for the FreeBSD fxp driver, we can assume that
> the ucode implements the "CPU Cycle Saver" feature on supported
> adapters.  Although generally wanted, this is an optional
> feature. The ucode source is not available, preventing it from
> being included in free distributions. This creates unnecessary
> problems for the end users. Doing a network install based on a
> free distribution installer requires the user to download and
> insert the ucode into the installer.
> 
> Making the ucode optional when possible improves the user
> experience and driver usability.
> 
> The ucode for some adapters include a bugfix, making it
> essential.  We continue to fail for these adapters unless the
> ucode is available.
> 
> Signed-off-by: Bjørn Mork <bjorn@mork.no>
> ---
> v2: removed URLs from the patch, converting them to generic
>     descriptions of the sources of information

Applied.

^ permalink raw reply

* Re: [PATCH net-next] asix: AX88172A driver depends on phylib
From: David Miller @ 2012-07-19 17:50 UTC (permalink / raw)
  To: christian.riesch; +Cc: netdev, fengguang.wu, kernel-janitors
In-Reply-To: <1342699339-13871-1-git-send-email-christian.riesch@omicron.at>

From: Christian Riesch <christian.riesch@omicron.at>
Date: Thu, 19 Jul 2012 14:02:19 +0200

> Since commit 16626b0cc3d5afe250850f96759b241f8a403b52 the asix
> driver depends on the phylib. Select phylib when the asix driver is
> selected.
> 
> Reported-by: Fengguang Wu <fengguang.wu@intel.com>
> Cc: kernel-janitors@vger.kernel.org
> Signed-off-by: Christian Riesch <christian.riesch@omicron.at>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 2/2] asix: Add support for programming the EEPROM
From: David Miller @ 2012-07-19 17:50 UTC (permalink / raw)
  To: christian.riesch; +Cc: netdev, allan, kernel, grundler
In-Reply-To: <1342693387-17945-2-git-send-email-christian.riesch@omicron.at>

From: Christian Riesch <christian.riesch@omicron.at>
Date: Thu, 19 Jul 2012 12:23:07 +0200

> This patch adds the asix_set_eeprom() function to provide support for
> programming the configuration EEPROM via ethtool.
> 
> Signed-off-by: Christian Riesch <christian.riesch@omicron.at>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 1/2] asix: Rework reading from EEPROM
From: David Miller @ 2012-07-19 17:50 UTC (permalink / raw)
  To: christian.riesch; +Cc: netdev, allan, kernel, grundler
In-Reply-To: <1342693387-17945-1-git-send-email-christian.riesch@omicron.at>

From: Christian Riesch <christian.riesch@omicron.at>
Date: Thu, 19 Jul 2012 12:23:06 +0200

> The current code for reading the EEPROM via ethtool in the asix
> driver has a few issues. It cannot handle odd length values
> (accesses must be aligned at 16 bit boundaries) and interprets the
> offset provided by ethtool as 16 bit word offset instead as byte offset.
> 
> The new code for asix_get_eeprom() introduced by this patch is
> modeled after the code in
> drivers/net/ethernet/atheros/atl1e/atl1e_ethtool.c
> and provides read access to the entire EEPROM with arbitrary
> offsets and lengths.
> 
> Signed-off-by: Christian Riesch <christian.riesch@omicron.at>

Applied.

^ permalink raw reply

* Re: [PATCHv1] net: stmmac: Add ip version to dts bindings
From: David Miller @ 2012-07-19 17:50 UTC (permalink / raw)
  To: sr
  Cc: dinguyen, netdev, dinh.linux, peppe.cavallaro, shiraz.hashim,
	deepak.sikri, pavel, arnd
In-Reply-To: <201207190925.58789.sr@denx.de>

From: Stefan Roese <sr@denx.de>
Date: Thu, 19 Jul 2012 09:25:58 +0200

> On Thursday 19 July 2012 01:28:26 dinguyen@altera.com wrote:
>> From: Dinh Nguyen <dinguyen@altera.com>
>> 
>> Because there are multiple variants to the stmmac/dwmac driver, the
>> dts bindings should be updated to include version of the IP used.
>> 
>> Signed-off-by: Dinh Nguyen <dinguyen@altera.com>
> 
> Acked-by: Stefan Roese <sr@denx.de>

Applied.

^ permalink raw reply

* Re: [PATCH] cxgb3: Set vlan_feature on net_device
From: David Miller @ 2012-07-19 17:50 UTC (permalink / raw)
  To: brenohl; +Cc: divy, netdev
In-Reply-To: <1342639748-16276-1-git-send-email-brenohl@br.ibm.com>

From: brenohl@br.ibm.com
Date: Wed, 18 Jul 2012 14:29:08 -0500

> cxgb3 interface has a bad performance when VLAN is set. On my current
> setup, a PowerLinux 7R2, I am able to get around 7 Gbps on a TCP_STREAM
> (8 instances, 4k message).
> With this patch, I am able to reach 9.5 Gbps.
> 
> Signed-off-by: Breno Leitao <brenohl@br.ibm.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v2] ipv4: tcp: remove per net tcp_sock
From: Eric Dumazet @ 2012-07-19 17:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, therbert, wsommerfeld
In-Reply-To: <20120719.103618.2037668049773125073.davem@davemloft.net>

On Thu, 2012-07-19 at 10:36 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 19 Jul 2012 19:34:03 +0200
> 
> > v2 : move unicast_sock out of ip_send_unicast_reply() body
> >      init sk_refcnt to 1, in case some driver get/put a reference on
> >      socket.
> 
> The compiler seems much happier with this, applied, thanks Eric :-)

Maybe I should install your compiler ;) What is the version you
currently use ?

^ permalink raw reply

* Re: [PATCH net-next] ipx: move peII functions
From: David Miller @ 2012-07-19 17:49 UTC (permalink / raw)
  To: shemminger; +Cc: acme, netdev
In-Reply-To: <20120718120948.67611bdd@s6510.linuxnetplumber.net>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 18 Jul 2012 12:09:48 -0700

> The Ethernet II wrapper is only used by IPX protocol, may have once
> been used by Appletalk but not currently. Therefore it makes sense to 
> move it to the IPX dust bin and drop the exports.
> 
> Build tested only.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: [PATCH V2] ipv6: fix incorrect route 'expires' value passed to userspace
From: David Miller @ 2012-07-19 17:49 UTC (permalink / raw)
  To: lw; +Cc: netdev, shemminger
In-Reply-To: <50076AD3.1060604@cn.fujitsu.com>

From: Li Wei <lw@cn.fujitsu.com>
Date: Thu, 19 Jul 2012 10:02:59 +0800

> 
> When userspace use RTM_GETROUTE to dump route table, with an already
> expired route entry, we always got an 'expires' value(2147157)
> calculated base on INT_MAX.
> 
> The reason of this problem is in the following satement:
> 	rt->dst.expires - jiffies < INT_MAX
> gcc promoted the type of both sides of '<' to unsigned long, thus
> a small negative value would be considered greater than INT_MAX.
> 
> This patch fix this by use the same trick as time_after macro to
> avoid the 'unsigned long' type promotion and deal with jiffies
> wrapping.
> 
> Also we should do some fix in rtnl_put_cacheinfo() which use
> jiffies_to_clock_t(which take an unsigned long as parameter) to
> convert jiffies to clock_t to handle the negative expires.
> 
> Signed-off-by: Li Wei <lw@cn.fujitsu.com>

Your patch is corrupted by your email client and therefore will
not apply cleanly.

I think this isn't the first time your patch submissions have
had this problem, and if so then you should do the necessary
work to prevent problem with more certainty in the future as
such this makes a lot of extra work for other people.

^ permalink raw reply

* Re: [PATCH] Crash in tun
From: David Miller @ 2012-07-19 17:47 UTC (permalink / raw)
  To: maxk; +Cc: mikulas, eric.dumazet, netdev
In-Reply-To: <50084761.1030602@qualcomm.com>

From: Max Krasnyansky <maxk@qualcomm.com>
Date: Thu, 19 Jul 2012 10:44:01 -0700

> btw I don't remember now who added the socket business to tun_struct and why.

Is GIT really so broken on your computer that you can't find the
answer to this question in like 5 seconds as I just did?

commit 33dccbb050bbe35b88ca8cf1228dcf3e4d4b3554
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu Feb 5 21:25:32 2009 -0800

    tun: Limit amount of queued packets per device
    
    Unlike a normal socket path, the tuntap device send path does
    not have any accounting.  This means that the user-space sender
    may be able to pin down arbitrary amounts of kernel memory by
    continuing to send data to an end-point that is congested.
    
    Even when this isn't an issue because of limited queueing at
    most end points, this can also be a problem because its only
    response to congestion is packet loss.  That is, when those
    local queues at the end-point fills up, the tuntap device will
    start wasting system time because it will continue to send
    data there which simply gets dropped straight away.
    
    Of course one could argue that everybody should do congestion
    control end-to-end, unfortunately there are people in this world
    still hooked on UDP, and they don't appear to be going away
    anywhere fast.  In fact, we've always helped them by performing
    accounting in our UDP code, the sole purpose of which is to
    provide congestion feedback other than through packet loss.
    
    This patch attempts to apply the same bandaid to the tuntap device.
    It creates a pseudo-socket object which is used to account our
    packets just as a normal socket does for UDP.  Of course things
    are a little complex because we're actually reinjecting traffic
    back into the stack rather than out of the stack.
    
    The stack complexities however should have been resolved by preceding
    patches.  So this one can simply start using skb_set_owner_w.
    
    For now the accounting is essentially disabled by default for
    backwards compatibility.  In particular, we set the cap to INT_MAX.
    This is so that existing applications don't get confused by the
    sudden arrival EAGAIN errors.
    
    In future we may wish (or be forced to) do this by default.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 15d6763..0476549 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -64,6 +64,7 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 #include <net/rtnetlink.h>
+#include <net/sock.h>
 
 #include <asm/system.h>
 #include <asm/uaccess.h>
@@ -95,6 +96,8 @@ struct tun_file {
 	wait_queue_head_t	read_wait;
 };
 
+struct tun_sock;
+
 struct tun_struct {
 	struct tun_file		*tfile;
 	unsigned int 		flags;
@@ -107,12 +110,24 @@ struct tun_struct {
 	struct fasync_struct	*fasync;
 
 	struct tap_filter       txflt;
+	struct sock		*sk;
+	struct socket		socket;
 
 #ifdef TUN_DEBUG
 	int debug;
 #endif
 };
 
+struct tun_sock {
+	struct sock		sk;
+	struct tun_struct	*tun;
+};
+
+static inline struct tun_sock *tun_sk(struct sock *sk)
+{
+	return container_of(sk, struct tun_sock, sk);
+}
+
 static int tun_attach(struct tun_struct *tun, struct file *file)
 {
 	struct tun_file *tfile = file->private_data;
@@ -461,7 +476,8 @@ static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
 {
 	struct tun_file *tfile = file->private_data;
 	struct tun_struct *tun = __tun_get(tfile);
-	unsigned int mask = POLLOUT | POLLWRNORM;
+	struct sock *sk = tun->sk;
+	unsigned int mask = 0;
 
 	if (!tun)
 		return POLLERR;
@@ -473,6 +489,11 @@ static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
 	if (!skb_queue_empty(&tun->readq))
 		mask |= POLLIN | POLLRDNORM;
 
+	if (sock_writeable(sk) ||
+	    (!test_and_set_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags) &&
+	     sock_writeable(sk)))
+		mask |= POLLOUT | POLLWRNORM;
+
 	if (tun->dev->reg_state != NETREG_REGISTERED)
 		mask = POLLERR;
 
@@ -482,66 +503,35 @@ static unsigned int tun_chr_poll(struct file *file, poll_table * wait)
 
 /* prepad is the amount to reserve at front.  len is length after that.
  * linear is a hint as to how much to copy (usually headers). */
-static struct sk_buff *tun_alloc_skb(size_t prepad, size_t len, size_t linear,
-				     gfp_t gfp)
+static inline struct sk_buff *tun_alloc_skb(struct tun_struct *tun,
+					    size_t prepad, size_t len,
+					    size_t linear, int noblock)
 {
+	struct sock *sk = tun->sk;
 	struct sk_buff *skb;
-	unsigned int i;
-
-	skb = alloc_skb(prepad + len, gfp|__GFP_NOWARN);
-	if (skb) {
-		skb_reserve(skb, prepad);
-		skb_put(skb, len);
-		return skb;
-	}
+	int err;
 
 	/* Under a page?  Don't bother with paged skb. */
 	if (prepad + len < PAGE_SIZE)
-		return NULL;
+		linear = len;
 
-	/* Start with a normal skb, and add pages. */
-	skb = alloc_skb(prepad + linear, gfp);
+	skb = sock_alloc_send_pskb(sk, prepad + linear, len - linear, noblock,
+				   &err);
 	if (!skb)
-		return NULL;
+		return ERR_PTR(err);
 
 	skb_reserve(skb, prepad);
 	skb_put(skb, linear);
-
-	len -= linear;
-
-	for (i = 0; i < MAX_SKB_FRAGS; i++) {
-		skb_frag_t *f = &skb_shinfo(skb)->frags[i];
-
-		f->page = alloc_page(gfp|__GFP_ZERO);
-		if (!f->page)
-			break;
-
-		f->page_offset = 0;
-		f->size = PAGE_SIZE;
-
-		skb->data_len += PAGE_SIZE;
-		skb->len += PAGE_SIZE;
-		skb->truesize += PAGE_SIZE;
-		skb_shinfo(skb)->nr_frags++;
-
-		if (len < PAGE_SIZE) {
-			len = 0;
-			break;
-		}
-		len -= PAGE_SIZE;
-	}
-
-	/* Too large, or alloc fail? */
-	if (unlikely(len)) {
-		kfree_skb(skb);
-		skb = NULL;
-	}
+	skb->data_len = len - linear;
+	skb->len += len - linear;
 
 	return skb;
 }
 
 /* Get packet from user space buffer */
-static __inline__ ssize_t tun_get_user(struct tun_struct *tun, struct iovec *iv, size_t count)
+static __inline__ ssize_t tun_get_user(struct tun_struct *tun,
+				       struct iovec *iv, size_t count,
+				       int noblock)
 {
 	struct tun_pi pi = { 0, cpu_to_be16(ETH_P_IP) };
 	struct sk_buff *skb;
@@ -573,9 +563,11 @@ static __inline__ ssize_t tun_get_user(struct tun_struct *tun, struct iovec *iv,
 			return -EINVAL;
 	}
 
-	if (!(skb = tun_alloc_skb(align, len, gso.hdr_len, GFP_KERNEL))) {
-		tun->dev->stats.rx_dropped++;
-		return -ENOMEM;
+	skb = tun_alloc_skb(tun, align, len, gso.hdr_len, noblock);
+	if (IS_ERR(skb)) {
+		if (PTR_ERR(skb) != -EAGAIN)
+			tun->dev->stats.rx_dropped++;
+		return PTR_ERR(skb);
 	}
 
 	if (skb_copy_datagram_from_iovec(skb, 0, iv, len)) {
@@ -661,7 +653,8 @@ static __inline__ ssize_t tun_get_user(struct tun_struct *tun, struct iovec *iv,
 static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv,
 			      unsigned long count, loff_t pos)
 {
-	struct tun_struct *tun = tun_get(iocb->ki_filp);
+	struct file *file = iocb->ki_filp;
+	struct tun_struct *tun = file->private_data;
 	ssize_t result;
 
 	if (!tun)
@@ -669,7 +662,8 @@ static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv,
 
 	DBG(KERN_INFO "%s: tun_chr_write %ld\n", tun->dev->name, count);
 
-	result = tun_get_user(tun, (struct iovec *) iv, iov_length(iv, count));
+	result = tun_get_user(tun, (struct iovec *)iv, iov_length(iv, count),
+			      file->f_flags & O_NONBLOCK);
 
 	tun_put(tun);
 	return result;
@@ -828,11 +822,40 @@ static struct rtnl_link_ops tun_link_ops __read_mostly = {
 	.validate	= tun_validate,
 };
 
+static void tun_sock_write_space(struct sock *sk)
+{
+	struct tun_struct *tun;
+
+	if (!sock_writeable(sk))
+		return;
+
+	if (sk->sk_sleep && waitqueue_active(sk->sk_sleep))
+		wake_up_interruptible_sync(sk->sk_sleep);
+
+	if (!test_and_clear_bit(SOCK_ASYNC_NOSPACE, &sk->sk_socket->flags))
+		return;
+
+	tun = container_of(sk, struct tun_sock, sk)->tun;
+	kill_fasync(&tun->fasync, SIGIO, POLL_OUT);
+}
+
+static void tun_sock_destruct(struct sock *sk)
+{
+	dev_put(container_of(sk, struct tun_sock, sk)->tun->dev);
+}
+
+static struct proto tun_proto = {
+	.name		= "tun",
+	.owner		= THIS_MODULE,
+	.obj_size	= sizeof(struct tun_sock),
+};
 
 static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 {
+	struct sock *sk;
 	struct tun_struct *tun;
 	struct net_device *dev;
+	struct tun_file *tfile = file->private_data;
 	int err;
 
 	dev = __dev_get_by_name(net, ifr->ifr_name);
@@ -885,14 +908,31 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 		tun->flags = flags;
 		tun->txflt.count = 0;
 
+		err = -ENOMEM;
+		sk = sk_alloc(net, AF_UNSPEC, GFP_KERNEL, &tun_proto);
+		if (!sk)
+			goto err_free_dev;
+
+		/* This ref count is for tun->sk. */
+		dev_hold(dev);
+		sock_init_data(&tun->socket, sk);
+		sk->sk_write_space = tun_sock_write_space;
+		sk->sk_destruct = tun_sock_destruct;
+		sk->sk_sndbuf = INT_MAX;
+		sk->sk_sleep = &tfile->read_wait;
+
+		tun->sk = sk;
+		container_of(sk, struct tun_sock, sk)->tun = tun;
+
 		tun_net_init(dev);
 
 		if (strchr(dev->name, '%')) {
 			err = dev_alloc_name(dev, dev->name);
 			if (err < 0)
-				goto err_free_dev;
+				goto err_free_sk;
 		}
 
+		err = -EINVAL;
 		err = register_netdevice(tun->dev);
 		if (err < 0)
 			goto err_free_dev;
@@ -928,6 +968,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 	strcpy(ifr->ifr_name, tun->dev->name);
 	return 0;
 
+ err_free_sk:
+	sock_put(sk);
  err_free_dev:
 	free_netdev(dev);
  failed:
@@ -1012,6 +1054,7 @@ static int tun_chr_ioctl(struct inode *inode, struct file *file,
 	struct tun_struct *tun;
 	void __user* argp = (void __user*)arg;
 	struct ifreq ifr;
+	int sndbuf;
 	int ret;
 
 	if (cmd == TUNSETIFF || _IOC_TYPE(cmd) == 0x89)
@@ -1151,6 +1194,22 @@ static int tun_chr_ioctl(struct inode *inode, struct file *file,
 		ret = dev_set_mac_address(tun->dev, &ifr.ifr_hwaddr);
 		rtnl_unlock();
 		break;
+
+	case TUNGETSNDBUF:
+		sndbuf = tun->sk->sk_sndbuf;
+		if (copy_to_user(argp, &sndbuf, sizeof(sndbuf)))
+			ret = -EFAULT;
+		break;
+
+	case TUNSETSNDBUF:
+		if (copy_from_user(&sndbuf, argp, sizeof(sndbuf))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		tun->sk->sk_sndbuf = sndbuf;
+		break;
+
 	default:
 		ret = -EINVAL;
 		break;
@@ -1218,8 +1277,10 @@ static int tun_chr_close(struct inode *inode, struct file *file)
 		__tun_detach(tun);
 
 		/* If desireable, unregister the netdevice. */
-		if (!(tun->flags & TUN_PERSIST))
+		if (!(tun->flags & TUN_PERSIST)) {
+			sock_put(tun->sk);
 			unregister_netdevice(tun->dev);
+		}
 
 		rtnl_unlock();
 	}
diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index c8f8d59..c03c10d 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -1988,6 +1988,8 @@ COMPATIBLE_IOCTL(TUNSETGROUP)
 COMPATIBLE_IOCTL(TUNGETFEATURES)
 COMPATIBLE_IOCTL(TUNSETOFFLOAD)
 COMPATIBLE_IOCTL(TUNSETTXFILTER)
+COMPATIBLE_IOCTL(TUNGETSNDBUF)
+COMPATIBLE_IOCTL(TUNSETSNDBUF)
 /* Big V */
 COMPATIBLE_IOCTL(VT_SETMODE)
 COMPATIBLE_IOCTL(VT_GETMODE)
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index 8529f57..049d6c9 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -46,6 +46,8 @@
 #define TUNSETOFFLOAD  _IOW('T', 208, unsigned int)
 #define TUNSETTXFILTER _IOW('T', 209, unsigned int)
 #define TUNGETIFF      _IOR('T', 210, unsigned int)
+#define TUNGETSNDBUF   _IOR('T', 211, int)
+#define TUNSETSNDBUF   _IOW('T', 212, int)
 
 /* TUNSETIFF ifr flags */
 #define IFF_TUN		0x0001

^ permalink raw reply related

* Re: [PATCH] Crash in tun
From: Max Krasnyansky @ 2012-07-19 17:44 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Eric Dumazet, netdev, davem
In-Reply-To: <alpine.DEB.2.00.1207191746170.7550@artax.karlin.mff.cuni.cz>

On 07/19/2012 09:13 AM, Mikulas Patocka wrote:
> 
> 
> On Thu, 19 Jul 2012, Eric Dumazet wrote:
> 
>> Hi Mikulas
>>
>> A fix for this problem is : http://patchwork.ozlabs.org/patch/170440/
> 
> If you call tun_free_netdev beacuse of a jump to an error label 
> err_free_sk, your patch still calls it with NULL file, causing a memory 
> corruption and a possible crash.
> 
> Your patch doesn't fix sockets_in_use underflow.
> 
> Maybe we can commit this patch --- it introduces a new flag 
> SOCK_EXTERNALLY_ALLOCATED to work around both problems. (it looks quite 
> nicer than my previous patch with file = (void *)1).


I definitely like this second version better. Less hacky an all.

btw I don't remember now who added the socket business to tun_struct and why.
It seems to be messy in general. Originally version, back when I was still paying attention to it,
didn't have sockets at all. Only char and net devices. It was much cleaner.

Max






> ---
> 
> tun: fix a crash bug and a memory leak
> 
> This patch fixes a crash
> tun_chr_close -> netdev_run_todo -> tun_free_netdev -> sk_release_kernel ->
> sock_release -> iput(SOCK_INODE(sock))
> introduced by commit 1ab5ecb90cb6a3df1476e052f76a6e8f6511cb3d
> 
> The problem is that this socket is embedded in struct tun_struct, it has
> no inode, iput is called on invalid inode, which modifies invalid memory
> and optionally causes a crash.
> 
> sock_release also decrements sockets_in_use, this causes a bug that
> "sockets: used" field in /proc/*/net/sockstat keeps on decreasing when
> creating and closing tun devices.
> 
> This patch introduces a flag SOCK_EXTERNALLY_ALLOCATED that instructs
> sock_release to not free the inode and not decrement sockets_in_use,
> fixing both memory corruption and sockets_in_use underflow.
> 
> It should be backported to 3.3 an 3.4 stabke.
> 
> Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
> Cc: stable@kernel.org
> 
> ---
>  drivers/net/tun.c   |    3 +++
>  include/linux/net.h |    1 +
>  net/socket.c        |    3 +++
>  3 files changed, 7 insertions(+)
> 
> Index: linux-3.4.5-fast/drivers/net/tun.c
> ===================================================================
> --- linux-3.4.5-fast.orig/drivers/net/tun.c	2012-07-19 17:55:16.000000000 +0200
> +++ linux-3.4.5-fast/drivers/net/tun.c	2012-07-19 17:58:30.000000000 +0200
> @@ -358,6 +358,8 @@ static void tun_free_netdev(struct net_d
>  {
>  	struct tun_struct *tun = netdev_priv(dev);
>  
> +	BUG_ON(!test_bit(SOCK_EXTERNALLY_ALLOCATED, &tun->socket.flags));
> +
>  	sk_release_kernel(tun->socket.sk);
>  }
>  
> @@ -1115,6 +1117,7 @@ static int tun_set_iff(struct net *net,
>  		tun->flags = flags;
>  		tun->txflt.count = 0;
>  		tun->vnet_hdr_sz = sizeof(struct virtio_net_hdr);
> +		set_bit(SOCK_EXTERNALLY_ALLOCATED, &tun->socket.flags);
>  
>  		err = -ENOMEM;
>  		sk = sk_alloc(&init_net, AF_UNSPEC, GFP_KERNEL, &tun_proto);
> Index: linux-3.4.5-fast/include/linux/net.h
> ===================================================================
> --- linux-3.4.5-fast.orig/include/linux/net.h	2012-07-19 17:54:31.000000000 +0200
> +++ linux-3.4.5-fast/include/linux/net.h	2012-07-19 17:55:03.000000000 +0200
> @@ -72,6 +72,7 @@ struct net;
>  #define SOCK_NOSPACE		2
>  #define SOCK_PASSCRED		3
>  #define SOCK_PASSSEC		4
> +#define SOCK_EXTERNALLY_ALLOCATED 5
>  
>  #ifndef ARCH_HAS_SOCKET_TYPES
>  /**
> Index: linux-3.4.5-fast/net/socket.c
> ===================================================================
> --- linux-3.4.5-fast.orig/net/socket.c	2012-07-19 17:56:55.000000000 +0200
> +++ linux-3.4.5-fast/net/socket.c	2012-07-19 17:57:50.000000000 +0200
> @@ -522,6 +522,9 @@ void sock_release(struct socket *sock)
>  	if (rcu_dereference_protected(sock->wq, 1)->fasync_list)
>  		printk(KERN_ERR "sock_release: fasync list not empty!\n");
>  
> +	if (test_bit(SOCK_EXTERNALLY_ALLOCATED, &sock->flags))
> +		return;
> +
>  	percpu_sub(sockets_in_use, 1);
>  	if (!sock->file) {
>  		iput(SOCK_INODE(sock));
> 

^ permalink raw reply

* [PATCH] net: Fix warnings in dst_ops.h
From: David Miller @ 2012-07-19 17:43 UTC (permalink / raw)
  To: netdev


include/net/dst_ops.h:28:20: warning: ‘struct sock’ declared inside parameter list

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dst_ops.h |    1 +
 1 file changed, 1 insertion(+)

diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index d079fc6..2f26dfb 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -8,6 +8,7 @@ struct dst_entry;
 struct kmem_cachep;
 struct net_device;
 struct sk_buff;
+struct sock;
 
 struct dst_ops {
 	unsigned short		family;
-- 
1.7.10.4


^ permalink raw reply related

* Re: [PATCH v3 2/7] net-tcp: Fast Open client - cookie cache
From: Eric Dumazet @ 2012-07-19 17:41 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: davem, hkchu, edumazet, ncardwell, sivasankar, netdev
In-Reply-To: <1342716191-19196-3-git-send-email-ycheng@google.com>

On Thu, 2012-07-19 at 09:43 -0700, Yuchung Cheng wrote:
> With help from Eric Dumazet, add Fast Open metrics in tcp metrics cache.
> The basic ones are MSS and the cookies. Later patch will cache more to
> handle unfriendly middleboxes.
> 
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> ---
>  include/net/tcp.h      |    4 +++
>  net/ipv4/tcp_metrics.c |   51 ++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 55 insertions(+), 0 deletions(-)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* Re: [PATCH net-next v2] ipv4: tcp: remove per net tcp_sock
From: David Miller @ 2012-07-19 17:36 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, therbert, wsommerfeld
In-Reply-To: <1342719243.2626.4571.camel@edumazet-glaptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 19 Jul 2012 19:34:03 +0200

> v2 : move unicast_sock out of ip_send_unicast_reply() body
>      init sk_refcnt to 1, in case some driver get/put a reference on
>      socket.

The compiler seems much happier with this, applied, thanks Eric :-)

^ permalink raw reply

* [PATCH net-next v2] ipv4: tcp: remove per net tcp_sock
From: Eric Dumazet @ 2012-07-19 17:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tom Herbert, Bill Sommerfeld

From: Eric Dumazet <edumazet@google.com>

tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
per network namespace.

This leads to bad behavior on multiqueue NICS, because many cpus
contend for the socket lock and once socket lock is acquired, extra
false sharing on various socket fields slow down the operations.

To better resist to attacks, we use a percpu socket. Each cpu can
run without contention, using appropriate memory (local node)

Additional features :

1) We also mirror the queue_mapping of the incoming skb, so that
answers use the same queue if possible.

2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()

3) We now limit the number of in-flight RST/ACK [1] packets
per cpu, instead of per namespace, and we honor the sysctl_wmem_default
limit dynamically. (Prior to this patch, sysctl_wmem_default value was
copied at boot time, so any further change would not affect tcp_sock
limit)


[1] These packets are only generated when no socket was matched for
the incoming packet.

Reported-by: Bill Sommerfeld <wsommerfeld@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
---
v2 : move unicast_sock out of ip_send_unicast_reply() body
     init sk_refcnt to 1, in case some driver get/put a reference on
socket.

 include/net/ip.h         |    2 -
 include/net/netns/ipv4.h |    1 
 net/ipv4/ip_output.c     |   50 +++++++++++++++++++++++--------------
 net/ipv4/tcp_ipv4.c      |    8 ++---
 4 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index ec5cfde..bd5e444 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -158,7 +158,7 @@ static inline __u8 ip_reply_arg_flowi_flags(const struct ip_reply_arg *arg)
 	return (arg->flags & IP_REPLY_ARG_NOSRCCHECK) ? FLOWI_FLAG_ANYSRC : 0;
 }
 
-void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, __be32 daddr,
+void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr,
 			   __be32 saddr, const struct ip_reply_arg *arg,
 			   unsigned int len);
 
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 2e089a9..d909c7f 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -38,7 +38,6 @@ struct netns_ipv4 {
 	struct sock		*fibnl;
 
 	struct sock		**icmp_sk;
-	struct sock		*tcp_sock;
 	struct inet_peer_base	*peers;
 	struct tcpm_hash_bucket	*tcp_metrics_hash;
 	unsigned int		tcp_metrics_hash_mask;
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index cc52679..c528f84 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1463,20 +1463,33 @@ static int ip_reply_glue_bits(void *dptr, char *to, int offset,
 
 /*
  *	Generic function to send a packet as reply to another packet.
- *	Used to send TCP resets so far.
+ *	Used to send some TCP resets/acks so far.
  *
- *	Should run single threaded per socket because it uses the sock
- *     	structure to pass arguments.
+ *	Use a fake percpu inet socket to avoid false sharing and contention.
  */
-void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, __be32 daddr,
+static DEFINE_PER_CPU(struct inet_sock, unicast_sock) = {
+	.sk = {
+		.__sk_common = {
+			.skc_refcnt = ATOMIC_INIT(1),
+		},
+		.sk_wmem_alloc	= ATOMIC_INIT(1),
+		.sk_allocation	= GFP_ATOMIC,
+		.sk_flags	= (1UL << SOCK_USE_WRITE_QUEUE),
+	},
+	.pmtudisc = IP_PMTUDISC_WANT,
+};
+
+void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr,
 			   __be32 saddr, const struct ip_reply_arg *arg,
 			   unsigned int len)
 {
-	struct inet_sock *inet = inet_sk(sk);
 	struct ip_options_data replyopts;
 	struct ipcm_cookie ipc;
 	struct flowi4 fl4;
 	struct rtable *rt = skb_rtable(skb);
+	struct sk_buff *nskb;
+	struct sock *sk;
+	struct inet_sock *inet;
 
 	if (ip_options_echo(&replyopts.opt.opt, skb))
 		return;
@@ -1494,38 +1507,39 @@ void ip_send_unicast_reply(struct sock *sk, struct sk_buff *skb, __be32 daddr,
 
 	flowi4_init_output(&fl4, arg->bound_dev_if, 0,
 			   RT_TOS(arg->tos),
-			   RT_SCOPE_UNIVERSE, sk->sk_protocol,
+			   RT_SCOPE_UNIVERSE, ip_hdr(skb)->protocol,
 			   ip_reply_arg_flowi_flags(arg),
 			   daddr, saddr,
 			   tcp_hdr(skb)->source, tcp_hdr(skb)->dest);
 	security_skb_classify_flow(skb, flowi4_to_flowi(&fl4));
-	rt = ip_route_output_key(sock_net(sk), &fl4);
+	rt = ip_route_output_key(net, &fl4);
 	if (IS_ERR(rt))
 		return;
 
-	/* And let IP do all the hard work.
+	inet = &get_cpu_var(unicast_sock);
 
-	   This chunk is not reenterable, hence spinlock.
-	   Note that it uses the fact, that this function is called
-	   with locally disabled BH and that sk cannot be already spinlocked.
-	 */
-	bh_lock_sock(sk);
 	inet->tos = arg->tos;
+	sk = &inet->sk;
 	sk->sk_priority = skb->priority;
 	sk->sk_protocol = ip_hdr(skb)->protocol;
 	sk->sk_bound_dev_if = arg->bound_dev_if;
+	sock_net_set(sk, net);
+	__skb_queue_head_init(&sk->sk_write_queue);
+	sk->sk_sndbuf = sysctl_wmem_default;
 	ip_append_data(sk, &fl4, ip_reply_glue_bits, arg->iov->iov_base, len, 0,
 		       &ipc, &rt, MSG_DONTWAIT);
-	if ((skb = skb_peek(&sk->sk_write_queue)) != NULL) {
+	nskb = skb_peek(&sk->sk_write_queue);
+	if (nskb) {
 		if (arg->csumoffset >= 0)
-			*((__sum16 *)skb_transport_header(skb) +
-			  arg->csumoffset) = csum_fold(csum_add(skb->csum,
+			*((__sum16 *)skb_transport_header(nskb) +
+			  arg->csumoffset) = csum_fold(csum_add(nskb->csum,
 								arg->csum));
-		skb->ip_summed = CHECKSUM_NONE;
+		nskb->ip_summed = CHECKSUM_NONE;
+		skb_set_queue_mapping(nskb, skb_get_queue_mapping(skb));
 		ip_push_pending_frames(sk, &fl4);
 	}
 
-	bh_unlock_sock(sk);
+	put_cpu_var(unicast_sock);
 
 	ip_rt_put(rt);
 }
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d9caf5c..d7d2fa5 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -688,7 +688,7 @@ static void tcp_v4_send_reset(struct sock *sk, struct sk_buff *skb)
 
 	net = dev_net(skb_dst(skb)->dev);
 	arg.tos = ip_hdr(skb)->tos;
-	ip_send_unicast_reply(net->ipv4.tcp_sock, skb, ip_hdr(skb)->saddr,
+	ip_send_unicast_reply(net, skb, ip_hdr(skb)->saddr,
 			      ip_hdr(skb)->daddr, &arg, arg.iov[0].iov_len);
 
 	TCP_INC_STATS_BH(net, TCP_MIB_OUTSEGS);
@@ -771,7 +771,7 @@ static void tcp_v4_send_ack(struct sk_buff *skb, u32 seq, u32 ack,
 	if (oif)
 		arg.bound_dev_if = oif;
 	arg.tos = tos;
-	ip_send_unicast_reply(net->ipv4.tcp_sock, skb, ip_hdr(skb)->saddr,
+	ip_send_unicast_reply(net, skb, ip_hdr(skb)->saddr,
 			      ip_hdr(skb)->daddr, &arg, arg.iov[0].iov_len);
 
 	TCP_INC_STATS_BH(net, TCP_MIB_OUTSEGS);
@@ -2624,13 +2624,11 @@ EXPORT_SYMBOL(tcp_prot);
 
 static int __net_init tcp_sk_init(struct net *net)
 {
-	return inet_ctl_sock_create(&net->ipv4.tcp_sock,
-				    PF_INET, SOCK_RAW, IPPROTO_TCP, net);
+	return 0;
 }
 
 static void __net_exit tcp_sk_exit(struct net *net)
 {
-	inet_ctl_sock_destroy(net->ipv4.tcp_sock);
 }
 
 static void __net_exit tcp_sk_exit_batch(struct list_head *net_exit_list)

^ permalink raw reply related

* Re: [PATCH v3] ipv4: use seqlock for nh_exceptions
From: David Miller @ 2012-07-19 17:30 UTC (permalink / raw)
  To: ja; +Cc: netdev
In-Reply-To: <1342642535-2545-1-git-send-email-ja@ssi.bg>

From: Julian Anastasov <ja@ssi.bg>
Date: Wed, 18 Jul 2012 23:15:35 +0300

> 	Use global seqlock for the nh_exceptions. Call
> fnhe_oldest with the right hash chain. Correct the diff
> value for dst_set_expires.
> 
> v2: after suggestions from Eric Dumazet:
> * get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
> * continue daddr search in rt_bind_exception
> 
> v3:
> * remove the daddr check before seqlock in rt_bind_exception
> * restart lookup in rt_bind_exception on detected seqlock change,
> as suggested by David Miller
> 
> Signed-off-by: Julian Anastasov <ja@ssi.bg>

Applied, thanks a lot Julian.

^ permalink raw reply

* Re: [PATCH net-next] ipv4: tcp: remove per net tcp_sock
From: Eric Dumazet @ 2012-07-19 17:22 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, therbert, wsommerfeld
In-Reply-To: <20120719.101225.2227860476502999476.davem@davemloft.net>

On Thu, 2012-07-19 at 10:12 -0700, David Miller wrote:

> Just give this thing a unique name and define it right before the function.

Yep, will do that.

I also will init sk_refcnt to one, just in case a driver wants to get a
reference on socket (and free it later).

^ permalink raw reply

* Re: [PATCH] sfc: initialize dynamic sysfs attributes for lockdep
From: Ben Hutchings @ 2012-07-19 17:19 UTC (permalink / raw)
  To: Michal Schmidt; +Cc: netdev, Solarflare linux maintainers
In-Reply-To: <1342717485-24034-1-git-send-email-mschmidt@redhat.com>

On Thu, 2012-07-19 at 19:04 +0200, Michal Schmidt wrote:
> Dynamically allocated sysfs attributes must be initialized using
> sysfs_attr_init(), otherwise lockdep complains:
> BUG: key <address> not in .data!
>
> Signed-off-by: Michal Schmidt <mschmidt@redhat.com>

Acked-by: Ben Hutchings <bhutchings@solarflare.com>

> ---
>  drivers/net/ethernet/sfc/mcdi_mon.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/ethernet/sfc/mcdi_mon.c b/drivers/net/ethernet/sfc/mcdi_mon.c
> index fb7f65b..1d552f0 100644
> --- a/drivers/net/ethernet/sfc/mcdi_mon.c
> +++ b/drivers/net/ethernet/sfc/mcdi_mon.c
> @@ -222,6 +222,7 @@ efx_mcdi_mon_add_attr(struct efx_nic *efx, const char *name,
>  	attr->index = index;
>  	attr->type = type;
>  	attr->limit_value = limit_value;
> +	sysfs_attr_init(&attr->dev_attr.attr);
>  	attr->dev_attr.attr.name = attr->name;
>  	attr->dev_attr.attr.mode = S_IRUGO;
>  	attr->dev_attr.show = reader;

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next] ipv4: tcp: remove per net tcp_sock
From: David Miller @ 2012-07-19 17:12 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, therbert, wsommerfeld
In-Reply-To: <20120719.101155.1970854797296520147.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 19 Jul 2012 10:11:55 -0700 (PDT)

> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 19 Jul 2012 19:07:47 +0200
> 
>> On Thu, 2012-07-19 at 08:45 -0700, David Miller wrote:
>>> From: David Miller <davem@davemloft.net>
>>> Date: Thu, 19 Jul 2012 08:35:44 -0700 (PDT)
>>> 
>>> > Looks great, applied, thanks Eric.
>>> 
>>> I take that back, it doesn't build:
>>> 
>>> net/ipv4/ip_output.c: In function ‘ip_send_unicast_reply’:
>>> net/ipv4/ip_output.c:1481:1: error: section attribute cannot be specified for local variables
>>> net/ipv4/ip_output.c:1481:1: error: section attribute cannot be specified for local variables
>>> net/ipv4/ip_output.c:1481:1: error: declaration of ‘__pcpu_unique_unicast_sock’ with no linkage follows extern declaration
>>> net/ipv4/ip_output.c:1481:1: note: previous declaration of ‘__pcpu_unique_unicast_sock’ was here
>>> net/ipv4/ip_output.c:1481:9: error: section attribute cannot be specified for local variables
>>> net/ipv4/ip_output.c:1481:9: error: weak declaration of ‘unicast_sock’ must be public
>> 
>> Strange, it builds on my machines, and I got nice performance boost.
>> 
>> Apparently your arch doesnt handle the 
>> 
>> static DEFINE_PER_CPU(struct inet_sock, unicast_sock)
>> 
>> in the function body ?
> 
> It's x86-64, standard Fedora 17 install.

Just give this thing a unique name and define it right before the function.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox