Netdev List
 help / color / mirror / Atom feed
* Re: tx checksum offload in rtl8168evl disabled in driver
From: Francois Romieu @ 2013-10-03 23:01 UTC (permalink / raw)
  To: jason.morgan; +Cc: netdev, hayeswang
In-Reply-To: <OF3E5567E2.6AC62D58-ON80257BF9.004ED219-80257BF9.004F713D@aveillant.com>

jason.morgan@aveillant.com <jason.morgan@aveillant.com> :
[...]
> I'm at 517Mbps and I've found that there seems to be a cpu bottleneck.

Which kernel ?

> I'm using 2k to 4k frames with a rtl8168evl.
> I've found this message
> http://www.spinics.net/lists/netdev/msg216530.html
[...]
> However the message thread, above indicates that this is not a problem and 
> can be changed to make tx-checksum offload possible.
> 
> However we are using a newer chip to the on in the message thread.  I've 
> tried to find other, more recent citations without success.
> 
> So, why is it still turned off ?

It has been disabled since d58d46b5d85139d18eb939aa7279c160bab70484 ("r8169:
jumbo fixes"). Patch was submitted as a RFC on 2011/07/17 and Hayes was
explicitely requested to comment on the jumbo part if necessary. Patch was
submitted for inclusion on 2011/09/22.

Tx checksumming and jumbo are mutually exclusive in Realtek's driver as well.

It seems no recent gigabit chipset reliably supports it.

> What will be the effect of turning it on (changing false to true, in the 
> driver line) for our chip ?

YMMV.

Hayes may elaborate.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH v2 net-next] fix unsafe set_memory_rw from softirq
From: Eric Dumazet @ 2013-10-03 23:02 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, netdev, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Daniel Borkmann, Paul E. McKenney, Xi Wang, x86,
	Eric Dumazet, Heiko Carstens, linux-kernel
In-Reply-To: <1380840466-3822-1-git-send-email-ast@plumgrid.com>

On Thu, 2013-10-03 at 15:47 -0700, Alexei Starovoitov wrote:
> on x86 system with net.core.bpf_jit_enable = 1
> 

> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -644,7 +644,9 @@ void sk_filter_release_rcu(struct rcu_head *rcu)
>  	struct sk_filter *fp = container_of(rcu, struct sk_filter, rcu);
>  
>  	bpf_jit_free(fp);
> +#if !defined(CONFIG_X86_64) /* x86_64 has a deferred free */
>  	kfree(fp);
> +#endif

Sorry this is not very nice.

Make bpf_jit_free(fp) a bool ?  true : caller must free, false : caller
must not free ?

if (bpf_jit_free(fp))
	kfree(fp);

Or move the kfree() in bpf_jit_free()

^ permalink raw reply

* Re: [PATCH v2 net-next] fix unsafe set_memory_rw from softirq
From: Eric Dumazet @ 2013-10-03 23:07 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, netdev, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Daniel Borkmann, Paul E. McKenney, Xi Wang, x86,
	Eric Dumazet, Heiko Carstens, linux-kernel
In-Reply-To: <1380840466-3822-1-git-send-email-ast@plumgrid.com>

On Thu, 2013-10-03 at 15:47 -0700, Alexei Starovoitov wrote:

> @@ -722,7 +725,8 @@ EXPORT_SYMBOL_GPL(sk_unattached_filter_destroy);
>  int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
>  {
>  	struct sk_filter *fp, *old_fp;
> -	unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
> +	unsigned int fsize = max(sizeof(struct sock_filter) * fprog->len,
> +				 sizeof(struct work_struct));
>  	int err;
>  
>  	if (sock_flag(sk, SOCK_FILTER_LOCKED))

Thats broken, as we might copy more data from user than expected,
and eventually trigger EFAULT :

if (copy_from_user(fp->insns, fprog->filter, fsize)) {

^ permalink raw reply

* Re: [PATCH v2 net-next] fix unsafe set_memory_rw from softirq
From: Alexei Starovoitov @ 2013-10-03 23:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, netdev, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Daniel Borkmann, Paul E. McKenney, Xi Wang, x86,
	Eric Dumazet, Heiko Carstens, linux-kernel
In-Reply-To: <1380841333.19002.259.camel@edumazet-glaptop.roam.corp.google.com>

On Thu, Oct 3, 2013 at 4:02 PM, Eric Dumazet <erdnetdev@gmail.com> wrote:
> On Thu, 2013-10-03 at 15:47 -0700, Alexei Starovoitov wrote:
>> on x86 system with net.core.bpf_jit_enable = 1
>>
>
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -644,7 +644,9 @@ void sk_filter_release_rcu(struct rcu_head *rcu)
>>       struct sk_filter *fp = container_of(rcu, struct sk_filter, rcu);
>>
>>       bpf_jit_free(fp);
>> +#if !defined(CONFIG_X86_64) /* x86_64 has a deferred free */
>>       kfree(fp);
>> +#endif
>
> Sorry this is not very nice.
>
> Make bpf_jit_free(fp) a bool ?  true : caller must free, false : caller
> must not free ?
>
> if (bpf_jit_free(fp))
>         kfree(fp);
>
> Or move the kfree() in bpf_jit_free()

I think it's cleaner too, just didn't want to touch all architectures.
Will do then.

^ permalink raw reply

* Re: [PATCH v2 net-next] fix unsafe set_memory_rw from softirq
From: Alexei Starovoitov @ 2013-10-03 23:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, netdev, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Daniel Borkmann, Paul E. McKenney, Xi Wang, x86,
	Eric Dumazet, Heiko Carstens, linux-kernel
In-Reply-To: <1380841666.19002.262.camel@edumazet-glaptop.roam.corp.google.com>

On Thu, Oct 3, 2013 at 4:07 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-10-03 at 15:47 -0700, Alexei Starovoitov wrote:
>
>> @@ -722,7 +725,8 @@ EXPORT_SYMBOL_GPL(sk_unattached_filter_destroy);
>>  int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
>>  {
>>       struct sk_filter *fp, *old_fp;
>> -     unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
>> +     unsigned int fsize = max(sizeof(struct sock_filter) * fprog->len,
>> +                              sizeof(struct work_struct));
>>       int err;
>>
>>       if (sock_flag(sk, SOCK_FILTER_LOCKED))
>
> Thats broken, as we might copy more data from user than expected,
> and eventually trigger EFAULT :
>
> if (copy_from_user(fp->insns, fprog->filter, fsize)) {

yes. will fix.

^ permalink raw reply

* Re: Ideas on why using WPA2 encryption speeds up many TCP connections?
From: Ben Greear @ 2013-10-03 23:19 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, linux-wireless@vger.kernel.org
In-Reply-To: <524DC2B8.905@candelatech.com>

I was seeing an un-expectedly bad wifi train rates, so I changed to ath9k-rate-control,
rebooted, and re-ran the tests.  Throughput is much improved.  I really hope it wasn't just
a reboot that fixed it, but too burned out to do more tests today.

I still see better TCP throughput with WPA2 when using 25-200 stations/streams.

For anyone who wants to wade through some big automated reports, see the links
near the top of this page (suggestions for improving those reports are welcome):

http://www.candelatech.com/lf_wifi_examples.php

(The 600 station reports are a bit dated and were done in a fairly busy
  wifi environment.  We'll re-run those sometime soon-ish.)

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH RFC 01/77] PCI/MSI: Fix return value when populate_msi_sysfs() failed
From: Jon Mason @ 2013-10-04  0:59 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Alexander Gordeev, linux-kernel, Bjorn Helgaas, Ralf Baechle,
	Michael Ellerman, Benjamin Herrenschmidt, Martin Schwidefsky,
	Ingo Molnar, Tejun Heo, Dan Williams, Andy King, Matt Porter,
	stable, linux-pci, linux-mips, linuxppc-dev, linux390, linux-s390,
	x86, linux-ide, iss_storagedev, linux-nvme, linux-rdma, netdev,
	e1000-devel, linux-driver, Solarflare linux maintainers
In-Reply-To: <1380836781.3419.17.camel@bwh-desktop.uk.level5networks.com>

On Thu, Oct 03, 2013 at 10:46:21PM +0100, Ben Hutchings wrote:
> On Wed, 2013-10-02 at 17:39 -0700, Jon Mason wrote:
> > On Wed, Oct 02, 2013 at 12:48:17PM +0200, Alexander Gordeev wrote:
> > > Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
> > 
> > Since you are changing the behavior of the msix_capability_init
> > function on populate_msi_sysfs error, a comment describing why in this
> > commit would be nice.
> [...]
> 
> This function was already treating that error as fatal, and freeing the
> MSIs.  The change in behaviour is that it now returns the error code in
> this case, rather than 0.  This is obviously correct and properly
> described by the one-line summary.

If someone dumb, like me, is looking at this commit and trying to
figure out what is happening, having ANY commit message is good.  "Fix
the return value" doesn't tell me anything.  Documenting what issue(s)
would've been seen had the error case been encountered and what will
now been seen would be very nice.

> 
> Ben.
> 
> -- 
> Ben Hutchings, Staff Engineer, Solarflare
> Not speaking for my employer; that's the marketing department's job.
> They asked us to note that Solarflare product names are trademarked.
> 

^ permalink raw reply

* [PATCH v3 net-next] fix unsafe set_memory_rw from softirq
From: Alexei Starovoitov @ 2013-10-04  2:24 UTC (permalink / raw)
  To: David S. Miller
  Cc: Benjamin Herrenschmidt, Heiko Carstens, Eric Dumazet,
	Paul Mackerras, H. Peter Anvin, sparclinux, Nicolas Dichtel,
	Alexei Starovoitov, linux-s390, Russell King, x86, James Morris,
	Ingo Molnar, Alexey Kuznetsov, Paul E. McKenney, Xi Wang,
	Matt Evans, Thomas Gleixner, linux-arm-kernel, Stelian Nirlu,
	Nicolas Schichan, Hideaki YOSHIFUJI, netdev, linux-kernel,
	Mircea Gherzan <m

on x86 system with net.core.bpf_jit_enable = 1

sudo tcpdump -i eth1 'tcp port 22'

causes the warning:
[   56.766097]  Possible unsafe locking scenario:
[   56.766097]
[   56.780146]        CPU0
[   56.786807]        ----
[   56.793188]   lock(&(&vb->lock)->rlock);
[   56.799593]   <Interrupt>
[   56.805889]     lock(&(&vb->lock)->rlock);
[   56.812266]
[   56.812266]  *** DEADLOCK ***
[   56.812266]
[   56.830670] 1 lock held by ksoftirqd/1/13:
[   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380
[   56.849757]
[   56.849757] stack backtrace:
[   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
[   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   56.882004]  ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
[   56.895630]  ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
[   56.909313]  ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
[   56.923006] Call Trace:
[   56.929532]  [<ffffffff8175a145>] dump_stack+0x55/0x76
[   56.936067]  [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208
[   56.942445]  [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50
[   56.948932]  [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150
[   56.955470]  [<ffffffff810ccb52>] mark_lock+0x282/0x2c0
[   56.961945]  [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50
[   56.968474]  [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50
[   56.975140]  [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90
[   56.981942]  [<ffffffff810cef72>] lock_acquire+0x92/0x1d0
[   56.988745]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   56.995619]  [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50
[   57.002493]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   57.009447]  [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380
[   57.016477]  [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380
[   57.023607]  [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460
[   57.030818]  [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10
[   57.037896]  [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0
[   57.044789]  [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0
[   57.051720]  [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40
[   57.058727]  [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40
[   57.065577]  [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30
[   57.072338]  [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0
[   57.078962]  [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0
[   57.085373]  [<ffffffff81058245>] run_ksoftirqd+0x35/0x70

cannot reuse jited filter memory, since it's readonly,
so use original bpf insns memory to hold work_struct

defer kfree of sk_filter until jit completed freeing

tested on x86_64 and i386

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 arch/arm/net/bpf_jit_32.c       |    1 +
 arch/powerpc/net/bpf_jit_comp.c |    1 +
 arch/s390/net/bpf_jit_comp.c    |    4 +++-
 arch/sparc/net/bpf_jit_comp.c   |    1 +
 arch/x86/net/bpf_jit_comp.c     |   20 +++++++++++++++-----
 include/linux/filter.h          |   11 +++++++++--
 net/core/filter.c               |   11 +++++++----
 7 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index f50d223..99b44e0 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -930,4 +930,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
 	if (fp->bpf_func != sk_run_filter)
 		module_free(NULL, fp->bpf_func);
+	kfree(fp);
 }
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index bf56e33..2345bdb 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -691,4 +691,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
 	if (fp->bpf_func != sk_run_filter)
 		module_free(NULL, fp->bpf_func);
+	kfree(fp);
 }
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 7092392..a5df511 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -881,7 +881,9 @@ void bpf_jit_free(struct sk_filter *fp)
 	struct bpf_binary_header *header = (void *)addr;
 
 	if (fp->bpf_func == sk_run_filter)
-		return;
+		goto free_filter;
 	set_memory_rw(addr, header->pages);
 	module_free(NULL, header);
+free_filter:
+	kfree(fp);
 }
diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
index 9c7be59..218b6b2 100644
--- a/arch/sparc/net/bpf_jit_comp.c
+++ b/arch/sparc/net/bpf_jit_comp.c
@@ -808,4 +808,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
 	if (fp->bpf_func != sk_run_filter)
 		module_free(NULL, fp->bpf_func);
+	kfree(fp);
 }
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 79c216a..1396a0a 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -772,13 +772,23 @@ out:
 	return;
 }
 
+static void bpf_jit_free_deferred(struct work_struct *work)
+{
+	struct sk_filter *fp = container_of((void *)work, struct sk_filter,
+					    insns);
+	unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
+	struct bpf_binary_header *header = (void *)addr;
+
+	set_memory_rw(addr, header->pages);
+	module_free(NULL, header);
+	kfree(fp);
+}
+
 void bpf_jit_free(struct sk_filter *fp)
 {
 	if (fp->bpf_func != sk_run_filter) {
-		unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
-		struct bpf_binary_header *header = (void *)addr;
-
-		set_memory_rw(addr, header->pages);
-		module_free(NULL, header);
+		struct work_struct *work = (struct work_struct *)fp->insns;
+		INIT_WORK(work, bpf_jit_free_deferred);
+		schedule_work(work);
 	}
 }
diff --git a/include/linux/filter.h b/include/linux/filter.h
index a6ac848..5d66cd9 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -25,15 +25,20 @@ struct sk_filter
 {
 	atomic_t		refcnt;
 	unsigned int         	len;	/* Number of filter blocks */
+	struct rcu_head		rcu;
 	unsigned int		(*bpf_func)(const struct sk_buff *skb,
 					    const struct sock_filter *filter);
-	struct rcu_head		rcu;
+	/* insns start right after bpf_func, so that sk_run_filter() fetches
+	 * first insn from the same cache line that was used to call into
+	 * sk_run_filter()
+	 */
 	struct sock_filter     	insns[0];
 };
 
 static inline unsigned int sk_filter_len(const struct sk_filter *fp)
 {
-	return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
+	return max(fp->len * sizeof(struct sock_filter),
+		   sizeof(struct work_struct)) + sizeof(*fp);
 }
 
 extern int sk_filter(struct sock *sk, struct sk_buff *skb);
@@ -67,11 +72,13 @@ static inline void bpf_jit_dump(unsigned int flen, unsigned int proglen,
 }
 #define SK_RUN_FILTER(FILTER, SKB) (*FILTER->bpf_func)(SKB, FILTER->insns)
 #else
+#include <linux/slab.h>
 static inline void bpf_jit_compile(struct sk_filter *fp)
 {
 }
 static inline void bpf_jit_free(struct sk_filter *fp)
 {
+	kfree(fp);
 }
 #define SK_RUN_FILTER(FILTER, SKB) sk_run_filter(SKB, FILTER->insns)
 #endif
diff --git a/net/core/filter.c b/net/core/filter.c
index 6438f29..ad5eaba 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -644,7 +644,6 @@ void sk_filter_release_rcu(struct rcu_head *rcu)
 	struct sk_filter *fp = container_of(rcu, struct sk_filter, rcu);
 
 	bpf_jit_free(fp);
-	kfree(fp);
 }
 EXPORT_SYMBOL(sk_filter_release_rcu);
 
@@ -677,13 +676,15 @@ int sk_unattached_filter_create(struct sk_filter **pfp,
 {
 	struct sk_filter *fp;
 	unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
+	unsigned int sk_fsize = max_t(u32, fsize, sizeof(struct work_struct))
+		+ sizeof(*fp);
 	int err;
 
 	/* Make sure new filter is there and in the right amounts. */
 	if (fprog->filter == NULL)
 		return -EINVAL;
 
-	fp = kmalloc(fsize + sizeof(*fp), GFP_KERNEL);
+	fp = kmalloc(sk_fsize, GFP_KERNEL);
 	if (!fp)
 		return -ENOMEM;
 	memcpy(fp->insns, fprog->filter, fsize);
@@ -723,6 +724,8 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 {
 	struct sk_filter *fp, *old_fp;
 	unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
+	unsigned int sk_fsize = max_t(u32, fsize, sizeof(struct work_struct))
+		+ sizeof(*fp);
 	int err;
 
 	if (sock_flag(sk, SOCK_FILTER_LOCKED))
@@ -732,11 +735,11 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 	if (fprog->filter == NULL)
 		return -EINVAL;
 
-	fp = sock_kmalloc(sk, fsize+sizeof(*fp), GFP_KERNEL);
+	fp = sock_kmalloc(sk, sk_fsize, GFP_KERNEL);
 	if (!fp)
 		return -ENOMEM;
 	if (copy_from_user(fp->insns, fprog->filter, fsize)) {
-		sock_kfree_s(sk, fp, fsize+sizeof(*fp));
+		sock_kfree_s(sk, fp, sk_fsize);
 		return -EFAULT;
 	}
 
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH v2 net-next] fix unsafe set_memory_rw from softirq
From: Alexei Starovoitov @ 2013-10-04  2:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, netdev, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Daniel Borkmann, Paul E. McKenney, Xi Wang, x86,
	Eric Dumazet, Heiko Carstens, linux-kernel
In-Reply-To: <CAMEtUuz4Qyg1s7CocNotTYXcn1odcbT5nek1T66dd+dK3ukfTw@mail.gmail.com>

On Thu, Oct 3, 2013 at 4:11 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Thu, Oct 3, 2013 at 4:07 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Thu, 2013-10-03 at 15:47 -0700, Alexei Starovoitov wrote:
>>
>>> @@ -722,7 +725,8 @@ EXPORT_SYMBOL_GPL(sk_unattached_filter_destroy);
>>>  int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
>>>  {
>>>       struct sk_filter *fp, *old_fp;
>>> -     unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
>>> +     unsigned int fsize = max(sizeof(struct sock_filter) * fprog->len,
>>> +                              sizeof(struct work_struct));
>>>       int err;
>>>
>>>       if (sock_flag(sk, SOCK_FILTER_LOCKED))
>>
>> Thats broken, as we might copy more data from user than expected,
>> and eventually trigger EFAULT :
>>
>> if (copy_from_user(fp->insns, fprog->filter, fsize)) {
>
> yes. will fix.

tested on x86_64/i386 only
with tcpdump and netsniff 1-4k filter size.
Thank you for careful review.

^ permalink raw reply

* Re: [PATCH v2.41 5/5] datapath: Add basic MPLS support to kernel
From: Pravin Shelar @ 2013-10-04  2:46 UTC (permalink / raw)
  To: Simon Horman
  Cc: dev@openvswitch.org, netdev, Jesse Gross, Ben Pfaff, Ravi K,
	Isaku Yamahata, Joe Stringer
In-Reply-To: <20131003002052.GA13111@verge.net.au>

On Wed, Oct 2, 2013 at 5:20 PM, Simon Horman <horms@verge.net.au> wrote:
> On Wed, Oct 02, 2013 at 11:03:57AM -0700, Pravin Shelar wrote:
>> On Mon, Sep 30, 2013 at 11:47 PM, Simon Horman <horms@verge.net.au> wrote:
>> > Allow datapath to recognize and extract MPLS labels into flow keys
>> > and execute actions which push, pop, and set labels on packets.
>> >
>> > Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.
>> >
>> > Cc: Ravi K <rkerur@gmail.com>
>> > Cc: Leo Alterman <lalterman@nicira.com>
>> > Cc: Isaku Yamahata <yamahata@valinux.co.jp>
>> > Cc: Joe Stringer <joe@wand.net.nz>
>> > Signed-off-by: Simon Horman <horms@verge.net.au>
>> >
>> > ---
>> >
>> > +
>> > +       /* this hack needed to get regular skb_gso_segment() */
>> > +#ifdef HAVE___SKB_GSO_SEGMENT
>> > +#undef __skb_gso_segment
>> > +       skb_gso = __skb_gso_segment(skb, features, tx_path);
>> > +#else
>> > +#undef skb_gso_segment
>> > +       skb_gso = skb_gso_segment(skb, features);
>> > +#endif
>> > +
>>
>> We can get rid of #ifdefs by just using different name for
>> rpl___skb_gso_segment(), something like mpls_vlan_skb_gso_segment().
>> The way it is done for tnl-gso.
>
> Thanks.
>
> The reason that I had the code arranged this way was so that
> calls to __skb_gso_segment() would go via rpl___skb_gso_segment()
> on kernels older than v3.11. In particular calls outside of gso.c.
>
> On closer examination the only such case is in ovs_dp_upcall().
> Currently there should be no need to perform MPLS GSO segmentation in that
> case because MPLS GSO segmentation can only be needed after actions are
> applied.
>
> However, I am concerned that it may be necessary later when
> recirculation is introduced as in that case an upcall may occur
> on a packet which has had actions applied.

good point.

currently we define __skb_gso_segment using skb_gso_segemt(). You have
reversed it. Is there any reason?
if you keep it as it is, it can simplify code a bit.

^ permalink raw reply

* RE: tx checksum offload in rtl8168evl disabled in driver
From: hayeswang @ 2013-10-04  3:10 UTC (permalink / raw)
  To: 'Francois Romieu', jason.morgan; +Cc: netdev, 'nic_swsd'
In-Reply-To: <20131003230120.GA25047@electric-eye.fr.zoreil.com>

Francois Romieu [mailto:romieu@fr.zoreil.com] 
[...]
> > I'm using 2k to 4k frames with a rtl8168evl.
> > I've found this message
> > http://www.spinics.net/lists/netdev/msg216530.html
> [...]
> > However the message thread, above indicates that this is 
> not a problem and 
> > can be changed to make tx-checksum offload possible.
> > 
> > However we are using a newer chip to the on in the message 
> thread.  I've 
> > tried to find other, more recent citations without success.
> > 
> > So, why is it still turned off ?
> 
> It has been disabled since 
> d58d46b5d85139d18eb939aa7279c160bab70484 ("r8169:
> jumbo fixes"). Patch was submitted as a RFC on 2011/07/17 and 
> Hayes was
> explicitely requested to comment on the jumbo part if 
> necessary. Patch was
> submitted for inclusion on 2011/09/22.
> 
> Tx checksumming and jumbo are mutually exclusive in Realtek's 
> driver as well.
> 
> It seems no recent gigabit chipset reliably supports it.
> 
> > What will be the effect of turning it on (changing false to 
> true, in the 
> > driver line) for our chip ?

Since RTL8111E, the chips support a feature which we call "tx early".
For jumbo frame, the hw starts to send a packet before getting the whole data.
However, the checksum has to be calculated after the whole data are fetched.
Therefore, the jumbo frame and the tx checksum couldn't be enable at the same
time, otherwise a packet with incorrect checksum would be sent.
 
Best Regards,
Hayes

^ permalink raw reply

* [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
From: Masatake YAMATO @ 2013-10-04  4:05 UTC (permalink / raw)
  To: netdev; +Cc: yamato

ip link has ability to show extra information of net work device if
kernel provides sunh information. With this patch veth driver can
provide its peer ifindex information to ip command via netlink
interface.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
---
 drivers/net/veth.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index eee1f19..54187b9 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -434,6 +434,25 @@ static const struct nla_policy veth_policy[VETH_INFO_MAX + 1] = {
 	[VETH_INFO_PEER]	= { .len = sizeof(struct ifinfomsg) },
 };
 
+static size_t veth_get_size(const struct net_device *dev)
+{
+	return nla_total_size(sizeof(u64)) + /* VETH_INFO_PEER */
+		0;
+}
+
+static int veth_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+	struct veth_priv *priv = netdev_priv(dev);
+	struct net_device *peer = rtnl_dereference(priv->peer);
+	u64 peer_ifindex;
+
+	peer_ifindex = peer ? peer->ifindex : 0;
+	if (nla_put_u64(skb, VETH_INFO_PEER, peer_ifindex))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
 static struct rtnl_link_ops veth_link_ops = {
 	.kind		= DRV_NAME,
 	.priv_size	= sizeof(struct veth_priv),
@@ -443,6 +462,8 @@ static struct rtnl_link_ops veth_link_ops = {
 	.dellink	= veth_dellink,
 	.policy		= veth_policy,
 	.maxtype	= VETH_INFO_MAX,
+	.get_size	= veth_get_size,
+	.fill_info	= veth_fill_info,
 };
 
 /*
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH] ip: Showing peer of veth type dev in ip link (ip cmd side)
From: Masatake YAMATO @ 2013-10-04  4:06 UTC (permalink / raw)
  To: netdev; +Cc: yamato

Implement print_opt method to veth to show peer ifindex
as ethtool -S does.

A patch submitted with following subject is needed:

   veth: Showing peer of veth type dev in ip link (kernel side)

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
---
 ip/link_veth.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/ip/link_veth.c b/ip/link_veth.c
index 7730f39..bd84815 100644
--- a/ip/link_veth.c
+++ b/ip/link_veth.c
@@ -11,6 +11,7 @@
  */
 
 #include <string.h>
+#include <inttypes.h>
 #include <net/if.h>
 #include <linux/veth.h>
 
@@ -57,7 +58,22 @@ static int veth_parse_opt(struct link_util *lu, int argc, char **argv,
 	return argc - 1 - err;
 }
 
+static void veth_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
+{
+	if (!tb)
+		return;
+
+	if (tb[VETH_INFO_PEER] &&
+	    RTA_PAYLOAD(tb[VETH_INFO_PEER]) < sizeof(__u64))
+		return;
+
+	fprintf(f, "peer_ifindex %"PRIu64,
+		(uint64_t)rta_getattr_u64(tb[VETH_INFO_PEER]));
+}
+
 struct link_util veth_link_util = {
 	.id = "veth",
+	.maxattr	= VETH_INFO_MAX,
 	.parse_opt = veth_parse_opt,
+	.print_opt = veth_print_opt,
 };
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH v3 net-next] fix unsafe set_memory_rw from softirq
From: Alexei Starovoitov @ 2013-10-04  4:11 UTC (permalink / raw)
  To: David S. Miller
  Cc: Daniel Borkmann, Eric Dumazet, linux-arm-kernel, linuxppc-dev,
	linux-s390, netdev

on x86 system with net.core.bpf_jit_enable = 1

sudo tcpdump -i eth1 'tcp port 22'

causes the warning:
[   56.766097]  Possible unsafe locking scenario:
[   56.766097]
[   56.780146]        CPU0
[   56.786807]        ----
[   56.793188]   lock(&(&vb->lock)->rlock);
[   56.799593]   <Interrupt>
[   56.805889]     lock(&(&vb->lock)->rlock);
[   56.812266]
[   56.812266]  *** DEADLOCK ***
[   56.812266]
[   56.830670] 1 lock held by ksoftirqd/1/13:
[   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380
[   56.849757]
[   56.849757] stack backtrace:
[   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
[   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   56.882004]  ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
[   56.895630]  ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
[   56.909313]  ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
[   56.923006] Call Trace:
[   56.929532]  [<ffffffff8175a145>] dump_stack+0x55/0x76
[   56.936067]  [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208
[   56.942445]  [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50
[   56.948932]  [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150
[   56.955470]  [<ffffffff810ccb52>] mark_lock+0x282/0x2c0
[   56.961945]  [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50
[   56.968474]  [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50
[   56.975140]  [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90
[   56.981942]  [<ffffffff810cef72>] lock_acquire+0x92/0x1d0
[   56.988745]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   56.995619]  [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50
[   57.002493]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   57.009447]  [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380
[   57.016477]  [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380
[   57.023607]  [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460
[   57.030818]  [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10
[   57.037896]  [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0
[   57.044789]  [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0
[   57.051720]  [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40
[   57.058727]  [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40
[   57.065577]  [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30
[   57.072338]  [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0
[   57.078962]  [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0
[   57.085373]  [<ffffffff81058245>] run_ksoftirqd+0x35/0x70

cannot reuse jited filter memory, since it's readonly,
so use original bpf insns memory to hold work_struct

defer kfree of sk_filter until jit completed freeing

tested on x86_64 and i386

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
---
 arch/arm/net/bpf_jit_32.c       |    1 +
 arch/powerpc/net/bpf_jit_comp.c |    1 +
 arch/s390/net/bpf_jit_comp.c    |    4 +++-
 arch/sparc/net/bpf_jit_comp.c   |    1 +
 arch/x86/net/bpf_jit_comp.c     |   20 +++++++++++++++-----
 include/linux/filter.h          |   11 +++++++++--
 net/core/filter.c               |   11 +++++++----
 7 files changed, 37 insertions(+), 12 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index f50d223..99b44e0 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -930,4 +930,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
 	if (fp->bpf_func != sk_run_filter)
 		module_free(NULL, fp->bpf_func);
+	kfree(fp);
 }
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index bf56e33..2345bdb 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -691,4 +691,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
 	if (fp->bpf_func != sk_run_filter)
 		module_free(NULL, fp->bpf_func);
+	kfree(fp);
 }
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 7092392..a5df511 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -881,7 +881,9 @@ void bpf_jit_free(struct sk_filter *fp)
 	struct bpf_binary_header *header = (void *)addr;
 
 	if (fp->bpf_func == sk_run_filter)
-		return;
+		goto free_filter;
 	set_memory_rw(addr, header->pages);
 	module_free(NULL, header);
+free_filter:
+	kfree(fp);
 }
diff --git a/arch/sparc/net/bpf_jit_comp.c b/arch/sparc/net/bpf_jit_comp.c
index 9c7be59..218b6b2 100644
--- a/arch/sparc/net/bpf_jit_comp.c
+++ b/arch/sparc/net/bpf_jit_comp.c
@@ -808,4 +808,5 @@ void bpf_jit_free(struct sk_filter *fp)
 {
 	if (fp->bpf_func != sk_run_filter)
 		module_free(NULL, fp->bpf_func);
+	kfree(fp);
 }
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 79c216a..1396a0a 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -772,13 +772,23 @@ out:
 	return;
 }
 
+static void bpf_jit_free_deferred(struct work_struct *work)
+{
+	struct sk_filter *fp = container_of((void *)work, struct sk_filter,
+					    insns);
+	unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
+	struct bpf_binary_header *header = (void *)addr;
+
+	set_memory_rw(addr, header->pages);
+	module_free(NULL, header);
+	kfree(fp);
+}
+
 void bpf_jit_free(struct sk_filter *fp)
 {
 	if (fp->bpf_func != sk_run_filter) {
-		unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
-		struct bpf_binary_header *header = (void *)addr;
-
-		set_memory_rw(addr, header->pages);
-		module_free(NULL, header);
+		struct work_struct *work = (struct work_struct *)fp->insns;
+		INIT_WORK(work, bpf_jit_free_deferred);
+		schedule_work(work);
 	}
 }
diff --git a/include/linux/filter.h b/include/linux/filter.h
index a6ac848..5d66cd9 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -25,15 +25,20 @@ struct sk_filter
 {
 	atomic_t		refcnt;
 	unsigned int         	len;	/* Number of filter blocks */
+	struct rcu_head		rcu;
 	unsigned int		(*bpf_func)(const struct sk_buff *skb,
 					    const struct sock_filter *filter);
-	struct rcu_head		rcu;
+	/* insns start right after bpf_func, so that sk_run_filter() fetches
+	 * first insn from the same cache line that was used to call into
+	 * sk_run_filter()
+	 */
 	struct sock_filter     	insns[0];
 };
 
 static inline unsigned int sk_filter_len(const struct sk_filter *fp)
 {
-	return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
+	return max(fp->len * sizeof(struct sock_filter),
+		   sizeof(struct work_struct)) + sizeof(*fp);
 }
 
 extern int sk_filter(struct sock *sk, struct sk_buff *skb);
@@ -67,11 +72,13 @@ static inline void bpf_jit_dump(unsigned int flen, unsigned int proglen,
 }
 #define SK_RUN_FILTER(FILTER, SKB) (*FILTER->bpf_func)(SKB, FILTER->insns)
 #else
+#include <linux/slab.h>
 static inline void bpf_jit_compile(struct sk_filter *fp)
 {
 }
 static inline void bpf_jit_free(struct sk_filter *fp)
 {
+	kfree(fp);
 }
 #define SK_RUN_FILTER(FILTER, SKB) sk_run_filter(SKB, FILTER->insns)
 #endif
diff --git a/net/core/filter.c b/net/core/filter.c
index 6438f29..ad5eaba 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -644,7 +644,6 @@ void sk_filter_release_rcu(struct rcu_head *rcu)
 	struct sk_filter *fp = container_of(rcu, struct sk_filter, rcu);
 
 	bpf_jit_free(fp);
-	kfree(fp);
 }
 EXPORT_SYMBOL(sk_filter_release_rcu);
 
@@ -677,13 +676,15 @@ int sk_unattached_filter_create(struct sk_filter **pfp,
 {
 	struct sk_filter *fp;
 	unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
+	unsigned int sk_fsize = max_t(u32, fsize, sizeof(struct work_struct))
+		+ sizeof(*fp);
 	int err;
 
 	/* Make sure new filter is there and in the right amounts. */
 	if (fprog->filter == NULL)
 		return -EINVAL;
 
-	fp = kmalloc(fsize + sizeof(*fp), GFP_KERNEL);
+	fp = kmalloc(sk_fsize, GFP_KERNEL);
 	if (!fp)
 		return -ENOMEM;
 	memcpy(fp->insns, fprog->filter, fsize);
@@ -723,6 +724,8 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 {
 	struct sk_filter *fp, *old_fp;
 	unsigned int fsize = sizeof(struct sock_filter) * fprog->len;
+	unsigned int sk_fsize = max_t(u32, fsize, sizeof(struct work_struct))
+		+ sizeof(*fp);
 	int err;
 
 	if (sock_flag(sk, SOCK_FILTER_LOCKED))
@@ -732,11 +735,11 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 	if (fprog->filter == NULL)
 		return -EINVAL;
 
-	fp = sock_kmalloc(sk, fsize+sizeof(*fp), GFP_KERNEL);
+	fp = sock_kmalloc(sk, sk_fsize, GFP_KERNEL);
 	if (!fp)
 		return -ENOMEM;
 	if (copy_from_user(fp->insns, fprog->filter, fsize)) {
-		sock_kfree_s(sk, fp, fsize+sizeof(*fp));
+		sock_kfree_s(sk, fp, sk_fsize);
 		return -EFAULT;
 	}
 
-- 
1.7.9.5

^ permalink raw reply related

* Re: [stable 3.0] add some CVE fixes
From: Ben Hutchings @ 2013-10-04  4:18 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Greg KH, stable, netdev
In-Reply-To: <524DC16D.90805@suse.cz>

[-- Attachment #1: Type: text/plain, Size: 1260 bytes --]

On Thu, 2013-10-03 at 21:11 +0200, Jiri Slaby wrote:
> On 10/03/2013 08:44 PM, Greg KH wrote:
> > On Thu, Oct 03, 2013 at 11:20:28AM +0200, Jiri Slaby wrote:
> >> Plus the backports that are replied to this mail?
> > 
> > I don't see any backports, did you forget to send them?
> 
> I don't think so, they were sent and this is a log of one of them:
> OK. Log says:
> Sendmail: /usr/sbin/sendmail -f jslaby@suse.cz -i stable@vger.kernel.org
> jslaby@suse.cz
> From: Jiri Slaby <jslaby@suse.cz>
> To: <stable@vger.kernel.org>
> Cc: jslaby@suse.cz
> Subject: [PATCH 4/4] Tools: hv: verify origin of netlink connector message
> Date: Thu,  3 Oct 2013 11:23:50 +0200
> Message-Id: <1380792230-27255-4-git-send-email-jslaby@suse.cz>
> X-Mailer: git-send-email 1.8.4
> In-Reply-To: <1380792230-27255-1-git-send-email-jslaby@suse.cz>
> References: <524D36DC.5070506@suse.cz>
>  <1380792230-27255-1-git-send-email-jslaby@suse.cz>
> 
> Result: OK
> 
> Could you check your spam folder? Or I can bounce them directly to you?

They didn't hit the list either.  If they're applicable to 3.2 as well
then could you send them this way?

Ben.

-- 
Ben Hutchings
Life is like a sewer:
what you get out of it depends on what you put into it.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: [PATCH] xen-netback: Don't destroy the netdev until the vif is shut
From: David Miller @ 2013-10-04  4:22 UTC (permalink / raw)
  To: Paul.Durrant; +Cc: netdev, wei.liu2


Can one of you do -stable backports of this change for v3.11 and any
other active -stable branch where this fix is relevant?

I gave it a shot, and it got outside my realm of expertise.

Thanks!

^ permalink raw reply

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
From: Eric Dumazet @ 2013-10-04  4:49 UTC (permalink / raw)
  To: Masatake YAMATO; +Cc: netdev
In-Reply-To: <1380859529-32351-1-git-send-email-yamato@redhat.com>

On Fri, 2013-10-04 at 13:05 +0900, Masatake YAMATO wrote:
> ip link has ability to show extra information of net work device if
> kernel provides sunh information. With this patch veth driver can
> provide its peer ifindex information to ip command via netlink
> interface.

> +
> +	peer_ifindex = peer ? peer->ifindex : 0;
> +	if (nla_put_u64(skb, VETH_INFO_PEER, peer_ifindex))
> +		return -EMSGSIZE;
> +

ifindex are int, so you should use nla_put_u32()

^ permalink raw reply

* Re: [PATCH RFC 06/77] PCI/MSI: Factor out pci_get_msi_cap() interface
From: Alexander Gordeev @ 2013-10-04  5:13 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: linux-kernel, Bjorn Helgaas, Ralf Baechle, Michael Ellerman,
	Benjamin Herrenschmidt, Martin Schwidefsky, Ingo Molnar,
	Tejun Heo, Dan Williams, Andy King, Jon Mason, Matt Porter,
	linux-pci, linux-mips, linuxppc-dev, linux390, linux-s390, x86,
	linux-ide, iss_storagedev, linux-nvme, linux-rdma, netdev,
	e1000-devel, linux-dr
In-Reply-To: <1380837174.3419.21.camel@bwh-desktop.uk.level5networks.com>

On Thu, Oct 03, 2013 at 10:52:54PM +0100, Ben Hutchings wrote:
> On Wed, 2013-10-02 at 12:48 +0200, Alexander Gordeev wrote:
> >  #ifndef CONFIG_PCI_MSI
> > +static inline int pci_get_msi_cap(struct pci_dev *dev)
> > +{
> > +	return -1;
> [...]
> 
> Shouldn't this also return -EINVAL?

Yep, all inliners here are better to return -EINVAL.
Will do unless someone speaks out against.

> Ben.

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply

* Re: [PATCH v3 net-next] fix unsafe set_memory_rw from softirq
From: Eric Dumazet @ 2013-10-04  5:16 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Daniel Borkmann, Eric Dumazet, linux-arm-kernel,
	linuxppc-dev, linux-s390, netdev
In-Reply-To: <1380859875-31025-1-git-send-email-ast@plumgrid.com>

On Thu, 2013-10-03 at 21:11 -0700, Alexei Starovoitov wrote:

> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index a6ac848..5d66cd9 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -25,15 +25,20 @@ struct sk_filter
>  {
>  	atomic_t		refcnt;
>  	unsigned int         	len;	/* Number of filter blocks */
> +	struct rcu_head		rcu;
>  	unsigned int		(*bpf_func)(const struct sk_buff *skb,
>  					    const struct sock_filter *filter);
> -	struct rcu_head		rcu;
> +	/* insns start right after bpf_func, so that sk_run_filter() fetches
> +	 * first insn from the same cache line that was used to call into
> +	 * sk_run_filter()
> +	 */
>  	struct sock_filter     	insns[0];
>  };
>  
>  static inline unsigned int sk_filter_len(const struct sk_filter *fp)
>  {
> -	return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
> +	return max(fp->len * sizeof(struct sock_filter),
> +		   sizeof(struct work_struct)) + sizeof(*fp);
>  }

I would use for include/linux/filter.h this (untested) diff :

(Note the include <linux/workqueue.h>)

I also remove your comment about cache lines, since there is nothing
to align stuff on a cache line boundary.

diff --git a/include/linux/filter.h b/include/linux/filter.h
index a6ac848..281b05c 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -6,6 +6,7 @@
 
 #include <linux/atomic.h>
 #include <linux/compat.h>
+#include <linux/workqueue.h>
 #include <uapi/linux/filter.h>
 
 #ifdef CONFIG_COMPAT
@@ -25,15 +26,20 @@ struct sk_filter
 {
 	atomic_t		refcnt;
 	unsigned int         	len;	/* Number of filter blocks */
+	struct rcu_head		rcu;
 	unsigned int		(*bpf_func)(const struct sk_buff *skb,
 					    const struct sock_filter *filter);
-	struct rcu_head		rcu;
-	struct sock_filter     	insns[0];
+	union {
+		struct work_struct	work;
+		struct sock_filter	insns[0];
+	};
 };
 
-static inline unsigned int sk_filter_len(const struct sk_filter *fp)
+static inline unsigned int sk_filter_size(const struct sk_filter *fp,
+					  unsigned int proglen)
 {
-	return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
+	return max(sizeof(*fp),
+		   offsetof(struct sk_filter, insns[proglen]));
 }
 
 extern int sk_filter(struct sock *sk, struct sk_buff *skb);



This way, you can use sk_filter_size(fp, fprog->len)
instead of doing the max() games in sk_attach_filter() and
sk_unattached_filter_create()

Other than that, I think your patch is fine.

^ permalink raw reply related

* Re: [PATCH 1/1] hso: fix problem with wrong status code sent by OPTION GTM601 during RING indication
From: Dr. H. Nikolaus Schaller @ 2013-10-04  6:22 UTC (permalink / raw)
  To: David Miller
  Cc: j.dumon-x9gZzRpC1QbQT0dZR+AlfA,
	marek.belisko-Re5JQEeQqe8AvxtiuMwx3w,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20131003.160049.199880461500344692.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>


Am 03.10.2013 um 22:00 schrieb David Miller:

> From: "Dr. H. Nikolaus Schaller" <hns-xXXSsgcRVICgSpxsJD1C4w@public.gmane.org>
> Date: Thu, 3 Oct 2013 21:40:34 +0200
> 
>> I have made the bug observation from debug log that this bit is set in a response
>> each time the modem has a RING message. It might be specific to this modem
>> and firmware version, i.e. a firmware bug. But we have no means to verify or even
>> change it in the firmware.
>> 
>> I.e. the driver must handle it in a better way.
>> 
>> Because the notification is rejected by the driver, the driver will hang up and the
>> whole modem connection breaks down.
>> 
>> With this patch, the problem was never observed again in ~2 years.
>> 
>> I'd hope the maintainer of this driver can shed some light on it.
> 
> I think if you suspect the bit is set in response to RING messages
> then you should define a macro so that the number is not just magic,
> and put a descriptrive comment above the macro definition for that
> bit.

Good idea! I will resend the patch.

--- hns--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v3 net-next] fix unsafe set_memory_rw from softirq
From: Alexei Starovoitov @ 2013-10-04  6:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexei Starovoitov, David S. Miller, Daniel Borkmann,
	Eric Dumazet, linux-arm-kernel, linuxppc-dev, linux-s390, netdev
In-Reply-To: <1380863798.3564.12.camel@edumazet-glaptop.roam.corp.google.com>

On Thu, Oct 3, 2013 at 10:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-10-03 at 21:11 -0700, Alexei Starovoitov wrote:
>
> -static inline unsigned int sk_filter_len(const struct sk_filter *fp)
> +static inline unsigned int sk_filter_size(const struct sk_filter *fp,
> +                                         unsigned int proglen)
>  {
> -       return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
> +       return max(sizeof(*fp),
> +                  offsetof(struct sk_filter, insns[proglen]));
>  }

indeed that's cleaner.
Like this then:
-static inline unsigned int sk_filter_len(const struct sk_filter *fp)
+static inline unsigned int sk_filter_size(unsigned int proglen)
 {
-       return fp->len * sizeof(struct sock_filter) + sizeof(*fp);
+       return max(sizeof(struct sk_filter),
+                  offsetof(struct sk_filter, insns[proglen]));
 }

testing it... will send v4 shortly

^ permalink raw reply

* Re: [PATCH v2.41 5/5] datapath: Add basic MPLS support to kernel
From: Simon Horman @ 2013-10-04  6:40 UTC (permalink / raw)
  To: Pravin Shelar
  Cc: dev@openvswitch.org, netdev, Jesse Gross, Ben Pfaff, Ravi K,
	Isaku Yamahata, Joe Stringer
In-Reply-To: <CALnjE+o6TK_2OGbWjZ4EwXMnXw5bzMQH2PeTMJM6nJQV+V147Q@mail.gmail.com>

On Thu, Oct 03, 2013 at 07:46:46PM -0700, Pravin Shelar wrote:
> On Wed, Oct 2, 2013 at 5:20 PM, Simon Horman <horms@verge.net.au> wrote:
> > On Wed, Oct 02, 2013 at 11:03:57AM -0700, Pravin Shelar wrote:
> >> On Mon, Sep 30, 2013 at 11:47 PM, Simon Horman <horms@verge.net.au> wrote:
> >> > Allow datapath to recognize and extract MPLS labels into flow keys
> >> > and execute actions which push, pop, and set labels on packets.
> >> >
> >> > Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.
> >> >
> >> > Cc: Ravi K <rkerur@gmail.com>
> >> > Cc: Leo Alterman <lalterman@nicira.com>
> >> > Cc: Isaku Yamahata <yamahata@valinux.co.jp>
> >> > Cc: Joe Stringer <joe@wand.net.nz>
> >> > Signed-off-by: Simon Horman <horms@verge.net.au>
> >> >
> >> > ---
> >> >
> >> > +
> >> > +       /* this hack needed to get regular skb_gso_segment() */
> >> > +#ifdef HAVE___SKB_GSO_SEGMENT
> >> > +#undef __skb_gso_segment
> >> > +       skb_gso = __skb_gso_segment(skb, features, tx_path);
> >> > +#else
> >> > +#undef skb_gso_segment
> >> > +       skb_gso = skb_gso_segment(skb, features);
> >> > +#endif
> >> > +
> >>
> >> We can get rid of #ifdefs by just using different name for
> >> rpl___skb_gso_segment(), something like mpls_vlan_skb_gso_segment().
> >> The way it is done for tnl-gso.
> >
> > Thanks.
> >
> > The reason that I had the code arranged this way was so that
> > calls to __skb_gso_segment() would go via rpl___skb_gso_segment()
> > on kernels older than v3.11. In particular calls outside of gso.c.
> >
> > On closer examination the only such case is in ovs_dp_upcall().
> > Currently there should be no need to perform MPLS GSO segmentation in that
> > case because MPLS GSO segmentation can only be needed after actions are
> > applied.
> >
> > However, I am concerned that it may be necessary later when
> > recirculation is introduced as in that case an upcall may occur
> > on a packet which has had actions applied.
> 
> good point.
> 
> currently we define __skb_gso_segment using skb_gso_segemt(). You have
> reversed it. Is there any reason?
> if you keep it as it is, it can simplify code a bit.

Thanks. I believe that I wanted to use the real __skb_gso_segment()
to back the compat version of __skb_gso_segment(). I can't recall
specifically why and that change seems to be orthogonal to the
MPLS patches.

I have switched things around as you suggested and the code
looks much cleaner. Thanks for the suggestion.

^ permalink raw reply

* Re: [PATCH v3 net-next] fix unsafe set_memory_rw from softirq
From: Heiko Carstens @ 2013-10-04  6:47 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, Paul Mackerras, H. Peter Anvin, sparclinux,
	Nicolas Dichtel, linux-s390, Russell King, x86, James Morris,
	Ingo Molnar, Alexey Kuznetsov, Paul E. McKenney, Xi Wang,
	Matt Evans, Thomas Gleixner, linux-arm-kernel, Stelian Nirlu,
	Nicolas Schichan, Hideaki YOSHIFUJI, netdev, linux-kernel,
	David S. Miller, Mircea Gherzan, Daniel Borkmann,
	Martin Schwidefsky
In-Reply-To: <1380853446-30537-1-git-send-email-ast@plumgrid.com>

On Thu, Oct 03, 2013 at 07:24:06PM -0700, Alexei Starovoitov wrote:
> diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
> index 7092392..a5df511 100644
> --- a/arch/s390/net/bpf_jit_comp.c
> +++ b/arch/s390/net/bpf_jit_comp.c
> @@ -881,7 +881,9 @@ void bpf_jit_free(struct sk_filter *fp)
>  	struct bpf_binary_header *header = (void *)addr;
> 
>  	if (fp->bpf_func == sk_run_filter)
> -		return;
> +		goto free_filter;
>  	set_memory_rw(addr, header->pages);
>  	module_free(NULL, header);
> +free_filter:
> +	kfree(fp);
>  }

For the s390 part:

Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>

^ permalink raw reply

* Re: [PATCH net-next] tcp: rcvbuf autotuning improvements
From: Michael Dalton @ 2013-10-04  6:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Daniel Borkmann, davem, netdev, Francesco Fusco, ycheng,
	Neal Cardwell, Eric Northup
In-Reply-To: <1380805381.19002.206.camel@edumazet-glaptop.roam.corp.google.com>

Thanks Eric,

I believe this issue may be related to one that I encountered
recently - poor performance with MTU-sized packets in
virtio_net when mergeable receive buffers are enabled. Performance was
quite low relative to virtio_net where mergeable receive buffers are
disabled and MTU-sized packets are received. The issue can be reliably
reproduced via netperf TCP_STREAM when mergeable receive buffers is
enabled but GRO is disabled (to force MTU-sized packets on receive).

I found the root cause was the memory allocation strategy employed for
virtio_net -- when mergeable receive buffers are enabled, every
receive ring packet buffer is allocated using a full page via the page
allocator, so the SKB truesize is 4096 + skb header +
128 (GOOD_COPY_LEN). This means that there is >100% overhead
(true_size / number of bytes actually used to store packet data) for
 MTU-sized packets, impacting TCP.

The issue can be resolved by switching mergeable receive packet's
packet allocation to use netdev_alloc_frag(), allocating MTU-sized (or
slightly larger) buffers, and handling the rare edge case where the
number of frags exceeds SKB_MAX_FRAGS (occurs for extremely large
GRO'd packets and is permitted by the virtio specification) by using
the SKB frag list. I will update this thread with a patch when one is
ready, hopefully in the next few days. Thanks!

Best,

Mike

On Thu, Oct 3, 2013 at 6:03 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2013-10-03 at 09:56 +0200, Daniel Borkmann wrote:
>> This is a complementary patch for commit 6ae705323 ("tcp: sndbuf
>> autotuning improvements") that fixes a performance regression on
>> receiver side in setups with low to mid latency, high throughput,
>> and senders with TSO/GSO off (receivers w/ default settings).
>>
>> The following measurements in Mbit/s were done for 60sec w/ netperf
>> on virtio w/ TSO/GSO off:
>>
>> (ms)    1)              2)              3)
>>   0     2762.11         1150.32         2906.17
>>  10     1083.61          538.89         1091.03
>>  25      471.81          313.18          474.60
>>  50      242.33          187.84          242.36
>>  75      162.14          134.45          161.95
>> 100      121.55          101.96          121.49
>> 150       80.64           57.75           80.48
>> 200       58.97           54.11           59.90
>> 250       47.10           46.92           47.31
>>
>> Same setup w/ TSO/GSO on:
>>
>> (ms)    1)              2)              3)
>>   0     12225.91        12366.89        16514.37
>>  10      1526.64         1525.79         2176.63
>>  25       655.13          647.79          871.52
>>  50       338.51          377.88          439.46
>>  75       246.49          278.46          295.62
>> 100       210.93          207.56          217.34
>> 150       127.88          129.56          141.33
>> 200        94.95           94.50          107.29
>> 250        67.39           73.88           88.35
>>
>> Similarly as in 6ae705323, we fixed up power-of-two rounding and
>> took cached mss into account, thus bringing per_mss calculations
>> closer to each other, the rest stays as is.
>>
>> We also renamed tcp_fixup_rcvbuf() to tcp_rcvbuf_expand() to be
>> consistent with tcp_sndbuf_expand().
>>
>> While we do think that 6ae705323b71 is the right way to go, also
>> this follow-up seems necessary to restore performance for
>> receivers.
>
> Hmm, I think you based this patch on some virtio requirements.
>
> I would rather fix virtio, because virtio has poor truesize/payload
> ratio.
>
> Michael Dalton is working on this right now.
>
> Really I don't understand how 'fixing' initial rcvbuf could explain such
> difference in a 60 second transfert.
>
> Normally, if autotuning was working, the first sk_rcvbuf value would
> only matter in the very beginning of a flow (maybe one, two or even
> three RTT)
>
> It looks like you only need to set sk_rcvbuf to tcp_rmem[2],
> so you probably have to fix the autotuning, or virtio to give normal
> skbs, not fat ones ;)
>
>
> Thanks
>
>

^ permalink raw reply

* LOAN OFFER
From: 005 @ 2013-10-04  7:02 UTC (permalink / raw)
  To: Recipients


 Do You need  Business Loan or Personal if Yes?Name,Amount,Country,Loan Durations,Phone Number Email;changloanoffer@vip.se.com

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox