Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 net-next 4/6] net: ethernet: ti: cpsw: add CBS Qdisc offload
From: Ivan Khoronzhuk @ 2018-06-15 18:15 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: grygorii.strashko, davem, corbet, akpm, netdev, linux-doc,
	linux-kernel, linux-omap, vinicius.gomes, henrik,
	jesus.sanchez-palencia, p-varis, spatton, francois.ozog, yogeshs,
	nsekhar, andrew
In-Reply-To: <20180614080902.GA8377@apalos>

On Thu, Jun 14, 2018 at 11:09:02AM +0300, Ilias Apalodimas wrote:
[...]

>> +				 "Speed was changed, CBS sahper speeds are changed!");
>typo here, should be shaper

Corrected in v3

-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply

* Re: [PATCH iproute2 v3] ipaddress: strengthen check on 'label' input
From: Stephen Hemminger @ 2018-06-15 18:19 UTC (permalink / raw)
  To: Patrick Talbert; +Cc: netdev
In-Reply-To: <1528984017-19490-1-git-send-email-ptalbert@redhat.com>

On Thu, 14 Jun 2018 15:46:57 +0200
Patrick Talbert <ptalbert@redhat.com> wrote:

> As mentioned in the ip-address man page, an address label must
> be equal to the device name or prefixed by the device name
> followed by a colon. Currently the only check on this input is
> to see if the device name appears at the beginning of the label
> string.
> 
> This commit adds an additional check to ensure label == dev or
> continues with a colon.
> 
> Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
> Suggested-by: Stephen Hemminger <stephen@networkplumber.org>

Sure applied

^ permalink raw reply

* Re: [iproute2 1/1] rdma: sync some IP headers with glibc
From: Stephen Hemminger @ 2018-06-15 18:19 UTC (permalink / raw)
  To: Hoang Le; +Cc: jon.maloy, maloy, ying.xue, netdev, tipc-discussion
In-Reply-To: <1528862996-7045-1-git-send-email-hoang.h.le@dektech.com.au>

On Wed, 13 Jun 2018 11:09:56 +0700
Hoang Le <hoang.h.le@dektech.com.au> wrote:

> In the commit 9a362cc71a45, new userspace header:
>   (i.e rdma/rdma_user_cm.h -> linux/in6.h)
> is included before the kernel space header:
>   (i.e utils.h -> resolv.h -> netinet/in.h).
> 
> This leads to unsynchronous some IP headers and compiler got failure
> with error: redefinition of some structs IP.
> 
> In this commit, just reorder this including to make them in-sync.
> 
> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>

Sure applied

^ permalink raw reply

* KASAN: use-after-free Write in free_htab_elem
From: syzbot @ 2018-06-15 18:40 UTC (permalink / raw)
  To: ast, daniel, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    f0dc7f9c6dd9 Merge git://git.kernel.org/pub/scm/linux/kern..
git tree:       bpf-next
console output: https://syzkaller.appspot.com/x/log.txt?x=11dad428400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=fa9c20c48788d1c1
dashboard link: https://syzkaller.appspot.com/bug?extid=ce67d3e4fa77eedee964
compiler:       gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+ce67d3e4fa77eedee964@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in atomic_dec  
include/asm-generic/atomic-instrumented.h:114 [inline]
BUG: KASAN: use-after-free in free_htab_elem+0x23/0x40  
kernel/bpf/sockmap.c:224
Write of size 4 at addr ffff8801b3dce648 by task syz-executor1/8114

CPU: 0 PID: 8114 Comm: syz-executor1 Not tainted 4.17.0+ #39
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
  print_address_description+0x6c/0x20b mm/kasan/report.c:256
  kasan_report_error mm/kasan/report.c:354 [inline]
  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
  check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
  kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278
  atomic_dec include/asm-generic/atomic-instrumented.h:114 [inline]
  free_htab_elem+0x23/0x40 kernel/bpf/sockmap.c:224
  bpf_tcp_close+0x8c1/0xf80 kernel/bpf/sockmap.c:273
  inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427
  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:459
  __sock_release+0xd7/0x260 net/socket.c:603
  sock_close+0x19/0x20 net/socket.c:1186
  __fput+0x353/0x890 fs/file_table.c:209
  ____fput+0x15/0x20 fs/file_table.c:243
  task_work_run+0x1e4/0x290 kernel/task_work.c:113
  exit_task_work include/linux/task_work.h:22 [inline]
  do_exit+0x1aee/0x2730 kernel/exit.c:865
  do_group_exit+0x16f/0x430 kernel/exit.c:968
  get_signal+0x886/0x1960 kernel/signal.c:2468
  do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
  exit_to_usermode_loop+0x2cf/0x360 arch/x86/entry/common.c:162
  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
  do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:293
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x455b29
Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f3bbf323ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 000000000072bf78 RCX: 0000000000455b29
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bf78
RBP: 000000000072bf78 R08: 0000000000000000 R09: 000000000072bf50
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd7addc2cf R14: 00007f3bbf3249c0 R15: 0000000000000001

Allocated by task 8104:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
  kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
  kmalloc include/linux/slab.h:513 [inline]
  kzalloc include/linux/slab.h:706 [inline]
  sock_hash_alloc+0x20d/0x6a0 kernel/bpf/sockmap.c:2003
  find_and_alloc_map kernel/bpf/syscall.c:129 [inline]
  map_create+0x393/0x1010 kernel/bpf/syscall.c:453
  __do_sys_bpf kernel/bpf/syscall.c:2351 [inline]
  __se_sys_bpf kernel/bpf/syscall.c:2328 [inline]
  __x64_sys_bpf+0x303/0x510 kernel/bpf/syscall.c:2328
  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 2131:
  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
  set_track mm/kasan/kasan.c:460 [inline]
  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
  __cache_free mm/slab.c:3498 [inline]
  kfree+0xd9/0x260 mm/slab.c:3813
  sock_hash_free+0x51c/0x6e0 kernel/bpf/sockmap.c:2098
  bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
  process_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
  worker_thread+0x181/0x13a0 kernel/workqueue.c:2296
  kthread+0x345/0x410 kernel/kthread.c:240
  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

The buggy address belongs to the object at ffff8801b3dce540
  which belongs to the cache kmalloc-512 of size 512
The buggy address is located 264 bytes inside of
  512-byte region [ffff8801b3dce540, ffff8801b3dce740)
The buggy address belongs to the page:
page:ffffea0006cf7380 count:1 mapcount:0 mapping:ffff8801da800940 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea0006caccc8 ffffea0006f57b08 ffff8801da800940
raw: 0000000000000000 ffff8801b3dce040 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8801b3dce500: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
  ffff8801b3dce580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff8801b3dce600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                               ^
  ffff8801b3dce680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8801b3dce700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.

^ permalink raw reply

* [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Mathieu Malaterre @ 2018-06-15 18:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: Mathieu Malaterre, Eric Dumazet, linux-kernel, netdev

This reverts commit 88078d98d1bb085d72af8437707279e203524fa5.

It causes regressions for people using chips driven by the sungem
driver. Suspicion is that the skb->csum value isn't being adjusted
properly.

Symptoms as seen on G4+sungem are:

[   34.023281] eth0: hw csum failure
[   34.023438] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #2
[   34.023618] Call Trace:
[   34.023707] [dffedbd0] [c069ddac] __skb_checksum_complete+0xf0/0x108 (unreliable)
[   34.023948] [dffedbf0] [c0777a70] tcp_v4_rcv+0x604/0xe00
[   34.024118] [dffedc70] [c0731624] ip_local_deliver_finish+0xa8/0x3c4
[   34.024315] [dffedcb0] [c0732430] ip_local_deliver+0xf0/0x154
[   34.024493] [dffedcf0] [c07328dc] ip_rcv+0x448/0x774
[   34.024653] [dffedd50] [c06aeae0] __netif_receive_skb_core+0x5e8/0x1184
[   34.024857] [dffedde0] [c06bba20] napi_gro_receive+0x160/0x22c
[   34.025044] [dffede10] [e14b2590] gem_poll+0x7fc/0x1ac0 [sungem]
[   34.025228] [dffedee0] [c06bacf0] net_rx_action+0x34c/0x618
[   34.025402] [dffedf60] [c07fd27c] __do_softirq+0x16c/0x5f0
[   34.025575] [dffedfd0] [c0064c7c] irq_exit+0x110/0x1a8
[   34.025738] [dffedff0] [c0016170] call_do_irq+0x24/0x3c
[   34.025903] [c0cf7e80] [c0009a84] do_IRQ+0x98/0x1a0
[   34.026055] [c0cf7eb0] [c001b474] ret_from_except+0x0/0x14
[   34.026225] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
                   LR = arch_cpu_idle+0x30/0x78
[   34.026510] [c0cf7f70] [c0cf6000] 0xc0cf6000 (unreliable)
[   34.026682] [c0cf7f80] [c00a3868] do_idle+0xc4/0x158
[   34.026835] [c0cf7fb0] [c00a3ab0] cpu_startup_entry+0x20/0x28
[   34.027013] [c0cf7fc0] [c0998820] start_kernel+0x47c/0x490
[   34.027181] [c0cf7ff0] [00003444] 0x3444

See commit 7ce5a27f2ef8 ("Revert "net: Handle CHECKSUM_COMPLETE more
adequately in pskb_trim_rcsum()."") for previous reference.

Link: https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-June/174444.html
Reported-by: Meelis Roos <mroos@linux.ee>
Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Cc: Eric Dumazet <edumazet@google.com>
---
 include/linux/skbuff.h |  5 +++--
 net/core/skbuff.c      | 14 --------------
 2 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c86885954994..cbc753a3e41c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3134,7 +3134,6 @@ static inline void *skb_push_rcsum(struct sk_buff *skb, unsigned int len)
 	return skb->data;
 }
 
-int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len);
 /**
  *	pskb_trim_rcsum - trim received skb and update checksum
  *	@skb: buffer to trim
@@ -3148,7 +3147,9 @@ static inline int pskb_trim_rcsum(struct sk_buff *skb, unsigned int len)
 {
 	if (likely(len >= skb->len))
 		return 0;
-	return pskb_trim_rcsum_slow(skb, len);
+	if (skb->ip_summed == CHECKSUM_COMPLETE)
+		skb->ip_summed = CHECKSUM_NONE;
+	return __pskb_trim(skb, len);
 }
 
 static inline int __skb_trim_rcsum(struct sk_buff *skb, unsigned int len)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c642304f178c..360293d1baf3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1840,20 +1840,6 @@ int ___pskb_trim(struct sk_buff *skb, unsigned int len)
 }
 EXPORT_SYMBOL(___pskb_trim);
 
-/* Note : use pskb_trim_rcsum() instead of calling this directly
- */
-int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
-{
-	if (skb->ip_summed == CHECKSUM_COMPLETE) {
-		int delta = skb->len - len;
-
-		skb->csum = csum_sub(skb->csum,
-				     skb_checksum(skb, len, delta, 0));
-	}
-	return __pskb_trim(skb, len);
-}
-EXPORT_SYMBOL(pskb_trim_rcsum_slow);
-
 /**
  *	__pskb_pull_tail - advance tail of skb header
  *	@skb: buffer to reallocate
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH] Revert "net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends"
From: Eric Dumazet @ 2018-06-15 19:14 UTC (permalink / raw)
  To: Mathieu Malaterre, David S. Miller; +Cc: Eric Dumazet, linux-kernel, netdev
In-Reply-To: <20180615185645.8921-1-malat@debian.org>



On 06/15/2018 11:56 AM, Mathieu Malaterre wrote:
> This reverts commit 88078d98d1bb085d72af8437707279e203524fa5.
> 
> It causes regressions for people using chips driven by the sungem
> driver. Suspicion is that the skb->csum value isn't being adjusted
> properly.
> 
> Symptoms as seen on G4+sungem are:
> 
> [   34.023281] eth0: hw csum failure
> [   34.023438] CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0+ #2
> [   34.023618] Call Trace:
> [   34.023707] [dffedbd0] [c069ddac] __skb_checksum_complete+0xf0/0x108 (unreliable)
> [   34.023948] [dffedbf0] [c0777a70] tcp_v4_rcv+0x604/0xe00
> [   34.024118] [dffedc70] [c0731624] ip_local_deliver_finish+0xa8/0x3c4
> [   34.024315] [dffedcb0] [c0732430] ip_local_deliver+0xf0/0x154
> [   34.024493] [dffedcf0] [c07328dc] ip_rcv+0x448/0x774
> [   34.024653] [dffedd50] [c06aeae0] __netif_receive_skb_core+0x5e8/0x1184
> [   34.024857] [dffedde0] [c06bba20] napi_gro_receive+0x160/0x22c
> [   34.025044] [dffede10] [e14b2590] gem_poll+0x7fc/0x1ac0 [sungem]
> [   34.025228] [dffedee0] [c06bacf0] net_rx_action+0x34c/0x618
> [   34.025402] [dffedf60] [c07fd27c] __do_softirq+0x16c/0x5f0
> [   34.025575] [dffedfd0] [c0064c7c] irq_exit+0x110/0x1a8
> [   34.025738] [dffedff0] [c0016170] call_do_irq+0x24/0x3c
> [   34.025903] [c0cf7e80] [c0009a84] do_IRQ+0x98/0x1a0
> [   34.026055] [c0cf7eb0] [c001b474] ret_from_except+0x0/0x14
> [   34.026225] --- interrupt: 501 at arch_cpu_idle+0x30/0x78
>                    LR = arch_cpu_idle+0x30/0x78
> [   34.026510] [c0cf7f70] [c0cf6000] 0xc0cf6000 (unreliable)
> [   34.026682] [c0cf7f80] [c00a3868] do_idle+0xc4/0x158
> [   34.026835] [c0cf7fb0] [c00a3ab0] cpu_startup_entry+0x20/0x28
> [   34.027013] [c0cf7fc0] [c0998820] start_kernel+0x47c/0x490
> [   34.027181] [c0cf7ff0] [00003444] 0x3444
> 
> See commit 7ce5a27f2ef8 ("Revert "net: Handle CHECKSUM_COMPLETE more
> adequately in pskb_trim_rcsum()."") for previous reference.

This fix seems to hide a bug in csum functions on this architecture.

Or a bug on this NIC when receiving a small packet (less than 60 bytes).
Maybe the padding bytes are not included in NIC provided csum, and not 0.

^ permalink raw reply

* RE: [PATCH 2/3] net: phy: vitesse: Add support for VSC73xx
From: Woojung.Huh @ 2018-06-15 19:24 UTC (permalink / raw)
  To: f.fainelli, linus.walleij, andrew, vivien.didelot, UNGLinuxDriver
  Cc: netdev, openwrt-devel, lede-dev, juhosg
In-Reply-To: <87156f76-449c-1ec3-e7fa-776c2fddc992@gmail.com>

Hi Florian,

> On 06/14/2018 05:35 AM, Linus Walleij wrote:
> > The VSC7385, VSC7388, VSC7395 and VSC7398 are integrated
> > switch/router chips for 5+1 or 8-port switches/routers. When
> > managed directly by Linux using DSA we need to do a special
> > set-up "dance" on the PHY. Unfortunately these sequences
> > switches the PHY to undocumented pages named 2a30 and 52b6
> > and does undocumented things. It is described by these opaque
> > sequences also in the reference manual. This is a best
> > effort to integrate it anyways.
> >
> > Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> 
> Probably as good as it can get given the information you have access to.
> Maybe the guys at Mircochip could help, adding them.

Microchip have completed the acquisition of Microsemi last months. It will take some time to get access to the right data. 
Hope we can help soon.

Thanks.
Woojung

^ permalink raw reply

* Re: [PATCH bpf 0/2] Two bpf fixes
From: Alexei Starovoitov @ 2018-06-15 19:31 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: ast, netdev
In-Reply-To: <20180615003048.3219-1-daniel@iogearbox.net>

On Fri, Jun 15, 2018 at 02:30:46AM +0200, Daniel Borkmann wrote:
> First one is a panic I ran into while testing the second
> one where we got several syzkaller reports. Series here
> fixes both.
> 
> Thanks!

Applied, thanks.

The second patch looks dubious to me though.
Nothing in the kernel tree checks the return value of set_memory_ro()
and my understanding that it can fail only when part of huge page
is being marked and pages have to be split. In bpf case I don't think
it's ever the case, so the patch is silencing purely theoretical
syzbot splat that can happen with artificial error injection.
I bet we're still going to see this splat in set_memory_rw.
imo the better fix would have been to drop WARN_ON from both.

^ permalink raw reply

* IMMEDIATE  REPLY.
From: Isa Zongo @ 2018-06-15 19:48 UTC (permalink / raw)

In-Reply-To: <86419544.380312.1529092097511.ref@mail.yahoo.com>

DEAR  FRIEND,

  I know that this letter will come to you as surprise, I got your contact address while I was searching for foreign partner to champion this golden appoint unity that is present in our favor, My name is Mr. Isa Zongo, I am the Bill and Exchange (assistant) Manager CORIS BANK INTERNATIONAL. I'm proposing to lift in your name (US$10.5 Million Dollars) that belong to our later customer, Mr Kurt Kuhle from Alexandra Egypt who died along with his family in Siber airline that crashed into sea  at Isreal on 4th October 2001.

I want to present you to my bank here as the beneficiary to this fund, Am waiting for your response for more details, As you are willing to execute this business appoint unity with me.

Thanks,
Yours Sincerely,
Mr. Isa Zongo.

^ permalink raw reply

* Re: [PATCH 07/17] net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
From: David Woodhouse @ 2018-06-15 20:00 UTC (permalink / raw)
  To: Eric Dumazet, Elena Reshetova, netdev
  Cc: Krzysztof Mazur, Kevin Darbyshire-Bryant, 3chas3, Mathias Kresin
In-Reply-To: <1529070283.27158.46.camel@infradead.org>

On Fri, 2018-06-15 at 14:44 +0100, David Woodhouse wrote:
> 
> > Or simply use a new field in ATM_SKB(skb) to remember a stable
> > truesize used in both sides (add/sub)
> 
> Right, that was my second suggestion ("copy the accounted value...").
> 
> It's a bit of a hack, and I think that actually *using* sock_wfree()
> instead of what's currently in atm_pop_raw() would be the better
> solution. Does anyone remember why we didn't do that in the first
> place?

That does end up being quite hairy. I don't think it's worth doing.

This should probably suffice to fix it...

Kevin this is going to conflict with the ifx_atm_alloc_skb() hack in
the tree you're working on, but that needs to be killed with fire
anyway. It's utterly pointless as discussed. 



>From 3368eaeb0a2f09138894dde0f26f879e5228005a Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw2@infradead.org>
Date: Fri, 15 Jun 2018 20:49:20 +0100
Subject: [PATCH] atm: Preserve value of skb->truesize when accounting to vcc

There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
for certain skbs. Ideally it would cover ATM too. It doesn't. Just
stashing the accounted value and using it in atm_raw_pop() is probably
the easiest way to cope.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
---
 include/linux/atmdev.h | 15 +++++++++++++++
 net/atm/br2684.c       |  3 +--
 net/atm/clip.c         |  3 +--
 net/atm/common.c       |  3 +--
 net/atm/lec.c          |  3 +--
 net/atm/mpc.c          |  3 +--
 net/atm/pppoatm.c      |  3 +--
 net/atm/raw.c          |  4 ++--
 8 files changed, 23 insertions(+), 14 deletions(-)

diff --git a/include/linux/atmdev.h b/include/linux/atmdev.h
index 0c27515d2cf6..8124815eb121 100644
--- a/include/linux/atmdev.h
+++ b/include/linux/atmdev.h
@@ -214,6 +214,7 @@ struct atmphy_ops {
 struct atm_skb_data {
 	struct atm_vcc	*vcc;		/* ATM VCC */
 	unsigned long	atm_options;	/* ATM layer options */
+	unsigned int	acct_truesize;  /* truesize accounted to vcc */
 };
 
 #define VCC_HTABLE_SIZE 32
@@ -241,6 +242,20 @@ void vcc_insert_socket(struct sock *sk);
 
 void atm_dev_release_vccs(struct atm_dev *dev);
 
+static inline void atm_account_tx(struct atm_vcc *vcc, struct sk_buff *skb)
+{
+	/*
+	 * Because ATM skbs may not belong to a sock (and we don't
+	 * necessarily want to), skb->truesize may be adjusted,
+	 * escaping the hack in pskb_expand_head() which avoids
+	 * doing so for some cases. So stash the value of truesize
+	 * at the time we accounted it, and atm_pop_raw() can use
+	 * that value later, in case it changes.
+	 */
+	refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
+	ATM_SKB(skb)->acct_truesize = skb->truesize;
+	ATM_SKB(skb)->atm_options = vcc->atm_options;
+}
 
 static inline void atm_force_charge(struct atm_vcc *vcc,int truesize)
 {
diff --git a/net/atm/br2684.c b/net/atm/br2684.c
index 4e111196f902..bc21f8e8daf2 100644
--- a/net/atm/br2684.c
+++ b/net/atm/br2684.c
@@ -252,8 +252,7 @@ static int br2684_xmit_vcc(struct sk_buff *skb, struct net_device *dev,
 
 	ATM_SKB(skb)->vcc = atmvcc = brvcc->atmvcc;
 	pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n", skb, atmvcc, atmvcc->dev);
-	refcount_add(skb->truesize, &sk_atm(atmvcc)->sk_wmem_alloc);
-	ATM_SKB(skb)->atm_options = atmvcc->atm_options;
+	atm_account_tx(atmvcc, skb);
 	dev->stats.tx_packets++;
 	dev->stats.tx_bytes += skb->len;
 
diff --git a/net/atm/clip.c b/net/atm/clip.c
index 65f706e4344c..60920a42f640 100644
--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -381,8 +381,7 @@ static netdev_tx_t clip_start_xmit(struct sk_buff *skb,
 		memcpy(here, llc_oui, sizeof(llc_oui));
 		((__be16 *) here)[3] = skb->protocol;
 	}
-	refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
-	ATM_SKB(skb)->atm_options = vcc->atm_options;
+	atm_account_tx(vcc, skb);
 	entry->vccs->last_use = jiffies;
 	pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n", skb, vcc, vcc->dev);
 	old = xchg(&entry->vccs->xoff, 1);	/* assume XOFF ... */
diff --git a/net/atm/common.c b/net/atm/common.c
index 8a4f99114cd2..9e812c782a37 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -630,10 +630,9 @@ int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t size)
 		goto out;
 	}
 	pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
-	refcount_add(skb->truesize, &sk->sk_wmem_alloc);
+	atm_account_tx(vcc, skb);
 
 	skb->dev = NULL; /* for paths shared with net_device interfaces */
-	ATM_SKB(skb)->atm_options = vcc->atm_options;
 	if (!copy_from_iter_full(skb_put(skb, size), size, &m->msg_iter)) {
 		kfree_skb(skb);
 		error = -EFAULT;
diff --git a/net/atm/lec.c b/net/atm/lec.c
index a3d93a1bb133..d7cc165e24e0 100644
--- a/net/atm/lec.c
+++ b/net/atm/lec.c
@@ -179,9 +179,8 @@ lec_send(struct atm_vcc *vcc, struct sk_buff *skb)
 	struct net_device *dev = skb->dev;
 
 	ATM_SKB(skb)->vcc = vcc;
-	ATM_SKB(skb)->atm_options = vcc->atm_options;
+	atm_account_tx(vcc, skb);
 
-	refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
 	if (vcc->send(vcc, skb) < 0) {
 		dev->stats.tx_dropped++;
 		return;
diff --git a/net/atm/mpc.c b/net/atm/mpc.c
index 5677147209e8..db9a1838687c 100644
--- a/net/atm/mpc.c
+++ b/net/atm/mpc.c
@@ -555,8 +555,7 @@ static int send_via_shortcut(struct sk_buff *skb, struct mpoa_client *mpc)
 					sizeof(struct llc_snap_hdr));
 	}
 
-	refcount_add(skb->truesize, &sk_atm(entry->shortcut)->sk_wmem_alloc);
-	ATM_SKB(skb)->atm_options = entry->shortcut->atm_options;
+	atm_account_tx(entry->shortcut, skb);
 	entry->shortcut->send(entry->shortcut, skb);
 	entry->packets_fwded++;
 	mpc->in_ops->put(entry);
diff --git a/net/atm/pppoatm.c b/net/atm/pppoatm.c
index 21d9d341a619..af8c4b38b746 100644
--- a/net/atm/pppoatm.c
+++ b/net/atm/pppoatm.c
@@ -350,8 +350,7 @@ static int pppoatm_send(struct ppp_channel *chan, struct sk_buff *skb)
 		return 1;
 	}
 
-	refcount_add(skb->truesize, &sk_atm(ATM_SKB(skb)->vcc)->sk_wmem_alloc);
-	ATM_SKB(skb)->atm_options = ATM_SKB(skb)->vcc->atm_options;
+	atm_account_tx(vcc, skb);
 	pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n",
 		 skb, ATM_SKB(skb)->vcc, ATM_SKB(skb)->vcc->dev);
 	ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
diff --git a/net/atm/raw.c b/net/atm/raw.c
index ee10e8d46185..b3ba44aab0ee 100644
--- a/net/atm/raw.c
+++ b/net/atm/raw.c
@@ -35,8 +35,8 @@ static void atm_pop_raw(struct atm_vcc *vcc, struct sk_buff *skb)
 	struct sock *sk = sk_atm(vcc);
 
 	pr_debug("(%d) %d -= %d\n",
-		 vcc->vci, sk_wmem_alloc_get(sk), skb->truesize);
-	WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
+		 vcc->vci, sk_wmem_alloc_get(sk), ATM_SKB(skb)->acct_truesize);
+	WARN_ON(refcount_sub_and_test(ATM_SKB(skb)->acct_truesize, &sk->sk_wmem_alloc));
 	dev_kfree_skb_any(skb);
 	sk->sk_write_space(sk);
 }
-- 
2.17.0

-- 
dwmw2

^ permalink raw reply related

* Re: [PATCH net-next,RFC 00/13] New fast forwarding path
From: Tom Herbert @ 2018-06-15 20:12 UTC (permalink / raw)
  To: David Miller
  Cc: Pablo Neira Ayuso, netfilter-devel,
	Linux Kernel Network Developers, Steffen Klassert
In-Reply-To: <20180614.165834.338565136334574983.davem@davemloft.net>

On Thu, Jun 14, 2018 at 4:58 PM, David Miller <davem@davemloft.net> wrote:
> From: Tom Herbert <tom@herbertland.com>
> Date: Thu, 14 Jun 2018 13:52:03 -0700
>
>> IIRC, there was a similar proposal a while back that want to bundle
>> packets of the same flow together (without doing GRO) so that they
>> could be processed by various functions by looking at just one
>> representative packet in the group. The concept had some promise, but
>> in the end it created quite a bit of complexity since at some point
>> the packet bundle needed to be undone to go back to processing the
>> individual packets.
>
> You're probably talking about Edward Cree's SKB list stuff, and as
> per his presenation at netconf 2 weeks ago he plans to revitalize
> it given how Spectre et al. gives cause to reevaluate all bulking
> techniques.nearly

The use case for that will be an interesting question. GSO/GRO solves
the problem for TCP and this extends to nearly all cases where TCP is
in an encapsulated packet. Super efficient forwarding can be done in
XDP/BPF (without needing overhead of GSO/GRO). That pretty much leaves
UDP as non-encapsulation end protocol, which I guess these days pretty
much means QUIC :-) I am still interested to see if we can implement
GSO/GRO for QUIC (via a generic GSO/GRO BPF function so we don't
hardcode any QUIC protocol or other application protocols in kernel).

Tom

^ permalink raw reply

* ethtool 4.17 released
From: John W. Linville @ 2018-06-15 20:00 UTC (permalink / raw)
  To: netdev

ethtool version 4.17 has been released.

Home page: https://www.kernel.org/pub/software/network/ethtool/
Download link:
https://www.kernel.org/pub/software/network/ethtool/ethtool-4.17.tar.xz

Release notes:

	* Fix: In ethtool.8, remove superfluous and incorrect \c.
	* Fix: fix uninitialized return value
	* Fix: fix RING_VF assignment
	* Fix: remove unused global variable
	* Fix: several fixes in do_gregs()
	* Fix: correctly free hkey when get_stringset() fails
	* Fix: remove unreachable code
	* Fix: fix stack clash in do_get_phy_tunable and do_set_phy_tunable
	* Feature: Add register dump support for MICROCHIP LAN78xx

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply

* Re: WARNING: kmalloc bug in memdup_user (3)
From: Daniel Borkmann @ 2018-06-15 20:24 UTC (permalink / raw)
  To: syzbot, ast, linux-kernel, netdev, syzkaller-bugs
In-Reply-To: <0000000000005ff424056c5476bf@google.com>

On 05/16/2018 05:35 PM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    c5c7d7f3c451 Merge branch 'bpf-sock-hashmap'
> git tree:       bpf-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1626ae37800000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=10c4dc62055b68f5
> dashboard link: https://syzkaller.appspot.com/bug?extid=0f92a17b0706231d0a09
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?x=126a5197800000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1598c477800000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+0f92a17b0706231d0a09@syzkaller.appspotmail.com

#syz fix: bpf: fix sock hashmap kmalloc warning

^ permalink raw reply

* Re: [PATCH 0/2] leds: drop led_trigger_rename_static()
From: Jacek Anaszewski @ 2018-06-15 20:49 UTC (permalink / raw)
  To: Uwe Kleine-König, Pavel Machek, Wolfgang Grandegger,
	Marc Kleine-Budde, David S. Miller, netdev
  Cc: linux-leds, linux-can, kernel
In-Reply-To: <20180518085333.26187-1-u.kleine-koenig@pengutronix.de>

Hi,

I need can or net maintainer's ack for merging this
set via LED tree. It's been awaiting feedback for a few weeks
now and it is blocking another set depending on it.


On 05/18/2018 10:53 AM, Uwe Kleine-König wrote:
> Hello,
> 
> initially I prepared a patch to fix the broken things in
> led_trigger_rename_static(), but given that there is only the
> can-trigger driver that makes use of this function and the netdev
> trigger implements a super set of the can-trigger, removing both the
> can-trigger and the broken function seems sensible.
> 
> Best regards
> Uwe
> 
> Uwe Kleine-König (2):
>    can: drop led trigger support
>    leds: remove unused function led_trigger_rename_static()
> 
>   drivers/leds/led-triggers.c           |  13 ---
>   drivers/net/can/Kconfig               |  11 --
>   drivers/net/can/Makefile              |   2 -
>   drivers/net/can/at91_can.c            |  10 --
>   drivers/net/can/c_can/c_can.c         |  11 --
>   drivers/net/can/dev.c                 |   5 -
>   drivers/net/can/flexcan.c             |   8 --
>   drivers/net/can/ifi_canfd/ifi_canfd.c |   9 --
>   drivers/net/can/led.c                 | 143 --------------------------
>   drivers/net/can/m_can/m_can.c         |   9 --
>   drivers/net/can/rcar/rcar_can.c       |   8 --
>   drivers/net/can/rcar/rcar_canfd.c     |   7 --
>   drivers/net/can/rx-offload.c          |   2 -
>   drivers/net/can/sja1000/sja1000.c     |  15 +--
>   drivers/net/can/spi/hi311x.c          |   8 --
>   drivers/net/can/spi/mcp251x.c         |  10 --
>   drivers/net/can/sun4i_can.c           |   7 --
>   drivers/net/can/ti_hecc.c             |   9 --
>   drivers/net/can/usb/mcba_usb.c        |   7 --
>   drivers/net/can/usb/usb_8dev.c        |  11 --
>   drivers/net/can/xilinx_can.c          |   9 --
>   include/linux/can/dev.h               |  10 --
>   include/linux/can/led.h               |  54 ----------
>   include/linux/leds.h                  |  18 ----
>   24 files changed, 1 insertion(+), 395 deletions(-)
>   delete mode 100644 drivers/net/can/led.c
>   delete mode 100644 include/linux/can/led.h
> 

-- 
Best regards,
Jacek Anaszewski

^ permalink raw reply

* Re: [PATCH 07/17] net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
From: Kevin Darbyshire-Bryant @ 2018-06-15 20:49 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Eric Dumazet, Elena Reshetova, netdev@vger.kernel.org,
	Krzysztof Mazur, 3chas3@gmail.com, Mathias Kresin
In-Reply-To: <00520334446ffa4671513bb42ebeeecfab4107e7.camel@infradead.org>

[-- Attachment #1: Type: text/plain, Size: 7853 bytes --]



> On 15 Jun 2018, at 21:00, David Woodhouse <dwmw2@infradead.org> wrote:
> 
> On Fri, 2018-06-15 at 14:44 +0100, David Woodhouse wrote:
>> 
>>> Or simply use a new field in ATM_SKB(skb) to remember a stable
>>> truesize used in both sides (add/sub)
>> 
>> Right, that was my second suggestion ("copy the accounted value...").
>> 
>> It's a bit of a hack, and I think that actually *using* sock_wfree()
>> instead of what's currently in atm_pop_raw() would be the better
>> solution. Does anyone remember why we didn't do that in the first
>> place?
> 
> That does end up being quite hairy. I don't think it's worth doing.
> 
> This should probably suffice to fix it...
> 
> Kevin this is going to conflict with the ifx_atm_alloc_skb() hack in
> the tree you're working on, but that needs to be killed with fire
> anyway. It's utterly pointless as discussed.

I had already done so as part of the last pastebin debug info round :-)

As regards your patch… MAGIC!  Works an absolute treat.  Will get that submitted along with the ‘nuke ifx_atm_alloc_skb’ patch to OpenWrt tomorrow.  For now, maybe my brain will let me sleep :-)

Thank you soooooo much for your help & patience.

Tested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>

> 
> 
> From 3368eaeb0a2f09138894dde0f26f879e5228005a Mon Sep 17 00:00:00 2001
> From: David Woodhouse <dwmw2@infradead.org>
> Date: Fri, 15 Jun 2018 20:49:20 +0100
> Subject: [PATCH] atm: Preserve value of skb->truesize when accounting to vcc
> 
> There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
> for certain skbs. Ideally it would cover ATM too. It doesn't. Just
> stashing the accounted value and using it in atm_raw_pop() is probably
> the easiest way to cope.
> 
> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
> ---
> include/linux/atmdev.h | 15 +++++++++++++++
> net/atm/br2684.c       |  3 +--
> net/atm/clip.c         |  3 +--
> net/atm/common.c       |  3 +--
> net/atm/lec.c          |  3 +--
> net/atm/mpc.c          |  3 +--
> net/atm/pppoatm.c      |  3 +--
> net/atm/raw.c          |  4 ++--
> 8 files changed, 23 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/atmdev.h b/include/linux/atmdev.h
> index 0c27515d2cf6..8124815eb121 100644
> --- a/include/linux/atmdev.h
> +++ b/include/linux/atmdev.h
> @@ -214,6 +214,7 @@ struct atmphy_ops {
> struct atm_skb_data {
> 	struct atm_vcc	*vcc;		/* ATM VCC */
> 	unsigned long	atm_options;	/* ATM layer options */
> +	unsigned int	acct_truesize;  /* truesize accounted to vcc */
> };
> 
> #define VCC_HTABLE_SIZE 32
> @@ -241,6 +242,20 @@ void vcc_insert_socket(struct sock *sk);
> 
> void atm_dev_release_vccs(struct atm_dev *dev);
> 
> +static inline void atm_account_tx(struct atm_vcc *vcc, struct sk_buff *skb)
> +{
> +	/*
> +	 * Because ATM skbs may not belong to a sock (and we don't
> +	 * necessarily want to), skb->truesize may be adjusted,
> +	 * escaping the hack in pskb_expand_head() which avoids
> +	 * doing so for some cases. So stash the value of truesize
> +	 * at the time we accounted it, and atm_pop_raw() can use
> +	 * that value later, in case it changes.
> +	 */
> +	refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
> +	ATM_SKB(skb)->acct_truesize = skb->truesize;
> +	ATM_SKB(skb)->atm_options = vcc->atm_options;
> +}
> 
> static inline void atm_force_charge(struct atm_vcc *vcc,int truesize)
> {
> diff --git a/net/atm/br2684.c b/net/atm/br2684.c
> index 4e111196f902..bc21f8e8daf2 100644
> --- a/net/atm/br2684.c
> +++ b/net/atm/br2684.c
> @@ -252,8 +252,7 @@ static int br2684_xmit_vcc(struct sk_buff *skb, struct net_device *dev,
> 
> 	ATM_SKB(skb)->vcc = atmvcc = brvcc->atmvcc;
> 	pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n", skb, atmvcc, atmvcc->dev);
> -	refcount_add(skb->truesize, &sk_atm(atmvcc)->sk_wmem_alloc);
> -	ATM_SKB(skb)->atm_options = atmvcc->atm_options;
> +	atm_account_tx(atmvcc, skb);
> 	dev->stats.tx_packets++;
> 	dev->stats.tx_bytes += skb->len;
> 
> diff --git a/net/atm/clip.c b/net/atm/clip.c
> index 65f706e4344c..60920a42f640 100644
> --- a/net/atm/clip.c
> +++ b/net/atm/clip.c
> @@ -381,8 +381,7 @@ static netdev_tx_t clip_start_xmit(struct sk_buff *skb,
> 		memcpy(here, llc_oui, sizeof(llc_oui));
> 		((__be16 *) here)[3] = skb->protocol;
> 	}
> -	refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
> -	ATM_SKB(skb)->atm_options = vcc->atm_options;
> +	atm_account_tx(vcc, skb);
> 	entry->vccs->last_use = jiffies;
> 	pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n", skb, vcc, vcc->dev);
> 	old = xchg(&entry->vccs->xoff, 1);	/* assume XOFF ... */
> diff --git a/net/atm/common.c b/net/atm/common.c
> index 8a4f99114cd2..9e812c782a37 100644
> --- a/net/atm/common.c
> +++ b/net/atm/common.c
> @@ -630,10 +630,9 @@ int vcc_sendmsg(struct socket *sock, struct msghdr *m, size_t size)
> 		goto out;
> 	}
> 	pr_debug("%d += %d\n", sk_wmem_alloc_get(sk), skb->truesize);
> -	refcount_add(skb->truesize, &sk->sk_wmem_alloc);
> +	atm_account_tx(vcc, skb);
> 
> 	skb->dev = NULL; /* for paths shared with net_device interfaces */
> -	ATM_SKB(skb)->atm_options = vcc->atm_options;
> 	if (!copy_from_iter_full(skb_put(skb, size), size, &m->msg_iter)) {
> 		kfree_skb(skb);
> 		error = -EFAULT;
> diff --git a/net/atm/lec.c b/net/atm/lec.c
> index a3d93a1bb133..d7cc165e24e0 100644
> --- a/net/atm/lec.c
> +++ b/net/atm/lec.c
> @@ -179,9 +179,8 @@ lec_send(struct atm_vcc *vcc, struct sk_buff *skb)
> 	struct net_device *dev = skb->dev;
> 
> 	ATM_SKB(skb)->vcc = vcc;
> -	ATM_SKB(skb)->atm_options = vcc->atm_options;
> +	atm_account_tx(vcc, skb);
> 
> -	refcount_add(skb->truesize, &sk_atm(vcc)->sk_wmem_alloc);
> 	if (vcc->send(vcc, skb) < 0) {
> 		dev->stats.tx_dropped++;
> 		return;
> diff --git a/net/atm/mpc.c b/net/atm/mpc.c
> index 5677147209e8..db9a1838687c 100644
> --- a/net/atm/mpc.c
> +++ b/net/atm/mpc.c
> @@ -555,8 +555,7 @@ static int send_via_shortcut(struct sk_buff *skb, struct mpoa_client *mpc)
> 					sizeof(struct llc_snap_hdr));
> 	}
> 
> -	refcount_add(skb->truesize, &sk_atm(entry->shortcut)->sk_wmem_alloc);
> -	ATM_SKB(skb)->atm_options = entry->shortcut->atm_options;
> +	atm_account_tx(entry->shortcut, skb);
> 	entry->shortcut->send(entry->shortcut, skb);
> 	entry->packets_fwded++;
> 	mpc->in_ops->put(entry);
> diff --git a/net/atm/pppoatm.c b/net/atm/pppoatm.c
> index 21d9d341a619..af8c4b38b746 100644
> --- a/net/atm/pppoatm.c
> +++ b/net/atm/pppoatm.c
> @@ -350,8 +350,7 @@ static int pppoatm_send(struct ppp_channel *chan, struct sk_buff *skb)
> 		return 1;
> 	}
> 
> -	refcount_add(skb->truesize, &sk_atm(ATM_SKB(skb)->vcc)->sk_wmem_alloc);
> -	ATM_SKB(skb)->atm_options = ATM_SKB(skb)->vcc->atm_options;
> +	atm_account_tx(vcc, skb);
> 	pr_debug("atm_skb(%p)->vcc(%p)->dev(%p)\n",
> 		 skb, ATM_SKB(skb)->vcc, ATM_SKB(skb)->vcc->dev);
> 	ret = ATM_SKB(skb)->vcc->send(ATM_SKB(skb)->vcc, skb)
> diff --git a/net/atm/raw.c b/net/atm/raw.c
> index ee10e8d46185..b3ba44aab0ee 100644
> --- a/net/atm/raw.c
> +++ b/net/atm/raw.c
> @@ -35,8 +35,8 @@ static void atm_pop_raw(struct atm_vcc *vcc, struct sk_buff *skb)
> 	struct sock *sk = sk_atm(vcc);
> 
> 	pr_debug("(%d) %d -= %d\n",
> -		 vcc->vci, sk_wmem_alloc_get(sk), skb->truesize);
> -	WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
> +		 vcc->vci, sk_wmem_alloc_get(sk), ATM_SKB(skb)->acct_truesize);
> +	WARN_ON(refcount_sub_and_test(ATM_SKB(skb)->acct_truesize, &sk->sk_wmem_alloc));
> 	dev_kfree_skb_any(skb);
> 	sk->sk_write_space(sk);
> }
> --
> 2.17.0
> 
> --
> dwmw2

Tested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH 07/17] net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
From: David Woodhouse @ 2018-06-15 20:57 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant
  Cc: Eric Dumazet, Elena Reshetova, netdev@vger.kernel.org,
	Krzysztof Mazur, 3chas3@gmail.com, Mathias Kresin
In-Reply-To: <32BB74EA-234B-484F-B981-9F3D0027FD82@darbyshire-bryant.me.uk>

[-- Attachment #1: Type: text/plain, Size: 994 bytes --]



On Fri, 2018-06-15 at 20:49 +0000, Kevin Darbyshire-Bryant wrote:
> 
> > That does end up being quite hairy. I don't think it's worth doing.
> > 
> > This should probably suffice to fix it...
> > 
> > Kevin this is going to conflict with the ifx_atm_alloc_skb() hack in
> > the tree you're working on, but that needs to be killed with fire
> > anyway. It's utterly pointless as discussed.
> 
> I had already done so as part of the last pastebin debug info round :-)
> 
> As regards your patch… MAGIC!  Works an absolute treat.  Will get
> that submitted along with the ‘nuke ifx_atm_alloc_skb’ patch to
> OpenWrt tomorrow.  For now, maybe my brain will let me sleep :-)
> 
> Thank you soooooo much for your help & patience.
> 
> Tested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>

Thanks. In the morning please could I trouble you to test the other
variants that you can manage — PPPoA with llc-encap, as well as br2684
and PPPoE over that?

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply

* Re: [PATCH bpf 0/2] Two bpf fixes
From: Daniel Borkmann @ 2018-06-15 21:16 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: ast, netdev
In-Reply-To: <20180615193143.6dyizhdq345lgwf2@ast-mbp.dhcp.thefacebook.com>

On 06/15/2018 09:31 PM, Alexei Starovoitov wrote:
> On Fri, Jun 15, 2018 at 02:30:46AM +0200, Daniel Borkmann wrote:
>> First one is a panic I ran into while testing the second
>> one where we got several syzkaller reports. Series here
>> fixes both.
>>
>> Thanks!
> 
> Applied, thanks.
> 
> The second patch looks dubious to me though.
> Nothing in the kernel tree checks the return value of set_memory_ro()
> and my understanding that it can fail only when part of huge page
> is being marked and pages have to be split. In bpf case I don't think
> it's ever the case, so the patch is silencing purely theoretical
> syzbot splat that can happen with artificial error injection.
> I bet we're still going to see this splat in set_memory_rw.
> imo the better fix would have been to drop WARN_ON from both.

I think it should be pretty unlikely to trigger them in real world,
we have them in place for ~4yrs now as fixed-builtin and I haven't
heard any issues with it so far aside from the syzkaller splats which
triggered it with a total of 54 times via fault injection, fwiw.
Dropping second warn doesn't make sense actually since if we ever
run into it there's no option to recover, so we would want to know
where it breaks first.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH v3 16/27] docs: Fix more broken references
From: Stephen Boyd @ 2018-06-15 21:42 UTC (permalink / raw)
  To: Linux Doc Mailing List, Mauro Carvalho Chehab
  Cc: linux-hwmon, devicetree, alsa-devel, linux-samsung-soc,
	linux-mediatek, Jonathan Corbet, netdev, linux-pm, linux-mmc,
	linux-kernel, dri-devel, Mauro Carvalho Chehab, linux-rockchip,
	linux-usb, intel-wired-lan, Mauro Carvalho Chehab, linux-fsdevel,
	linux-clk, linux-arm-kernel
In-Reply-To: <e1bf52a721005b2017434acc54ec5ddc152d6fe4.1528990947.git.mchehab+samsung@kernel.org>

Quoting Mauro Carvalho Chehab (2018-06-14 09:09:01)
> As we move stuff around, some doc references are broken. Fix some of
> them via this script:
>         ./scripts/documentation-file-ref-check --fix
> 
> Manually checked that produced results are valid.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
> ---
>  .../devicetree/bindings/clock/st/st,clkgen.txt |  8 ++++----
>  .../devicetree/bindings/clock/ti/gate.txt      |  2 +-
>  .../devicetree/bindings/clock/ti/interface.txt |  2 +-

Acked-by: Stephen Boyd <sboyd@kernel.org>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply

* Re: KASAN: use-after-free Write in free_htab_elem
From: Daniel Borkmann @ 2018-06-15 21:42 UTC (permalink / raw)
  To: syzbot, ast, linux-kernel, netdev, syzkaller-bugs, john.fastabend
In-Reply-To: <000000000000394fd9056eb28b51@google.com>

On 06/15/2018 08:40 PM, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    f0dc7f9c6dd9 Merge git://git.kernel.org/pub/scm/linux/kern..
> git tree:       bpf-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=11dad428400000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=fa9c20c48788d1c1
> dashboard link: https://syzkaller.appspot.com/bug?extid=ce67d3e4fa77eedee964
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+ce67d3e4fa77eedee964@syzkaller.appspotmail.com

(John, if you do the respin would be great to double check all the open
reports on sock{map,hash} and annotate with syzkaller Reported-by tags.
Looks like on https://syzkaller.appspot.com/ we have 5 in total that may
be all related to your series of fixes you have. 2 of the reports have
wrong syz-fix tag, though, so we should clarify tracking them a bit. Thx)

> ==================================================================
> BUG: KASAN: use-after-free in atomic_dec include/asm-generic/atomic-instrumented.h:114 [inline]
> BUG: KASAN: use-after-free in free_htab_elem+0x23/0x40 kernel/bpf/sockmap.c:224
> Write of size 4 at addr ffff8801b3dce648 by task syz-executor1/8114
> 
> CPU: 0 PID: 8114 Comm: syz-executor1 Not tainted 4.17.0+ #39
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>  print_address_description+0x6c/0x20b mm/kasan/report.c:256
>  kasan_report_error mm/kasan/report.c:354 [inline]
>  kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
>  check_memory_region_inline mm/kasan/kasan.c:260 [inline]
>  check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
>  kasan_check_write+0x14/0x20 mm/kasan/kasan.c:278
>  atomic_dec include/asm-generic/atomic-instrumented.h:114 [inline]
>  free_htab_elem+0x23/0x40 kernel/bpf/sockmap.c:224
>  bpf_tcp_close+0x8c1/0xf80 kernel/bpf/sockmap.c:273
>  inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427
>  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:459
>  __sock_release+0xd7/0x260 net/socket.c:603
>  sock_close+0x19/0x20 net/socket.c:1186
>  __fput+0x353/0x890 fs/file_table.c:209
>  ____fput+0x15/0x20 fs/file_table.c:243
>  task_work_run+0x1e4/0x290 kernel/task_work.c:113
>  exit_task_work include/linux/task_work.h:22 [inline]
>  do_exit+0x1aee/0x2730 kernel/exit.c:865
>  do_group_exit+0x16f/0x430 kernel/exit.c:968
>  get_signal+0x886/0x1960 kernel/signal.c:2468
>  do_signal+0x9c/0x21c0 arch/x86/kernel/signal.c:816
>  exit_to_usermode_loop+0x2cf/0x360 arch/x86/entry/common.c:162
>  prepare_exit_to_usermode arch/x86/entry/common.c:197 [inline]
>  syscall_return_slowpath arch/x86/entry/common.c:268 [inline]
>  do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:293
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x455b29
> Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007f3bbf323ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: fffffffffffffe00 RBX: 000000000072bf78 RCX: 0000000000455b29
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bf78
> RBP: 000000000072bf78 R08: 0000000000000000 R09: 000000000072bf50
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffd7addc2cf R14: 00007f3bbf3249c0 R15: 0000000000000001
> 
> Allocated by task 8104:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
>  set_track mm/kasan/kasan.c:460 [inline]
>  kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
>  kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
>  kmalloc include/linux/slab.h:513 [inline]
>  kzalloc include/linux/slab.h:706 [inline]
>  sock_hash_alloc+0x20d/0x6a0 kernel/bpf/sockmap.c:2003
>  find_and_alloc_map kernel/bpf/syscall.c:129 [inline]
>  map_create+0x393/0x1010 kernel/bpf/syscall.c:453
>  __do_sys_bpf kernel/bpf/syscall.c:2351 [inline]
>  __se_sys_bpf kernel/bpf/syscall.c:2328 [inline]
>  __x64_sys_bpf+0x303/0x510 kernel/bpf/syscall.c:2328
>  do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Freed by task 2131:
>  save_stack+0x43/0xd0 mm/kasan/kasan.c:448
>  set_track mm/kasan/kasan.c:460 [inline]
>  __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
>  kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
>  __cache_free mm/slab.c:3498 [inline]
>  kfree+0xd9/0x260 mm/slab.c:3813
>  sock_hash_free+0x51c/0x6e0 kernel/bpf/sockmap.c:2098
>  bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
>  process_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
>  worker_thread+0x181/0x13a0 kernel/workqueue.c:2296
>  kthread+0x345/0x410 kernel/kthread.c:240
>  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
> 
> The buggy address belongs to the object at ffff8801b3dce540
>  which belongs to the cache kmalloc-512 of size 512
> The buggy address is located 264 bytes inside of
>  512-byte region [ffff8801b3dce540, ffff8801b3dce740)
> The buggy address belongs to the page:
> page:ffffea0006cf7380 count:1 mapcount:0 mapping:ffff8801da800940 index:0x0
> flags: 0x2fffc0000000100(slab)
> raw: 02fffc0000000100 ffffea0006caccc8 ffffea0006f57b08 ffff8801da800940
> raw: 0000000000000000 ffff8801b3dce040 0000000100000006 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  ffff8801b3dce500: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>  ffff8801b3dce580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>> ffff8801b3dce600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                               ^
>  ffff8801b3dce680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff8801b3dce700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> ==================================================================
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot.

^ permalink raw reply

* Re: [PATCH bpf v2] xdp: Fix handling of devmap in generic XDP
From: Daniel Borkmann @ 2018-06-15 22:27 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Toshiaki Makita; +Cc: Alexei Starovoitov, netdev
In-Reply-To: <20180614113302.30472d4e@redhat.com>

On 06/14/2018 11:33 AM, Jesper Dangaard Brouer wrote:
> On Thu, 14 Jun 2018 18:00:22 +0900
> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:
>> On 2018/06/14 17:49, Jesper Dangaard Brouer wrote:
>>> On Thu, 14 Jun 2018 11:07:42 +0900
>>> Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> wrote:
>>>   
>>>> Commit 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue") changed
>>>> the return value type of __devmap_lookup_elem() from struct net_device *
>>>> to struct bpf_dtab_netdev * but forgot to modify generic XDP code
>>>> accordingly.
>>>> Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct
>>>> net_device is expected, then skb->dev was set to invalid value.
>>>>
>>>> v2:
>>>> - Fix compiler warning without CONFIG_BPF_SYSCALL.
>>>>
>>>> Fixes: 67f29e07e131 ("bpf: devmap introduce dev_map_enqueue")
>>>> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>  
>>>
>>> Thanks for catching this!
>>>
>>> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

Applied to bpf, thanks Toshiaki!

^ permalink raw reply

* pull-request: bpf 2018-06-16
From: Daniel Borkmann @ 2018-06-15 23:06 UTC (permalink / raw)
  To: davem; +Cc: daniel, ast, netdev

Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a panic in devmap handling in generic XDP where return type
   of __devmap_lookup_elem() got changed recently but generic XDP
   code missed the related update, from Toshiaki.

2) Fix a freeze when BPF progs are loaded that include BPF to BPF
   calls when JIT is enabled where we would later bail out via error
   path w/o dropping kallsyms, and another one to silence syzkaller
   splats from locking prog read-only, from Daniel.

3) Fix a bug in test_offloads.py BPF selftest which must not assume
   that the underlying system have no BPF progs loaded prior to test,
   and one in bpftool to fix accuracy of program load time, from Jakub.

4) Fix a bug in bpftool's probe for availability of the bpf(2)
   BPF_TASK_FD_QUERY subcommand, from Yonghong.

5) Fix a regression in AF_XDP's XDP_SKB receive path where queue
   id check got erroneously removed, from Björn.

6) Fix missing state cleanup in BPF's xfrm tunnel test, from William.

7) Check tunnel type more accurately in BPF's tunnel collect metadata
   kselftest, from Jian.

8) Fix missing Kconfig fragments for BPF kselftests, from Anders.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!

----------------------------------------------------------------

The following changes since commit 6892286e9c09925780fe2cb6db3585b56b71fe8e:

  tcp: Do not reload skb pointer after skb_gro_receive(). (2018-06-11 20:00:56 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to 6d5fc1957989266006db6ef3dfb9159b42cf0189:

  xdp: Fix handling of devmap in generic XDP (2018-06-15 23:47:15 +0200)

----------------------------------------------------------------
Alexei Starovoitov (1):
      Merge branch 'bpf-fixes'

Anders Roxell (1):
      selftests: bpf: config: add config fragments

Björn Töpel (1):
      xsk: re-add queue id check for XDP_SKB path

Daniel Borkmann (3):
      Merge branch 'bpf-misc-fixes'
      bpf: fix panic in prog load calls cleanup
      bpf: reject any prog that failed read-only lock

Jakub Kicinski (2):
      tools: bpftool: improve accuracy of load time
      selftests/bpf: test offloads even with BPF programs present

Jian Wang (1):
      bpf, selftest: check tunnel type more accurately

Toshiaki Makita (1):
      xdp: Fix handling of devmap in generic XDP

William Tu (1):
      bpf, selftests: delete xfrm tunnel when test exits.

Yonghong Song (1):
      tools/bpftool: fix a bug in bpftool perf

 include/linux/bpf.h                         | 12 +++++
 include/linux/filter.h                      | 79 +++++++++++++++++++++--------
 kernel/bpf/core.c                           | 69 ++++++++++++++++++++++---
 kernel/bpf/devmap.c                         | 14 +++++
 kernel/bpf/syscall.c                        | 12 ++---
 net/core/filter.c                           | 21 ++------
 net/xdp/xsk.c                               |  3 ++
 tools/bpf/bpftool/perf.c                    |  5 +-
 tools/bpf/bpftool/prog.c                    |  4 +-
 tools/testing/selftests/bpf/config          | 10 ++++
 tools/testing/selftests/bpf/test_offload.py | 12 ++++-
 tools/testing/selftests/bpf/test_tunnel.sh  | 26 +++++-----
 12 files changed, 195 insertions(+), 72 deletions(-)

^ permalink raw reply

* [PATCH iproute2-next v3] ip-xfrm: Add support for OUTPUT_MARK
From: Subash Abhinov Kasiviswanathan @ 2018-06-16  2:32 UTC (permalink / raw)
  To: lorenzo, netdev, stephen, dsahern, steffen.klassert
  Cc: Subash Abhinov Kasiviswanathan

This patch adds support for OUTPUT_MARK in xfrm state to exercise the
functionality added by kernel commit 077fbac405bf
("net: xfrm: support setting an output mark.").

Sample output-

(with mark and output-mark)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        mark 0x10000/0x3ffff output-mark 0x20000
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(with mark only)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        mark 0x10000/0x3ffff
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(with output-mark only)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        output-mark 0x20000
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

(no mark and output-mark)
src 192.168.1.1 dst 192.168.1.2
        proto esp spi 0x00004321 reqid 0 mode tunnel
        replay-window 0 flag af-unspec
        auth-trunc xcbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b211 96
        enc cbc(aes) 0x3ed0af408cf5dcbf5d5d9a5fa806b233
        anti-replay context: seq 0x0, oseq 0x0, bitmap 0x00000000

v1->v2: Moved the XFRMA_OUTPUT_MARK print after XFRMA_MARK in
xfrm_xfrma_print() as mentioned by Lorenzo

v2->v3: Fix one help formatting error as mentioned by Lorenzo.
Keep mark and output-mark on the same line and add man page info as
mentioned by David.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
---
 ip/ipxfrm.c        | 17 ++++++++++++++++-
 ip/xfrm_state.c    |  9 +++++++++
 man/man8/ip-xfrm.8 | 11 +++++++++++
 3 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/ip/ipxfrm.c b/ip/ipxfrm.c
index 12c2f72..17ab4ab 100644
--- a/ip/ipxfrm.c
+++ b/ip/ipxfrm.c
@@ -637,6 +637,13 @@ static void xfrm_tmpl_print(struct xfrm_user_tmpl *tmpls, int len,
 	}
 }
 
+static void xfrm_output_mark_print(struct rtattr *tb[], FILE *fp)
+{
+	__u32 output_mark = rta_getattr_u32(tb[XFRMA_OUTPUT_MARK]);
+
+	fprintf(fp, "output-mark 0x%x", output_mark);
+}
+
 int xfrm_parse_mark(struct xfrm_mark *mark, int *argcp, char ***argvp)
 {
 	int argc = *argcp;
@@ -677,7 +684,15 @@ void xfrm_xfrma_print(struct rtattr *tb[], __u16 family,
 		struct rtattr *rta = tb[XFRMA_MARK];
 		struct xfrm_mark *m = RTA_DATA(rta);
 
-		fprintf(fp, "\tmark %#x/%#x", m->v, m->m);
+		fprintf(fp, "\tmark %#x/%#x ", m->v, m->m);
+
+		if (tb[XFRMA_OUTPUT_MARK])
+			xfrm_output_mark_print(tb, fp);
+		fprintf(fp, "%s", _SL_);
+	} else if (tb[XFRMA_OUTPUT_MARK]) {
+		fprintf(fp, "\t");
+
+		xfrm_output_mark_print(tb, fp);
 		fprintf(fp, "%s", _SL_);
 	}
 
diff --git a/ip/xfrm_state.c b/ip/xfrm_state.c
index 85d959c..913e9fa 100644
--- a/ip/xfrm_state.c
+++ b/ip/xfrm_state.c
@@ -61,6 +61,7 @@ static void usage(void)
 	fprintf(stderr, "        [ flag FLAG-LIST ] [ sel SELECTOR ] [ LIMIT-LIST ] [ encap ENCAP ]\n");
 	fprintf(stderr, "        [ coa ADDR[/PLEN] ] [ ctx CTX ] [ extra-flag EXTRA-FLAG-LIST ]\n");
 	fprintf(stderr, "        [ offload [dev DEV] dir DIR ]\n");
+	fprintf(stderr, "        [ output-mark OUTPUT-MARK ]\n");
 	fprintf(stderr, "Usage: ip xfrm state allocspi ID [ mode MODE ] [ mark MARK [ mask MASK ] ]\n");
 	fprintf(stderr, "        [ reqid REQID ] [ seq SEQ ] [ min SPI max SPI ]\n");
 	fprintf(stderr, "Usage: ip xfrm state { delete | get } ID [ mark MARK [ mask MASK ] ]\n");
@@ -322,6 +323,7 @@ static int xfrm_state_modify(int cmd, unsigned int flags, int argc, char **argv)
 		struct xfrm_user_sec_ctx sctx;
 		char    str[CTX_BUF_SIZE];
 	} ctx = {};
+	__u32 output_mark = 0;
 
 	while (argc > 0) {
 		if (strcmp(*argv, "mode") == 0) {
@@ -437,6 +439,10 @@ static int xfrm_state_modify(int cmd, unsigned int flags, int argc, char **argv)
 				invarg("value after \"offload dir\" is invalid", *argv);
 				is_offload = false;
 			}
+		} else if (strcmp(*argv, "output-mark") == 0) {
+			NEXT_ARG();
+			if (get_u32(&output_mark, *argv, 0))
+				invarg("value after \"output-mark\" is invalid", *argv);
 		} else {
 			/* try to assume ALGO */
 			int type = xfrm_algotype_getbyname(*argv);
@@ -720,6 +726,9 @@ static int xfrm_state_modify(int cmd, unsigned int flags, int argc, char **argv)
 		}
 	}
 
+	if (output_mark)
+		addattr32(&req.n, sizeof(req.buf), XFRMA_OUTPUT_MARK, output_mark);
+
 	if (rtnl_open_byproto(&rth, 0, NETLINK_XFRM) < 0)
 		exit(1);
 
diff --git a/man/man8/ip-xfrm.8 b/man/man8/ip-xfrm.8
index 988cc6a..839e06a 100644
--- a/man/man8/ip-xfrm.8
+++ b/man/man8/ip-xfrm.8
@@ -59,6 +59,8 @@ ip-xfrm \- transform configuration
 .IR CTX " ]"
 .RB "[ " extra-flag
 .IR EXTRA-FLAG-LIST " ]"
+.RB "[ " output-mark
+.IR OUTPUT-MARK " ]"
 
 .ti -8
 .B "ip xfrm state allocspi"
@@ -537,6 +539,15 @@ encapsulates packets with protocol
 .RI "using source port " SPORT ", destination port "  DPORT
 .RI ", and original address " OADDR "."
 
+.TP
+.I MARK
+used to match xfrm policies and states
+
+.TP
+.I OUTPUT-MARK
+used to set the output mark to influence the routing
+of the packets emitted by the state
+
 .sp
 .PP
 .TS
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH net] net/multicast: clean change record if add new INCLUDE group
From: Hangbin Liu @ 2018-06-16  3:10 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Paolo Abeni, Stefano Brivio, Daniel Borkmann,
	WANG Cong, hideaki.yoshifuji
In-Reply-To: <1528871551-17879-1-git-send-email-liuhangbin@gmail.com>

Self NAK this patch, there is some issue I need to fix.

On Wed, Jun 13, 2018 at 02:32:31PM +0800, Hangbin Liu wrote:
> Based on RFC3376 5.1 and RFC3810 6.1:
>    If no interface
>    state existed for that multicast address before the change (i.e., the
>    change consisted of creating a new per-interface record), or if no
>    state exists after the change (i.e., the change consisted of deleting
>    a per-interface record), then the "non-existent" state is considered
>    to have a filter mode of INCLUDE and an empty source list.
> 
> Which means a new multicast group should start with state IN(). That is
> exactly what we did with ip_mc_join_group()/ipv6_sock_mc_join(), which
> adds a group with state EX() and init crcount to mc_qrv. The kernel will
> send a TO_EX() report message after adding group. This is what IGMPv3/MLDv2
> ASM(Any-Source Multicast) mode should look like.
> 
> But for IGMPv3/MLDv2 SSM JOIN_SOURCE_GROUP mode, we split the group
> joining into two steps. First step we join the group like ASM, i.e. via
> ip_mc_join_group()/ipv6_sock_mc_join(). So the state changes from IN() to EX().
> 
> Then we add the Source-specific address with INCLUDE mode. So the state
> changes from EX() to IN(A).
> 
> Before the first step sends a group change record, we finished the second step.
> So we will only send the second change record. i.e. TO_IN(A)
> 
> Regarding the RFC stands, we should actually send an ALLOW(A) message for
> SSM JOIN_SOURCE_GROUP as the state should mimic the 'IN() to IN(A)' transition.
> 
> The issue was exposed by commit a052517a8ff65 ("net/multicast: should not send
> source list records when have filter mode change"). Before this commit we will
> send both ALLOW(A) and TO_IN(A). After this commit we only send TO_IN(A).
> 
> Fix it by adding a is_new key to clean the crcount when we add a new
> INCLUDE SSM group.
> 
> Fixes: a052517a8ff65 ("net/multicast: should not send source list records when have filter mode change")
> Reviewed-by: Paolo Abeni <pabeni@redhat.com>
> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>  include/linux/igmp.h     |  2 +-
>  include/net/ipv6.h       |  2 +-
>  net/ipv4/igmp.c          | 27 ++++++++++++++++++++++++++-
>  net/ipv4/ip_sockglue.c   |  8 ++++++--
>  net/ipv6/ipv6_sockglue.c |  4 +++-
>  net/ipv6/mcast.c         | 25 ++++++++++++++++++++++++-
>  6 files changed, 61 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/igmp.h b/include/linux/igmp.h
> index f823185..32cb02b 100644
> --- a/include/linux/igmp.h
> +++ b/include/linux/igmp.h
> @@ -112,7 +112,7 @@ extern int ip_mc_join_group(struct sock *sk, struct ip_mreqn *imr);
>  extern int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr);
>  extern void ip_mc_drop_socket(struct sock *sk);
>  extern int ip_mc_source(int add, int omode, struct sock *sk,
> -		struct ip_mreq_source *mreqs, int ifindex);
> +		struct ip_mreq_source *mreqs, int ifindex, bool is_new);
>  extern int ip_mc_msfilter(struct sock *sk, struct ip_msfilter *msf,int ifindex);
>  extern int ip_mc_msfget(struct sock *sk, struct ip_msfilter *msf,
>  		struct ip_msfilter __user *optval, int __user *optlen);
> diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> index 836f31a..754c5cb 100644
> --- a/include/net/ipv6.h
> +++ b/include/net/ipv6.h
> @@ -1065,7 +1065,7 @@ struct group_source_req;
>  struct group_filter;
>  
>  int ip6_mc_source(int add, int omode, struct sock *sk,
> -		  struct group_source_req *pgsr);
> +		  struct group_source_req *pgsr, bool is_new);
>  int ip6_mc_msfilter(struct sock *sk, struct group_filter *gsf);
>  int ip6_mc_msfget(struct sock *sk, struct group_filter *gsf,
>  		  struct group_filter __user *optval, int __user *optlen);
> diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
> index b26a81a..8d6ecc3 100644
> --- a/net/ipv4/igmp.c
> +++ b/net/ipv4/igmp.c
> @@ -2249,8 +2249,27 @@ int ip_mc_leave_group(struct sock *sk, struct ip_mreqn *imr)
>  }
>  EXPORT_SYMBOL(ip_mc_leave_group);
>  
> +static void ip_mc_clear_cr(struct in_device *in_dev, __be32 pmca)
> +{
> +#ifdef CONFIG_IP_MULTICAST
> +	struct ip_mc_list *pmc;
> +
> +	rcu_read_lock();
> +	for_each_pmc_rcu(in_dev, pmc) {
> +		if (pmca == pmc->multiaddr)
> +			break;
> +	}
> +	if (pmc) {
> +		spin_lock_bh(&pmc->lock);
> +		pmc->crcount = 0;
> +		spin_unlock_bh(&pmc->lock);
> +	}
> +	rcu_read_unlock();
> +#endif
> +}
> +
>  int ip_mc_source(int add, int omode, struct sock *sk, struct
> -	ip_mreq_source *mreqs, int ifindex)
> +	ip_mreq_source *mreqs, int ifindex, bool is_new)
>  {
>  	int err;
>  	struct ip_mreqn imr;
> @@ -2301,6 +2320,12 @@ int ip_mc_source(int add, int omode, struct sock *sk, struct
>  		ip_mc_del_src(in_dev, &mreqs->imr_multiaddr, pmc->sfmode, 0,
>  			NULL, 0);
>  		pmc->sfmode = omode;
> +		/* Based on RFC3376 5.1, for newly added INCLUDE SSM, we should
> +		 * not send filter-mode change record as the mode should be
> +		 * from IN() to IN(A).
> +		 */
> +		if (is_new)
> +			ip_mc_clear_cr(in_dev, mreqs->imr_multiaddr);
>  	}
>  
>  	psl = rtnl_dereference(pmc->sflist);
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index 57bbb06..8d8c0cd 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -962,6 +962,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
>  	case IP_DROP_SOURCE_MEMBERSHIP:
>  	{
>  		struct ip_mreq_source mreqs;
> +		bool is_new = false;
>  		int omode, add;
>  
>  		if (optlen != sizeof(struct ip_mreq_source))
> @@ -987,11 +988,12 @@ static int do_ip_setsockopt(struct sock *sk, int level,
>  				break;
>  			omode = MCAST_INCLUDE;
>  			add = 1;
> +			is_new = true;
>  		} else /* IP_DROP_SOURCE_MEMBERSHIP */ {
>  			omode = MCAST_INCLUDE;
>  			add = 0;
>  		}
> -		err = ip_mc_source(add, omode, sk, &mreqs, 0);
> +		err = ip_mc_source(add, omode, sk, &mreqs, 0, is_new);
>  		break;
>  	}
>  	case MCAST_JOIN_GROUP:
> @@ -1027,6 +1029,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
>  		struct group_source_req greqs;
>  		struct ip_mreq_source mreqs;
>  		struct sockaddr_in *psin;
> +		bool is_new = false;
>  		int omode, add;
>  
>  		if (optlen != sizeof(struct group_source_req))
> @@ -1065,12 +1068,13 @@ static int do_ip_setsockopt(struct sock *sk, int level,
>  			greqs.gsr_interface = mreq.imr_ifindex;
>  			omode = MCAST_INCLUDE;
>  			add = 1;
> +			is_new = true;
>  		} else /* MCAST_LEAVE_SOURCE_GROUP */ {
>  			omode = MCAST_INCLUDE;
>  			add = 0;
>  		}
>  		err = ip_mc_source(add, omode, sk, &mreqs,
> -				   greqs.gsr_interface);
> +				   greqs.gsr_interface, is_new);
>  		break;
>  	}
>  	case MCAST_MSFILTER:
> diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> index 4d780c7..36e7c40 100644
> --- a/net/ipv6/ipv6_sockglue.c
> +++ b/net/ipv6/ipv6_sockglue.c
> @@ -695,6 +695,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
>  	case MCAST_UNBLOCK_SOURCE:
>  	{
>  		struct group_source_req greqs;
> +		bool is_new = false;
>  		int omode, add;
>  
>  		if (optlen < sizeof(struct group_source_req))
> @@ -725,11 +726,12 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
>  				break;
>  			omode = MCAST_INCLUDE;
>  			add = 1;
> +			is_new = true;
>  		} else /* MCAST_LEAVE_SOURCE_GROUP */ {
>  			omode = MCAST_INCLUDE;
>  			add = 0;
>  		}
> -		retv = ip6_mc_source(add, omode, sk, &greqs);
> +		retv = ip6_mc_source(add, omode, sk, &greqs, is_new);
>  		break;
>  	}
>  	case MCAST_MSFILTER:
> diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
> index 793159d..f508a1c 100644
> --- a/net/ipv6/mcast.c
> +++ b/net/ipv6/mcast.c
> @@ -315,8 +315,25 @@ void ipv6_sock_mc_close(struct sock *sk)
>  	rtnl_unlock();
>  }
>  
> +static void ip6_mc_clear_cr(struct inet6_dev *idev, const struct in6_addr *pmca)
> +{
> +	struct ifmcaddr6 *pmc;
> +
> +	read_lock_bh(&idev->lock);
> +	for (pmc = idev->mc_list; pmc; pmc = pmc->next) {
> +		if (ipv6_addr_equal(pmca, &pmc->mca_addr))
> +			break;
> +	}
> +	if (pmc) {
> +		spin_lock_bh(&pmc->mca_lock);
> +		pmc->mca_crcount = 0;
> +		spin_unlock_bh(&pmc->mca_lock);
> +	}
> +	read_unlock_bh(&idev->lock);
> +}
> +
>  int ip6_mc_source(int add, int omode, struct sock *sk,
> -	struct group_source_req *pgsr)
> +	struct group_source_req *pgsr, bool is_new)
>  {
>  	struct in6_addr *source, *group;
>  	struct ipv6_mc_socklist *pmc;
> @@ -365,6 +382,12 @@ int ip6_mc_source(int add, int omode, struct sock *sk,
>  		ip6_mc_add_src(idev, group, omode, 0, NULL, 0);
>  		ip6_mc_del_src(idev, group, pmc->sfmode, 0, NULL, 0);
>  		pmc->sfmode = omode;
> +		/* Based on RFC3810 6.1, for newly added INCLUDE SSM, we
> +		 * should not send filter-mode change record as the mode
> +		 * should be from IN() to IN(A).
> +		 */
> +		if (is_new)
> +			ip6_mc_clear_cr(idev, group);
>  	}
>  
>  	write_lock(&pmc->sflock);
> -- 
> 2.5.5
> 

^ permalink raw reply

* Re: [PATCH 07/17] net: convert sock.sk_wmem_alloc from atomic_t to refcount_t
From: Kevin Darbyshire-Bryant @ 2018-06-16  3:44 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Eric Dumazet, Elena Reshetova, netdev@vger.kernel.org,
	Krzysztof Mazur, 3chas3@gmail.com, Mathias Kresin
In-Reply-To: <1529096226.27158.51.camel@infradead.org>

[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]



> On 15 Jun 2018, at 21:57, David Woodhouse <dwmw2@infradead.org> wrote:
> 
> 
> 
> On Fri, 2018-06-15 at 20:49 +0000, Kevin Darbyshire-Bryant wrote:
>> 
>>> That does end up being quite hairy. I don't think it's worth doing.
>>> 
>>> This should probably suffice to fix it...
>>> 
>>> Kevin this is going to conflict with the ifx_atm_alloc_skb() hack in
>>> the tree you're working on, but that needs to be killed with fire
>>> anyway. It's utterly pointless as discussed.
>> 
>> I had already done so as part of the last pastebin debug info round :-)
>> 
>> As regards your patch… MAGIC!  Works an absolute treat.  Will get
>> that submitted along with the ‘nuke ifx_atm_alloc_skb’ patch to
>> OpenWrt tomorrow.  For now, maybe my brain will let me sleep :-)
>> 
>> Thank you soooooo much for your help & patience.
>> 
>> Tested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
> 
> Thanks. In the morning please could I trouble you to test the other
> variants that you can manage — PPPoA with llc-encap, as well as br2684
> and PPPoE over that?

I can confirm that PPPoA with both vc & llc encapsulations work.  BR2684 with PPPoE and both vc & llc encapsulations also work.  No nasty messages noted in dmesg.  I’m actually gobsmacked at how tolerant TalkTalk/BT are of what I’ve thrown at them, they clearly just look for PPP frames :-)

Kevin


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] net_sched: blackhole: tell upper qdisc about dropped packets
From: Cong Wang @ 2018-06-16  4:00 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Eric Dumazet, Linux Kernel Network Developers, David S. Miller,
	Jiri Pirko, Jamal Hadi Salim
In-Reply-To: <fe86812c-f0bb-157f-d489-ba08a02b43c7@yandex-team.ru>

On Fri, Jun 15, 2018 at 6:21 AM, Konstantin Khlebnikov
<khlebnikov@yandex-team.ru> wrote:
> On 15.06.2018 16:13, Eric Dumazet wrote:
>>
>>
>>
>> On 06/15/2018 03:27 AM, Konstantin Khlebnikov wrote:
>>>
>>> When blackhole is used on top of classful qdisc like hfsc it breaks
>>> qlen and backlog counters because packets are disappear without notice.
>>>
>>> In HFSC non-zero qlen while all classes are inactive triggers warning:
>>> WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90
>>> [sch_hfsc]
>>> and schedules watchdog work endlessly.
>>>
>>> This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS,
>>> this flag tells upper layer: this packet is gone and isn't queued.
>>>
>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>>> ---
>>>   net/sched/sch_blackhole.c |    2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/sched/sch_blackhole.c b/net/sched/sch_blackhole.c
>>> index c98a61e980ba..9c4c2bb547d7 100644
>>> --- a/net/sched/sch_blackhole.c
>>> +++ b/net/sched/sch_blackhole.c
>>> @@ -21,7 +21,7 @@ static int blackhole_enqueue(struct sk_buff *skb,
>>> struct Qdisc *sch,
>>>                              struct sk_buff **to_free)
>>>   {
>>>         qdisc_drop(skb, sch, to_free);
>>> -       return NET_XMIT_SUCCESS;
>>> +       return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
>>
>>
>> Why do not we use instead :
>>
>>         return qdisc_drop(skb, sch, to_free);
>>
>> Although noop_enqueue() seems to use :
>>
>>         return NET_XMIT_CN;
>>
>> Oh well.
>>
>>
>
> I suppose "blackhole" should work like "successful" xmit, but counted as
> drop.

But anything !NET_XMIT_SUCCESS is basically same for upper
layer:

        err = qdisc_enqueue(skb, cl->qdisc, to_free);
        if (unlikely(err != NET_XMIT_SUCCESS)) {
                if (net_xmit_drop_count(err)) {
                        cl->qstats.drops++;
                        qdisc_qstats_drop(sch);
                }
                return err;
        }

So using NET_XMIT_DROP is same in this case?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox