Netdev List
 help / color / mirror / Atom feed
* Re: [net] ixgbe: fix possible deadlock in ixgbe_service_task()
From: Jeff Kirsher @ 2019-08-08 16:36 UTC (permalink / raw)
  To: Taehee Yoo, David Miller; +Cc: Netdev, nhorman, sassmann, andrewx.bowers
In-Reply-To: <CAMArcTXWBHUpy+p18UJ6RZm2W=vhnLRezste=kHSSv=dyd0kBA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1206 bytes --]

On Wed, 2019-08-07 at 15:08 +0900, Taehee Yoo wrote:
> On Wed, 7 Aug 2019 at 08:36, David Miller <davem@davemloft.net>
> wrote:
> 
> Hi David
> Thank you for the review!
> 
> > From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> > Date: Mon,  5 Aug 2019 13:04:03 -0700
> > 
> > > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > > b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > > index cbaf712d6529..3386e752e458 100644
> > > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> > > @@ -7898,9 +7898,7 @@ static void ixgbe_service_task(struct
> > > work_struct *work)
> > >        }
> > >        if (ixgbe_check_fw_error(adapter)) {
> > >                if (!test_bit(__IXGBE_DOWN, &adapter->state)) {
> > > -                     rtnl_lock();
> > >                        unregister_netdev(adapter->netdev);
> > > -                     rtnl_unlock();
> > >                }
> > 
> > Please remove the (now unnecessary) curly braces for this basic
> > block.
> > 
> 
> I will send a v2 patch.
> Thank you!

I have already created a v2 on your behalf Taechee and will submit to
Dave here shortly.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* [net v2] ixgbe: fix possible deadlock in ixgbe_service_task()
From: Jeff Kirsher @ 2019-08-08 16:37 UTC (permalink / raw)
  To: davem; +Cc: Taehee Yoo, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher

From: Taehee Yoo <ap420073@gmail.com>

ixgbe_service_task() calls unregister_netdev() under rtnl_lock().
But unregister_netdev() internally calls rtnl_lock().
So deadlock would occur.

Fixes: 59dd45d550c5 ("ixgbe: firmware recovery mode")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
v2: removed unnecessary curly brackets

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cbaf712d6529..7882148abb43 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -7897,11 +7897,8 @@ static void ixgbe_service_task(struct work_struct *work)
 		return;
 	}
 	if (ixgbe_check_fw_error(adapter)) {
-		if (!test_bit(__IXGBE_DOWN, &adapter->state)) {
-			rtnl_lock();
+		if (!test_bit(__IXGBE_DOWN, &adapter->state))
 			unregister_netdev(adapter->netdev);
-			rtnl_unlock();
-		}
 		ixgbe_service_event_complete(adapter);
 		return;
 	}
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH] isdn: hysdn: Fix error spaces around '*'
From: Greg KH @ 2019-08-08 16:39 UTC (permalink / raw)
  To: Joe Perches
  Cc: Stephen Hemminger, Jose Carlos Cazarin Filho, isdn, devel, netdev,
	linux-kernel
In-Reply-To: <2ecfbf8dda354fe47912446bf5c3fe30ca905aa0.camel@perches.com>

On Fri, Aug 02, 2019 at 03:05:05PM -0700, Joe Perches wrote:
> On Fri, 2019-08-02 at 14:55 -0700, Stephen Hemminger wrote:
> > On Fri,  2 Aug 2019 19:56:02 +0000
> > Jose Carlos Cazarin Filho <joseespiriki@gmail.com> wrote:
> > 
> > > Fix checkpath error:
> > > CHECK: spaces preferred around that '*' (ctx:WxV)
> > > +extern hysdn_card *card_root;        /* pointer to first card */
> > > 
> > > Signed-off-by: Jose Carlos Cazarin Filho <joseespiriki@gmail.com>
> > 
> > Read the TODO, these drivers are scheduled for removal, so changes
> > are not helpful at this time.
> 
> Maybe better to mark the MAINTAINERS entry obsolete so
> checkpatch bleats a message about unnecessary changes.
> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 30bf852e6d6b..b5df91032574 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8628,7 +8628,7 @@ M:	Karsten Keil <isdn@linux-pingi.de>
>  L:	isdn4linux@listserv.isdn4linux.de (subscribers-only)
>  L:	netdev@vger.kernel.org
>  W:	http://www.isdn4linux.de
> -S:	Odd Fixes
> +S:	Obsolete
>  F:	Documentation/isdn/
>  F:	drivers/isdn/capi/
>  F:	drivers/staging/isdn/
> 

Good idea, will take this patch now, thanks.

greg k-h

^ permalink raw reply

* Re: [PATCH] isdn: hysdn: Fix error spaces around '*'
From: Greg KH @ 2019-08-08 16:40 UTC (permalink / raw)
  To: Joe Perches
  Cc: Stephen Hemminger, Jose Carlos Cazarin Filho, isdn, devel, netdev,
	linux-kernel
In-Reply-To: <20190808163905.GA9224@kroah.com>

On Thu, Aug 08, 2019 at 06:39:05PM +0200, Greg KH wrote:
> On Fri, Aug 02, 2019 at 03:05:05PM -0700, Joe Perches wrote:
> > On Fri, 2019-08-02 at 14:55 -0700, Stephen Hemminger wrote:
> > > On Fri,  2 Aug 2019 19:56:02 +0000
> > > Jose Carlos Cazarin Filho <joseespiriki@gmail.com> wrote:
> > > 
> > > > Fix checkpath error:
> > > > CHECK: spaces preferred around that '*' (ctx:WxV)
> > > > +extern hysdn_card *card_root;        /* pointer to first card */
> > > > 
> > > > Signed-off-by: Jose Carlos Cazarin Filho <joseespiriki@gmail.com>
> > > 
> > > Read the TODO, these drivers are scheduled for removal, so changes
> > > are not helpful at this time.
> > 
> > Maybe better to mark the MAINTAINERS entry obsolete so
> > checkpatch bleats a message about unnecessary changes.
> > ---
> >  MAINTAINERS | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 30bf852e6d6b..b5df91032574 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -8628,7 +8628,7 @@ M:	Karsten Keil <isdn@linux-pingi.de>
> >  L:	isdn4linux@listserv.isdn4linux.de (subscribers-only)
> >  L:	netdev@vger.kernel.org
> >  W:	http://www.isdn4linux.de
> > -S:	Odd Fixes
> > +S:	Obsolete
> >  F:	Documentation/isdn/
> >  F:	drivers/isdn/capi/
> >  F:	drivers/staging/isdn/
> > 
> 
> Good idea, will take this patch now, thanks.

Can you resend this with a s-o-b so I can apply it?

thanks,

greg k-h

^ permalink raw reply

* general protection fault in tls_tx_records
From: syzbot @ 2019-08-08 16:44 UTC (permalink / raw)
  To: ast, aviadye, borisp, bpf, daniel, davejwatson, davem,
	jakub.kicinski, john.fastabend, kafai, linux-kernel, netdev,
	songliubraving, syzkaller-bugs, yhs

Hello,

syzbot found the following crash on:

HEAD commit:    ce96e791 Add linux-next specific files for 20190731
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=13ce4fd0600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=fca5b9d53db6585c
dashboard link: https://syzkaller.appspot.com/bug?extid=97d0cf528b9c8e9be7f4
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+97d0cf528b9c8e9be7f4@syzkaller.appspotmail.com

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.3.0-rc2-next-20190731 #56
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Workqueue: events tx_work_handler
RIP: 0010:tls_tx_records+0x5e/0x740 net/tls/tls_sw.c:365
Code: 80 3c 02 00 0f 85 31 06 00 00 49 8b 87 b0 06 00 00 48 8d 78 28 48 89  
45 c0 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f  
85 1b 06 00 00 48 8b 45 c0 48 8d 78 60 48 8b 58 28
RSP: 0018:ffff8880a98d7cb0 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000001 RCX: 1ffffffff134c016
RDX: 0000000000000005 RSI: ffffffff862e74fc RDI: 0000000000000028
RBP: ffff8880a98d7d00 R08: ffff8880a98c8300 R09: 0000000000000000
R10: fffffbfff134b9d8 R11: ffff8880a98c8300 R12: ffff88808eb47cc0
R13: ffff8880a9ac4c40 R14: ffff88808eb47de8 R15: ffff8880a9ac4c40
FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001b30e80 CR3: 000000009c1a0000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  tx_work_handler+0x134/0x180 net/tls/tls_sw.c:2176
  process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
  worker_thread+0x98/0xe40 kernel/workqueue.c:2415
  kthread+0x361/0x430 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
Modules linked in:
---[ end trace c75bda97ceb541bf ]---
RIP: 0010:tls_tx_records+0x5e/0x740 net/tls/tls_sw.c:365
Code: 80 3c 02 00 0f 85 31 06 00 00 49 8b 87 b0 06 00 00 48 8d 78 28 48 89  
45 c0 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f  
85 1b 06 00 00 48 8b 45 c0 48 8d 78 60 48 8b 58 28
RSP: 0018:ffff8880a98d7cb0 EFLAGS: 00010206
RAX: dffffc0000000000 RBX: 0000000000000001 RCX: 1ffffffff134c016
RDX: 0000000000000005 RSI: ffffffff862e74fc RDI: 0000000000000028
RBP: ffff8880a98d7d00 R08: ffff8880a98c8300 R09: 0000000000000000
R10: fffffbfff134b9d8 R11: ffff8880a98c8300 R12: ffff88808eb47cc0
R13: ffff8880a9ac4c40 R14: ffff88808eb47de8 R15: ffff8880a9ac4c40
FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd4f4eadac CR3: 00000000987e6000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply

* KASAN: use-after-free Read in tls_wait_data
From: syzbot @ 2019-08-08 16:44 UTC (permalink / raw)
  To: ast, aviadye, borisp, bpf, daniel, davejwatson, davem,
	jakub.kicinski, john.fastabend, kafai, linux-kernel, netdev,
	songliubraving, syzkaller-bugs, yhs

Hello,

syzbot found the following crash on:

HEAD commit:    7b4980e0 Add linux-next specific files for 20190802
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=14a749b4600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=7e1348afd44b5e02
dashboard link: https://syzkaller.appspot.com/bug?extid=30c791a76814a3c6c9f9
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+30c791a76814a3c6c9f9@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in tls_wait_data+0x884/0x980  
net/tls/tls_sw.c:1261
Read of size 8 at addr ffff88808ea9f890 by task syz-executor.2/31898

CPU: 1 PID: 31898 Comm: syz-executor.2 Not tainted 5.3.0-rc2-next-20190802  
#58
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351
  __kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482
  kasan_report+0x12/0x17 mm/kasan/common.c:610
  __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
  tls_wait_data+0x884/0x980 net/tls/tls_sw.c:1261
  tls_sw_recvmsg+0x57d/0x17c0 net/tls/tls_sw.c:1730
  inet_recvmsg+0x136/0x620 net/ipv4/af_inet.c:838
  sock_recvmsg_nosec+0x89/0xb0 net/socket.c:871
  ___sys_recvmsg+0x271/0x5a0 net/socket.c:2480
  do_recvmmsg+0x27e/0x7a0 net/socket.c:2601
  __sys_recvmmsg+0x259/0x270 net/socket.c:2680
  __do_sys_recvmmsg net/socket.c:2703 [inline]
  __se_sys_recvmmsg net/socket.c:2696 [inline]
  __x64_sys_recvmmsg+0xe6/0x140 net/socket.c:2696
  do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459829
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f6093e3ec78 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000459829
RDX: 0000000000000004 RSI: 00000000200031c0 RDI: 0000000000000003
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6093e3f6d4
R13: 00000000004c6d67 R14: 00000000004dc0a8 R15: 00000000ffffffff

Allocated by task 31898:
  save_stack+0x23/0x90 mm/kasan/common.c:69
  set_track mm/kasan/common.c:77 [inline]
  __kasan_kmalloc mm/kasan/common.c:486 [inline]
  __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:459
  kasan_kmalloc+0x9/0x10 mm/kasan/common.c:500
  kmem_cache_alloc_trace+0x158/0x790 mm/slab.c:3550
  kmalloc include/linux/slab.h:552 [inline]
  kzalloc include/linux/slab.h:686 [inline]
  tls_set_sw_offload+0x110a/0x1567 net/tls/tls_sw.c:2243
  do_tls_setsockopt_conf net/tls/tls_main.c:594 [inline]
  do_tls_setsockopt net/tls/tls_main.c:630 [inline]
  tls_setsockopt+0x68d/0x8d0 net/tls/tls_main.c:649
  sock_common_setsockopt+0x94/0xd0 net/core/sock.c:3130
  __sys_setsockopt+0x261/0x4c0 net/socket.c:2084
  __do_sys_setsockopt net/socket.c:2100 [inline]
  __se_sys_setsockopt net/socket.c:2097 [inline]
  __x64_sys_setsockopt+0xbe/0x150 net/socket.c:2097
  do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 17:
  save_stack+0x23/0x90 mm/kasan/common.c:69
  set_track mm/kasan/common.c:77 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/common.c:448
  kasan_slab_free+0xe/0x10 mm/kasan/common.c:456
  __cache_free mm/slab.c:3425 [inline]
  kfree+0x10a/0x2c0 mm/slab.c:3756
  tls_sw_free_ctx_rx+0x31/0x40 net/tls/tls_sw.c:2145
  tls_ctx_free_deferred+0xc4/0x130 net/tls/tls_main.c:279
  process_one_work+0x9af/0x1740 kernel/workqueue.c:2269
  worker_thread+0x98/0xe40 kernel/workqueue.c:2415
  kthread+0x361/0x430 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352

The buggy address belongs to the object at ffff88808ea9f680
  which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 528 bytes inside of
  1024-byte region [ffff88808ea9f680, ffff88808ea9fa80)
The buggy address belongs to the page:
page:ffffea00023aa780 refcount:1 mapcount:0 mapping:ffff8880aa400c40  
index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea000295f888 ffffea00021f7908 ffff8880aa400c40
raw: 0000000000000000 ffff88808ea9e000 0000000100000007 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff88808ea9f780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff88808ea9f800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88808ea9f880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                          ^
  ffff88808ea9f900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff88808ea9f980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply

* INFO: rcu detected stall in tcp_write_timer
From: syzbot @ 2019-08-08 16:45 UTC (permalink / raw)
  To: ast, bpf, daniel, davem, edumazet, kafai, kuznet, linux-kernel,
	netdev, songliubraving, syzkaller-bugs, yhs, yoshfuji

Hello,

syzbot found the following crash on:

HEAD commit:    ce96e791 Add linux-next specific files for 20190731
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=12b2efd0600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=fca5b9d53db6585c
dashboard link: https://syzkaller.appspot.com/bug?extid=1f80b70f1e8f1df46319
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+1f80b70f1e8f1df46319@syzkaller.appspotmail.com

rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	(detected by 1, t=10502 jiffies, g=28777, q=38)
rcu: All QSes seen, last rcu_preempt kthread activity 10503  
(4294973637-4294963134), jiffies_till_next_fqs=1, root ->qsmask 0x0
syz-executor.5  R  running task    27376 17588  10322 0x00004008
Call Trace:
  <IRQ>
  sched_show_task kernel/sched/core.c:5814 [inline]
  sched_show_task.cold+0x2ed/0x34e kernel/sched/core.c:5789
  print_other_cpu_stall kernel/rcu/tree_stall.h:410 [inline]
  check_cpu_stall kernel/rcu/tree_stall.h:536 [inline]
  rcu_pending kernel/rcu/tree.c:2736 [inline]
  rcu_sched_clock_irq.cold+0xac8/0xc13 kernel/rcu/tree.c:2183
  update_process_times+0x32/0x80 kernel/time/timer.c:1639
  tick_sched_handle+0xa2/0x190 kernel/time/tick-sched.c:167
  tick_sched_timer+0x53/0x140 kernel/time/tick-sched.c:1296
  __run_hrtimer kernel/time/hrtimer.c:1389 [inline]
  __hrtimer_run_queues+0x364/0xe40 kernel/time/hrtimer.c:1451
  hrtimer_interrupt+0x314/0x770 kernel/time/hrtimer.c:1509
  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1068 [inline]
  smp_apic_timer_interrupt+0x160/0x610 arch/x86/kernel/apic/apic.c:1093
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:828
RIP: 0010:__kasan_check_read+0x0/0x20 mm/kasan/common.c:91
Code: e8 e9 c0 ae ff 0f 0b 4c 8b 4d d0 e9 27 ee ff ff 48 8b 73 58 89 c2 48  
c7 c7 e0 c2 89 88 f7 da e8 ca c0 ae ff e9 da ee ff ff 90 <55> 89 f6 31 d2  
48 89 e5 48 8b 4d 08 e8 cf 26 00 00 5d c3 0f 1f 00
RSP: 0018:ffff8880ae909b40 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: ffff8880601cce08 RCX: ffffffff8158f467
RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8880601cce08
RBP: ffff8880ae909c08 R08: 1ffff1100c0399c1 R09: ffffed100c0399c2
R10: ffffed100c0399c1 R11: ffff8880601cce0b R12: 0000000000000001
R13: 0000000000000003 R14: ffffed100c0399c1 R15: 0000000000000001
  pv_queued_spin_lock_slowpath arch/x86/include/asm/paravirt.h:642 [inline]
  queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:50 [inline]
  queued_spin_lock include/asm-generic/qspinlock.h:81 [inline]
  do_raw_spin_lock+0x20e/0x2e0 kernel/locking/spinlock_debug.c:113
  __raw_spin_lock include/linux/spinlock_api_smp.h:143 [inline]
  _raw_spin_lock+0x37/0x40 kernel/locking/spinlock.c:151
  spin_lock include/linux/spinlock.h:338 [inline]
  tcp_write_timer+0x2b/0x1e0 net/ipv4/tcp_timer.c:610
  call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1322
  expire_timers kernel/time/timer.c:1366 [inline]
  __run_timers kernel/time/timer.c:1685 [inline]
  __run_timers kernel/time/timer.c:1653 [inline]
  run_timer_softirq+0x697/0x17a0 kernel/time/timer.c:1698
  __do_softirq+0x262/0x98c kernel/softirq.c:292
  invoke_softirq kernel/softirq.c:373 [inline]
  irq_exit+0x19b/0x1e0 kernel/softirq.c:413
  exiting_irq arch/x86/include/asm/apic.h:536 [inline]
  smp_apic_timer_interrupt+0x1a3/0x610 arch/x86/kernel/apic/apic.c:1095
  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:828
  </IRQ>
RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:756  
[inline]
RIP: 0010:slab_alloc mm/slab.c:3312 [inline]
RIP: 0010:__do_kmalloc mm/slab.c:3653 [inline]
RIP: 0010:__kmalloc+0x2b8/0x770 mm/slab.c:3664
Code: 7e 0f 85 d6 fe ff ff e8 26 c5 53 ff e9 cc fe ff ff e8 1c ff ca ff 48  
83 3d e4 52 26 07 00 0f 84 4f 03 00 00 48 8b 7d c0 57 9d <0f> 1f 44 00 00  
e9 5e fe ff ff 31 d2 be 35 02 00 00 48 c7 c7 ce c0
RSP: 0018:ffff888069e978c0 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000007 RBX: 0000000000000dc0 RCX: 1ffffffff134c016
RDX: 0000000000000000 RSI: ffffffff8177a12e RDI: 0000000000000286
RBP: ffff888069e97938 R08: ffff888064bee240 R09: ffffed10154802c1
R10: ffffed10154802c0 R11: ffff8880aa401603 R12: 0000000000000100
R13: 0000000000000dc0 R14: ffff8880aa4008c0 R15: ffff88805f8e7dc0
  kmalloc_array include/linux/slab.h:614 [inline]
  kcalloc include/linux/slab.h:625 [inline]
  iter_file_splice_write+0x16e/0xbe0 fs/splice.c:690
  do_splice_from fs/splice.c:848 [inline]
  direct_splice_actor+0x123/0x190 fs/splice.c:1020
  splice_direct_to_actor+0x366/0x970 fs/splice.c:975
  do_splice_direct+0x1da/0x2a0 fs/splice.c:1063
  do_sendfile+0x597/0xd00 fs/read_write.c:1464
  __do_sys_sendfile64 fs/read_write.c:1519 [inline]
  __se_sys_sendfile64 fs/read_write.c:1511 [inline]
  __x64_sys_sendfile64+0x15a/0x220 fs/read_write.c:1511
  do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459829
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f9691478c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000459829
RDX: 0000000020001000 RSI: 0000000000000003 RDI: 0000000000000003
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000ffff R11: 0000000000000246 R12: 00007f96914796d4
R13: 00000000004c6ff7 R14: 00000000004dc558 R15: 00000000ffffffff
rcu: rcu_preempt kthread starved for 10569 jiffies! g28777 f0x2  
RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: RCU grace-period kthread stack dump:
rcu_preempt     R  running task    29688    10      2 0x80004000
Call Trace:
  context_switch kernel/sched/core.c:3254 [inline]
  __schedule+0x755/0x15b0 kernel/sched/core.c:3921
  schedule+0xa8/0x270 kernel/sched/core.c:3985
  schedule_timeout+0x486/0xc50 kernel/time/timer.c:1807
  rcu_gp_fqs_loop kernel/rcu/tree.c:1611 [inline]
  rcu_gp_kthread+0x9b2/0x18c0 kernel/rcu/tree.c:1768
  kthread+0x361/0x430 kernel/kthread.c:255
  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply

* WARNING in xfrm_policy_inexact_list_reinsert
From: syzbot @ 2019-08-08 16:45 UTC (permalink / raw)
  To: davem, herbert, linux-kernel, netdev, steffen.klassert,
	syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    4010b622 Merge branch 'dax-fix-5.3-rc3' of git://git.kerne..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=13e2829fa00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=e397351d2615e10
dashboard link: https://syzkaller.appspot.com/bug?extid=8cc27ace5f6972910b31
compiler:       clang version 9.0.0 (/home/glider/llvm/clang  
80fee25776c2fb61e74c1ecb1a523375c2500b69)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+8cc27ace5f6972910b31@syzkaller.appspotmail.com

WARNING: CPU: 1 PID: 6756 at net/xfrm/xfrm_policy.c:877  
xfrm_policy_inexact_list_reinsert+0x625/0x6e0 net/xfrm/xfrm_policy.c:877
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 6756 Comm: syz-executor.1 Not tainted 5.3.0-rc2+ #57
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x1d8/0x2f8 lib/dump_stack.c:113
  panic+0x29b/0x7d9 kernel/panic.c:219
  __warn+0x22f/0x230 kernel/panic.c:576
  report_bug+0x190/0x290 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:179 [inline]
  do_error_trap+0xd7/0x440 arch/x86/kernel/traps.c:272
  do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:291
  invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1026
RIP: 0010:xfrm_policy_inexact_list_reinsert+0x625/0x6e0  
net/xfrm/xfrm_policy.c:877
Code: ef e8 6f 32 0f fb 4d 8b 6d 00 4c 39 6d 90 0f 85 81 fa ff ff e9 b0 00  
00 00 e8 c7 87 d4 fa 0f 0b e9 fa fa ff ff e8 bb 87 d4 fa <0f> 0b e9 75 ff  
ff ff e8 af 87 d4 fa 0f 0b eb a9 44 89 f1 80 e1 07
RSP: 0018:ffff888052caf080 EFLAGS: 00010283
RAX: ffffffff86a35975 RBX: 00000000ffffff20 RCX: 0000000000040000
RDX: ffffc9000816a000 RSI: 00000000000005cd RDI: 00000000000005ce
RBP: ffff888052caf110 R08: ffffffff86a358ac R09: ffffffff86a3514c
R10: ffff88805914c380 R11: 0000000000000002 R12: 0000000000000000
R13: ffff888092fa6aa0 R14: 000000000000007e R15: 000000000000000a
  xfrm_policy_inexact_node_reinsert net/xfrm/xfrm_policy.c:922 [inline]
  xfrm_policy_inexact_node_merge net/xfrm/xfrm_policy.c:958 [inline]
  xfrm_policy_inexact_insert_node+0x537/0xb50 net/xfrm/xfrm_policy.c:1023
  xfrm_policy_inexact_alloc_chain+0x62b/0xbd0 net/xfrm/xfrm_policy.c:1139
  xfrm_policy_inexact_insert+0xe8/0x1540 net/xfrm/xfrm_policy.c:1182
  xfrm_policy_insert+0xdf/0xce0 net/xfrm/xfrm_policy.c:1574
  xfrm_add_policy+0x4cf/0x9b0 net/xfrm/xfrm_user.c:1670
  xfrm_user_rcv_msg+0x46b/0x720 net/xfrm/xfrm_user.c:2676
  netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2477
  xfrm_netlink_rcv+0x74/0x90 net/xfrm/xfrm_user.c:2684
  netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
  netlink_unicast+0x809/0x9a0 net/netlink/af_netlink.c:1328
  netlink_sendmsg+0xa70/0xd30 net/netlink/af_netlink.c:1917
  sock_sendmsg_nosec net/socket.c:637 [inline]
  sock_sendmsg net/socket.c:657 [inline]
  ___sys_sendmsg+0x66b/0x9a0 net/socket.c:2311
  __sys_sendmsg net/socket.c:2356 [inline]
  __do_sys_sendmsg net/socket.c:2365 [inline]
  __se_sys_sendmsg net/socket.c:2363 [inline]
  __x64_sys_sendmsg+0x1cf/0x290 net/socket.c:2363
  do_syscall_64+0xfe/0x140 arch/x86/entry/common.c:296
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459829
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f16cc51ec78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000459829
RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f16cc51f6d4
R13: 00000000004c776b R14: 00000000004dceb8 R15: 00000000ffffffff
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply

* KASAN: use-after-free Read in tomoyo_socket_sendmsg_permission
From: syzbot @ 2019-08-08 16:45 UTC (permalink / raw)
  To: jmorris, linux-kernel, linux-security-module, netdev, serge,
	syzkaller-bugs, takedakn

Hello,

syzbot found the following crash on:

HEAD commit:    107e47cc vrf: make sure skb->data contains ip header to ma..
git tree:       net
console output: https://syzkaller.appspot.com/x/log.txt?x=139506d8600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4dba67bf8b8c9ad7
dashboard link: https://syzkaller.appspot.com/bug?extid=b91501546ab4037f685f
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b91501546ab4037f685f@syzkaller.appspotmail.com

==================================================================
BUG: KASAN: use-after-free in tomoyo_sock_family  
security/tomoyo/network.c:632 [inline]
BUG: KASAN: use-after-free in tomoyo_socket_sendmsg_permission+0x37e/0x3cb  
security/tomoyo/network.c:762
Read of size 2 at addr ffff8880a066f410 by task syz-executor.3/2063

CPU: 1 PID: 2063 Comm: syz-executor.3 Not tainted 5.2.0+ #97
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  print_address_description.cold+0xd4/0x306 mm/kasan/report.c:351
  __kasan_report.cold+0x1b/0x36 mm/kasan/report.c:482
  kasan_report+0x12/0x17 mm/kasan/common.c:612
  __asan_report_load2_noabort+0x14/0x20 mm/kasan/generic_report.c:130
  tomoyo_sock_family security/tomoyo/network.c:632 [inline]
  tomoyo_socket_sendmsg_permission+0x37e/0x3cb security/tomoyo/network.c:762
  tomoyo_socket_sendmsg+0x26/0x30 security/tomoyo/tomoyo.c:486
  security_socket_sendmsg+0x77/0xc0 security/security.c:1973
  sock_sendmsg+0x45/0x130 net/socket.c:654
  __sys_sendto+0x262/0x380 net/socket.c:1952
  __do_sys_sendto net/socket.c:1964 [inline]
  __se_sys_sendto net/socket.c:1960 [inline]
  __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1960
  do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459829
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8413a51c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000459829
RDX: 00000000000000ed RSI: 0000000020000280 RDI: 0000000000000005
RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8413a526d4
R13: 00000000004c77f1 R14: 00000000004dcfc0 R15: 00000000ffffffff

Allocated by task 0:
  save_stack+0x23/0x90 mm/kasan/common.c:69
  set_track mm/kasan/common.c:77 [inline]
  __kasan_kmalloc mm/kasan/common.c:487 [inline]
  __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:460
  kasan_kmalloc+0x9/0x10 mm/kasan/common.c:501
  __do_kmalloc mm/slab.c:3655 [inline]
  __kmalloc+0x163/0x770 mm/slab.c:3664
  kmalloc include/linux/slab.h:557 [inline]
  sk_prot_alloc+0x23a/0x310 net/core/sock.c:1603
  sk_alloc+0x39/0xf70 net/core/sock.c:1657
  nr_make_new net/netrom/af_netrom.c:476 [inline]
  nr_rx_frame+0x733/0x1e73 net/netrom/af_netrom.c:959
  nr_loopback_timer+0x7b/0x170 net/netrom/nr_loopback.c:59
  call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1322
  expire_timers kernel/time/timer.c:1366 [inline]
  __run_timers kernel/time/timer.c:1685 [inline]
  __run_timers kernel/time/timer.c:1653 [inline]
  run_timer_softirq+0x697/0x17a0 kernel/time/timer.c:1698
  __do_softirq+0x262/0x98c kernel/softirq.c:292

Freed by task 2063:
  save_stack+0x23/0x90 mm/kasan/common.c:69
  set_track mm/kasan/common.c:77 [inline]
  __kasan_slab_free+0x102/0x150 mm/kasan/common.c:449
  kasan_slab_free+0xe/0x10 mm/kasan/common.c:457
  __cache_free mm/slab.c:3425 [inline]
  kfree+0x10a/0x2c0 mm/slab.c:3756
  sk_prot_free net/core/sock.c:1640 [inline]
  __sk_destruct+0x4f7/0x6e0 net/core/sock.c:1726
  sk_destruct+0x86/0xa0 net/core/sock.c:1734
  __sk_free+0xfb/0x360 net/core/sock.c:1745
  sk_free+0x42/0x50 net/core/sock.c:1756
  sock_put include/net/sock.h:1725 [inline]
  sock_efree+0x61/0x80 net/core/sock.c:2042
  skb_release_head_state+0xeb/0x250 net/core/skbuff.c:652
  skb_release_all+0x16/0x60 net/core/skbuff.c:663
  __kfree_skb net/core/skbuff.c:679 [inline]
  kfree_skb net/core/skbuff.c:697 [inline]
  kfree_skb+0x101/0x3c0 net/core/skbuff.c:691
  nr_accept+0x56e/0x700 net/netrom/af_netrom.c:819
  __sys_accept4+0x34e/0x6a0 net/socket.c:1754
  __do_sys_accept net/socket.c:1795 [inline]
  __se_sys_accept net/socket.c:1792 [inline]
  __x64_sys_accept+0x75/0xb0 net/socket.c:1792
  do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8880a066f400
  which belongs to the cache kmalloc-2k of size 2048
The buggy address is located 16 bytes inside of
  2048-byte region [ffff8880a066f400, ffff8880a066fc00)
The buggy address belongs to the page:
page:ffffea0002819b80 refcount:1 mapcount:0 mapping:ffff8880aa400e00  
index:0x0 compound_mapcount: 0
flags: 0x1fffc0000010200(slab|head)
raw: 01fffc0000010200 ffffea0001849388 ffffea000235a788 ffff8880aa400e00
raw: 0000000000000000 ffff8880a066e300 0000000100000003 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff8880a066f300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8880a066f380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> ffff8880a066f400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                          ^
  ffff8880a066f480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff8880a066f500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 1 PID: 2063 at lib/refcount.c:105 refcount_add_checked  
lib/refcount.c:105 [inline]
WARNING: CPU: 1 PID: 2063 at lib/refcount.c:105  
refcount_add_checked+0x6b/0x70 lib/refcount.c:103
Modules linked in:
CPU: 1 PID: 2063 Comm: syz-executor.3 Tainted: G    B             5.2.0+ #97
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:refcount_add_checked lib/refcount.c:105 [inline]
RIP: 0010:refcount_add_checked+0x6b/0x70 lib/refcount.c:103
Code: 1d e7 77 64 06 31 ff 89 de e8 71 dc 35 fe 84 db 75 db e8 28 db 35 fe  
48 c7 c7 20 08 c6 87 c6 05 c7 77 64 06 01 e8 1d 43 07 fe <0f> 0b eb bf 90  
48 b8 00 00 00 00 00 fc ff df 55 48 89 e5 41 54 49
RSP: 0018:ffff88805d55f9f8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000040000 RSI: ffffffff815c3a26 RDI: ffffed100baabf31
RBP: ffff88805d55fa10 R08: ffff888098124480 R09: fffffbfff1349ef0
R10: fffffbfff1349eef R11: ffffffff89a4f77f R12: 0000000000000500
R13: ffff8880a066f644 R14: ffff88808f271e60 R15: 00000000000000ed
FS:  00007f8413a52700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000075c000 CR3: 0000000097173000 CR4: 00000000001406e0
Call Trace:
  skb_set_owner_w+0x216/0x320 net/core/sock.c:1991
  sock_alloc_send_pskb+0x7c9/0x920 net/core/sock.c:2226
  sock_alloc_send_skb+0x32/0x40 net/core/sock.c:2240
  nr_sendmsg+0x557/0xb00 net/netrom/af_netrom.c:1094
  sock_sendmsg_nosec net/socket.c:637 [inline]
  sock_sendmsg+0xd7/0x130 net/socket.c:657
  __sys_sendto+0x262/0x380 net/socket.c:1952
  __do_sys_sendto net/socket.c:1964 [inline]
  __se_sys_sendto net/socket.c:1960 [inline]
  __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1960
  do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459829
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8413a51c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 0000000000459829
RDX: 00000000000000ed RSI: 0000000020000280 RDI: 0000000000000005
RBP: 000000000075bfc8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8413a526d4
R13: 00000000004c77f1 R14: 00000000004dcfc0 R15: 00000000ffffffff
irq event stamp: 344
hardirqs last  enabled at (343): [<ffffffff8100a206>]  
do_syscall_64+0x26/0x6a0 arch/x86/entry/common.c:283
hardirqs last disabled at (344): [<ffffffff87391ddf>]  
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
hardirqs last disabled at (344): [<ffffffff87391ddf>]  
_raw_spin_lock_irqsave+0x6f/0xcd kernel/locking/spinlock.c:159
softirqs last  enabled at (314): [<ffffffff858fb1b6>] spin_unlock_bh  
include/linux/spinlock.h:383 [inline]
softirqs last  enabled at (314): [<ffffffff858fb1b6>]  
release_sock+0x156/0x1c0 net/core/sock.c:2945
softirqs last disabled at (312): [<ffffffff858fb080>] spin_lock_bh  
include/linux/spinlock.h:343 [inline]
softirqs last disabled at (312): [<ffffffff858fb080>]  
release_sock+0x20/0x1c0 net/core/sock.c:2932
---[ end trace 2fe47c842e5d598a ]---


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply

* Re: [PATCH] isdn: hysdn: Fix error spaces around '*'
From: Joe Perches @ 2019-08-08 16:48 UTC (permalink / raw)
  To: Greg KH, Karsten Keil
  Cc: Stephen Hemminger, Jose Carlos Cazarin Filho, isdn, devel, netdev,
	linux-kernel
In-Reply-To: <20190808164020.GA9453@kroah.com>

On Thu, 2019-08-08 at 18:40 +0200, Greg KH wrote:
> On Thu, Aug 08, 2019 at 06:39:05PM +0200, Greg KH wrote:
> > On Fri, Aug 02, 2019 at 03:05:05PM -0700, Joe Perches wrote:
> > > On Fri, 2019-08-02 at 14:55 -0700, Stephen Hemminger wrote:
> > > > On Fri,  2 Aug 2019 19:56:02 +0000
> > > > Jose Carlos Cazarin Filho <joseespiriki@gmail.com> wrote:
> > > > 
> > > > > Fix checkpath error:
> > > > > CHECK: spaces preferred around that '*' (ctx:WxV)
> > > > > +extern hysdn_card *card_root;        /* pointer to first card */
> > > > > 
> > > > > Signed-off-by: Jose Carlos Cazarin Filho <joseespiriki@gmail.com>
> > > > 
> > > > Read the TODO, these drivers are scheduled for removal, so changes
> > > > are not helpful at this time.
> > > 
> > > Maybe better to mark the MAINTAINERS entry obsolete so
> > > checkpatch bleats a message about unnecessary changes.
> > > ---
> > >  MAINTAINERS | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index 30bf852e6d6b..b5df91032574 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -8628,7 +8628,7 @@ M:	Karsten Keil <isdn@linux-pingi.de>
> > >  L:	isdn4linux@listserv.isdn4linux.de (subscribers-only)
> > >  L:	netdev@vger.kernel.org
> > >  W:	http://www.isdn4linux.de
> > > -S:	Odd Fixes
> > > +S:	Obsolete
> > >  F:	Documentation/isdn/
> > >  F:	drivers/isdn/capi/
> > >  F:	drivers/staging/isdn/
> > > 
> > 
> > Good idea, will take this patch now, thanks.
> 
> Can you resend this with a s-o-b so I can apply it?
> 
> thanks,

Hey Greg.  It was just an idea and an example.
I'm sure you can figure out if you want it.
No need for my SOB really.

btw: Karsten hasn't acked a patch or been active
in 3+ years.  Maybe he should go into CREDITS.




^ permalink raw reply

* Re: [v3,3/4] tools: bpftool: add bash-completion for net attach/detach
From: Quentin Monnet @ 2019-08-08 16:48 UTC (permalink / raw)
  To: Daniel T. Lee, Daniel Borkmann, Alexei Starovoitov; +Cc: netdev
In-Reply-To: <20190807022509.4214-4-danieltimlee@gmail.com>

2019-08-07 11:25 UTC+0900 ~ Daniel T. Lee <danieltimlee@gmail.com>
> This commit adds bash-completion for new "net attach/detach"
> subcommand for attaching XDP program on interface.
> 
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> ---
>  tools/bpf/bpftool/bash-completion/bpftool | 64 +++++++++++++++++++----
>  1 file changed, 55 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
> index c8f42e1fcbc9..1d81cb09d478 100644
> --- a/tools/bpf/bpftool/bash-completion/bpftool
> +++ b/tools/bpf/bpftool/bash-completion/bpftool
> @@ -201,6 +201,10 @@ _bpftool()
>              _bpftool_get_prog_tags
>              return 0
>              ;;
> +        dev)
> +            _sysfs_get_netdevs
> +            return 0
> +            ;;

Makes sense to have this for "dev", thanks! But it seems you missed one
place where it was used, for "bpftool feature probe" (We have "[[ $prev
== "dev" ]] && _sysfs_get_netdevs && return 0"). Could you also remove
that one please?

Other than this looks good, thanks:

Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>

^ permalink raw reply

* Re: [v3,4/4] tools: bpftool: add documentation for net attach/detach
From: Quentin Monnet @ 2019-08-08 16:48 UTC (permalink / raw)
  To: Daniel T. Lee, Daniel Borkmann, Alexei Starovoitov; +Cc: netdev
In-Reply-To: <20190807022509.4214-5-danieltimlee@gmail.com>

2019-08-07 11:25 UTC+0900 ~ Daniel T. Lee <danieltimlee@gmail.com>
> Since, new sub-command 'net attach/detach' has been added for
> attaching XDP program on interface,
> this commit documents usage and sample output of `net attach/detach`.
> 
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
> ---
>  .../bpf/bpftool/Documentation/bpftool-net.rst | 51 +++++++++++++++++--
>  1 file changed, 48 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/Documentation/bpftool-net.rst b/tools/bpf/bpftool/Documentation/bpftool-net.rst
> index d8e5237a2085..4ad1a380e186 100644
> --- a/tools/bpf/bpftool/Documentation/bpftool-net.rst
> +++ b/tools/bpf/bpftool/Documentation/bpftool-net.rst
> @@ -15,17 +15,22 @@ SYNOPSIS
>  	*OPTIONS* := { [{ **-j** | **--json** }] [{ **-p** | **--pretty** }] }
>  
>  	*COMMANDS* :=
> -	{ **show** | **list** } [ **dev** name ] | **help**
> +	{ **show** | **list** | **attach** | **detach** | **help** }
>  
>  NET COMMANDS
>  ============
>  
> -|	**bpftool** **net { show | list } [ dev name ]**
> +|	**bpftool** **net { show | list }** [ **dev** *name* ]
> +|	**bpftool** **net attach** *ATTACH_TYPE* *PROG* **dev** *name* [ **overwrite** ]
> +|	**bpftool** **net detach** *ATTACH_TYPE* **dev** *name*

Nit: Could we have "name" in capital letters (everywhere in the file),
to make this file consistent with the formatting used for
bpftool-prog.rst and bpftool-map.rst?

>  |	**bpftool** **net help**
> +|
> +|	*PROG* := { **id** *PROG_ID* | **pinned** *FILE* | **tag** *PROG_TAG* }
> +|	*ATTACH_TYPE* := { **xdp** | **xdpgeneric** | **xdpdrv** | **xdpoffload** }
>  
>  DESCRIPTION
>  ===========
> -	**bpftool net { show | list } [ dev name ]**
> +	**bpftool net { show | list }** [ **dev** *name* ]
>                    List bpf program attachments in the kernel networking subsystem.
>  
>                    Currently, only device driver xdp attachments and tc filter
> @@ -47,6 +52,18 @@ DESCRIPTION
>                    all bpf programs attached to non clsact qdiscs, and finally all
>                    bpf programs attached to root and clsact qdisc.
>  
> +	**bpftool** **net attach** *ATTACH_TYPE* *PROG* **dev** *name* [ **overwrite** ]
> +                  Attach bpf program *PROG* to network interface *name* with
> +                  type specified by *ATTACH_TYPE*. Previously attached bpf program
> +                  can be replaced by the command used with **overwrite** option.
> +                  Currently, *ATTACH_TYPE* only contains XDP programs.

Other nit: "ATTACH_TYPE only contains XDP programs" sounds odd to me.
Could we maybe phrase this something like: "Currently, only XDP-related
modes are supported for ATTACH_TYPE"?

Also, could you please provide a brief description of the different
attach types? In particular, explaining what "xdp" alone stands for
might be useful.

Thanks,
Quentin

> +
> +	**bpftool** **net detach** *ATTACH_TYPE* **dev** *name*
> +                  Detach bpf program attached to network interface *name* with
> +                  type specified by *ATTACH_TYPE*. To detach bpf program, same
> +                  *ATTACH_TYPE* previously used for attach must be specified.
> +                  Currently, *ATTACH_TYPE* only contains XDP programs.

^ permalink raw reply

* Re: memory leak in kobject_set_name_vargs (2)
From: Dmitry Vyukov @ 2019-08-08 17:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: syzbot, Catalin Marinas, David Miller, Herbert Xu,
	Alexey Kuznetsov, Kalle Valo, Linux List Kernel Mailing, Linux-MM,
	luciano.coelho, Netdev, Steffen Klassert, syzkaller-bugs,
	Hideaki YOSHIFUJI
In-Reply-To: <CAHk-=why-PdP_HNbskRADMp1bnj+FwUDYpUZSYoNLNHMRPtoVA@mail.gmail.com>

On Sat, Jul 27, 2019 at 4:29 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, Jul 26, 2019 at 4:26 PM syzbot
> <syzbot+ad8ca40ecd77896d51e2@syzkaller.appspotmail.com> wrote:
> >
> > syzbot has bisected this bug to:
> >
> > commit 0e034f5c4bc408c943f9c4a06244415d75d7108c
> > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > Date:   Wed May 18 18:51:25 2016 +0000
> >
> >      iwlwifi: fix mis-merge that breaks the driver
>
> While this bisection looks more likely than the other syzbot entry
> that bisected to a version change, I don't think it is correct eitger.
>
> The bisection ended up doing a lot of "git bisect skip" because of the
>
>     undefined reference to `nf_nat_icmp_reply_translation'
>
> issue. Also, the memory leak doesn't seem to be entirely reliable:
> when the bisect does 10 runs to verify that some test kernel is bad,
> there are a couple of cases where only one or two of the ten run
> failed.
>
> Which makes me wonder if one or two of the "everything OK" runs were
> actually buggy, but just happened to have all ten pass...


I agree this is unrelated.

Bisection of memory leaks is now turned off completely after a
week-long experiment (details:
https://groups.google.com/d/msg/syzkaller/sR8aAXaWEF4/k34t365JBgAJ)

FWIW 'git bisect skip' is not a problem in itself. If the bisection
will end up being inconclusive due to this, then syzbot will not
attribute it to any commit (won't send an email at all), it will just
show the commit range in the web UI for the bug.

Low probability wasn't the root cause as well, first runs ended with
10/10 precision:

bisecting cause commit starting from 3bfe1fc46794631366faa3ef075e1b0ff7ba120a
building syzkaller on 1656845f45f284c574eb4f8bfe85dd7916a47a3a
testing commit 3bfe1fc46794631366faa3ef075e1b0ff7ba120a with gcc (GCC) 8.1.0
all runs: crashed: memory leak in kobject_set_name_vargs
testing release v5.2
testing commit 0ecfebd2b52404ae0c54a878c872bb93363ada36 with gcc (GCC) 8.1.0
all runs: crashed: memory leak in kobject_set_name_vargs
testing release v5.1
testing commit e93c9c99a629c61837d5a7fc2120cd2b6c70dbdd with gcc (GCC) 8.1.0
all runs: crashed: memory leak in kobject_set_name_vargs
testing release v5.0
testing commit 1c163f4c7b3f621efff9b28a47abb36f7378d783 with gcc (GCC) 8.1.0
all runs: crashed: memory leak in kobject_set_name_vargs
testing release v4.20
testing commit 8fe28cb58bcb235034b64cbbb7550a8a43fd88be with gcc (GCC) 8.1.0
all runs: crashed: memory leak in kobject_set_name_vargs
testing release v4.19
testing commit 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d with gcc (GCC) 8.1.0
all runs: crashed: memory leak in kobject_set_name_vargs

But it was distracted by other bugs and other memory leaks (which
reproduce with lower probability) and then the process went random
(which confirms the bisection analysis results).

^ permalink raw reply

* Re: [bpf-next PATCH 0/3] bpf: improvements to xdp_fwd sample
From: Zvi Effron @ 2019-08-08 17:21 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Xdp, Anton Protopopov, dsahern, Toke Høiland-Jørgensen,
	netdev@vger.kernel.org
In-Reply-To: <20190808112955.5a29c9e1@carbon>

On Thu, Aug 8, 2019 at 4:30 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
> Another improvement from Toke, is that the bpf_redirect_map() helper,
> now also check if the redirect index is valid in the map.  If not, then
> it returns another value than XDP_REDIRECT.  You can choose the
> alternative return value yourself, via "flags" e.g. XDP_PASS.  Thus,
> you don't even need to check/validate devmap in your BPF-code, as it is
> part of the bpf_redirect_map() call now.
>
>  action = bpf_redirect_map(&map, &index, flags_as_xdp_value)
>
> The default flags used in most programs today is 0, which maps to
> XDP_ABORTED.  This is sort of a small UAPI change, but for the better.
> As today, the packet is dropped later, only diagnose/seen via
> tracepoint xdp:xdp_redirect_map_err.
>
That's great to hear, as I'd thought it already worked that way (minus
the flags to specify what the alternate action is).

Thanks for explaining!

--Zvi

^ permalink raw reply

* general protection fault in perf_tp_event_match (2)
From: syzbot @ 2019-08-08 17:24 UTC (permalink / raw)
  To: acme, alexander.shishkin, ast, bpf, daniel, jolsa, kafai,
	linux-kernel, mingo, namhyung, netdev, peterz, songliubraving,
	syzkaller-bugs, yhs

Hello,

syzbot found the following crash on:

HEAD commit:    1e78030e Merge tag 'mmc-v5.3-rc1' of git://git.kernel.org/..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1011831a600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4c7b914a2680c9c6
dashboard link: https://syzkaller.appspot.com/bug?extid=076ba900c4a9a0f67aba
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+076ba900c4a9a0f67aba@syzkaller.appspotmail.com

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 22070 Comm: syz-executor.3 Not tainted 5.3.0-rc2+ #86
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:perf_tp_event_match+0x31/0x260 kernel/events/core.c:8560
Code: 89 f6 41 55 49 89 d5 41 54 53 48 89 fb e8 b7 0e ea ff 48 8d bb d0 01  
00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84  
c0 74 08 3c 03 0f 8e cc 01 00 00 44 8b a3 d0 01 00
RSP: 0018:ffff88804ffa7790 EFLAGS: 00010007
RAX: dffffc0000000000 RBX: 00000000ffffff9f RCX: ffffffff818bcb73
RDX: 000000002000002d RSI: ffffffff818890b9 RDI: 000000010000016f
RBP: ffff88804ffa77b0 R08: ffff8880531ba640 R09: ffffed100a6374c9
R10: ffffed100a6374c8 R11: ffff8880531ba647 R12: ffff8880ae830860
R13: ffff8880ae830860 R14: ffff88804ffa7880 R15: dffffc0000000000
FS:  00005555556d7940(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000738008 CR3: 000000004cad5000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  perf_tp_event+0x1ea/0x730 kernel/events/core.c:8611
  perf_trace_run_bpf_submit+0x131/0x190 kernel/events/core.c:8586
  perf_trace_sched_wakeup_template+0x42d/0x5d0  
include/trace/events/sched.h:57
  trace_sched_wakeup_new include/trace/events/sched.h:103 [inline]
  wake_up_new_task+0x70f/0xbd0 kernel/sched/core.c:2848
  _do_fork+0x26c/0xfa0 kernel/fork.c:2393
  __do_sys_clone kernel/fork.c:2524 [inline]
  __se_sys_clone kernel/fork.c:2505 [inline]
  __x64_sys_clone+0x18d/0x250 kernel/fork.c:2505
  do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457dfa
Code: f7 d8 64 89 04 25 d4 02 00 00 64 4c 8b 0c 25 10 00 00 00 31 d2 4d 8d  
91 d0 02 00 00 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff  
ff 0f 87 f5 00 00 00 85 c0 41 89 c5 0f 85 fc 00 00
RSP: 002b:00007ffcf0b1c640 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
RAX: ffffffffffffffda RBX: 00007ffcf0b1c640 RCX: 0000000000457dfa
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
RBP: 00007ffcf0b1c680 R08: 0000000000000001 R09: 00005555556d7940
R10: 00005555556d7c10 R11: 0000000000000246 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcf0b1c6d0
Modules linked in:
---[ end trace 8f4efeb0ada52ec1 ]---
RIP: 0010:perf_tp_event_match+0x31/0x260 kernel/events/core.c:8560
Code: 89 f6 41 55 49 89 d5 41 54 53 48 89 fb e8 b7 0e ea ff 48 8d bb d0 01  
00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84  
c0 74 08 3c 03 0f 8e cc 01 00 00 44 8b a3 d0 01 00
RSP: 0018:ffff88804ffa7790 EFLAGS: 00010007
RAX: dffffc0000000000 RBX: 00000000ffffff9f RCX: ffffffff818bcb73
RDX: 000000002000002d RSI: ffffffff818890b9 RDI: 000000010000016f
RBP: ffff88804ffa77b0 R08: ffff8880531ba640 R09: ffffed100a6374c9
R10: ffffed100a6374c8 R11: ffff8880531ba647 R12: ffff8880ae830860
R13: ffff8880ae830860 R14: ffff88804ffa7880 R15: dffffc0000000000
FS:  00005555556d7940(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000738008 CR3: 000000004cad5000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply

* Re: [PATCH net v3] net/tls: prevent skb_orphan() from leaking TLS plain text with offload
From: Jakub Kicinski @ 2019-08-08 17:31 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: John Fastabend, David Miller, Network Development, davejwatson,
	borisp, aviadye, Daniel Borkmann, Eric Dumazet,
	Alexei Starovoitov, oss-drivers
In-Reply-To: <CA+FuTSc7H6X+rRnxZ5NcFiNy+pw1YCONiUr+K6g800DXzT_0EA@mail.gmail.com>

On Thu, 8 Aug 2019 11:59:18 -0400, Willem de Bruijn wrote:
> > diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> > index 7c0b2b778703..43922d86e510 100644
> > --- a/net/tls/tls_device.c
> > +++ b/net/tls/tls_device.c
> > @@ -373,9 +373,9 @@ static int tls_push_data(struct sock *sk,
> >         struct tls_context *tls_ctx = tls_get_ctx(sk);
> >         struct tls_prot_info *prot = &tls_ctx->prot_info;
> >         struct tls_offload_context_tx *ctx = tls_offload_ctx_tx(tls_ctx);
> > -       int tls_push_record_flags = flags | MSG_SENDPAGE_NOTLAST;
> >         int more = flags & (MSG_SENDPAGE_NOTLAST | MSG_MORE);
> >         struct tls_record_info *record = ctx->open_record;
> > +       int tls_push_record_flags;
> >         struct page_frag *pfrag;
> >         size_t orig_size = size;
> >         u32 max_open_record_len;
> > @@ -390,6 +390,9 @@ static int tls_push_data(struct sock *sk,
> >         if (sk->sk_err)
> >                 return -sk->sk_err;
> >
> > +       flags |= MSG_SENDPAGE_DECRYPTED;
> > +       tls_push_record_flags = flags | MSG_SENDPAGE_NOTLAST;
> > +  
> 
> Without being too familiar with this code: can this plaintext flag be
> set once, closer to the call to do_tcp_sendpages, in tls_push_sg?
> 
> Instead of two locations with multiple non-trivial codepaths between
> them and do_tcp_sendpages.
> 
> Or are there paths where the flag is not set? Which I imagine would
> imply already passing s/w encrypted ciphertext.

tls_push_sg() is shared with sw path which doesn't have the device
validation. 

Device TLS can read tls_push_sg() via tls_push_partial_record() and
tls_push_data(). tls_push_data() is addressed directly here,
tls_push_partial_record() is again shared with SW path, so we have to
address it by adding the flag in tls_device_write_space().

The alternative is to add a conditional to tls_push_sg() which is 
a little less nice from performance and layering PoV but it is a lot
simpler..

Should I change?

^ permalink raw reply

* Re: [PATCH v2 bpf-next] btf: expose BTF info through sysfs
From: Andrii Nakryiko @ 2019-08-08 17:47 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andrii Nakryiko, bpf@vger.kernel.org, netdev@vger.kernel.org,
	Alexei Starovoitov, daniel@iogearbox.net, Kernel Team,
	Masahiro Yamada, Arnaldo Carvalho de Melo, Jiri Olsa,
	Sam Ravnborg
In-Reply-To: <89a6e282-0250-4264-128d-469be99073e9@fb.com>

On Wed, Aug 7, 2019 at 9:24 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 8/7/19 5:32 PM, Andrii Nakryiko wrote:
> > Make .BTF section allocated and expose its contents through sysfs.
> >
> > /sys/kernel/btf directory is created to contain all the BTFs present
> > inside kernel. Currently there is only kernel's main BTF, represented as
> > /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported,
> > each module will expose its BTF as /sys/kernel/btf/<module-name> file.
> >
> > Current approach relies on a few pieces coming together:
> > 1. pahole is used to take almost final vmlinux image (modulo .BTF and
> >     kallsyms) and generate .BTF section by converting DWARF info into
> >     BTF. This section is not allocated and not mapped to any segment,
> >     though, so is not yet accessible from inside kernel at runtime.
> > 2. objcopy dumps .BTF contents into binary file and subsequently
> >     convert binary file into linkable object file with automatically
> >     generated symbols _binary__btf_kernel_bin_start and
> >     _binary__btf_kernel_bin_end, pointing to start and end, respectively,
> >     of BTF raw data.
> > 3. final vmlinux image is generated by linking this object file (and
> >     kallsyms, if necessary). sysfs_btf.c then creates
> >     /sys/kernel/btf/kernel file and exposes embedded BTF contents through
> >     it. This allows, e.g., libbpf and bpftool access BTF info at
> >     well-known location, without resorting to searching for vmlinux image
> >     on disk (location of which is not standardized and vmlinux image
> >     might not be even available in some scenarios, e.g., inside qemu
> >     during testing).
> >
> > Alternative approach using .incbin assembler directive to embed BTF
> > contents directly was attempted but didn't work, because sysfs_proc.o is
> > not re-compiled during link-vmlinux.sh stage. This is required, though,
> > to update embedded BTF data (initially empty data is embedded, then
> > pahole generates BTF info and we need to regenerate sysfs_btf.o with
> > updated contents, but it's too late at that point).
> >
> > If BTF couldn't be generated due to missing or too old pahole,
> > sysfs_btf.c handles that gracefully by detecting that
> > _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating
> > /sys/kernel/btf at all.
> >
> > v1->v2:
> > - allow kallsyms stage to re-use vmlinux generated by gen_btf();
> >
> > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Cc: Jiri Olsa <jolsa@kernel.org>
> > Cc: Sam Ravnborg <sam@ravnborg.org>
> > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > ---

[...]

> > +
> > +     # dump .BTF section into raw binary file to link with final vmlinux
> > +     bin_arch=$(${OBJDUMP} -f ${1} | grep architecture | \
> > +             cut -d, -f1 | cut -d' ' -f2)
> > +     ${OBJCOPY} --dump-section .BTF=.btf.kernel.bin ${1} 2>/dev/null
> > +     ${OBJCOPY} -I binary -O ${CONFIG_OUTPUT_FORMAT} -B ${bin_arch} \
> > +             --rename-section .data=.BTF .btf.kernel.bin ${2}
>
> Currently, the binary size on my config is about 2.6MB. Do you think
> we could or need to compress it to make it smaller? I tried gzip
> and the compressed size is 0.9MB.

I'd really prefer to keep it uncompressed for two main reasons:
- by having this in uncompressed form, kernel itself can use this BTF
data from inside with almost no additional memory (except maybe for
index from type ID to actual location of type info), which opens up a
lot of new and interesting opportunities, like kernel returning its
own BTF and BTF type ID for various types (think about driver metdata,
all those special maps, etc).
- if we are doing compression, now we need to decide on best
compression format, teach it libbpf (which will make libbpf also
bigger and depending on extra libraries), etc.

So basically, in exchange of 1-1.5MB extra memory we get a bunch of
new problems we normally don't have to deal with.

>
> >   }
> >
> >   # Create ${2} .o file with all symbols from the ${1} object file
> > @@ -153,6 +164,7 @@ sortextable()
> >   # Delete output files in case of error
> >   cleanup()
> >   {

[...]

^ permalink raw reply

* Re: [PATCH v5 bpf-next] BPF: helpers: New helper to obtain namespace data from current task
From: Carlos Antonio Neira Bustos @ 2019-08-08 17:48 UTC (permalink / raw)
  To: Yonghong Song
  Cc: netdev@vger.kernel.org, ebiederm@xmission.com, brouer@redhat.com,
	quentin.monnet@netronome.com
In-Reply-To: <96c7ea2e-7acf-e81a-61dc-a4d4562c736a@fb.com>

Yonghong,

I have modified the patch following your feedback. 
Let me know if I'm missing something.

Bests

From 70f8d5584700c9cfc82c006901d8ee9595c53f15 Mon Sep 17 00:00:00 2001
From: Carlos <cneirabustos@gmail.com>
Date: Wed, 7 Aug 2019 20:04:30 -0400
Subject: [PATCH] [PATCH v6 bpf-next] BPF: New helper to obtain namespace data 
 from current task

This helper obtains the active namespace from current and returns pid, tgid,
device and namespace id as seen from that namespace, allowing to instrument
a process inside a container.
Device is read from /proc/self/ns/pid, as in the future it's possible that
different pid_ns files may belong to different devices, according
to the discussion between Eric Biederman and Yonghong in 2017 linux plumbers
conference.
Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
scripts but this helper returns the pid as seen by the root namespace which is
fine when a bcc script is not executed inside a container.
When the process of interest is inside a container, pid filtering will not work
if bpf_get_current_pid_tgid() is used. This helper addresses this limitation
returning the pid as it's seen by the current namespace where the script is
executing.

This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
used to do pid filtering even inside a container.

For example a bcc script using bpf_get_current_pid_tgid() (tools/funccount.py):

        u32 pid = bpf_get_current_pid_tgid() >> 32;
        if (pid != <pid_arg_passed_in>)
                return 0;
Could be modified to use bpf_get_current_pidns_info() as follows:

        struct bpf_pidns pidns;
        bpf_get_current_pidns_info(&pidns, sizeof(struct bpf_pidns));
        u32 pid = pidns.tgid;
        u32 nsid = pidns.nsid;
        if ((pid != <pid_arg_passed_in>) && (nsid != <nsid_arg_passed_in>))
                return 0;

To find out the name PID namespace id of a process, you could use this command:

$ ps -h -o pidns -p <pid_of_interest>

Or this other command:

$ ls -Li /proc/<pid_of_interest>/ns/pid

Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
---
 fs/internal.h                                      |   2 -
 fs/namei.c                                         |   1 -
 include/linux/bpf.h                                |   1 +
 include/linux/namei.h                              |   4 +
 include/uapi/linux/bpf.h                           |  27 +++-
 kernel/bpf/core.c                                  |   1 +
 kernel/bpf/helpers.c                               |  64 ++++++++++
 kernel/trace/bpf_trace.c                           |   2 +
 samples/bpf/Makefile                               |   3 +
 samples/bpf/trace_ns_info_user.c                   |  35 ++++++
 samples/bpf/trace_ns_info_user_kern.c              |  44 +++++++
 tools/include/uapi/linux/bpf.h                     |  27 +++-
 tools/testing/selftests/bpf/Makefile               |   2 +-
 tools/testing/selftests/bpf/bpf_helpers.h          |   3 +
 .../testing/selftests/bpf/progs/test_pidns_kern.c  |  51 ++++++++
 tools/testing/selftests/bpf/test_pidns.c           | 138 +++++++++++++++++++++
 16 files changed, 399 insertions(+), 6 deletions(-)
 create mode 100644 samples/bpf/trace_ns_info_user.c
 create mode 100644 samples/bpf/trace_ns_info_user_kern.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
 create mode 100644 tools/testing/selftests/bpf/test_pidns.c

diff --git a/fs/internal.h b/fs/internal.h
index 315fcd8d237c..6647e15dd419 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -59,8 +59,6 @@ extern int finish_clean_context(struct fs_context *fc);
 /*
  * namei.c
  */
-extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
-			   struct path *path, struct path *root);
 extern int user_path_mountpoint_at(int, const char __user *, unsigned int, struct path *);
 extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
 			   const char *, unsigned int, struct path *);
diff --git a/fs/namei.c b/fs/namei.c
index 209c51a5226c..a89fc72a4a10 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -19,7 +19,6 @@
 #include <linux/export.h>
 #include <linux/kernel.h>
 #include <linux/slab.h>
-#include <linux/fs.h>
 #include <linux/namei.h>
 #include <linux/pagemap.h>
 #include <linux/fsnotify.h>
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f9a506147c8a..e4adf5e05afd 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
 extern const struct bpf_func_proto bpf_strtol_proto;
 extern const struct bpf_func_proto bpf_strtoul_proto;
 extern const struct bpf_func_proto bpf_tcp_sock_proto;
+extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
 
 /* Shared helpers among cBPF and eBPF. */
 void bpf_user_rnd_init_once(void);
diff --git a/include/linux/namei.h b/include/linux/namei.h
index 9138b4471dbf..b45c8b6f7cb4 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -6,6 +6,7 @@
 #include <linux/path.h>
 #include <linux/fcntl.h>
 #include <linux/errno.h>
+#include <linux/fs.h>
 
 enum { MAX_NESTED_LINKS = 8 };
 
@@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
 
 extern void nd_jump_link(struct path *path);
 
+extern int filename_lookup(int dfd, struct filename *name, unsigned flags,
+			   struct path *path, struct path *root);
+
 static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
 {
 	((char *) name)[min(len, maxlen)] = '\0';
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4393bd4b2419..b0d4869fb860 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2741,6 +2741,24 @@ union bpf_attr {
  *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
+ *	Description
+ *		Copies into *pidns* pid, namespace id and tgid as seen by the
+ *		current namespace and also device from /proc/self/ns/pid.
+ *		*size_of_pidns* must be the size of *pidns*
+ *
+ *		This helper is used when pid filtering is needed inside a
+ *		container as bpf_get_current_tgid() helper returns always the
+ *		pid id as seen by the root namespace.
+ *	Return
+ *		0 on success
+ *
+ *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
+ *		or tgid of the current task.
+ *
+ *		**-ENOMEM**  if allocation fails.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2853,7 +2871,8 @@ union bpf_attr {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),		\
+	FN(get_current_pidns_info),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -3604,4 +3623,10 @@ struct bpf_sockopt {
 	__s32	retval;
 };
 
+struct bpf_pidns_info {
+	__u32 dev;
+	__u32 nsid;
+	__u32 tgid;
+	__u32 pid;
+};
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 8191a7db2777..3159f2a0188c 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
 const struct bpf_func_proto bpf_get_current_comm_proto __weak;
 const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
 const struct bpf_func_proto bpf_get_local_storage_proto __weak;
+const struct bpf_func_proto bpf_get_current_pidns_info __weak;
 
 const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
 {
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 5e28718928ca..41fbf1f28a48 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -11,6 +11,12 @@
 #include <linux/uidgid.h>
 #include <linux/filter.h>
 #include <linux/ctype.h>
+#include <linux/pid_namespace.h>
+#include <linux/major.h>
+#include <linux/stat.h>
+#include <linux/namei.h>
+#include <linux/version.h>
+
 
 #include "../../lib/kstrtox.h"
 
@@ -312,6 +318,64 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
 	preempt_enable();
 }
 
+BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
+	 size)
+{
+	const char *pidns_path = "/proc/self/ns/pid";
+	struct pid_namespace *pidns = NULL;
+	struct filename *tmp = NULL;
+	struct inode *inode;
+	struct path kp;
+	pid_t tgid = 0;
+	pid_t pid = 0;
+	int ret;
+	int len;
+
+	if (unlikely(size != sizeof(struct bpf_pidns_info)))
+		return -EINVAL;
+	pidns = task_active_pid_ns(current);
+	if (unlikely(!pidns))
+		goto clear;
+	pidns_info->nsid =  pidns->ns.inum;
+	pid = task_pid_nr_ns(current, pidns);
+	if (unlikely(!pid))
+		goto clear;
+	tgid = task_tgid_nr_ns(current, pidns);
+	if (unlikely(!tgid))
+		goto clear;
+	pidns_info->tgid = (u32) tgid;
+	pidns_info->pid = (u32) pid;
+	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
+	if (unlikely(!tmp)) {
+		memset((void *)pidns_info, 0, (size_t) size);
+		return -ENOMEM;
+	}
+	len = strlen(pidns_path) + 1;
+	memcpy((char *)tmp->name, pidns_path, len);
+	tmp->uptr = NULL;
+	tmp->aname = NULL;
+	tmp->refcnt = 1;
+	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
+	if (ret) {
+		memset((void *)pidns_info, 0, (size_t) size);
+		return ret;
+	}
+	inode = d_backing_inode(kp.dentry);
+	pidns_info->dev = inode->i_sb->s_dev;
+	return 0;
+clear:
+	memset((void *)pidns_info, 0, (size_t) size);
+	return -EINVAL;
+}
+
+const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
+	.func		= bpf_get_current_pidns_info,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
+	.arg2_type	= ARG_CONST_SIZE,
+};
+
 #ifdef CONFIG_CGROUPS
 BPF_CALL_0(bpf_get_current_cgroup_id)
 {
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ca1255d14576..5e1dc22765a5 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 #endif
 	case BPF_FUNC_send_signal:
 		return &bpf_send_signal_proto;
+	case BPF_FUNC_get_current_pidns_info:
+		return &bpf_get_current_pidns_info_proto;
 	default:
 		return NULL;
 	}
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 1d9be26b4edd..238453ff27d2 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -53,6 +53,7 @@ hostprogs-y += task_fd_query
 hostprogs-y += xdp_sample_pkts
 hostprogs-y += ibumad
 hostprogs-y += hbm
+hostprogs-y += trace_ns_info
 
 # Libbpf dependencies
 LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
@@ -109,6 +110,7 @@ task_fd_query-objs := bpf_load.o task_fd_query_user.o $(TRACE_HELPERS)
 xdp_sample_pkts-objs := xdp_sample_pkts_user.o $(TRACE_HELPERS)
 ibumad-objs := bpf_load.o ibumad_user.o $(TRACE_HELPERS)
 hbm-objs := bpf_load.o hbm.o $(CGROUP_HELPERS)
+trace_ns_info-objs := bpf_load.o trace_ns_info_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -170,6 +172,7 @@ always += xdp_sample_pkts_kern.o
 always += ibumad_kern.o
 always += hbm_out_kern.o
 always += hbm_edt_kern.o
+always += trace_ns_info_user_kern.o
 
 KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include
 KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/bpf/
diff --git a/samples/bpf/trace_ns_info_user.c b/samples/bpf/trace_ns_info_user.c
new file mode 100644
index 000000000000..e06d08db6f30
--- /dev/null
+++ b/samples/bpf/trace_ns_info_user.c
@@ -0,0 +1,35 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include <stdio.h>
+#include <linux/bpf.h>
+#include <unistd.h>
+#include "bpf/libbpf.h"
+#include "bpf_load.h"
+
+/* This code was taken verbatim from tracex1_user.c, it's used
+ * to exercize bpf_get_current_pidns_info() helper call.
+ */
+int main(int ac, char **argv)
+{
+	FILE *f;
+	char filename[256];
+
+	snprintf(filename, sizeof(filename), "%s_user_kern.o", argv[0]);
+	printf("loading %s\n", filename);
+
+	if (load_bpf_file(filename)) {
+		printf("%s", bpf_log_buf);
+		return 1;
+	}
+
+	f = popen("taskset 1 ping  localhost", "r");
+	(void) f;
+	read_trace_pipe();
+	return 0;
+}
diff --git a/samples/bpf/trace_ns_info_user_kern.c b/samples/bpf/trace_ns_info_user_kern.c
new file mode 100644
index 000000000000..96675e02b707
--- /dev/null
+++ b/samples/bpf/trace_ns_info_user_kern.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/version.h>
+#include <uapi/linux/bpf.h>
+#include "bpf_helpers.h"
+
+typedef __u64 u64;
+typedef __u32 u32;
+
+
+/* kprobe is NOT a stable ABI
+ * kernel functions can be removed, renamed or completely change semantics.
+ * Number of arguments and their positions can change, etc.
+ * In such case this bpf+kprobe example will no longer be meaningful
+ */
+
+/* This will call bpf_get_current_pidns_info() to display pid and ns values
+ * as seen by the current namespace, on the far left you will see the pid as
+ * seen as by the root namespace.
+ */
+
+SEC("kprobe/__netif_receive_skb_core")
+int bpf_prog1(struct pt_regs *ctx)
+{
+	char fmt[] = "nsid:%u, dev: %u,  pid:%u\n";
+	struct bpf_pidns_info nsinfo;
+	int ok = 0;
+
+	ok = bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo));
+	if (ok == 0)
+		bpf_trace_printk(fmt, sizeof(fmt), (u32)nsinfo.nsid,
+				 (u32) nsinfo.dev, (u32)nsinfo.pid);
+
+	return 0;
+}
+char _license[] SEC("license") = "GPL";
+u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4393bd4b2419..b0d4869fb860 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2741,6 +2741,24 @@ union bpf_attr {
  *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
+ *	Description
+ *		Copies into *pidns* pid, namespace id and tgid as seen by the
+ *		current namespace and also device from /proc/self/ns/pid.
+ *		*size_of_pidns* must be the size of *pidns*
+ *
+ *		This helper is used when pid filtering is needed inside a
+ *		container as bpf_get_current_tgid() helper returns always the
+ *		pid id as seen by the root namespace.
+ *	Return
+ *		0 on success
+ *
+ *		**-EINVAL** if *size_of_pidns* is not valid or unable to get ns, pid
+ *		or tgid of the current task.
+ *
+ *		**-ENOMEM**  if allocation fails.
+ *
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2853,7 +2871,8 @@ union bpf_attr {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),		\
+	FN(get_current_pidns_info),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -3604,4 +3623,10 @@ struct bpf_sockopt {
 	__s32	retval;
 };
 
+struct bpf_pidns_info {
+	__u32 dev;
+	__u32 nsid;
+	__u32 tgid;
+	__u32 pid;
+};
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 3bd0f4a0336a..1f97b571b581 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -29,7 +29,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
 	test_cgroup_storage test_select_reuseport test_section_names \
 	test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
 	test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
-	test_sockopt_multi test_tcp_rtt
+	test_sockopt_multi test_tcp_rtt test_pidns
 
 BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
 TEST_GEN_FILES = $(BPF_OBJ_FILES)
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 120aa86c58d3..c96795a9d983 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -231,6 +231,9 @@ static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
 static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip,
 					  int ip_len, void *tcp, int tcp_len) =
 	(void *) BPF_FUNC_tcp_gen_syncookie;
+static int (*bpf_get_current_pidns_info)(struct bpf_pidns_info *buf,
+					 unsigned int buf_size) =
+	(void *) BPF_FUNC_get_current_pidns_info;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/progs/test_pidns_kern.c b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
new file mode 100644
index 000000000000..e1d2facfa762
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_pidns_kern.c
@@ -0,0 +1,51 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include <linux/bpf.h>
+#include <errno.h>
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") nsidmap = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u32),
+	.max_entries = 1,
+};
+
+struct bpf_map_def SEC("maps") pidmap = {
+	.type = BPF_MAP_TYPE_ARRAY,
+	.key_size = sizeof(__u32),
+	.value_size = sizeof(__u32),
+	.max_entries = 1,
+};
+
+SEC("tracepoint/syscalls/sys_enter_nanosleep")
+int trace(void *ctx)
+{
+	struct bpf_pidns_info nsinfo;
+	__u32 key = 0, *expected_pid, *val;
+	char fmt[] = "ERROR nspid:%d\n";
+
+	if (bpf_get_current_pidns_info(&nsinfo, sizeof(nsinfo)))
+		return -EINVAL;
+
+	expected_pid = bpf_map_lookup_elem(&pidmap, &key);
+
+
+	if (!expected_pid || *expected_pid != nsinfo.pid)
+		return 0;
+
+	val = bpf_map_lookup_elem(&nsidmap, &key);
+	if (val)
+		*val = nsinfo.nsid;
+
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1;
diff --git a/tools/testing/selftests/bpf/test_pidns.c b/tools/testing/selftests/bpf/test_pidns.c
new file mode 100644
index 000000000000..a7254055f294
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_pidns.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2018 Carlos Neira cneirabustos@gmail.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <syscall.h>
+#include <unistd.h>
+#include <linux/perf_event.h>
+#include <sys/ioctl.h>
+#include <sys/time.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include <linux/bpf.h>
+#include <bpf/bpf.h>
+#include <bpf/libbpf.h>
+
+#include "cgroup_helpers.h"
+#include "bpf_rlimit.h"
+
+#define CHECK(condition, tag, format...) ({		\
+	int __ret = !!(condition);			\
+	if (__ret) {					\
+		printf("%s:FAIL:%s ", __func__, tag);	\
+		printf(format);				\
+	} else {					\
+		printf("%s:PASS:%s\n", __func__, tag);	\
+	}						\
+	__ret;						\
+})
+
+static int bpf_find_map(const char *test, struct bpf_object *obj,
+			const char *name)
+{
+	struct bpf_map *map;
+
+	map = bpf_object__find_map_by_name(obj, name);
+	if (!map)
+		return -1;
+	return bpf_map__fd(map);
+}
+
+
+int main(int argc, char **argv)
+{
+	const char *probe_name = "syscalls/sys_enter_nanosleep";
+	const char *file = "test_pidns_kern.o";
+	int err, bytes, efd, prog_fd, pmu_fd;
+	int pidmap_fd, nsidmap_fd;
+	struct perf_event_attr attr = {};
+	struct bpf_object *obj;
+	__u32 knsid = 0;
+	__u32 key = 0, pid;
+	int exit_code = 1;
+	struct stat st;
+	char buf[256];
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
+	if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
+		goto cleanup_cgroup_env;
+
+	nsidmap_fd = bpf_find_map(__func__, obj, "nsidmap");
+	if (CHECK(nsidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
+		  nsidmap_fd, errno))
+		goto close_prog;
+
+	pidmap_fd = bpf_find_map(__func__, obj, "pidmap");
+	if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
+		  pidmap_fd, errno))
+		goto close_prog;
+
+	pid = getpid();
+	bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
+
+	snprintf(buf, sizeof(buf),
+		 "/sys/kernel/debug/tracing/events/%s/id", probe_name);
+	efd = open(buf, O_RDONLY, 0);
+	if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
+		goto close_prog;
+	bytes = read(efd, buf, sizeof(buf));
+	close(efd);
+	if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
+		  "bytes %d errno %d\n", bytes, errno))
+		goto close_prog;
+
+	attr.config = strtol(buf, NULL, 0);
+	attr.type = PERF_TYPE_TRACEPOINT;
+	attr.sample_type = PERF_SAMPLE_RAW;
+	attr.sample_period = 1;
+	attr.wakeup_events = 1;
+
+	pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
+	if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
+		  errno))
+		goto close_prog;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
+	if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
+		  errno))
+		goto close_pmu;
+
+	err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
+	if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
+		  errno))
+		goto close_pmu;
+
+	/* trigger some syscalls */
+	sleep(1);
+
+	err = bpf_map_lookup_elem(nsidmap_fd, &key, &knsid);
+	if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
+		goto close_pmu;
+
+	if (stat("/proc/self/ns/pid", &st))
+		goto close_pmu;
+
+	if (CHECK(knsid != (__u32) st.st_ino, "compare_namespace_id",
+		  "kern knsid %u user unsid %u\n", knsid, (__u32) st.st_ino))
+		goto close_pmu;
+
+	exit_code = 0;
+	printf("%s:PASS\n", argv[0]);
+
+close_pmu:
+	close(pmu_fd);
+close_prog:
+	bpf_object__close(obj);
+cleanup_cgroup_env:
+	return exit_code;
+}
-- 
2.11.0






On Thu, Aug 08, 2019 at 05:09:51AM +0000, Yonghong Song wrote:
> 
> 
> On 8/7/19 6:22 PM, Carlos Antonio Neira Bustos wrote:
> > The code has been modified to avoid syscalls that could sleep.
> > Please let me know if any other modification is needed.
> > 
> >  From be0384c0fa209a78c1567936e8db4e35b9a7c0f8 Mon Sep 17 00:00:00 2001
> > From: Carlos <cneirabustos@gmail.com>
> > Date: Wed, 7 Aug 2019 20:04:30 -0400
> > Subject: [PATCH] [PATCH v5 bpf-next] BPF: New helper to obtain namespace data
> >   from current task
> > 
> > This helper obtains the active namespace from current and returns pid, tgid,
> > device and namespace id as seen from that namespace, allowing to instrument
> > a process inside a container.
> > Device is read from /proc/self/ns/pid, as in the future it's possible that
> > different pid_ns files may belong to different devices, according
> > to the discussion between Eric Biederman and Yonghong in 2017 linux plumbers
> > conference.
> > Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
> > scripts but this helper returns the pid as seen by the root namespace which is
> > fine when a bcc script is not executed inside a container.
> > When the process of interest is inside a container, pid filtering will not work
> > if bpf_get_current_pid_tgid() is used. This helper addresses this limitation
> > returning the pid as it's seen by the current namespace where the script is
> > executing.
> > 
> > This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
> > used to do pid filtering even inside a container.
> > 
> > For example a bcc script using bpf_get_current_pid_tgid() (tools/funccount.py):
> > 
> >          u32 pid = bpf_get_current_pid_tgid() >> 32;
> >          if (pid != <pid_arg_passed_in>)
> >                  return 0;
> > Could be modified to use bpf_get_current_pidns_info() as follows:
> > 
> >          struct bpf_pidns pidns;
> >          bpf_get_current_pidns_info(&pidns, sizeof(struct bpf_pidns));
> >          u32 pid = pidns.tgid;
> >          u32 nsid = pidns.nsid;
> >          if ((pid != <pid_arg_passed_in>) && (nsid != <nsid_arg_passed_in>))
> >                  return 0;
> > 
> > To find out the name PID namespace id of a process, you could use this command:
> > 
> > $ ps -h -o pidns -p <pid_of_interest>
> > 
> > Or this other command:
> > 
> > $ ls -Li /proc/<pid_of_interest>/ns/pid
> > 
> > Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
> > ---
> >   fs/namei.c                                         |   2 +-
> >   include/linux/bpf.h                                |   1 +
> >   include/linux/namei.h                              |   4 +
> >   include/uapi/linux/bpf.h                           |  29 ++++-
> >   kernel/bpf/core.c                                  |   1 +
> >   kernel/bpf/helpers.c                               |  78 ++++++++++++
> >   kernel/trace/bpf_trace.c                           |   2 +
> >   samples/bpf/Makefile                               |   3 +
> >   samples/bpf/trace_ns_info_user.c                   |  35 ++++++
> >   samples/bpf/trace_ns_info_user_kern.c              |  44 +++++++
> >   tools/include/uapi/linux/bpf.h                     |  29 ++++-
> >   tools/testing/selftests/bpf/Makefile               |   2 +-
> >   tools/testing/selftests/bpf/bpf_helpers.h          |   3 +
> >   .../testing/selftests/bpf/progs/test_pidns_kern.c  |  51 ++++++++
> >   tools/testing/selftests/bpf/test_pidns.c           | 138 +++++++++++++++++++++
> >   15 files changed, 418 insertions(+), 4 deletions(-)
> >   create mode 100644 samples/bpf/trace_ns_info_user.c
> >   create mode 100644 samples/bpf/trace_ns_info_user_kern.c
> >   create mode 100644 tools/testing/selftests/bpf/progs/test_pidns_kern.c
> >   create mode 100644 tools/testing/selftests/bpf/test_pidns.c
> > 
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 209c51a5226c..d1eca36972d2 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -19,7 +19,6 @@
> >   #include <linux/export.h>
> >   #include <linux/kernel.h>
> >   #include <linux/slab.h>
> > -#include <linux/fs.h>
> >   #include <linux/namei.h>
> >   #include <linux/pagemap.h>
> >   #include <linux/fsnotify.h>
> > @@ -2355,6 +2354,7 @@ int filename_lookup(int dfd, struct filename *name, unsigned flags,
> >   	putname(name);
> >   	return retval;
> >   }
> > +EXPORT_SYMBOL(filename_lookup);
> 
> No need to export symbols. bpf uses it and bpf is in the core, not in 
> modules.
> 
> >   
> >   /* Returns 0 and nd will be valid on success; Retuns error, otherwise. */
> >   static int path_parentat(struct nameidata *nd, unsigned flags,
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index f9a506147c8a..e4adf5e05afd 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -1050,6 +1050,7 @@ extern const struct bpf_func_proto bpf_get_local_storage_proto;
> >   extern const struct bpf_func_proto bpf_strtol_proto;
> >   extern const struct bpf_func_proto bpf_strtoul_proto;
> >   extern const struct bpf_func_proto bpf_tcp_sock_proto;
> > +extern const struct bpf_func_proto bpf_get_current_pidns_info_proto;
> >   
> >   /* Shared helpers among cBPF and eBPF. */
> >   void bpf_user_rnd_init_once(void);
> > diff --git a/include/linux/namei.h b/include/linux/namei.h
> > index 9138b4471dbf..2c24e8c71d46 100644
> > --- a/include/linux/namei.h
> > +++ b/include/linux/namei.h
> > @@ -6,6 +6,7 @@
> >   #include <linux/path.h>
> >   #include <linux/fcntl.h>
> >   #include <linux/errno.h>
> > +#include <linux/fs.h>
> >   
> >   enum { MAX_NESTED_LINKS = 8 };
> >   
> > @@ -97,6 +98,9 @@ extern void unlock_rename(struct dentry *, struct dentry *);
> >   
> >   extern void nd_jump_link(struct path *path);
> >   
> > +extern int filename_lookup(int dfd, struct filename *name, unsigned int flags,
> > +		    struct path *path, struct path *root);
> 
> The previous definition in fs/internal.h should be removed.
> 
> > +
> >   static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
> >   {
> >   	((char *) name)[min(len, maxlen)] = '\0';
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index 4393bd4b2419..6f601f7106e2 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -2741,6 +2741,26 @@ union bpf_attr {
> >    *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
> >    *
> >    *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
> > + *
> > + * int bpf_get_current_pidns_info(struct bpf_pidns_info *pidns, u32 size_of_pidns)
> > + *	Description
> > + *		Copies into *pidns* pid, namespace id and tgid as seen by the
> > + *		current namespace and also device from /proc/self/ns/pid.
> > + *		*size_of_pidns* must be the size of *pidns*
> > + *
> > + *		This helper is used when pid filtering is needed inside a
> > + *		container as bpf_get_current_tgid() helper returns always the
> > + *		pid id as seen by the root namespace.
> > + *	Return
> > + *		0 on success
> > + *
> > + *		**-EINVAL**  if unable to get ns, pid or tgid of current task.
> > + *		Or if size_of_pidns is not valid.
> 
> Maybe reword by following the code sequence.
>     if *size_of_pidns* is not valid or unable to get ns, pid or tgid of
>     the current task.
> 
> > + *
> > + *		**-ENOMEM**  if allocation fails.
> 
> Maybe some other error codes in filename_lookup() function?
> 
> > + *
> > + *		If unable to get the inode from /proc/self/ns/pid an error code
> > + *		will be returned.
> 
> You do not need this. The description of error code cases should cover this.
> 
> >    */
> >   #define __BPF_FUNC_MAPPER(FN)		\
> >   	FN(unspec),			\
> > @@ -2853,7 +2873,8 @@ union bpf_attr {
> >   	FN(sk_storage_get),		\
> >   	FN(sk_storage_delete),		\
> >   	FN(send_signal),		\
> > -	FN(tcp_gen_syncookie),
> > +	FN(tcp_gen_syncookie),		\
> > +	FN(get_current_pidns_info),
> >   
> >   /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> >    * function eBPF program intends to call
> > @@ -3604,4 +3625,10 @@ struct bpf_sockopt {
> >   	__s32	retval;
> >   };
> >   
> > +struct bpf_pidns_info {
> > +	__u32 dev;
> > +	__u32 nsid;
> > +	__u32 tgid;
> > +	__u32 pid;
> > +};
> >   #endif /* _UAPI__LINUX_BPF_H__ */
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 8191a7db2777..3159f2a0188c 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -2038,6 +2038,7 @@ const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
> >   const struct bpf_func_proto bpf_get_current_comm_proto __weak;
> >   const struct bpf_func_proto bpf_get_current_cgroup_id_proto __weak;
> >   const struct bpf_func_proto bpf_get_local_storage_proto __weak;
> > +const struct bpf_func_proto bpf_get_current_pidns_info __weak;
> >   
> >   const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
> >   {
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 5e28718928ca..571f24077db2 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -11,6 +11,12 @@
> >   #include <linux/uidgid.h>
> >   #include <linux/filter.h>
> >   #include <linux/ctype.h>
> > +#include <linux/pid_namespace.h>
> > +#include <linux/major.h>
> > +#include <linux/stat.h>
> > +#include <linux/namei.h>
> > +#include <linux/version.h>
> > +
> >   
> >   #include "../../lib/kstrtox.h"
> >   
> > @@ -312,6 +318,78 @@ void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
> >   	preempt_enable();
> >   }
> >   
> > +BPF_CALL_2(bpf_get_current_pidns_info, struct bpf_pidns_info *, pidns_info, u32,
> > +	 size)
> > +{
> > +	const char *name = "/proc/self/ns/pid";
> 
> maybe rename this variable to pidns_path?
> 
> > +	struct pid_namespace *pidns = NULL;
> > +	struct filename *tmp = NULL;
> 
> Maybe rename this variable to name?
> 
> > +	int len = strlen(name) + 1;
> 
> We can delay this assignment later until it is needed.
> 
> > +	struct inode *inode;
> > +	struct path kp;
> > +	pid_t tgid = 0;
> > +	pid_t pid = 0;
> > +	int ret;
> > +
> > +	if (unlikely(size != sizeof(struct bpf_pidns_info)))
> > +		return -EINVAL;
> > +
> > +	pidns = task_active_pid_ns(current);
> > +
> 
> we can save an empty line here.
> 
> > +	if (unlikely(!pidns))
> > +		goto clear;
> > +
> > +	pidns_info->nsid =  pidns->ns.inum;
> > +	pid = task_pid_nr_ns(current, pidns);
> > +
> 
> We can save an empty line here.
> 
> > +	if (unlikely(!pid))
> > +		goto clear;
> > +
> > +	tgid = task_tgid_nr_ns(current, pidns);
> > +
> ditto. save an empty line.
> > +	if (unlikely(!tgid))
> > +		goto clear;
> > +
> > +	pidns_info->tgid = (u32) tgid;
> > +	pidns_info->pid = (u32) pid;
> > +
> > +	tmp = kmem_cache_alloc(names_cachep, GFP_ATOMIC);
> > +	if (unlikely(!tmp)) {
> > +		memset((void *)pidns_info, 0, (size_t) size);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	memcpy((char *)tmp->name, name, len);
> > +	tmp->uptr = NULL;
> > +	tmp->aname = NULL;
> > +	tmp->refcnt = 1;
> > +
> ditto. save an empty line.
> > +	ret = filename_lookup(AT_FDCWD, tmp, 0, &kp, NULL);
> > +
> ditto. save an empty line.
> > +	if (ret) {
> > +		memset((void *)pidns_info, 0, (size_t) size);
> > +		return ret;
> > +	}
> > +
> > +	inode = d_backing_inode(kp.dentry);
> > +	pidns_info->dev = inode->i_sb->s_dev;
> > +
> > +	return 0;
> > +
> > +clear:
> > +	memset((void *)pidns_info, 0, (size_t) size);
> > +
> save an empty line.
> > +	return -EINVAL;
> > +}
> > +
> > +const struct bpf_func_proto bpf_get_current_pidns_info_proto = {
> > +	.func	= bpf_get_current_pidns_info,
> make the "= " aligned with others?
> > +	.gpl_only	= false,
> > +	.ret_type	= RET_INTEGER,
> > +	.arg1_type	= ARG_PTR_TO_UNINIT_MEM,
> > +	.arg2_type	= ARG_CONST_SIZE,
> > +};
> > +
> >   #ifdef CONFIG_CGROUPS
> >   BPF_CALL_0(bpf_get_current_cgroup_id)
> >   {
> > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > index ca1255d14576..5e1dc22765a5 100644
> > --- a/kernel/trace/bpf_trace.c
> > +++ b/kernel/trace/bpf_trace.c
> > @@ -709,6 +709,8 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> >   #endif
> >   	case BPF_FUNC_send_signal:
> >   		return &bpf_send_signal_proto;
> > +	case BPF_FUNC_get_current_pidns_info:
> > +		return &bpf_get_current_pidns_info_proto;
> >   	default:
> >   		return NULL;
> >   	}
> > diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
> > index 1d9be26b4edd..238453ff27d2 100644
> > --- a/samples/bpf/Makefile
> > +++ b/samples/bpf/Makefile
> > @@ -53,6 +53,7 @@ hostprogs-y += task_fd_query
> >   hostprogs-y += xdp_sample_pkts
> >   hostprogs-y += ibumad
> >   hostprogs-y += hbm
> > +hostprogs-y += trace_ns_info
> [...]

^ permalink raw reply related

* Re: [v3,1/4] tools: bpftool: add net attach command to attach XDP on interface
From: Jakub Kicinski @ 2019-08-08 17:49 UTC (permalink / raw)
  To: Daniel T. Lee; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <CAEKGpzj1VKWuWioEmRkNXrgfDdT-KkWZWsrbY+p=yyK8sPctwg@mail.gmail.com>

On Thu, 8 Aug 2019 07:15:22 +0900, Daniel T. Lee wrote:
> > > +             return -EINVAL;
> > > +     }
> > > +
> > > +     NEXT_ARG();  
> >
> > nit: the new line should be before NEXT_ARG(), IOV NEXT_ARG() belongs
> > to the code which consumed the argument
> >  
> 
> I'm not sure I'm following.
> Are you saying that, at here the newline shouldn't be necessary?

I mean this is better:

	if (!is_prefix(*argv, "bla-bla"))
		return -EINVAL;
	NEXT_ARG();

	if (!is_prefix(*argv, "bla-bla"))
		return -EINVAL;
	NEXT_ARG();

Than this:

	if (!is_prefix(*argv, "bla-bla"))
		return -EINVAL;

	NEXT_ARG();
	if (!is_prefix(*argv, "bla-bla"))
		return -EINVAL;

	NEXT_ARG();

Because the NEXT_ARG() "belongs" to the code that "consumed" the option.

So instead of this:

     attach_type = parse_attach_type(*argv);
     if (attach_type == max_net_attach_type) {
             p_err("invalid net attach/detach type");  
             return -EINVAL;
     }

     NEXT_ARG();  
     progfd = prog_parse_fd(&argc, &argv);
     if (progfd < 0)
             return -EINVAL;

This seems more logical to me:

     attach_type = parse_attach_type(*argv);
     if (attach_type == max_net_attach_type) {
             p_err("invalid net attach/detach type");  
             return -EINVAL;
     }
     NEXT_ARG();  

     progfd = prog_parse_fd(&argc, &argv);
     if (progfd < 0)
             return -EINVAL;

^ permalink raw reply

* Re: [PATCH v2 bpf-next] btf: expose BTF info through sysfs
From: Andrii Nakryiko @ 2019-08-08 17:53 UTC (permalink / raw)
  To: Greg KH
  Cc: Yonghong Song, Andrii Nakryiko, bpf@vger.kernel.org,
	netdev@vger.kernel.org, Alexei Starovoitov, daniel@iogearbox.net,
	Kernel Team, Masahiro Yamada, Arnaldo Carvalho de Melo, Jiri Olsa,
	Sam Ravnborg
In-Reply-To: <20190808060812.GA25150@kroah.com>

On Wed, Aug 7, 2019 at 11:08 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Thu, Aug 08, 2019 at 04:24:25AM +0000, Yonghong Song wrote:
> >
> >
> > On 8/7/19 5:32 PM, Andrii Nakryiko wrote:
> > > Make .BTF section allocated and expose its contents through sysfs.
>
> Was this original patch not on bpf@vger?  I can't find it in my
> archive.  Anyway...
>
> > > /sys/kernel/btf directory is created to contain all the BTFs present
> > > inside kernel. Currently there is only kernel's main BTF, represented as
> > > /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported,
> > > each module will expose its BTF as /sys/kernel/btf/<module-name> file.
>
> Why are you using sysfs for this?  Who uses "BTF"s?  Are these debugging
> images that only people working on developing bpf programs are going to
> need, or are these things that you are going to need on a production
> system?

We need it in production system. One immediate and direct use case is
BPF CO-RE (Compile Once - Run Everywhere), which aims to allow to
pre-compile BPF applications (even those that read internal kernel
structures) using any local kernel headers, and then distribute and
run them in binary form on all target production machines without
dependencies on kernel headers and having Clang on target machine to
compile C to BPF IR. Libbpf is doing all those adjustments/relocations
based on kernel's actual BTF. See [0] for a summary and slides, if you
curious to learn more.

  [0] http://vger.kernel.org/bpfconf2019.html#session-2

>
> I ask as maybe debugfs is the best place for this if they are not needed
> on production systems.
>
>
> > >
> > > Current approach relies on a few pieces coming together:
> > > 1. pahole is used to take almost final vmlinux image (modulo .BTF and
> > >     kallsyms) and generate .BTF section by converting DWARF info into
> > >     BTF. This section is not allocated and not mapped to any segment,
> > >     though, so is not yet accessible from inside kernel at runtime.
> > > 2. objcopy dumps .BTF contents into binary file and subsequently
> > >     convert binary file into linkable object file with automatically
> > >     generated symbols _binary__btf_kernel_bin_start and
> > >     _binary__btf_kernel_bin_end, pointing to start and end, respectively,
> > >     of BTF raw data.
> > > 3. final vmlinux image is generated by linking this object file (and
> > >     kallsyms, if necessary). sysfs_btf.c then creates
> > >     /sys/kernel/btf/kernel file and exposes embedded BTF contents through
> > >     it. This allows, e.g., libbpf and bpftool access BTF info at
> > >     well-known location, without resorting to searching for vmlinux image
> > >     on disk (location of which is not standardized and vmlinux image
> > >     might not be even available in some scenarios, e.g., inside qemu
> > >     during testing).
> > >
> > > Alternative approach using .incbin assembler directive to embed BTF
> > > contents directly was attempted but didn't work, because sysfs_proc.o is
> > > not re-compiled during link-vmlinux.sh stage. This is required, though,
> > > to update embedded BTF data (initially empty data is embedded, then
> > > pahole generates BTF info and we need to regenerate sysfs_btf.o with
> > > updated contents, but it's too late at that point).
> > >
> > > If BTF couldn't be generated due to missing or too old pahole,
> > > sysfs_btf.c handles that gracefully by detecting that
> > > _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating
> > > /sys/kernel/btf at all.
> > >
> > > v1->v2:
> > > - allow kallsyms stage to re-use vmlinux generated by gen_btf();
> > >
> > > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > Cc: Jiri Olsa <jolsa@kernel.org>
> > > Cc: Sam Ravnborg <sam@ravnborg.org>
> > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > ---
> > >   kernel/bpf/Makefile     |  3 +++
> > >   kernel/bpf/sysfs_btf.c  | 52 ++++++++++++++++++++++++++++++++++++++
> > >   scripts/link-vmlinux.sh | 55 +++++++++++++++++++++++++++--------------
> > >   3 files changed, 91 insertions(+), 19 deletions(-)
> > >   create mode 100644 kernel/bpf/sysfs_btf.c
>
> First rule, you can't create new sysfs files without a matching
> Documentation/ABI/ set of entries.  Please do that for the next version
> of this patch so we can properly check to see if what you are
> documenting lines up with the code.  Otherwise we just have to guess as
> to what the entries you are creating actually do.

Yep, sure, I wasn't aware, will add in v3.

>
> > >
> > > diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
> > > index 29d781061cd5..e1d9adb212f9 100644
> > > --- a/kernel/bpf/Makefile
> > > +++ b/kernel/bpf/Makefile
> > > @@ -22,3 +22,6 @@ obj-$(CONFIG_CGROUP_BPF) += cgroup.o
> > >   ifeq ($(CONFIG_INET),y)
> > >   obj-$(CONFIG_BPF_SYSCALL) += reuseport_array.o
> > >   endif
> > > +ifeq ($(CONFIG_SYSFS),y)
> > > +obj-$(CONFIG_DEBUG_INFO_BTF) += sysfs_btf.o
> > > +endif
> > > diff --git a/kernel/bpf/sysfs_btf.c b/kernel/bpf/sysfs_btf.c
> > > new file mode 100644
> > > index 000000000000..ac06ce1d62e8
> > > --- /dev/null
> > > +++ b/kernel/bpf/sysfs_btf.c
> > > @@ -0,0 +1,52 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * Provide kernel BTF information for introspection and use by eBPF tools.
> > > + */
> > > +#include <linux/kernel.h>
> > > +#include <linux/module.h>
> > > +#include <linux/kobject.h>
> > > +#include <linux/init.h>
> > > +
> > > +/* See scripts/link-vmlinux.sh, gen_btf() func for details */
> > > +extern char __weak _binary__btf_kernel_bin_start[];
> > > +extern char __weak _binary__btf_kernel_bin_end[];
> > > +
> > > +static ssize_t
> > > +btf_kernel_read(struct file *file, struct kobject *kobj,
> > > +           struct bin_attribute *bin_attr,
> > > +           char *buf, loff_t off, size_t len)
> > > +{
> > > +   memcpy(buf, _binary__btf_kernel_bin_start + off, len);
> > > +   return len;
> > > +}
> > > +
> > > +static struct bin_attribute btf_kernel_attr __ro_after_init = {
> > > +   .attr = {
> > > +           .name = "kernel",
> > > +           .mode = 0444,
> > > +   },
> > > +   .read = btf_kernel_read,
> > > +};
>
> BIN_ATTR_RO()?

Ok, will use that.

>
> > > +
> > > +static struct bin_attribute *btf_attrs[] __ro_after_init = {
> > > +   &btf_kernel_attr,
> > > +   NULL,
> > > +};
> > > +
> > > +static struct attribute_group btf_group_attr __ro_after_init = {
> > > +   .name = "btf",
> > > +   .bin_attrs = btf_attrs,
> > > +};
> > > +
> > > +static int __init btf_kernel_init(void)
> > > +{
> > > +   if (!_binary__btf_kernel_bin_start)
> > > +           return 0;
> > > +
> > > +   btf_kernel_attr.size = _binary__btf_kernel_bin_end -
> > > +                          _binary__btf_kernel_bin_start;
> > > +
> > > +   return sysfs_create_group(kernel_kobj, &btf_group_attr);
>
> You are nesting directories here without a "real" kobject in the middle.
> Are you _sure_ you want to do that?  It's going to get really tricky
> later on based on your comments above about creating multiple files in
> that directory over time once "modules" are allowed.

My thinking was that when we have BTF for modules, I'll need to do
some code adjustments anyway, at which point it will be more clear how
we want to structure that. But I can add explicit kobject as static
variable right now, no problems. Later on we probably will just switch
it to be exported, so that modules can self-register/unregister their
BTFs autonomously.

>
> thanks,
>
> greg k-h

^ permalink raw reply

* Re: [PATCH] pcan_usb_fd: zero out the common command buffer
From: David Miller @ 2019-08-08 18:03 UTC (permalink / raw)
  To: oneukum; +Cc: netdev, wg, mkl, linux-can
In-Reply-To: <20190808092825.23470-1-oneukum@suse.com>

From: Oliver Neukum <oneukum@suse.com>
Date: Thu,  8 Aug 2019 11:28:25 +0200

> Lest we leak kernel memory to a device we better zero out buffers.
> 
> Reported-by: syzbot+513e4d0985298538bf9b@syzkaller.appspotmail.com
> Signed-off-by: Oliver Neukum <oneukum@suse.com>

Please CC: the CAN subsystem maintainers, as this is clearly listed in the
MAINTAINERS file.

Thank you.

> ---
>  drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
> index 34761c3a6286..47cc1ff5b88e 100644
> --- a/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
> +++ b/drivers/net/can/usb/peak_usb/pcan_usb_fd.c
> @@ -841,7 +841,7 @@ static int pcan_usb_fd_init(struct peak_usb_device *dev)
>  			goto err_out;
>  
>  		/* allocate command buffer once for all for the interface */
> -		pdev->cmd_buffer_addr = kmalloc(PCAN_UFD_CMD_BUFFER_SIZE,
> +		pdev->cmd_buffer_addr = kzalloc(PCAN_UFD_CMD_BUFFER_SIZE,
>  						GFP_KERNEL);
>  		if (!pdev->cmd_buffer_addr)
>  			goto err_out_1;
> -- 
> 2.16.4
> 

^ permalink raw reply

* Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations
From: Jonathan Lemon @ 2019-08-08 18:03 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Jakub Kicinski, Jiri Pirko, dsahern, netdev, davem, mlxsw,
	f.fainelli, vivien.didelot, mkubecek, stephen, daniel, brouer,
	eric.dumazet
In-Reply-To: <20190806190637.GE17072@lunn.ch>



On 6 Aug 2019, at 12:06, Andrew Lunn wrote:

> On Tue, Aug 06, 2019 at 11:54:49AM -0700, Jakub Kicinski wrote:
>> On Tue, 6 Aug 2019 20:38:41 +0200, Jiri Pirko wrote:
>>>>> So the proposal is to have some new device, say "kernelnet", that
>>>>> would implicitly create per-namespace devlink instance. This 
>>>>> devlink
>>>>> instance would be used to setup resource limits. Like:
>>>>>
>>>>> devlink resource set kernelnet path /IPv4/fib size 96
>>>>> devlink -N ns1name resource set kernelnet path /IPv6/fib size 100
>>>>> devlink -N ns2name resource set kernelnet path /IPv4/fib-rules 
>>>>> size 8
>>>>>
>>>>> To me it sounds a bit odd for kernel namespace to act as a device, 
>>>>> but
>>>>> thinking about it more, it makes sense. Probably better than to 
>>>>> define
>>>>> a new api. User would use the same tool to work with kernel and 
>>>>> hw.
>>>>>
>>>>> Also we can implement other devlink functionality, like dpipe.
>>>>> User would then have visibility of network pipeline, tables,
>>>>> utilization, etc. It is related to the resources too.
>>>>>
>>>>> What do you think?
>>>>
>>>> I'm no expert here but seems counter intuitive that device tables 
>>>> would
>>>> be aware of namespaces in the first place. Are we not reinventing
>>>> cgroup controllers based on a device API? IMHO from a perspective 
>>>> of
>>>> someone unfamiliar with routing offload this seems backwards :)
>>>
>>> Can we use cgroup for fib and other limitations instead?
>>
>> Not sure the question is to me, I don't feel particularly qualified,
>> I've never worked with VDCs or wrote a switch driver.. But I'd see
>> cgroups as a natural fit, and if I read Andrew's reply right so does
>> he..
>
> Hi Jakub
>
> I think there needs to be a clearly reasoned argument why cgroups is
> the wrong answer to this problem. I myself don't know enough to give
> that answer, but i can pose the question.
>
>      Andrew

For the example above, the first question would be why is the 
restriction
based on the number of entries instead of their memory footprint?  The 
resource
being consumed is memory, so I'd think that should be what is monitored.

Quickly scanning the cgroups documentation, it seems there is a device 
controller,
so this isn't just process based.  ISTR that Larry Brakmo was working on 
a network
bandwidth limiter, which is controlled by cgroups.
-- 
Jonathan




^ permalink raw reply

* Re: [PATCH] zd1211rw: remove false assertion from zd_mac_clear()
From: David Miller @ 2019-08-08 18:05 UTC (permalink / raw)
  To: oneukum; +Cc: netdev, dsd, kune, linux-wireless, kvalo
In-Reply-To: <20190808093203.23752-1-oneukum@suse.com>

From: Oliver Neukum <oneukum@suse.com>
Date: Thu,  8 Aug 2019 11:32:03 +0200

> The function is called before the lock which is asserted was ever used.
> Just remove it.
> 
> Reported-by: syzbot+74c65761783d66a9c97c@syzkaller.appspotmail.com
> Signed-off-by: Oliver Neukum <oneukum@suse.com>

Please CC: the appropriate driver maitainers and mailing list as this
is clearly specified in the MAINTAINERS file.

Thank you.

> ---
>  drivers/net/wireless/zydas/zd1211rw/zd_mac.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/net/wireless/zydas/zd1211rw/zd_mac.c b/drivers/net/wireless/zydas/zd1211rw/zd_mac.c
> index da7e63fca9f5..a9999d10ae81 100644
> --- a/drivers/net/wireless/zydas/zd1211rw/zd_mac.c
> +++ b/drivers/net/wireless/zydas/zd1211rw/zd_mac.c
> @@ -223,7 +223,6 @@ void zd_mac_clear(struct zd_mac *mac)
>  {
>  	flush_workqueue(zd_workqueue);
>  	zd_chip_clear(&mac->chip);
> -	lockdep_assert_held(&mac->lock);
>  	ZD_MEMCLEAR(mac, sizeof(struct zd_mac));
>  }
>  
> -- 
> 2.16.4
> 

^ permalink raw reply

* Re: [PATCH v2 bpf-next] btf: expose BTF info through sysfs
From: Greg KH @ 2019-08-08 18:11 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Yonghong Song, Andrii Nakryiko, bpf@vger.kernel.org,
	netdev@vger.kernel.org, Alexei Starovoitov, daniel@iogearbox.net,
	Kernel Team, Masahiro Yamada, Arnaldo Carvalho de Melo, Jiri Olsa,
	Sam Ravnborg
In-Reply-To: <CAEf4BzaWtumTrc7h1t3w8hA1L8mVo2Cm0B+eLSe4eSghFAu3iw@mail.gmail.com>

On Thu, Aug 08, 2019 at 10:53:44AM -0700, Andrii Nakryiko wrote:
> On Wed, Aug 7, 2019 at 11:08 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Thu, Aug 08, 2019 at 04:24:25AM +0000, Yonghong Song wrote:
> > >
> > >
> > > On 8/7/19 5:32 PM, Andrii Nakryiko wrote:
> > > > Make .BTF section allocated and expose its contents through sysfs.
> >
> > Was this original patch not on bpf@vger?  I can't find it in my
> > archive.  Anyway...
> >
> > > > /sys/kernel/btf directory is created to contain all the BTFs present
> > > > inside kernel. Currently there is only kernel's main BTF, represented as
> > > > /sys/kernel/btf/kernel file. Once kernel modules' BTFs are supported,
> > > > each module will expose its BTF as /sys/kernel/btf/<module-name> file.
> >
> > Why are you using sysfs for this?  Who uses "BTF"s?  Are these debugging
> > images that only people working on developing bpf programs are going to
> > need, or are these things that you are going to need on a production
> > system?
> 
> We need it in production system. One immediate and direct use case is
> BPF CO-RE (Compile Once - Run Everywhere), which aims to allow to
> pre-compile BPF applications (even those that read internal kernel
> structures) using any local kernel headers, and then distribute and
> run them in binary form on all target production machines without
> dependencies on kernel headers and having Clang on target machine to
> compile C to BPF IR. Libbpf is doing all those adjustments/relocations
> based on kernel's actual BTF. See [0] for a summary and slides, if you
> curious to learn more.
> 
>   [0] http://vger.kernel.org/bpfconf2019.html#session-2

Ok, then a binary sysfs file is fine, no objection from me.

> > I ask as maybe debugfs is the best place for this if they are not needed
> > on production systems.
> >
> >
> > > >
> > > > Current approach relies on a few pieces coming together:
> > > > 1. pahole is used to take almost final vmlinux image (modulo .BTF and
> > > >     kallsyms) and generate .BTF section by converting DWARF info into
> > > >     BTF. This section is not allocated and not mapped to any segment,
> > > >     though, so is not yet accessible from inside kernel at runtime.
> > > > 2. objcopy dumps .BTF contents into binary file and subsequently
> > > >     convert binary file into linkable object file with automatically
> > > >     generated symbols _binary__btf_kernel_bin_start and
> > > >     _binary__btf_kernel_bin_end, pointing to start and end, respectively,
> > > >     of BTF raw data.
> > > > 3. final vmlinux image is generated by linking this object file (and
> > > >     kallsyms, if necessary). sysfs_btf.c then creates
> > > >     /sys/kernel/btf/kernel file and exposes embedded BTF contents through
> > > >     it. This allows, e.g., libbpf and bpftool access BTF info at
> > > >     well-known location, without resorting to searching for vmlinux image
> > > >     on disk (location of which is not standardized and vmlinux image
> > > >     might not be even available in some scenarios, e.g., inside qemu
> > > >     during testing).
> > > >
> > > > Alternative approach using .incbin assembler directive to embed BTF
> > > > contents directly was attempted but didn't work, because sysfs_proc.o is
> > > > not re-compiled during link-vmlinux.sh stage. This is required, though,
> > > > to update embedded BTF data (initially empty data is embedded, then
> > > > pahole generates BTF info and we need to regenerate sysfs_btf.o with
> > > > updated contents, but it's too late at that point).
> > > >
> > > > If BTF couldn't be generated due to missing or too old pahole,
> > > > sysfs_btf.c handles that gracefully by detecting that
> > > > _binary__btf_kernel_bin_start (weak symbol) is 0 and not creating
> > > > /sys/kernel/btf at all.
> > > >
> > > > v1->v2:
> > > > - allow kallsyms stage to re-use vmlinux generated by gen_btf();
> > > >
> > > > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > > > Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
> > > > Cc: Jiri Olsa <jolsa@kernel.org>
> > > > Cc: Sam Ravnborg <sam@ravnborg.org>
> > > > Signed-off-by: Andrii Nakryiko <andriin@fb.com>
> > > > ---
> > > >   kernel/bpf/Makefile     |  3 +++
> > > >   kernel/bpf/sysfs_btf.c  | 52 ++++++++++++++++++++++++++++++++++++++
> > > >   scripts/link-vmlinux.sh | 55 +++++++++++++++++++++++++++--------------
> > > >   3 files changed, 91 insertions(+), 19 deletions(-)
> > > >   create mode 100644 kernel/bpf/sysfs_btf.c
> >
> > First rule, you can't create new sysfs files without a matching
> > Documentation/ABI/ set of entries.  Please do that for the next version
> > of this patch so we can properly check to see if what you are
> > documenting lines up with the code.  Otherwise we just have to guess as
> > to what the entries you are creating actually do.
> 
> Yep, sure, I wasn't aware, will add in v3.

thanks.

> > > > +static int __init btf_kernel_init(void)
> > > > +{
> > > > +   if (!_binary__btf_kernel_bin_start)
> > > > +           return 0;
> > > > +
> > > > +   btf_kernel_attr.size = _binary__btf_kernel_bin_end -
> > > > +                          _binary__btf_kernel_bin_start;
> > > > +
> > > > +   return sysfs_create_group(kernel_kobj, &btf_group_attr);
> >
> > You are nesting directories here without a "real" kobject in the middle.
> > Are you _sure_ you want to do that?  It's going to get really tricky
> > later on based on your comments above about creating multiple files in
> > that directory over time once "modules" are allowed.
> 
> My thinking was that when we have BTF for modules, I'll need to do
> some code adjustments anyway, at which point it will be more clear how
> we want to structure that. But I can add explicit kobject as static
> variable right now, no problems. Later on we probably will just switch
> it to be exported, so that modules can self-register/unregister their
> BTFs autonomously.

A "real" kobject to start with here would probably be best.  Keeps
things simpler later as well.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH net v3] net/tls: prevent skb_orphan() from leaking TLS plain text with offload
From: Willem de Bruijn @ 2019-08-08 18:10 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: John Fastabend, David Miller, Network Development, davejwatson,
	borisp, aviadye, Daniel Borkmann, Eric Dumazet,
	Alexei Starovoitov, oss-drivers
In-Reply-To: <20190808103148.164bec9f@cakuba.netronome.com>

On Thu, Aug 8, 2019 at 1:32 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Thu, 8 Aug 2019 11:59:18 -0400, Willem de Bruijn wrote:
> > > diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> > > index 7c0b2b778703..43922d86e510 100644
> > > --- a/net/tls/tls_device.c
> > > +++ b/net/tls/tls_device.c
> > > @@ -373,9 +373,9 @@ static int tls_push_data(struct sock *sk,
> > >         struct tls_context *tls_ctx = tls_get_ctx(sk);
> > >         struct tls_prot_info *prot = &tls_ctx->prot_info;
> > >         struct tls_offload_context_tx *ctx = tls_offload_ctx_tx(tls_ctx);
> > > -       int tls_push_record_flags = flags | MSG_SENDPAGE_NOTLAST;
> > >         int more = flags & (MSG_SENDPAGE_NOTLAST | MSG_MORE);
> > >         struct tls_record_info *record = ctx->open_record;
> > > +       int tls_push_record_flags;
> > >         struct page_frag *pfrag;
> > >         size_t orig_size = size;
> > >         u32 max_open_record_len;
> > > @@ -390,6 +390,9 @@ static int tls_push_data(struct sock *sk,
> > >         if (sk->sk_err)
> > >                 return -sk->sk_err;
> > >
> > > +       flags |= MSG_SENDPAGE_DECRYPTED;
> > > +       tls_push_record_flags = flags | MSG_SENDPAGE_NOTLAST;
> > > +
> >
> > Without being too familiar with this code: can this plaintext flag be
> > set once, closer to the call to do_tcp_sendpages, in tls_push_sg?
> >
> > Instead of two locations with multiple non-trivial codepaths between
> > them and do_tcp_sendpages.
> >
> > Or are there paths where the flag is not set? Which I imagine would
> > imply already passing s/w encrypted ciphertext.
>
> tls_push_sg() is shared with sw path which doesn't have the device
> validation.
>
> Device TLS can read tls_push_sg() via tls_push_partial_record() and
> tls_push_data(). tls_push_data() is addressed directly here,
> tls_push_partial_record() is again shared with SW path, so we have to
> address it by adding the flag in tls_device_write_space().
>
> The alternative is to add a conditional to tls_push_sg() which is
> a little less nice from performance and layering PoV but it is a lot
> simpler..
>
> Should I change?

Not at all. Thanks for the detailed explanation. That answered my last question

Acked-by: Willem de Bruijn <willemb@google.com>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox