Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 00/14] ARM: move lpc32xx and dove to multiplatform
From: Arnd Bergmann @ 2019-08-15 13:11 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: SoC Team, Linux ARM, Vladimir Zapolskiy, Sylvain Lemieux,
	Gregory Clement, Linus Walleij, Jason Cooper, Andrew Lunn,
	Sebastian Hesselbarth, David S. Miller, Greg Kroah-Hartman,
	Alan Stern, Guenter Roeck, open list:GPIO SUBSYSTEM, Networking,
	linux-serial, USB list, LINUXWATCHDOG
In-Reply-To: <CAK8P3a1Lgbz9RwVaOgNq=--gwvEG70tUi67XwsswjgnXAX6EhA@mail.gmail.com>

On Thu, Aug 1, 2019 at 9:33 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Thu, Aug 1, 2019 at 12:53 AM Russell King - ARM Linux admin
> <linux@armlinux.org.uk> wrote:
> >
> > On Wed, Jul 31, 2019 at 09:56:42PM +0200, Arnd Bergmann wrote:
> > > For dove, the patches are basically what I had proposed back in
> > > 2015 when all other ARMv6/ARMv7 machines became part of a single
> > > kernel build. I don't know what the state is mach-dove support is,
> > > compared to the DT based support in mach-mvebu for the same
> > > hardware. If they are functionally the same, we could also just
> > > remove mach-dove rather than applying my patches.
> >
> > Well, the good news is that I'm down to a small board support file
> > for the Dove Cubox now - but the bad news is, that there's still a
> > board support file necessary to support everything the Dove SoC has
> > to offer.
> >
> > Even for a DT based Dove Cubox, I'm still using mach-dove, but it
> > may be possible to drop most of mach-dove now.  Without spending a
> > lot of time digging through it, it's impossible to really know.
>
> Ok, so we won't remove it then, but I'd like to merge my patches to
> at least get away from the special case of requiring a separate kernel
> image for it.
>
> Can you try if applying patches 12 and 14 from my series causes
> problems for you? (it may be easier to apply the entire set
> or pull from [1] to avoid rebase conflicts).

I applied patches 12 and 13 into the soc tree now. There are some
other pending multiplatform conversions (iop32x, ep93xx, lpc32xx,
omap1), but it looks like none of those will be complete for 5.4.

I now expect that we can get most of the preparation into 5.4,
and maybe move them all over together in 5.5 after some more
testing. If someone finds a problem with the one of the
preparation steps, that we can revert the individual patches
more easily.

      Arnd

^ permalink raw reply

* Re: [PATCH] tun: fix use-after-free when register netdev failed
From: Yang Yingliang @ 2019-08-15 12:55 UTC (permalink / raw)
  To: Jason Wang, netdev; +Cc: xiyou.wangcong, davem
In-Reply-To: <f9dd102e-e3b4-6522-6094-63aa31b890ea@redhat.com>



On 2019/8/15 17:21, Jason Wang wrote:
>
> On 2019/8/15 下午4:18, Yang Yingliang wrote:
>> I got a UAF repport in tun driver when doing fuzzy test:
>>
>> [  466.269490] 
>> ==================================================================
>> [  466.271792] BUG: KASAN: use-after-free in 
>> tun_chr_read_iter+0x2ca/0x2d0
>> [  466.271806] Read of size 8 at addr ffff888372139250 by task 
>> tun-test/2699
>> [  466.271810]
>> [  466.271824] CPU: 1 PID: 2699 Comm: tun-test Not tainted 
>> 5.3.0-rc1-00001-g5a9433db2614-dirty #427
>> [  466.271833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
>> BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
>> [  466.271838] Call Trace:
>> [  466.271858]  dump_stack+0xca/0x13e
>> [  466.271871]  ? tun_chr_read_iter+0x2ca/0x2d0
>> [  466.271890]  print_address_description+0x79/0x440
>> [  466.271906]  ? vprintk_func+0x5e/0xf0
>> [  466.271920]  ? tun_chr_read_iter+0x2ca/0x2d0
>> [  466.271935]  __kasan_report+0x15c/0x1df
>> [  466.271958]  ? tun_chr_read_iter+0x2ca/0x2d0
>> [  466.271976]  kasan_report+0xe/0x20
>> [  466.271987]  tun_chr_read_iter+0x2ca/0x2d0
>> [  466.272013]  do_iter_readv_writev+0x4b7/0x740
>> [  466.272032]  ? default_llseek+0x2d0/0x2d0
>> [  466.272072]  do_iter_read+0x1c5/0x5e0
>> [  466.272110]  vfs_readv+0x108/0x180
>> [  466.299007]  ? compat_rw_copy_check_uvector+0x440/0x440
>> [  466.299020]  ? fsnotify+0x888/0xd50
>> [  466.299040]  ? __fsnotify_parent+0xd0/0x350
>> [  466.299064]  ? fsnotify_first_mark+0x1e0/0x1e0
>> [  466.304548]  ? vfs_write+0x264/0x510
>> [  466.304569]  ? ksys_write+0x101/0x210
>> [  466.304591]  ? do_preadv+0x116/0x1a0
>> [  466.304609]  do_preadv+0x116/0x1a0
>> [  466.309829]  do_syscall_64+0xc8/0x600
>> [  466.309849]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [  466.309861] RIP: 0033:0x4560f9
>> [  466.309875] Code: 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 
>> 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 
>> 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 
>> 89 01 48
>> [  466.309889] RSP: 002b:00007ffffa5166e8 EFLAGS: 00000206 ORIG_RAX: 
>> 0000000000000127
>> [  466.322992] RAX: ffffffffffffffda RBX: 0000000000400460 RCX: 
>> 00000000004560f9
>> [  466.322999] RDX: 0000000000000003 RSI: 00000000200008c0 RDI: 
>> 0000000000000003
>> [  466.323007] RBP: 00007ffffa516700 R08: 0000000000000004 R09: 
>> 0000000000000000
>> [  466.323014] R10: 0000000000000000 R11: 0000000000000206 R12: 
>> 000000000040cb10
>> [  466.323021] R13: 0000000000000000 R14: 00000000006d7018 R15: 
>> 0000000000000000
>> [  466.323057]
>> [  466.323064] Allocated by task 2605:
>> [  466.335165]  save_stack+0x19/0x80
>> [  466.336240]  __kasan_kmalloc.constprop.8+0xa0/0xd0
>> [  466.337755]  kmem_cache_alloc+0xe8/0x320
>> [  466.339050]  getname_flags+0xca/0x560
>> [  466.340229]  user_path_at_empty+0x2c/0x50
>> [  466.341508]  vfs_statx+0xe6/0x190
>> [  466.342619]  __do_sys_newstat+0x81/0x100
>> [  466.343908]  do_syscall_64+0xc8/0x600
>> [  466.345303]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [  466.347034]
>> [  466.347517] Freed by task 2605:
>> [  466.348471]  save_stack+0x19/0x80
>> [  466.349476]  __kasan_slab_free+0x12e/0x180
>> [  466.350726]  kmem_cache_free+0xc8/0x430
>> [  466.351874]  putname+0xe2/0x120
>> [  466.352921]  filename_lookup+0x257/0x3e0
>> [  466.354319]  vfs_statx+0xe6/0x190
>> [  466.355498]  __do_sys_newstat+0x81/0x100
>> [  466.356889]  do_syscall_64+0xc8/0x600
>> [  466.358037]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [  466.359567]
>> [  466.360050] The buggy address belongs to the object at 
>> ffff888372139100
>> [  466.360050]  which belongs to the cache names_cache of size 4096
>> [  466.363735] The buggy address is located 336 bytes inside of
>> [  466.363735]  4096-byte region [ffff888372139100, ffff88837213a100)
>> [  466.367179] The buggy address belongs to the page:
>> [  466.368604] page:ffffea000dc84e00 refcount:1 mapcount:0 
>> mapping:ffff8883df1b4f00 index:0x0 compound_mapcount: 0
>> [  466.371582] flags: 0x2fffff80010200(slab|head)
>> [  466.372910] raw: 002fffff80010200 dead000000000100 
>> dead000000000122 ffff8883df1b4f00
>> [  466.375209] raw: 0000000000000000 0000000000070007 
>> 00000001ffffffff 0000000000000000
>> [  466.377778] page dumped because: kasan: bad access detected
>> [  466.379730]
>> [  466.380288] Memory state around the buggy address:
>> [  466.381844]  ffff888372139100: fb fb fb fb fb fb fb fb fb fb fb fb 
>> fb fb fb fb
>> [  466.384009]  ffff888372139180: fb fb fb fb fb fb fb fb fb fb fb fb 
>> fb fb fb fb
>> [  466.386131] >ffff888372139200: fb fb fb fb fb fb fb fb fb fb fb fb 
>> fb fb fb fb
>> [  466.388257] ^
>> [  466.390234]  ffff888372139280: fb fb fb fb fb fb fb fb fb fb fb fb 
>> fb fb fb fb
>> [  466.392512]  ffff888372139300: fb fb fb fb fb fb fb fb fb fb fb fb 
>> fb fb fb fb
>> [  466.394667] 
>> ==================================================================
>>
>> tun_chr_read_iter() accessed the memory which freed by free_netdev()
>> called by tun_set_iff():
>>
>>     CPUA                CPUB
>>      tun_set_iff()
>>        alloc_netdev_mqs()
>>        tun_attach()
>>                     tun_chr_read_iter()
>>                       tun_get()
>>        register_netdevice()
>>        tun_detach_all()
>>          synchronize_net()
>>                       tun_do_read()
>>                         tun_ring_recv()
>>                           schedule()
>>        free_netdev()
>>                       tun_put() <-- UAF
>>
>> Set a new bit in tun->flag if register_netdevice() successed,
>> without this bit, tun_get() returns NULL to avoid using a
>> freed tun pointer.
>
>
> Good catch.
>
> Some comments inline.
>
>
>>
>> Fixes: eb0fb363f920 ("tuntap: attach queue 0 before registering 
>> netdevice")
>> Reported-by: Hulk Robot <hulkci@huawei.com>
>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
>> ---
>>   drivers/net/tun.c | 10 ++++++++--
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index db16d7a13e00..cbd60c276c40 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -115,6 +115,7 @@ do {                                \
>>   /* High bits in flags field are unused. */
>>   #define TUN_VNET_LE     0x80000000
>>   #define TUN_VNET_BE     0x40000000
>> +#define TUN_DEV_REGISTERED    0x20000000
>>     #define TUN_FEATURES (IFF_NO_PI | IFF_ONE_QUEUE | IFF_VNET_HDR | \
>>                 IFF_MULTI_QUEUE | IFF_NAPI | IFF_NAPI_FRAGS)
>> @@ -719,8 +720,10 @@ static void __tun_detach(struct tun_file *tfile, 
>> bool clean)
>>               netif_carrier_off(tun->dev);
>>                 if (!(tun->flags & IFF_PERSIST) &&
>> -                tun->dev->reg_state == NETREG_REGISTERED)
>> +                tun->dev->reg_state == NETREG_REGISTERED) {
>>                   unregister_netdevice(tun->dev);
>> +                tun->flags &= ~TUN_DEV_REGISTERED;
>> +            }
>>           }
>>           if (tun)
>>               xdp_rxq_info_unreg(&tfile->xdp_rxq);
>> @@ -884,8 +887,10 @@ static struct tun_struct *tun_get(struct 
>> tun_file *tfile)
>>         rcu_read_lock();
>>       tun = rcu_dereference(tfile->tun);
>> -    if (tun)
>> +    if (tun && (tun->flags & TUN_DEV_REGISTERED))
>>           dev_hold(tun->dev);
>> +    else
>> +        tun = NULL;
>>       rcu_read_unlock();
>>         return tun;
>> @@ -2836,6 +2841,7 @@ static int tun_set_iff(struct net *net, struct 
>> file *file, struct ifreq *ifr)
>>           err = register_netdevice(tun->dev);
>>           if (err < 0)
>>               goto err_detach;
>> +        tun->flags |= TUN_DEV_REGISTERED;
>>       }
>>         netif_carrier_on(tun->dev);
>
>
> This looks just a duplicated of netdev->state? However it lacks 
> sufficient synchronization like barriers or locks. How about:
It's not same, register_netdevice() will return error if 
call_netdevice_notifiers() failed after dev->reg_state is set to 
NETREG_REGISTERED.
>
> - call tun_set_real_num_queues() before register_netdevice() this can 
> have the same result as what  eb0fb363f920 did.
> - move tun_attach() after register_netdevice() this makes sure we 
> won't publish tfile->tun until we are sure at least one refcnt is held 
> by register_netdevice()?
Yes, I think this way is better, I will try this later.
>
> Thanks
>
>
> .
>



^ permalink raw reply

* Re: [PATCH net-next] mcast: ensure L-L IPv6 packets are accepted by bridge
From: Patrick Ruddy @ 2019-08-15 13:36 UTC (permalink / raw)
  To: Nikolay Aleksandrov, Linus Lüssing
  Cc: bridge, Ido Schimmel, netdev, roopa
In-Reply-To: <d0be5038-e76f-d21b-a034-e450cbb3010e@cumulusnetworks.com>

On Wed, 2019-08-14 at 23:34 +0300, Nikolay Aleksandrov wrote:
> On 8/14/19 11:11 PM, Linus Lüssing wrote:
> > On Wed, Aug 14, 2019 at 05:40:58PM +0100, Patrick Ruddy wrote:
> > > The group is being joined by MLD at the L3 level but the packets are
> > > not being passed up to the l3 interface becasue there is a MLD querier
> > > on the network
> > > 
> > > snippet from /proc/net/igmp6
> > > ...
> > > 40   sw1             ff0200000000000000000001ff008700     1 00000004 0
> > > 40   sw1             ff020000000000000000000000000002     1 00000004 0
> > > 40   sw1             ff020000000000000000000000000001     1 0000000C 0
> > > 40   sw1             ff010000000000000000000000000001     1 00000008 0
> > > 41   lo1             ff020000000000000000000000000001     1 0000000C 0
> > > 41   lo1             ff010000000000000000000000000001     1 00000008 0
> > > 42   sw1.1           ff020000000000000000000000000006     1 00000004 0
> > > 42   sw1.1           ff020000000000000000000000000005     1 00000004 0
> > > 42   sw1.1           ff0200000000000000000001ff000000     2 00000004 0
> > > 42   sw1.1           ff0200000000000000000001ff008700     1 00000004 0
> > > 42   sw1.1           ff0200000000000000000001ff000099     1 00000004 0
> > > 42   sw1.1           ff020000000000000000000000000002     1 00000004 0
> > > 42   sw1.1           ff020000000000000000000000000001     1 0000000C 0
> > > 42   sw1.1           ff010000000000000000000000000001     1 00000008 0
> > > ...
> > > 
> > > the bridge is sw1 and the l3 intervace is sw1.1
> > 
> > What kind of interface is sw1.1 exactly? Is it a VLAN or a VRF
> > interface? Something else?
> > 
> +1
> 
> > Could you also post the output of bridge mdb show?
> > 
> > Regards, Linus
> > 
> > 
> > PS: Also please include the bridge mailinglist in the future.
> > 
> 
> Note that if you'd like to debug a host joined group currently bridge mdb show
> will not dump it and if the group is host-joined only it
> can even be empty. You can use my latest set (not applied yet):
> https://urldefense.proofpoint.com/v2/url?u=http-3A__patchwork.ozlabs.org_project_netdev_list_-3Fseries-3D125169&d=DwIDaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=au3D9TlUU6OvFpWOU9cuIHeNeV2fw-AOF1ZqCRqsILc&m=KsdarH0MAMMoKZ4PuvHrEC57uEluTGK-XSL4uUxu9MY&s=jyoK6VVmFh1KpKZirrtUYwq9nLy8fz-GigFFLjaLsoE&e=
> 
> Alternatively you could monitor the mdb events, it will show up there even
> today without any changes (bridge monitor mdb) and you can check if it's
> getting deleted.
> 
> Cheers,
>  Nik

The sw1.1 interface is a .1q vlan

The output of "bridge monitor mdb" is empty

I can see the incoming query and the outging report on tshark:
29002 72654.887739 fe80::4041:1ff:fe00:101 → ff02::1      ICMPv6 94
Multicast Listener Query
29003 72655.502035 fe80::eac5:7aff:fe00:8700 → ff02::16     ICMPv6 194
Multicast Listener Report Message v2

debugging shows that bridge code sees the incoming query but not the
outgoing report.

Thanks for all the pointers - I will pursue what is happening.

-pr 


^ permalink raw reply

* [patch net-next v3 0/2] selftests: netdevsim: add devlink paramstests
From: Jiri Pirko @ 2019-08-15 13:42 UTC (permalink / raw)
  To: netdev; +Cc: davem, jakub.kicinski, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

The first patch is just a helper addition as a dependency of the actual
test in patch number two.

Jiri Pirko (2):
  selftests: net: push jq workaround into separate helper
  selftests: netdevsim: add devlink params tests

 .../drivers/net/netdevsim/devlink.sh          | 62 ++++++++++++++++++-
 tools/testing/selftests/net/forwarding/lib.sh | 19 ++++++
 .../selftests/net/forwarding/tc_common.sh     | 17 ++---
 3 files changed, 84 insertions(+), 14 deletions(-)

-- 
2.21.0


^ permalink raw reply

* [patch net-next v3 1/2] selftests: net: push jq workaround into separate helper
From: Jiri Pirko @ 2019-08-15 13:42 UTC (permalink / raw)
  To: netdev; +Cc: davem, jakub.kicinski, mlxsw
In-Reply-To: <20190815134229.8884-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Push the jq return value workaround code into a separate helper so it
could be used by the rest of the code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v2->v3:
-remove -e jq option and rather check for empty output and return error
v1->v2:
-new patch
---
 tools/testing/selftests/net/forwarding/lib.sh | 19 +++++++++++++++++++
 .../selftests/net/forwarding/tc_common.sh     | 17 ++++-------------
 2 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 9385dc971269..85c587a03c8a 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -250,6 +250,25 @@ setup_wait()
 	sleep $WAIT_TIME
 }
 
+cmd_jq()
+{
+	local cmd=$1
+	local jq_exp=$2
+	local ret
+	local output
+
+	output="$($cmd)"
+	# it the command fails, return error right away
+	ret=$?
+	if [[ $ret -ne 0 ]]; then
+		return $ret
+	fi
+	output=$(echo $output | jq -r "$jq_exp")
+	echo $output
+	# return success only in case of non-empty output
+	[ ! -z "$output" ]
+}
+
 lldpad_app_wait_set()
 {
 	local dev=$1; shift
diff --git a/tools/testing/selftests/net/forwarding/tc_common.sh b/tools/testing/selftests/net/forwarding/tc_common.sh
index 9d3b64a2a264..315e934358d4 100644
--- a/tools/testing/selftests/net/forwarding/tc_common.sh
+++ b/tools/testing/selftests/net/forwarding/tc_common.sh
@@ -8,18 +8,9 @@ tc_check_packets()
 	local id=$1
 	local handle=$2
 	local count=$3
-	local ret
 
-	output="$(tc -j -s filter show $id)"
-	# workaround the jq bug which causes jq to return 0 in case input is ""
-	ret=$?
-	if [[ $ret -ne 0 ]]; then
-		return $ret
-	fi
-	echo $output | \
-		jq -e ".[] \
-		| select(.options.handle == $handle) \
-		| select(.options.actions[0].stats.packets == $count)" \
-		&> /dev/null
-	return $?
+	cmd_jq "tc -j -s filter show $id" \
+	       ".[] | select(.options.handle == $handle) | \
+	              select(.options.actions[0].stats.packets == $count)" \
+	       &> /dev/null
 }
-- 
2.21.0


^ permalink raw reply related

* [patch net-next v3 2/2] selftests: netdevsim: add devlink params tests
From: Jiri Pirko @ 2019-08-15 13:42 UTC (permalink / raw)
  To: netdev; +Cc: davem, jakub.kicinski, mlxsw
In-Reply-To: <20190815134229.8884-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Test recently added netdevsim devlink param implementation.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v1->v2:
-using cmd_jq helper
---
 .../drivers/net/netdevsim/devlink.sh          | 62 ++++++++++++++++++-
 1 file changed, 61 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
index 9d8baf5d14b3..6828e9404460 100755
--- a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
+++ b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
@@ -3,7 +3,7 @@
 
 lib_dir=$(dirname $0)/../../../net/forwarding
 
-ALL_TESTS="fw_flash_test"
+ALL_TESTS="fw_flash_test params_test"
 NUM_NETIFS=0
 source $lib_dir/lib.sh
 
@@ -30,6 +30,66 @@ fw_flash_test()
 	log_test "fw flash test"
 }
 
+param_get()
+{
+	local name=$1
+
+	cmd_jq "devlink dev param show $DL_HANDLE name $name -j" \
+	       '.[][][].values[] | select(.cmode == "driverinit").value'
+}
+
+param_set()
+{
+	local name=$1
+	local value=$2
+
+	devlink dev param set $DL_HANDLE name $name cmode driverinit value $value
+}
+
+check_value()
+{
+	local name=$1
+	local phase_name=$2
+	local expected_param_value=$3
+	local expected_debugfs_value=$4
+	local value
+
+	value=$(param_get $name)
+	check_err $? "Failed to get $name param value"
+	[ "$value" == "$expected_param_value" ]
+	check_err $? "Unexpected $phase_name $name param value"
+	value=$(<$DEBUGFS_DIR/$name)
+	check_err $? "Failed to get $name debugfs value"
+	[ "$value" == "$expected_debugfs_value" ]
+	check_err $? "Unexpected $phase_name $name debugfs value"
+}
+
+params_test()
+{
+	RET=0
+
+	local max_macs
+	local test1
+
+	check_value max_macs initial 32 32
+	check_value test1 initial true Y
+
+	param_set max_macs 16
+	check_err $? "Failed to set max_macs param value"
+	param_set test1 false
+	check_err $? "Failed to set test1 param value"
+
+	check_value max_macs post-set 16 32
+	check_value test1 post-set false Y
+
+	devlink dev reload $DL_HANDLE
+
+	check_value max_macs post-reload 16 16
+	check_value test1 post-reload false N
+
+	log_test "params test"
+}
+
 setup_prepare()
 {
 	modprobe netdevsim
-- 
2.21.0


^ permalink raw reply related

* [patch net-next v4 0/2] netdevsim: implement support for devlink region and snapshots
From: Jiri Pirko @ 2019-08-15 13:46 UTC (permalink / raw)
  To: netdev; +Cc: davem, jakub.kicinski, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

Implement devlink region support for netdevsim and test it.

---
Note the selftest patch depends on "[patch net-next] selftests:
netdevsim: add devlink params tests" patch sent earlier today.

Jiri Pirko (2):
  netdevsim: implement support for devlink region and snapshots
  selftests: netdevsim: add devlink regions tests

 drivers/net/netdevsim/dev.c                   | 63 ++++++++++++++++++-
 drivers/net/netdevsim/netdevsim.h             |  1 +
 .../drivers/net/netdevsim/devlink.sh          | 54 +++++++++++++++-
 3 files changed, 116 insertions(+), 2 deletions(-)

-- 
2.21.0


^ permalink raw reply

* [patch net-next v4 1/2] netdevsim: implement support for devlink region and snapshots
From: Jiri Pirko @ 2019-08-15 13:46 UTC (permalink / raw)
  To: netdev; +Cc: davem, jakub.kicinski, mlxsw
In-Reply-To: <20190815134634.9858-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Implement dummy region of size 32K and allow user to create snapshots
or random data using debugfs file trigger.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v3->v4:
- sorted nsim_dev_take_snapshot_write() variables.
v2->v3:
- return -ENOMEM in case dummy_data cannot be allocated
  and don't print out error message.
- return err in case snapshot creation fails and kfree dummy_data.
- use PTR_ERR_OR_ZERO in nsim_dev_dummy_region_init().
---
 drivers/net/netdevsim/dev.c       | 63 ++++++++++++++++++++++++++++++-
 drivers/net/netdevsim/netdevsim.h |  1 +
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 08ca59fc189b..a570da406d1d 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -27,6 +27,41 @@
 
 static struct dentry *nsim_dev_ddir;
 
+#define NSIM_DEV_DUMMY_REGION_SIZE (1024 * 32)
+
+static ssize_t nsim_dev_take_snapshot_write(struct file *file,
+					    const char __user *data,
+					    size_t count, loff_t *ppos)
+{
+	struct nsim_dev *nsim_dev = file->private_data;
+	void *dummy_data;
+	int err;
+	u32 id;
+
+	dummy_data = kmalloc(NSIM_DEV_DUMMY_REGION_SIZE, GFP_KERNEL);
+	if (!dummy_data)
+		return -ENOMEM;
+
+	get_random_bytes(dummy_data, NSIM_DEV_DUMMY_REGION_SIZE);
+
+	id = devlink_region_shapshot_id_get(priv_to_devlink(nsim_dev));
+	err = devlink_region_snapshot_create(nsim_dev->dummy_region,
+					     dummy_data, id, kfree);
+	if (err) {
+		pr_err("Failed to create region snapshot\n");
+		kfree(dummy_data);
+		return err;
+	}
+
+	return count;
+}
+
+static const struct file_operations nsim_dev_take_snapshot_fops = {
+	.open = simple_open,
+	.write = nsim_dev_take_snapshot_write,
+	.llseek = generic_file_llseek,
+};
+
 static int nsim_dev_debugfs_init(struct nsim_dev *nsim_dev)
 {
 	char dev_ddir_name[16];
@@ -44,6 +79,8 @@ static int nsim_dev_debugfs_init(struct nsim_dev *nsim_dev)
 			   &nsim_dev->max_macs);
 	debugfs_create_bool("test1", 0600, nsim_dev->ddir,
 			    &nsim_dev->test1);
+	debugfs_create_file("take_snapshot", 0200, nsim_dev->ddir, nsim_dev,
+			    &nsim_dev_take_snapshot_fops);
 	return 0;
 }
 
@@ -248,6 +285,23 @@ static void nsim_devlink_param_load_driverinit_values(struct devlink *devlink)
 		nsim_dev->test1 = saved_value.vbool;
 }
 
+#define NSIM_DEV_DUMMY_REGION_SNAPSHOT_MAX 16
+
+static int nsim_dev_dummy_region_init(struct nsim_dev *nsim_dev,
+				      struct devlink *devlink)
+{
+	nsim_dev->dummy_region =
+		devlink_region_create(devlink, "dummy",
+				      NSIM_DEV_DUMMY_REGION_SNAPSHOT_MAX,
+				      NSIM_DEV_DUMMY_REGION_SIZE);
+	return PTR_ERR_OR_ZERO(nsim_dev->dummy_region);
+}
+
+static void nsim_dev_dummy_region_exit(struct nsim_dev *nsim_dev)
+{
+	devlink_region_destroy(nsim_dev->dummy_region);
+}
+
 static int nsim_dev_reload(struct devlink *devlink,
 			   struct netlink_ext_ack *extack)
 {
@@ -363,10 +417,14 @@ nsim_dev_create(struct nsim_bus_dev *nsim_bus_dev, unsigned int port_count)
 		goto err_dl_unregister;
 	nsim_devlink_set_params_init_values(nsim_dev, devlink);
 
-	err = nsim_dev_debugfs_init(nsim_dev);
+	err = nsim_dev_dummy_region_init(nsim_dev, devlink);
 	if (err)
 		goto err_params_unregister;
 
+	err = nsim_dev_debugfs_init(nsim_dev);
+	if (err)
+		goto err_dummy_region_exit;
+
 	err = nsim_bpf_dev_init(nsim_dev);
 	if (err)
 		goto err_debugfs_exit;
@@ -376,6 +434,8 @@ nsim_dev_create(struct nsim_bus_dev *nsim_bus_dev, unsigned int port_count)
 
 err_debugfs_exit:
 	nsim_dev_debugfs_exit(nsim_dev);
+err_dummy_region_exit:
+	nsim_dev_dummy_region_exit(nsim_dev);
 err_params_unregister:
 	devlink_params_unregister(devlink, nsim_devlink_params,
 				  ARRAY_SIZE(nsim_devlink_params));
@@ -396,6 +456,7 @@ static void nsim_dev_destroy(struct nsim_dev *nsim_dev)
 
 	nsim_bpf_dev_exit(nsim_dev);
 	nsim_dev_debugfs_exit(nsim_dev);
+	nsim_dev_dummy_region_exit(nsim_dev);
 	devlink_params_unregister(devlink, nsim_devlink_params,
 				  ARRAY_SIZE(nsim_devlink_params));
 	devlink_unregister(devlink);
diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
index 95751a817508..4c758c6919f5 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -160,6 +160,7 @@ struct nsim_dev {
 	bool fw_update_status;
 	u32 max_macs;
 	bool test1;
+	struct devlink_region *dummy_region;
 };
 
 int nsim_dev_init(void);
-- 
2.21.0


^ permalink raw reply related

* [patch net-next v4 2/2] selftests: netdevsim: add devlink regions tests
From: Jiri Pirko @ 2019-08-15 13:46 UTC (permalink / raw)
  To: netdev; +Cc: davem, jakub.kicinski, mlxsw
In-Reply-To: <20190815134634.9858-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Test netdevsim devlink region implementation.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v1->v2:
- new patch
---
 .../drivers/net/netdevsim/devlink.sh          | 54 ++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
index 6828e9404460..115837355eaf 100755
--- a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
+++ b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
@@ -3,7 +3,7 @@
 
 lib_dir=$(dirname $0)/../../../net/forwarding
 
-ALL_TESTS="fw_flash_test params_test"
+ALL_TESTS="fw_flash_test params_test regions_test"
 NUM_NETIFS=0
 source $lib_dir/lib.sh
 
@@ -90,6 +90,58 @@ params_test()
 	log_test "params test"
 }
 
+check_region_size()
+{
+	local name=$1
+	local size
+
+	size=$(devlink region show $DL_HANDLE/$name -j | jq -e -r '.[][].size')
+	check_err $? "Failed to get $name region size"
+	[ $size -eq 32768 ]
+	check_err $? "Invalid $name region size"
+}
+
+check_region_snapshot_count()
+{
+	local name=$1
+	local phase_name=$2
+	local expected_count=$3
+	local count
+
+	count=$(devlink region show $DL_HANDLE/$name -j | jq -e -r '.[][].snapshot | length')
+	[ $count -eq $expected_count ]
+	check_err $? "Unexpected $phase_name snapshot count"
+}
+
+regions_test()
+{
+	RET=0
+
+	local count
+
+	check_region_size dummy
+	check_region_snapshot_count dummy initial 0
+
+	echo ""> $DEBUGFS_DIR/take_snapshot
+	check_err $? "Failed to take first dummy region snapshot"
+	check_region_snapshot_count dummy post-first-snapshot 1
+
+	echo ""> $DEBUGFS_DIR/take_snapshot
+	check_err $? "Failed to take second dummy region snapshot"
+	check_region_snapshot_count dummy post-second-snapshot 2
+
+	echo ""> $DEBUGFS_DIR/take_snapshot
+	check_err $? "Failed to take third dummy region snapshot"
+	check_region_snapshot_count dummy post-third-snapshot 3
+
+	devlink region del $DL_HANDLE/dummy snapshot 1
+	check_err $? "Failed to delete first dummy region snapshot"
+
+	check_region_snapshot_count dummy post-first-delete 2
+
+	log_test "regions test"
+}
+
 setup_prepare()
 {
 	modprobe netdevsim
-- 
2.21.0


^ permalink raw reply related

* RE: [net-next  1/1] tipc: clean up skb list lock handling on send path
From: Jon Maloy @ 2019-08-15 13:54 UTC (permalink / raw)
  To: Xin Long
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Tung Quang Nguyen,
	Hoang Huu Le, shuali@redhat.com, ying xue, edumazet@google.com,
	tipc-discussion@lists.sourceforge.net
In-Reply-To: <2060574598.8620705.1565848654584.JavaMail.zimbra@redhat.com>



> -----Original Message-----
> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org> On
> Behalf Of Xin Long
> Sent: 15-Aug-19 01:58
> To: Jon Maloy <jon.maloy@ericsson.com>
> Cc: davem@davemloft.net; netdev@vger.kernel.org; Tung Quang Nguyen
> <tung.q.nguyen@dektech.com.au>; Hoang Huu Le
> <hoang.h.le@dektech.com.au>; shuali@redhat.com; ying xue
> <ying.xue@windriver.com>; edumazet@google.com; tipc-
> discussion@lists.sourceforge.net
> Subject: Re: [net-next 1/1] tipc: clean up skb list lock handling on send path
> 
> 

[...]

> >  	/* Try again later if socket is busy */
> > --
> > 2.1.4
> >
> >
> Patch looks good, can you also check those tmp tx queues in:
> 
>   tipc_group_cong()
>   tipc_group_join()
>   tipc_link_create_dummy_tnl_msg()
>   tipc_link_tnl_prepare()
> 
> which are using skb_queue_head_init() to init?
> 
> Thanks.

You are right. I missed those. I'll post a v2 of this patch.
///jon

^ permalink raw reply

* Re: [RFC bpf-next 0/3] tools: bpftool: add subcommand to count map entries
From: Quentin Monnet @ 2019-08-15 14:02 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Edward Cree, Alexei Starovoitov,
	Daniel Borkmann, bpf, Network Development, oss-drivers
In-Reply-To: <CAEf4BzYqsT4OmWQ9WK9dmnKT9cMcjbhgHZmboUBgkEvtbaeUeA@mail.gmail.com>

2019-08-14 13:18 UTC-0700 ~ Andrii Nakryiko <andrii.nakryiko@gmail.com>
> On Wed, Aug 14, 2019 at 10:12 AM Quentin Monnet
> <quentin.monnet@netronome.com> wrote:
>>
>> 2019-08-14 09:58 UTC-0700 ~ Alexei Starovoitov
>> <alexei.starovoitov@gmail.com>
>>> On Wed, Aug 14, 2019 at 9:45 AM Edward Cree <ecree@solarflare.com> wrote:
>>>>
>>>> On 14/08/2019 10:42, Quentin Monnet wrote:
>>>>> 2019-08-13 18:51 UTC-0700 ~ Alexei Starovoitov
>>>>> <alexei.starovoitov@gmail.com>
>>>>>> The same can be achieved by 'bpftool map dump|grep key|wc -l', no?
>>>>> To some extent (with subtleties for some other map types); and we use a
>>>>> similar command line as a workaround for now. But because of the rate of
>>>>> inserts/deletes in the map, the process often reports a number higher
>>>>> than the max number of entries (we observed up to ~750k when max_entries
>>>>> is 500k), even is the map is only half-full on average during the count.
>>>>> On the worst case (though not frequent), an entry is deleted just before
>>>>> we get the next key from it, and iteration starts all over again. This
>>>>> is not reliable to determine how much space is left in the map.
>>>>>
>>>>> I cannot see a solution that would provide a more accurate count from
>>>>> user space, when the map is under pressure?
>>>> This might be a really dumb suggestion, but: you're wanting to collect a
>>>>  summary statistic over an in-kernel data structure in a single syscall,
>>>>  because making a series of syscalls to examine every entry is slow and
>>>>  racy.  Isn't that exactly a job for an in-kernel virtual machine, and
>>>>  could you not supply an eBPF program which the kernel runs on each entry
>>>>  in the map, thus supporting people who want to calculate something else
>>>>  (mean, min and max, whatever) instead of count?
>>>
>>> Pretty much my suggestion as well :)
> 
> I also support the suggestion to count it from BPF side. It's flexible
> and powerful approach and doesn't require adding more and more nuanced
> sub-APIs to kernel to support subset of bulk operations on map
> (subset, because we'll expose count, but what about, e.g., p50, etc,
> there will always be something more that someone will want and it just
> doesn't scale).

Hi Andrii,
Yes, that makes sense.

> 
>>>
>>> It seems the better fix for your nat threshold is to keep count of
>>> elements in the map in a separate global variable that
>>> bpf program manually increments and decrements.
>>> bpftool will dump it just as regular map of single element.
>>> (I believe it doesn't recognize global variables properly yet)
>>> and BTF will be there to pick exactly that 'count' variable.
>>>
>>
>> It would be with an offloaded map, but yes, I suppose we could keep
>> track of the numbers in a separate map. We'll have a look into this.
> 
> See if you can use a global variable, that way you completely
> eliminate any overhead from BPF side of things, except for atomic
> increment.

Offloaded maps do not implement the map_direct_value_addr() operation,
so global variables are not supported at the moment. I need to dive
deeper into this and see what is required to add that support.

Thanks for your advice!
Quentin

^ permalink raw reply

* Re: [PATCH v5 10/18] compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
From: Greg Kroah-Hartman @ 2019-08-15 14:09 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, viro, linux-fsdevel, Jiri Slaby, David S. Miller,
	netdev
In-Reply-To: <20190814205521.122180-1-arnd@arndb.de>

On Wed, Aug 14, 2019 at 10:54:45PM +0200, Arnd Bergmann wrote:
> All users of this call are in socket or tty code, so handling
> it there means we can avoid the table entry in fs/compat_ioctl.c.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  drivers/tty/tty_io.c | 1 +
>  fs/compat_ioctl.c    | 2 --
>  net/socket.c         | 2 ++
>  3 files changed, 3 insertions(+), 2 deletions(-)a

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply

* Re: [RFC bpf-next 0/3] tools: bpftool: add subcommand to count map entries
From: Quentin Monnet @ 2019-08-15 14:15 UTC (permalink / raw)
  To: Edward Cree, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, netdev, oss-drivers
In-Reply-To: <84aa97e3-5fde-e041-12c6-85863e27d2d9@solarflare.com>

2019-08-14 18:14 UTC+0100 ~ Edward Cree <ecree@solarflare.com>
> On 14/08/2019 17:58, Quentin Monnet wrote:
>> 2019-08-14 17:45 UTC+0100 ~ Edward Cree <ecree@solarflare.com>
>>> This might be a really dumb suggestion, but: you're wanting to collect a
>>>  summary statistic over an in-kernel data structure in a single syscall,
>>>  because making a series of syscalls to examine every entry is slow and
>>>  racy.  Isn't that exactly a job for an in-kernel virtual machine, and
>>>  could you not supply an eBPF program which the kernel runs on each entry
>>>  in the map, thus supporting people who want to calculate something else
>>>  (mean, min and max, whatever) instead of count?
>>>
>> Hi Edward, I like the approach, thanks for the suggestion.
>>
>> But I did not mention that we were using offloaded maps: Tracing the
>> kernel would probably work for programs running on the host, but this is
>> not a solution we could extend to hardware offload.
> I don't see where "tracing" comes into it; this is a new program type and
>  a new map op under the bpf() syscall.
> Could the user-supplied BPF program not then be passed down to the device
>  for it to run against its offloaded maps?
> 

Sorry, I misunderstood your suggestion :s (I thought you meant tracing
the insert and delete operations).

So if I understand correctly, we would use the bpf() syscall to trigger
a run of such program on all map entries (for map implementing the new
operation), and the context would include pointers to the key and the
value for the entry being processed so we can count/sum/compute an
average of the values or any other kind of processing?

Thanks,
Quentin

^ permalink raw reply

* Re: [RFC PATCH 1/5] x86: tsc: add tsc to art helpers
From: Thomas Gleixner @ 2019-08-15 14:16 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Richard Cochran, netdev, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, linux-kernel, Christopher S . Hall
In-Reply-To: <87y2zvt1hk.fsf@gmail.com>

Felipe,

On Thu, 15 Aug 2019, Felipe Balbi wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:
> > On Tue, 16 Jul 2019, Felipe Balbi wrote:
> >
> > So some information what those interfaces are used for and why they are
> > needed would be really helpful.
> 
> Okay, I have some more details about this. The TGPIO device itself uses
> ART since TSC is not directly available to anything other than the
> CPU. The 'problem' here is that reading ART incurs extra latency which
> we would like to avoid. Therefore, we use TSC and scale it to
> nanoseconds which, would be the same as ART to ns.

Fine. But that's not really correct:

      TSC = art_to_tsc_offset + ART * scale;
 
> >> +void get_tsc_ns(struct system_counterval_t *tsc_counterval, u64 *tsc_ns)

Why is this not returning the result instead of having that pointer
indirection?

> >> +{
> >> +	u64 tmp, res, rem;
> >> +	u64 cycles;
> >> +
> >> +	tsc_counterval->cycles = clocksource_tsc.read(NULL);
> >> +	cycles = tsc_counterval->cycles;
> >> +	tsc_counterval->cs = art_related_clocksource;

So this does more than returning the TSC time converted to nanoseconds. The
function name should reflect this. Plus both functions want kernel-doc
explaining what they do.

> >> +	rem = do_div(cycles, tsc_khz);
> >> +
> >> +	res = cycles * USEC_PER_SEC;
> >> +	tmp = rem * USEC_PER_SEC;
> >> +
> >> +	do_div(tmp, tsc_khz);
> >> +	res += tmp;
> >> +
> >> +	*tsc_ns = res;
> >> +}
> >> +EXPORT_SYMBOL(get_tsc_ns);
> >> +
> >> +u64 get_art_ns_now(void)
> >> +{
> >> +	struct system_counterval_t tsc_cycles;
> >> +	u64 tsc_ns;
> >> +
> >> +	get_tsc_ns(&tsc_cycles, &tsc_ns);
> >> +
> >> +	return tsc_ns;
> >> +}
> >> +EXPORT_SYMBOL(get_art_ns_now);
> >
> > While the changes look innocuous I'm missing the big picture why this needs
> > to emulate ART instead of simply using TSC directly.
> 
> i don't think we're emulating ART here (other than the name in the
> function). We're just reading TSC and converting to nanoseconds, right?

Well, the function name says clearly: get_art_ns_now(). But you are not
using ART, you use the TSC and derive the ART value from it (incorrectly).

Thanks,

	tglx


^ permalink raw reply

* [PATCH -next] btf: fix return value check in btf_vmlinux_init()
From: Wei Yongjun @ 2019-08-15 14:24 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko
  Cc: Wei Yongjun, netdev, bpf, kernel-janitors

In case of error, the function kobject_create_and_add() returns NULL
pointer not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.

Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
---
 kernel/bpf/sysfs_btf.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/sysfs_btf.c b/kernel/bpf/sysfs_btf.c
index 4659349fc795..be5557deb958 100644
--- a/kernel/bpf/sysfs_btf.c
+++ b/kernel/bpf/sysfs_btf.c
@@ -30,16 +30,13 @@ static struct kobject *btf_kobj;
 
 static int __init btf_vmlinux_init(void)
 {
-	int err;
-
 	if (!_binary__btf_vmlinux_bin_start)
 		return 0;
 
 	btf_kobj = kobject_create_and_add("btf", kernel_kobj);
-	if (IS_ERR(btf_kobj)) {
-		err = PTR_ERR(btf_kobj);
+	if (!btf_kobj) {
 		btf_kobj = NULL;
-		return err;
+		return -ENOMEM;
 	}
 
 	bin_attr_btf_vmlinux.size = _binary__btf_vmlinux_bin_end -




^ permalink raw reply related

* [PATCH bpf] tools: bpftool: close prog FD before exit on showing a single program
From: Quentin Monnet @ 2019-08-15 14:22 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet

When showing metadata about a single program by invoking
"bpftool prog show PROG", the file descriptor referring to the program
is not closed before returning from the function. Let's close it.

Fixes: 71bb428fe2c1 ("tools: bpf: add bpftool")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/bpf/bpftool/prog.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 66f04a4846a5..43fdbbfe41bb 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -363,7 +363,9 @@ static int do_show(int argc, char **argv)
 		if (fd < 0)
 			return -1;
 
-		return show_prog(fd);
+		err = show_prog(fd);
+		close(fd);
+		return err;
 	}
 
 	if (argc)
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf 0/6] tools: bpftool: fix printf()-like functions
From: Quentin Monnet @ 2019-08-15 14:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet

Hi,
Because the "__printf()" attributes were used only where the functions are
implemented, and not in header files, the checks have not been enforced on
all the calls to printf()-like functions, and a number of errors slipped in
bpftool over time.

This set cleans up such errors, and then moves the "__printf()" attributes
to header files, so that the checks are performed at all locations.

Quentin Monnet (6):
  tools: bpftool: fix arguments for p_err() in do_event_pipe()
  tools: bpftool: fix format strings and arguments for jsonw_printf()
  tools: bpftool: fix argument for p_err() in BTF do_dump()
  tools: bpftool: fix format string for p_err() in
    query_flow_dissector()
  tools: bpftool: fix format string for p_err() in
    detect_common_prefix()
  tools: bpftool: move "__printf()" attributes to header file

 tools/bpf/bpftool/btf.c            | 2 +-
 tools/bpf/bpftool/btf_dumper.c     | 8 ++++----
 tools/bpf/bpftool/common.c         | 4 ++--
 tools/bpf/bpftool/json_writer.c    | 6 ++----
 tools/bpf/bpftool/json_writer.h    | 6 ++++--
 tools/bpf/bpftool/main.c           | 2 +-
 tools/bpf/bpftool/main.h           | 4 ++--
 tools/bpf/bpftool/map_perf_ring.c  | 4 ++--
 tools/bpf/bpftool/net.c            | 2 +-
 tools/include/linux/compiler-gcc.h | 2 ++
 10 files changed, 21 insertions(+), 19 deletions(-)

-- 
2.17.1


^ permalink raw reply

* [PATCH bpf 1/6] tools: bpftool: fix arguments for p_err() in do_event_pipe()
From: Quentin Monnet @ 2019-08-15 14:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190815143220.4199-1-quentin.monnet@netronome.com>

The last argument passed to some calls to the p_err() functions is not
correct, it should be "*argv" instead of "**argv". This may lead to a
segmentation fault error if CPU IDs or indices from the command line
cannot be parsed correctly. Let's fix this.

Fixes: f412eed9dfde ("tools: bpftool: add simple perf event output reader")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/bpf/bpftool/map_perf_ring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/bpf/bpftool/map_perf_ring.c b/tools/bpf/bpftool/map_perf_ring.c
index 3f108ab17797..4c5531d1a450 100644
--- a/tools/bpf/bpftool/map_perf_ring.c
+++ b/tools/bpf/bpftool/map_perf_ring.c
@@ -157,7 +157,7 @@ int do_event_pipe(int argc, char **argv)
 			NEXT_ARG();
 			ctx.cpu = strtoul(*argv, &endptr, 0);
 			if (*endptr) {
-				p_err("can't parse %s as CPU ID", **argv);
+				p_err("can't parse %s as CPU ID", *argv);
 				goto err_close_map;
 			}
 
@@ -168,7 +168,7 @@ int do_event_pipe(int argc, char **argv)
 			NEXT_ARG();
 			ctx.idx = strtoul(*argv, &endptr, 0);
 			if (*endptr) {
-				p_err("can't parse %s as index", **argv);
+				p_err("can't parse %s as index", *argv);
 				goto err_close_map;
 			}
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf 2/6] tools: bpftool: fix format strings and arguments for jsonw_printf()
From: Quentin Monnet @ 2019-08-15 14:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190815143220.4199-1-quentin.monnet@netronome.com>

There are some mismatches between format strings and arguments passed to
jsonw_printf() in the BTF dumper for bpftool, which seems harmless but
may result in warnings if the "__printf()" attribute is used correctly
for jsonw_printf(). Let's fix relevant format strings and type cast.

Fixes: b12d6ec09730 ("bpf: btf: add btf print functionality")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/bpf/bpftool/btf_dumper.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/bpf/bpftool/btf_dumper.c b/tools/bpf/bpftool/btf_dumper.c
index 8cafb9b31467..d66131f69689 100644
--- a/tools/bpf/bpftool/btf_dumper.c
+++ b/tools/bpf/bpftool/btf_dumper.c
@@ -26,9 +26,9 @@ static void btf_dumper_ptr(const void *data, json_writer_t *jw,
 			   bool is_plain_text)
 {
 	if (is_plain_text)
-		jsonw_printf(jw, "%p", *(unsigned long *)data);
+		jsonw_printf(jw, "%p", data);
 	else
-		jsonw_printf(jw, "%u", *(unsigned long *)data);
+		jsonw_printf(jw, "%lu", *(unsigned long *)data);
 }
 
 static int btf_dumper_modifier(const struct btf_dumper *d, __u32 type_id,
@@ -216,7 +216,7 @@ static int btf_dumper_int(const struct btf_type *t, __u8 bit_offset,
 	switch (BTF_INT_ENCODING(*int_type)) {
 	case 0:
 		if (BTF_INT_BITS(*int_type) == 64)
-			jsonw_printf(jw, "%lu", *(__u64 *)data);
+			jsonw_printf(jw, "%llu", *(__u64 *)data);
 		else if (BTF_INT_BITS(*int_type) == 32)
 			jsonw_printf(jw, "%u", *(__u32 *)data);
 		else if (BTF_INT_BITS(*int_type) == 16)
@@ -229,7 +229,7 @@ static int btf_dumper_int(const struct btf_type *t, __u8 bit_offset,
 		break;
 	case BTF_INT_SIGNED:
 		if (BTF_INT_BITS(*int_type) == 64)
-			jsonw_printf(jw, "%ld", *(long long *)data);
+			jsonw_printf(jw, "%lld", *(long long *)data);
 		else if (BTF_INT_BITS(*int_type) == 32)
 			jsonw_printf(jw, "%d", *(int *)data);
 		else if (BTF_INT_BITS(*int_type) == 16)
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf 4/6] tools: bpftool: fix format string for p_err() in query_flow_dissector()
From: Quentin Monnet @ 2019-08-15 14:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190815143220.4199-1-quentin.monnet@netronome.com>

The format string passed to one call to the p_err() function in
query_flow_dissector() does not match the value that should be printed,
resulting in some garbage integer being printed instead of
strerror(errno) if /proc/self/ns/net cannot be open. Let's fix the
format string.

Fixes: 7f0c57fec80f ("bpftool: show flow_dissector attachment status")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/bpf/bpftool/net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index 67e99c56bc88..e3b770082a39 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -197,7 +197,7 @@ static int query_flow_dissector(struct bpf_attach_info *attach_info)
 
 	fd = open("/proc/self/ns/net", O_RDONLY);
 	if (fd < 0) {
-		p_err("can't open /proc/self/ns/net: %d",
+		p_err("can't open /proc/self/ns/net: %s",
 		      strerror(errno));
 		return -1;
 	}
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf 5/6] tools: bpftool: fix format string for p_err() in detect_common_prefix()
From: Quentin Monnet @ 2019-08-15 14:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190815143220.4199-1-quentin.monnet@netronome.com>

There is one call to the p_err() function in detect_common_prefix()
where the message to print is passed directly as the first argument,
without using a format string. This is harmless, but may trigger
warnings if the "__printf()" attribute is used correctly for the p_err()
function. Let's fix it by using a "%s" format string.

Fixes: ba95c7452439 ("tools: bpftool: add "prog run" subcommand to test-run programs")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/bpf/bpftool/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index e916ff25697f..93d008687020 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -139,7 +139,7 @@ int detect_common_prefix(const char *arg, ...)
 	strncat(msg, "'", sizeof(msg) - strlen(msg) - 1);
 
 	if (count >= 2) {
-		p_err(msg);
+		p_err("%s", msg);
 		return -1;
 	}
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf 3/6] tools: bpftool: fix argument for p_err() in BTF do_dump()
From: Quentin Monnet @ 2019-08-15 14:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190815143220.4199-1-quentin.monnet@netronome.com>

The last argument passed to one call to the p_err() function is not
correct, it should be "*argv" instead of "**argv". This may lead to a
segmentation fault error if BTF id cannot be parsed correctly. Let's fix
this.

Fixes: c93cc69004dt ("bpftool: add ability to dump BTF types")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/bpf/bpftool/btf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/btf.c b/tools/bpf/bpftool/btf.c
index 1b8ec91899e6..8805637f1a7e 100644
--- a/tools/bpf/bpftool/btf.c
+++ b/tools/bpf/bpftool/btf.c
@@ -449,7 +449,7 @@ static int do_dump(int argc, char **argv)
 
 		btf_id = strtoul(*argv, &endptr, 0);
 		if (*endptr) {
-			p_err("can't parse %s as ID", **argv);
+			p_err("can't parse %s as ID", *argv);
 			return -1;
 		}
 		NEXT_ARG();
-- 
2.17.1


^ permalink raw reply related

* [PATCH bpf 6/6] tools: bpftool: move "__printf()" attributes to header file
From: Quentin Monnet @ 2019-08-15 14:32 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet
In-Reply-To: <20190815143220.4199-1-quentin.monnet@netronome.com>

Some functions in bpftool have a "__printf()" format attributes to tell
the compiler they should expect printf()-like arguments. But because
these attributes are not used for the function prototypes in the header
files, the compiler does not run the checks everywhere the functions are
used, and some mistakes on format string and corresponding arguments
slipped in over time.

Let's move the __printf() attributes to the correct places.

Note: We add guards around the definition of GCC_VERSION in
tools/include/linux/compiler-gcc.h to prevent a conflict in jit_disasm.c
on GCC_VERSION from headers pulled via libbfd.

Fixes: c101189bc968 ("tools: bpftool: fix -Wmissing declaration warnings")
Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/bpf/bpftool/common.c         | 4 ++--
 tools/bpf/bpftool/json_writer.c    | 6 ++----
 tools/bpf/bpftool/json_writer.h    | 6 ++++--
 tools/bpf/bpftool/main.h           | 4 ++--
 tools/include/linux/compiler-gcc.h | 2 ++
 5 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 6a71324be628..88264abaa738 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -29,7 +29,7 @@
 #define BPF_FS_MAGIC		0xcafe4a11
 #endif
 
-void __printf(1, 2) p_err(const char *fmt, ...)
+void p_err(const char *fmt, ...)
 {
 	va_list ap;
 
@@ -47,7 +47,7 @@ void __printf(1, 2) p_err(const char *fmt, ...)
 	va_end(ap);
 }
 
-void __printf(1, 2) p_info(const char *fmt, ...)
+void p_info(const char *fmt, ...)
 {
 	va_list ap;
 
diff --git a/tools/bpf/bpftool/json_writer.c b/tools/bpf/bpftool/json_writer.c
index 6046dcab51cc..86501cd3c763 100644
--- a/tools/bpf/bpftool/json_writer.c
+++ b/tools/bpf/bpftool/json_writer.c
@@ -15,7 +15,6 @@
 #include <malloc.h>
 #include <inttypes.h>
 #include <stdint.h>
-#include <linux/compiler.h>
 
 #include "json_writer.h"
 
@@ -153,8 +152,7 @@ void jsonw_name(json_writer_t *self, const char *name)
 		putc(' ', self->out);
 }
 
-void __printf(2, 0)
-jsonw_vprintf_enquote(json_writer_t *self, const char *fmt, va_list ap)
+void jsonw_vprintf_enquote(json_writer_t *self, const char *fmt, va_list ap)
 {
 	jsonw_eor(self);
 	putc('"', self->out);
@@ -162,7 +160,7 @@ jsonw_vprintf_enquote(json_writer_t *self, const char *fmt, va_list ap)
 	putc('"', self->out);
 }
 
-void __printf(2, 3) jsonw_printf(json_writer_t *self, const char *fmt, ...)
+void jsonw_printf(json_writer_t *self, const char *fmt, ...)
 {
 	va_list ap;
 
diff --git a/tools/bpf/bpftool/json_writer.h b/tools/bpf/bpftool/json_writer.h
index cb9a1993681c..35cf1f00f96c 100644
--- a/tools/bpf/bpftool/json_writer.h
+++ b/tools/bpf/bpftool/json_writer.h
@@ -14,6 +14,7 @@
 #include <stdbool.h>
 #include <stdint.h>
 #include <stdarg.h>
+#include <linux/compiler.h>
 
 /* Opaque class structure */
 typedef struct json_writer json_writer_t;
@@ -30,8 +31,9 @@ void jsonw_pretty(json_writer_t *self, bool on);
 void jsonw_name(json_writer_t *self, const char *name);
 
 /* Add value  */
-void jsonw_vprintf_enquote(json_writer_t *self, const char *fmt, va_list ap);
-void jsonw_printf(json_writer_t *self, const char *fmt, ...);
+void __printf(2, 0) jsonw_vprintf_enquote(json_writer_t *self, const char *fmt,
+					  va_list ap);
+void __printf(2, 3) jsonw_printf(json_writer_t *self, const char *fmt, ...);
 void jsonw_string(json_writer_t *self, const char *value);
 void jsonw_bool(json_writer_t *self, bool value);
 void jsonw_float(json_writer_t *self, double number);
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 7031a4bf87a0..af9ad56c303a 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -98,8 +98,8 @@ extern int bpf_flags;
 extern struct pinned_obj_table prog_table;
 extern struct pinned_obj_table map_table;
 
-void p_err(const char *fmt, ...);
-void p_info(const char *fmt, ...);
+void __printf(1, 2) p_err(const char *fmt, ...);
+void __printf(1, 2) p_info(const char *fmt, ...);
 
 bool is_prefix(const char *pfx, const char *str);
 int detect_common_prefix(const char *arg, ...);
diff --git a/tools/include/linux/compiler-gcc.h b/tools/include/linux/compiler-gcc.h
index 0d35f18006a1..95c072b70d0e 100644
--- a/tools/include/linux/compiler-gcc.h
+++ b/tools/include/linux/compiler-gcc.h
@@ -6,9 +6,11 @@
 /*
  * Common definitions for all gcc versions go here.
  */
+#ifndef GCC_VERSION
 #define GCC_VERSION (__GNUC__ * 10000		\
 		     + __GNUC_MINOR__ * 100	\
 		     + __GNUC_PATCHLEVEL__)
+#endif
 
 #if GCC_VERSION >= 70000 && !defined(__CHECKER__)
 # define __fallthrough __attribute__ ((fallthrough))
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next v2 0/4] net/rds: Fixes from internal Oracle repo
From: Gerd Rausch @ 2019-08-15 14:40 UTC (permalink / raw)
  To: Santosh Shilimkar, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <20190814.212525.326606319186601317.davem@davemloft.net>

This is the first set of (mostly old) patches from our internal repository
in an effort to synchronize what Oracle had been using internally
with what is shipped with the Linux kernel.

Andy Grover (1):
  rds: check for excessive looping in rds_send_xmit

Chris Mason (2):
  RDS: limit the number of times we loop in rds_send_xmit
  RDS: don't use GFP_ATOMIC for sk_alloc in rds_create

Gerd Rausch (1):
  net/rds: Add a few missing rds_stat_names entries

 net/rds/af_rds.c  |  2 +-
 net/rds/ib_recv.c | 12 +++++++++++-
 net/rds/rds.h     |  2 +-
 net/rds/send.c    | 12 ++++++++++++
 net/rds/stats.c   |  3 +++
 5 files changed, 28 insertions(+), 3 deletions(-)

-- 

Changes in submitted patch v2:
* Dropped the controversial "sysctl" patch:
  https://lore.kernel.org/netdev/20190814.142112.1080694155114782651.davem@davemloft.net/

^ permalink raw reply

* [PATCH net-next v2 1/4] RDS: limit the number of times we loop in rds_send_xmit
From: Gerd Rausch @ 2019-08-15 14:42 UTC (permalink / raw)
  To: Santosh Shilimkar, netdev, linux-rdma, rds-devel; +Cc: David Miller
In-Reply-To: <cover.1565879451.git.gerd.rausch@oracle.com>

From: Chris Mason <chris.mason@oracle.com>
Date: Fri, 3 Feb 2012 11:07:54 -0500

This will kick the RDS worker thread if we have been looping
too long.

Original commit from 2012 updated to include a change by
Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
that triggers "must_wake" if "rds_ib_recv_refill_one" fails.

Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
---
 net/rds/ib_recv.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 3cae88cbdaa0..1a8a4a760b84 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -385,6 +385,7 @@ void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp)
 	unsigned int posted = 0;
 	int ret = 0;
 	bool can_wait = !!(gfp & __GFP_DIRECT_RECLAIM);
+	bool must_wake = false;
 	u32 pos;
 
 	/* the goal here is to just make sure that someone, somewhere
@@ -405,6 +406,7 @@ void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp)
 		recv = &ic->i_recvs[pos];
 		ret = rds_ib_recv_refill_one(conn, recv, gfp);
 		if (ret) {
+			must_wake = true;
 			break;
 		}
 
@@ -423,6 +425,11 @@ void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp)
 		}
 
 		posted++;
+
+		if ((posted > 128 && need_resched()) || posted > 8192) {
+			must_wake = true;
+			break;
+		}
 	}
 
 	/* We're doing flow control - update the window. */
@@ -445,10 +452,13 @@ void rds_ib_recv_refill(struct rds_connection *conn, int prefill, gfp_t gfp)
 	 * if we should requeue.
 	 */
 	if (rds_conn_up(conn) &&
-	    ((can_wait && rds_ib_ring_low(&ic->i_recv_ring)) ||
+	    (must_wake ||
+	    (can_wait && rds_ib_ring_low(&ic->i_recv_ring)) ||
 	    rds_ib_ring_empty(&ic->i_recv_ring))) {
 		queue_delayed_work(rds_wq, &conn->c_recv_w, 1);
 	}
+	if (can_wait)
+		cond_resched();
 }
 
 /*
-- 
2.22.1



^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox