Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] tools: bpftool: add feature check for zlib
From: Daniel Borkmann @ 2019-08-13 14:24 UTC (permalink / raw)
  To: Peter Wu, Jakub Kicinski
  Cc: Stanislav Fomichev, Alexei Starovoitov, netdev, Quentin Monnet
In-Reply-To: <20190813003833.22042-2-peter@lekensteyn.nl>

On 8/13/19 2:38 AM, Peter Wu wrote:
> bpftool requires libelf, and zlib for decompressing /proc/config.gz.
> zlib is a transitive dependency via libelf, and became mandatory since
> elfutils 0.165 (Jan 2016). The feature check of libelf is already done
> in the elfdep target of tools/lib/bpf/Makefile, pulled in by bpftool via
> a dependency on libbpf.a. Add a similar feature check for zlib.
> 
> Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>

Applied, thanks!

^ permalink raw reply

* [PATCH] net: ieee802154: remove redundant assignment to rc
From: Colin King @ 2019-08-13 14:28 UTC (permalink / raw)
  To: Alexander Aring, Stefan Schmidt, David S . Miller, linux-wpan,
	netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Variable rc is initialized to a value that is never read and it is
re-assigned later. The initialization is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 net/ieee802154/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index dacbd58e1799..badc5cfe4dc6 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -1092,7 +1092,7 @@ static struct packet_type ieee802154_packet_type = {
 
 static int __init af_ieee802154_init(void)
 {
-	int rc = -EINVAL;
+	int rc;
 
 	rc = proto_register(&ieee802154_raw_prot, 1);
 	if (rc)
-- 
2.20.1


^ permalink raw reply related

* Re: kernel BUG at net/rxrpc/local_object.c:LINE!
From: Dmitry Vyukov @ 2019-08-13 14:28 UTC (permalink / raw)
  To: David Howells
  Cc: syzbot, Eric Biggers, David Miller, linux-afs, LKML, netdev,
	syzkaller-bugs
In-Reply-To: <3135.1565706180@warthog.procyon.org.uk>

On Tue, Aug 13, 2019 at 4:23 PM David Howells <dhowells@redhat.com> wrote:
>
> Dmitry Vyukov <dvyukov@google.com> wrote:
>
> > > > Please send a patch for testing that enables this tracing
> > > > unconditionally. This should have the same effect. There is no way to
> > > > hook into a middle of the automated process and arbitrary tune things.
> > >
> > > I don't know how to do that off hand.  Do you have an example?
> >
> > Few messages above I asked it to test:
> > https://groups.google.com/d/msg/syzkaller-bugs/gEnZkmEWf1s/r2_X_KVQAQAJ
> >
> > Basically, git repo + branch + patch. Here are the docs:
> > https://github.com/google/syzkaller/blob/master/docs/syzbot.md#testing-patches
>
> I meant that I don't know how to turn a tracepoint on from inside the kernel.

This /sys/kernel/debug/tracing/events/rxrpc/rxrpc_local/enable in:
        echo 1 > /sys/kernel/debug/tracing/events/rxrpc/rxrpc_local/enable
should map to some global variable, right? If so, it should be
possible to initialize that var to 1 statically. Or that won't work
for some reason?

^ permalink raw reply

* Re: [bpf-next] selftests/bpf: fix race in flow dissector tests
From: Daniel Borkmann @ 2019-08-13 14:32 UTC (permalink / raw)
  To: Petar Penkov, netdev, bpf; +Cc: davem, ast, sdf, Petar Penkov
In-Reply-To: <20190812233039.173067-1-ppenkov.kernel@gmail.com>

On 8/13/19 1:30 AM, Petar Penkov wrote:
> From: Petar Penkov <ppenkov@google.com>
> 
> Since the "last_dissection" map holds only the flow keys for the most
> recent packet, there is a small race in the skb-less flow dissector
> tests if a new packet comes between transmitting the test packet, and
> reading its keys from the map. If this happens, the test packet keys
> will be overwritten and the test will fail.
> 
> Changing the "last_dissection" map to a hash map, keyed on the
> source/dest port pair resolves this issue. Additionally, let's clear the
> last test results from the map between tests to prevent previous test
> cases from interfering with the following test cases.
> 
> Fixes: 0905beec9f52 ("selftests/bpf: run flow dissector tests in skb-less mode")
> Signed-off-by: Petar Penkov <ppenkov@google.com>

Applied, thanks!

^ permalink raw reply

* Re: [PATCH v2] net/mlx4_en: fix a memory leak bug
From: Tariq Toukan @ 2019-08-13 14:39 UTC (permalink / raw)
  To: Wenwen Wang
  Cc: Tariq Toukan, David S. Miller,
	open list:MELLANOX ETHERNET DRIVER (mlx4_en),
	open list:MELLANOX MLX4 core VPI driver, open list
In-Reply-To: <1565637095-7972-1-git-send-email-wenwen@cs.uga.edu>



On 8/12/2019 10:11 PM, Wenwen Wang wrote:
> In mlx4_en_config_rss_steer(), 'rss_map->indir_qp' is allocated through
> kzalloc(). After that, mlx4_qp_alloc() is invoked to configure RSS
> indirection. However, if mlx4_qp_alloc() fails, the allocated
> 'rss_map->indir_qp' is not deallocated, leading to a memory leak bug.
> 
> To fix the above issue, add the 'qp_alloc_err' label to free
> 'rss_map->indir_qp'.
> 
> Fixes: 4931c6ef04b4 ("net/mlx4_en: Optimized single ring steering")
> 
> Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
> ---
>   drivers/net/ethernet/mellanox/mlx4/en_rx.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 6c01314..db3552f 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -1187,7 +1187,7 @@ int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv)
>   	err = mlx4_qp_alloc(mdev->dev, priv->base_qpn, rss_map->indir_qp);
>   	if (err) {
>   		en_err(priv, "Failed to allocate RSS indirection QP\n");
> -		goto rss_err;
> +		goto qp_alloc_err;
>   	}
>   
>   	rss_map->indir_qp->event = mlx4_en_sqp_event;
> @@ -1241,6 +1241,7 @@ int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv)
>   		       MLX4_QP_STATE_RST, NULL, 0, 0, rss_map->indir_qp);
>   	mlx4_qp_remove(mdev->dev, rss_map->indir_qp);
>   	mlx4_qp_free(mdev->dev, rss_map->indir_qp);
> +qp_alloc_err:
>   	kfree(rss_map->indir_qp);
>   	rss_map->indir_qp = NULL;
>   rss_err:
> 

Thanks for your patch.
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>

^ permalink raw reply

* Re: [PATCH net] netdevsim: Restore per-network namespace accounting for fib entries
From: David Ahern @ 2019-08-13 14:41 UTC (permalink / raw)
  To: Jiri Pirko, David Miller; +Cc: dsahern, netdev
In-Reply-To: <20190813071445.GL2428@nanopsycho>

On 8/13/19 1:14 AM, Jiri Pirko wrote:
> Mon, Aug 12, 2019 at 05:28:02PM CEST, davem@davemloft.net wrote:
>> From: Jiri Pirko <jiri@resnulli.us>
>> Date: Mon, 12 Aug 2019 10:36:35 +0200
>>
>>> I understand it with real devices, but dummy testing device, who's
>>> purpose is just to test API. Why?
>>
>> Because you'll break all of the wonderful testing infrastructure
>> people like David have created.
>  
> Are you referring to selftests? There is no such test there :(

I  have one now and will be submitting it after net merges with net-next.

> But if it would be, could implement the limitations
> properly (like using cgroups), change the tests and remove this
> code from netdevsim?

The intent of this code and test is to have a s/w model similar to how
mlxsw works - responding to notifiers and deciding to reject a change.
You are currently adding (or trying to) more devlink based s/w tests, so
you must see the value of netdevsim as a source of testing.

^ permalink raw reply

* KMSAN: uninit-value in ipv6_find_tlv
From: syzbot @ 2019-08-13 14:48 UTC (permalink / raw)
  To: davem, glider, kuznet, linux-kernel, netdev, syzkaller-bugs,
	yoshfuji

Hello,

syzbot found the following crash on:

HEAD commit:    61ccdad1 Revert "drm/bochs: Use shadow buffer for bochs fr..
git tree:       https://github.com/google/kmsan.git master
console output: https://syzkaller.appspot.com/x/log.txt?x=1229a18c600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=27abc558ecb16a3b
dashboard link: https://syzkaller.appspot.com/bug?extid=8257f4dcef79de670baf
compiler:       clang version 9.0.0 (/home/glider/llvm/clang  
80fee25776c2fb61e74c1ecb1a523375c2500b69)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=12480dce600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1659dc4a600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+8257f4dcef79de670baf@syzkaller.appspotmail.com

==================================================================
BUG: KMSAN: uninit-value in ipv6_find_tlv+0x370/0x3c0  
net/ipv6/exthdrs_core.c:147
CPU: 1 PID: 12844 Comm: syz-executor755 Not tainted 5.3.0-rc3+ #17
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x191/0x1f0 lib/dump_stack.c:113
  kmsan_report+0x162/0x2d0 mm/kmsan/kmsan_report.c:109
  __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:294
  ipv6_find_tlv+0x370/0x3c0 net/ipv6/exthdrs_core.c:147
  ip6_find_1stfragopt+0x2b6/0x500 net/ipv6/output_core.c:102
  ip6_fragment+0x275/0x37d0 net/ipv6/ip6_output.c:775
  __ip6_finish_output+0x753/0x8f0 net/ipv6/ip6_output.c:140
  ip6_finish_output+0x2db/0x420 net/ipv6/ip6_output.c:152
  NF_HOOK_COND include/linux/netfilter.h:294 [inline]
  ip6_output+0x5d3/0x720 net/ipv6/ip6_output.c:175
  dst_output include/net/dst.h:436 [inline]
  ip6_local_out+0x164/0x1d0 net/ipv6/output_core.c:179
  ip6_send_skb net/ipv6/ip6_output.c:1791 [inline]
  ip6_push_pending_frames+0x215/0x4f0 net/ipv6/ip6_output.c:1811
  rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
  rawv6_sendmsg+0x40da/0x5b10 net/ipv6/raw.c:954
  inet_sendmsg+0x2d8/0x2e0 net/ipv4/af_inet.c:807
  sock_sendmsg_nosec net/socket.c:637 [inline]
  sock_sendmsg net/socket.c:657 [inline]
  sock_write_iter+0x599/0x650 net/socket.c:989
  call_write_iter include/linux/fs.h:1870 [inline]
  new_sync_write fs/read_write.c:483 [inline]
  __vfs_write+0xa2c/0xcb0 fs/read_write.c:496
  vfs_write+0x481/0x920 fs/read_write.c:558
  ksys_write+0x265/0x430 fs/read_write.c:611
  __do_sys_write fs/read_write.c:623 [inline]
  __se_sys_write+0x92/0xb0 fs/read_write.c:620
  __x64_sys_write+0x4a/0x70 fs/read_write.c:620
  do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:297
  entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x452be9
Code: e8 7c e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 1b 05 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fbf62544ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00000000006efa08 RCX: 0000000000452be9
RDX: 00000000000041a0 RSI: 00000000200001c0 RDI: 0000000000000003
RBP: 00000000006efa00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006efa0c
R13: 00007ffec68fe78f R14: 00007fbf625459c0 R15: 20c49ba5e353f7cf

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:187 [inline]
  kmsan_internal_poison_shadow+0x53/0xa0 mm/kmsan/kmsan.c:146
  kmsan_slab_alloc+0xaa/0x120 mm/kmsan/kmsan_hooks.c:175
  slab_alloc_node mm/slub.c:2790 [inline]
  __kmalloc_node_track_caller+0xb55/0x1320 mm/slub.c:4388
  __kmalloc_reserve net/core/skbuff.c:141 [inline]
  __alloc_skb+0x306/0xa10 net/core/skbuff.c:209
  alloc_skb include/linux/skbuff.h:1056 [inline]
  __ip6_append_data+0x46ad/0x6060 net/ipv6/ip6_output.c:1514
  ip6_append_data+0x3c2/0x650 net/ipv6/ip6_output.c:1683
  rawv6_sendmsg+0x232e/0x5b10 net/ipv6/raw.c:947
  inet_sendmsg+0x2d8/0x2e0 net/ipv4/af_inet.c:807
  sock_sendmsg_nosec net/socket.c:637 [inline]
  sock_sendmsg net/socket.c:657 [inline]
  sock_write_iter+0x599/0x650 net/socket.c:989
  call_write_iter include/linux/fs.h:1870 [inline]
  new_sync_write fs/read_write.c:483 [inline]
  __vfs_write+0xa2c/0xcb0 fs/read_write.c:496
  vfs_write+0x481/0x920 fs/read_write.c:558
  ksys_write+0x265/0x430 fs/read_write.c:611
  __do_sys_write fs/read_write.c:623 [inline]
  __se_sys_write+0x92/0xb0 fs/read_write.c:620
  __x64_sys_write+0x4a/0x70 fs/read_write.c:620
  do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:297
  entry_SYSCALL_64_after_hwframe+0x63/0xe7
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* [patch net-next v2 0/2] netdevsim: implement support for devlink region and snapshots
From: Jiri Pirko @ 2019-08-13 14:48 UTC (permalink / raw)
  To: netdev; +Cc: davem, jakub.kicinski, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

Implement devlink region support for netdevsim and test it.

---
Note the selftest patch depends on "[patch net-next] selftests:
netdevsim: add devlink params tests" patch sent earlier today.

Jiri Pirko (2):
  netdevsim: implement support for devlink region and snapshots
  selftests: netdevsim: add devlink regions tests

 drivers/net/netdevsim/dev.c                   | 66 ++++++++++++++++++-
 drivers/net/netdevsim/netdevsim.h             |  1 +
 .../drivers/net/netdevsim/devlink.sh          | 54 ++++++++++++++-
 3 files changed, 119 insertions(+), 2 deletions(-)

-- 
2.21.0


^ permalink raw reply

* [patch net-next v2 1/2] netdevsim: implement support for devlink region and snapshots
From: Jiri Pirko @ 2019-08-13 14:48 UTC (permalink / raw)
  To: netdev; +Cc: davem, jakub.kicinski, mlxsw
In-Reply-To: <20190813144843.28466-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Implement dummy region of size 32K and allow user to create snapshots
or random data using debugfs file trigger.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/netdevsim/dev.c       | 66 ++++++++++++++++++++++++++++++-
 drivers/net/netdevsim/netdevsim.h |  1 +
 2 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 127aef85dc99..8485dd805f7c 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -27,6 +27,41 @@
 
 static struct dentry *nsim_dev_ddir;
 
+#define NSIM_DEV_DUMMY_REGION_SIZE (1024 * 32)
+
+static ssize_t nsim_dev_take_snapshot_write(struct file *file,
+					    const char __user *data,
+					    size_t count, loff_t *ppos)
+{
+	struct nsim_dev *nsim_dev = file->private_data;
+	void *dummy_data;
+	u32 id;
+	int err;
+
+	dummy_data = kmalloc(NSIM_DEV_DUMMY_REGION_SIZE, GFP_KERNEL);
+	if (!dummy_data) {
+		pr_err("Failed to allocate memory for region snapshot\n");
+		goto out;
+	}
+
+	get_random_bytes(dummy_data, NSIM_DEV_DUMMY_REGION_SIZE);
+
+	id = devlink_region_shapshot_id_get(priv_to_devlink(nsim_dev));
+	err = devlink_region_snapshot_create(nsim_dev->dummy_region,
+					     dummy_data, id, kfree);
+	if (err)
+		pr_err("Failed to create region snapshot\n");
+
+out:
+	return count;
+}
+
+static const struct file_operations nsim_dev_take_snapshot_fops = {
+	.open = simple_open,
+	.write = nsim_dev_take_snapshot_write,
+	.llseek = generic_file_llseek,
+};
+
 static int nsim_dev_debugfs_init(struct nsim_dev *nsim_dev)
 {
 	char dev_ddir_name[16];
@@ -44,6 +79,8 @@ static int nsim_dev_debugfs_init(struct nsim_dev *nsim_dev)
 			   &nsim_dev->max_macs);
 	debugfs_create_bool("test1", 0600, nsim_dev->ddir,
 			    &nsim_dev->test1);
+	debugfs_create_file("take_snapshot", 0200, nsim_dev->ddir, nsim_dev,
+			    &nsim_dev_take_snapshot_fops);
 	return 0;
 }
 
@@ -248,6 +285,26 @@ static void nsim_devlink_param_load_driverinit_values(struct devlink *devlink)
 		nsim_dev->test1 = saved_value.vbool;
 }
 
+#define NSIM_DEV_DUMMY_REGION_SNAPSHOT_MAX 16
+
+static int nsim_dev_dummy_region_init(struct nsim_dev *nsim_dev,
+				      struct devlink *devlink)
+{
+	nsim_dev->dummy_region =
+		devlink_region_create(devlink, "dummy",
+				      NSIM_DEV_DUMMY_REGION_SNAPSHOT_MAX,
+				      NSIM_DEV_DUMMY_REGION_SIZE);
+	if (IS_ERR(nsim_dev->dummy_region))
+		return PTR_ERR(nsim_dev->dummy_region);
+
+	return 0;
+}
+
+static void nsim_dev_dummy_region_exit(struct nsim_dev *nsim_dev)
+{
+	devlink_region_destroy(nsim_dev->dummy_region);
+}
+
 static int nsim_dev_reload(struct devlink *devlink,
 			   struct netlink_ext_ack *extack)
 {
@@ -365,10 +422,14 @@ nsim_dev_create(struct net *net, struct nsim_bus_dev *nsim_bus_dev,
 		goto err_dl_unregister;
 	nsim_devlink_set_params_init_values(nsim_dev, devlink);
 
-	err = nsim_dev_debugfs_init(nsim_dev);
+	err = nsim_dev_dummy_region_init(nsim_dev, devlink);
 	if (err)
 		goto err_params_unregister;
 
+	err = nsim_dev_debugfs_init(nsim_dev);
+	if (err)
+		goto err_dummy_region_exit;
+
 	err = nsim_bpf_dev_init(nsim_dev);
 	if (err)
 		goto err_debugfs_exit;
@@ -378,6 +439,8 @@ nsim_dev_create(struct net *net, struct nsim_bus_dev *nsim_bus_dev,
 
 err_debugfs_exit:
 	nsim_dev_debugfs_exit(nsim_dev);
+err_dummy_region_exit:
+	nsim_dev_dummy_region_exit(nsim_dev);
 err_params_unregister:
 	devlink_params_unregister(devlink, nsim_devlink_params,
 				  ARRAY_SIZE(nsim_devlink_params));
@@ -398,6 +461,7 @@ static void nsim_dev_destroy(struct nsim_dev *nsim_dev)
 
 	nsim_bpf_dev_exit(nsim_dev);
 	nsim_dev_debugfs_exit(nsim_dev);
+	nsim_dev_dummy_region_exit(nsim_dev);
 	devlink_params_unregister(devlink, nsim_devlink_params,
 				  ARRAY_SIZE(nsim_devlink_params));
 	devlink_unregister(devlink);
diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
index ef879892dd6f..521802d429a0 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -163,6 +163,7 @@ struct nsim_dev {
 	bool fw_update_status;
 	u32 max_macs;
 	bool test1;
+	struct devlink_region *dummy_region;
 };
 
 int nsim_dev_init(void);
-- 
2.21.0


^ permalink raw reply related

* [patch net-next v2 2/2] selftests: netdevsim: add devlink regions tests
From: Jiri Pirko @ 2019-08-13 14:48 UTC (permalink / raw)
  To: netdev; +Cc: davem, jakub.kicinski, mlxsw
In-Reply-To: <20190813144843.28466-1-jiri@resnulli.us>

From: Jiri Pirko <jiri@mellanox.com>

Test netdevsim devlink region implementation.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
v1->v2:
- new patch
---
 .../drivers/net/netdevsim/devlink.sh          | 54 ++++++++++++++++++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
index 858ebdc8d8a3..b77fdd046bea 100755
--- a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
+++ b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
@@ -3,7 +3,7 @@
 
 lib_dir=$(dirname $0)/../../../net/forwarding
 
-ALL_TESTS="fw_flash_test params_test"
+ALL_TESTS="fw_flash_test params_test regions_test"
 NUM_NETIFS=0
 source $lib_dir/lib.sh
 
@@ -90,6 +90,58 @@ params_test()
 	log_test "params test"
 }
 
+check_region_size()
+{
+	local name=$1
+	local size
+
+	size=$(devlink region show $DL_HANDLE/$name -j | jq -e -r '.[][].size')
+	check_err $? "Failed to get $name region size"
+	[ $size -eq 32768 ]
+	check_err $? "Invalid $name region size"
+}
+
+check_region_snapshot_count()
+{
+	local name=$1
+	local phase_name=$2
+	local expected_count=$3
+	local count
+
+	count=$(devlink region show $DL_HANDLE/$name -j | jq -e -r '.[][].snapshot | length')
+	[ $count -eq $expected_count ]
+	check_err $? "Unexpected $phase_name snapshot count"
+}
+
+regions_test()
+{
+	RET=0
+
+	local count
+
+	check_region_size dummy
+	check_region_snapshot_count dummy initial 0
+
+	echo ""> $DEBUGFS_DIR/take_snapshot
+	check_err $? "Failed to take first dummy region snapshot"
+	check_region_snapshot_count dummy post-first-snapshot 1
+
+	echo ""> $DEBUGFS_DIR/take_snapshot
+	check_err $? "Failed to take second dummy region snapshot"
+	check_region_snapshot_count dummy post-second-snapshot 2
+
+	echo ""> $DEBUGFS_DIR/take_snapshot
+	check_err $? "Failed to take third dummy region snapshot"
+	check_region_snapshot_count dummy post-third-snapshot 3
+
+	devlink region del $DL_HANDLE/dummy snapshot 1
+	check_err $? "Failed to delete first dummy region snapshot"
+
+	check_region_snapshot_count dummy post-first-delete 2
+
+	log_test "regions test"
+}
+
 setup_prepare()
 {
 	modprobe netdevsim
-- 
2.21.0


^ permalink raw reply related

* Re: kernel BUG at net/rxrpc/local_object.c:LINE!
From: David Howells @ 2019-08-13 15:06 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: dhowells, syzbot, Eric Biggers, David Miller, linux-afs, LKML,
	netdev, syzkaller-bugs
In-Reply-To: <CACT4Y+YCB3o5Ps9RNq9KpMcmGCwBM4R9DeX67prQ9Q3UppGowQ@mail.gmail.com>

Dmitry Vyukov <dvyukov@google.com> wrote:

> > I meant that I don't know how to turn a tracepoint on from inside the kernel.
> 
> This /sys/kernel/debug/tracing/events/rxrpc/rxrpc_local/enable in:
>         echo 1 > /sys/kernel/debug/tracing/events/rxrpc/rxrpc_local/enable
> should map to some global variable, right? If so, it should be
> possible to initialize that var to 1 statically. Or that won't work
> for some reason?

As I understand it, it's all hidden inside of tracing macros and ftrace
infrastructure and involves runtime patching the code to enable tracepoints
(they're effectively NOP'ed out when not in use).

So, no, it's not that simple.

I asked Steven and he says:

	trace_set_clr_event("sched", "sched_switch", 1);

is the same as

	echo 1 > events/sched/sched_switch/enable

So it can be done.  Will syzbot actually collect the trace log?

David

^ permalink raw reply

* Re: KMSAN: uninit-value in smsc75xx_bind
From: Andrey Konovalov @ 2019-08-13 15:08 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: syzbot, David S. Miller, Alexander Potapenko, syzkaller-bugs,
	steve.glendinning, LKML, USB list, netdev
In-Reply-To: <1565700220.7043.8.camel@suse.com>

On Tue, Aug 13, 2019 at 2:43 PM Oliver Neukum <oneukum@suse.com> wrote:
>
> Am Freitag, den 09.08.2019, 01:48 -0700 schrieb syzbot:
> > Hello,
> >
> > syzbot found the following crash on:
> >
> > HEAD commit:    beaab8a3 fix KASAN build
> > git tree:       kmsan
>
> [..]
> > Call Trace:
> >   __dump_stack lib/dump_stack.c:77 [inline]
> >   dump_stack+0x191/0x1f0 lib/dump_stack.c:113
> >   kmsan_report+0x162/0x2d0 mm/kmsan/kmsan_report.c:109
> >   __msan_warning+0x75/0xe0 mm/kmsan/kmsan_instr.c:294
> >   smsc75xx_wait_ready drivers/net/usb/smsc75xx.c:976 [inline]
> >   smsc75xx_bind+0x541/0x12d0 drivers/net/usb/smsc75xx.c:1483
>
> >
> > Local variable description: ----buf.i93@smsc75xx_bind
> > Variable was created at:
> >   __smsc75xx_read_reg drivers/net/usb/smsc75xx.c:83 [inline]
> >   smsc75xx_wait_ready drivers/net/usb/smsc75xx.c:969 [inline]
> >   smsc75xx_bind+0x44c/0x12d0 drivers/net/usb/smsc75xx.c:1483
> >   usbnet_probe+0x10d3/0x3950 drivers/net/usb/usbnet.c:1722
>
> Hi,
>
> this looks like a false positive to me.
> The offending code is likely this:
>
>         if (size) {
>                 buf = kmalloc(size, GFP_KERNEL);
>                 if (!buf)
>                         goto out;
>         }
>
>         err = usb_control_msg(dev->udev, usb_rcvctrlpipe(dev->udev, 0),
>                               cmd, reqtype, value, index, buf, size,
>                               USB_CTRL_GET_TIMEOUT);
>
> which uses 'buf' uninitialized. But it is used for input.
> What is happening here?

AFAICS, the uninitialized use of buf that KMSAN points out is in the
"if (buf & PMT_CTL_DEV_RDY)"  statement in smsc75xx_wait_ready(). Does
__smsc75xx_read_reg/usb_control_msg() always initialize buf? Can it
just initialize the first few bytes for example?

^ permalink raw reply

* Re: kernel BUG at net/rxrpc/local_object.c:LINE!
From: Dmitry Vyukov @ 2019-08-13 15:12 UTC (permalink / raw)
  To: David Howells
  Cc: syzbot, Eric Biggers, David Miller, linux-afs, LKML, netdev,
	syzkaller-bugs
In-Reply-To: <8013.1565708810@warthog.procyon.org.uk>

On Tue, Aug 13, 2019 at 5:06 PM David Howells <dhowells@redhat.com> wrote:
>
> Dmitry Vyukov <dvyukov@google.com> wrote:
>
> > > I meant that I don't know how to turn a tracepoint on from inside the kernel.
> >
> > This /sys/kernel/debug/tracing/events/rxrpc/rxrpc_local/enable in:
> >         echo 1 > /sys/kernel/debug/tracing/events/rxrpc/rxrpc_local/enable
> > should map to some global variable, right? If so, it should be
> > possible to initialize that var to 1 statically. Or that won't work
> > for some reason?
>
> As I understand it, it's all hidden inside of tracing macros and ftrace
> infrastructure and involves runtime patching the code to enable tracepoints
> (they're effectively NOP'ed out when not in use).
>
> So, no, it's not that simple.
>
> I asked Steven and he says:
>
>         trace_set_clr_event("sched", "sched_switch", 1);
>
> is the same as
>
>         echo 1 > events/sched/sched_switch/enable
>
> So it can be done.  Will syzbot actually collect the trace log?

It only collects console output. I don't know what is trace log. If
the trace log is not console output, then it won't.

^ permalink raw reply

* Re: [PATCH 2/2] net: gmii2rgmii: Switch priv field in mdio device structure
From: Harini Katakam @ 2019-08-13 15:13 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Harini Katakam, Florian Fainelli, Heiner Kallweit, David Miller,
	Michal Simek, netdev, linux-arm-kernel, linux-kernel,
	radhey.shyam.pandey
In-Reply-To: <20190813132321.GF15047@lunn.ch>

Hi Andrew,

On Tue, Aug 13, 2019 at 6:54 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Tue, Aug 13, 2019 at 04:46:40PM +0530, Harini Katakam wrote:
> > Hi Andrew,
> >
> > On Thu, Aug 1, 2019 at 9:36 AM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > On Wed, Jul 31, 2019 at 03:06:19PM +0530, Harini Katakam wrote:
> > > > Use the priv field in mdio device structure instead of the one in
> > > > phy device structure. The phy device priv field may be used by the
> > > > external phy driver and should not be overwritten.
> > >
> > > Hi Harini
> > >
> > > I _think_ you could use dev_set_drvdata(&mdiodev->dev) in xgmiitorgmii_probe() and
> > > dev_get_drvdata(&phydev->mdiomdio.dev) in _read_status()
> >
> > Thanks for the review. This works if I do:
> > dev_set_drvdata(&priv->phy_dev->mdio.dev->dev) in probe
> > and then
> > dev_get_drvdata(&phydev->mdio.dev) in _read_status()
> >
> > i.e mdiodev in gmii2rgmii probe and priv->phy_dev->mdio are not the same.
> >
> > If this is acceptable, I can send a v2.
>
> Hi Harini
>
> I think this is better, making use of the central driver
> infrastructure, rather than inventing something new.

Ok sure.

>
> The kernel does have a few helper, spi_get_drvdata, pci_get_drvdata,
> hci_get_drvdata. So maybe had add phydev_get_drvdata(struct phy_device
> *phydev)?

Maybe phydev_mdio_get_drvdata? Because the driver data member available is
phydev->mdio.dev.driver_data.

Regards,
Harini

^ permalink raw reply

* Re: kernel BUG at net/rxrpc/local_object.c:LINE!
From: David Howells @ 2019-08-13 15:29 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: dhowells, syzbot, Eric Biggers, David Miller, linux-afs, LKML,
	netdev, syzkaller-bugs
In-Reply-To: <CACT4Y+YVyaTrwpaZfpfi9LKA=5TOdKSL60pjAH04dMPNCZTMSQ@mail.gmail.com>

Dmitry Vyukov <dvyukov@google.com> wrote:

> It only collects console output. I don't know what is trace log. If
> the trace log is not console output, then it won't.

Assuming the system is still alive:

	cat /sys/kernel/debug/tracing/trace

David

^ permalink raw reply

* Re: [PATCH net] netdevsim: Restore per-network namespace accounting for fib entries
From: Jiri Pirko @ 2019-08-13 15:34 UTC (permalink / raw)
  To: David Ahern; +Cc: David Miller, dsahern, netdev
In-Reply-To: <9306e893-cd43-75a0-9a81-fd2ee0dd44c5@gmail.com>

Tue, Aug 13, 2019 at 04:41:18PM CEST, dsahern@gmail.com wrote:
>On 8/13/19 1:14 AM, Jiri Pirko wrote:
>> Mon, Aug 12, 2019 at 05:28:02PM CEST, davem@davemloft.net wrote:
>>> From: Jiri Pirko <jiri@resnulli.us>
>>> Date: Mon, 12 Aug 2019 10:36:35 +0200
>>>
>>>> I understand it with real devices, but dummy testing device, who's
>>>> purpose is just to test API. Why?
>>>
>>> Because you'll break all of the wonderful testing infrastructure
>>> people like David have created.
>>  
>> Are you referring to selftests? There is no such test there :(
>
>I  have one now and will be submitting it after net merges with net-next.
>
>> But if it would be, could implement the limitations
>> properly (like using cgroups), change the tests and remove this
>> code from netdevsim?
>
>The intent of this code and test is to have a s/w model similar to how
>mlxsw works - responding to notifiers and deciding to reject a change.
>You are currently adding (or trying to) more devlink based s/w tests, so
>you must see the value of netdevsim as a source of testing.

Sure I do. Not sure makes sence to repeat myself again, but why not:
The way you use netdevsim with netnamespace limitation is nothing like
it is done in hardware. Devlink resources should limit the resources of
the device, not network namespace. You abused netdevsim and devlink for
that. Not cool :(

To be in sync with mlxsw, netdevsim should track fibs added to the ports
and apply the resource limitations to that. That is the correct
behaviour. Exacly like mlxsw does.

Frankly I don't really understand why you keep pushing your broken
design. Why the limitation applied only for fibs related to netdevsim
ports is not enough for testing??? Would that work for you? Please?

This is keeping me awake at night. Sigh :(

^ permalink raw reply

* Re: [PATCH 2/2] net: gmii2rgmii: Switch priv field in mdio device structure
From: Andrew Lunn @ 2019-08-13 15:38 UTC (permalink / raw)
  To: Harini Katakam
  Cc: Harini Katakam, Florian Fainelli, Heiner Kallweit, David Miller,
	Michal Simek, netdev, linux-arm-kernel, linux-kernel,
	radhey.shyam.pandey
In-Reply-To: <CAFcVECKipjD9atgEJSf8j78q_1aOAX77nD6vVeytZ-M00qBt6A@mail.gmail.com>

> > The kernel does have a few helper, spi_get_drvdata, pci_get_drvdata,
> > hci_get_drvdata. So maybe had add phydev_get_drvdata(struct phy_device
> > *phydev)?
> 
> Maybe phydev_mdio_get_drvdata? Because the driver data member available is
> phydev->mdio.dev.driver_data.

I still prefer phydev_get_drvdata(). It fits with the X_get_drvdata()
pattern, where X is the type of parameter passed to the call, spi,
pci, hci.

We can also add mdiodev_get_drvdata(mdiodev). A few DSA drivers could
use that.

   Andrew

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: dsa: mv88e6xxx: fix RGMII-ID port setup
From: Marek Behún @ 2019-08-13 15:44 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Heiner Kallweit, Sebastian Reichel, Vivien Didelot,
	Florian Fainelli, David S . Miller
In-Reply-To: <20190811165108.GG14290@lunn.ch>

Hi Andrew,

> We should read the switch registers. I think you can set the defaults
> using strapping pins. And in general, the driver always reads state
> from the hardware rather than caching it.

hmm. The cmode is cached for each port, though. For example
mv88e6390x_port_set_cmode compares the new requested value with the
cached one and doesn't do anything if they are equal.

If mv88e6xxx_port_setup_mac can be called once per second by phylink as
you say, do we really want to read the value via MDIO every time? We
already have cmode cached (read from registers at mv88e6xxx_setup, and
then changed when cmode change is requested). From cmode we can already
differentiate mode in the terms of phy_interface_t, unless it is RGMII,
in which case we would have to read RX/TX timing.

Marek

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: dsa: mv88e6xxx: fix RGMII-ID port setup
From: Marek Behún @ 2019-08-13 15:51 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, Heiner Kallweit, Sebastian Reichel, Vivien Didelot,
	Florian Fainelli, David S . Miller
In-Reply-To: <20190813174416.5c57b08f@dellmb.labs.office.nic.cz>

On Tue, 13 Aug 2019 17:44:16 +0200
Marek Behún <marek.behun@nic.cz> wrote:

> Hi Andrew,
> 
> > We should read the switch registers. I think you can set the
> > defaults using strapping pins. And in general, the driver always
> > reads state from the hardware rather than caching it.  
> 
> hmm. The cmode is cached for each port, though. For example
> mv88e6390x_port_set_cmode compares the new requested value with the
> cached one and doesn't do anything if they are equal.
> 
> If mv88e6xxx_port_setup_mac can be called once per second by phylink
> as you say, do we really want to read the value via MDIO every time?
> We already have cmode cached (read from registers at mv88e6xxx_setup,
> and then changed when cmode change is requested). From cmode we can
> already differentiate mode in the terms of phy_interface_t, unless it
> is RGMII, in which case we would have to read RX/TX timing.
> 
> Marek

/o\ OK. I see now that mv88e6xxx_port_setup_mac already calls
->port_link_state(), which fills in a struct phylink_link_state, and
already does MDIO communication. Sorry :)
I will try to send a patch which adds the filling of the ->interface
member of the struct phylink_link_state in ->port_link_state() method.

Marek

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: dsa: mv88e6xxx: fix RGMII-ID port setup
From: Andrew Lunn @ 2019-08-13 15:52 UTC (permalink / raw)
  To: Marek Behún
  Cc: netdev, Heiner Kallweit, Sebastian Reichel, Vivien Didelot,
	Florian Fainelli, David S . Miller
In-Reply-To: <20190813174416.5c57b08f@dellmb.labs.office.nic.cz>

On Tue, Aug 13, 2019 at 05:44:16PM +0200, Marek Behún wrote:
> Hi Andrew,
> 
> > We should read the switch registers. I think you can set the defaults
> > using strapping pins. And in general, the driver always reads state
> > from the hardware rather than caching it.
> 
> hmm. The cmode is cached for each port, though. For example
> mv88e6390x_port_set_cmode compares the new requested value with the
> cached one and doesn't do anything if they are equal.
> 
> If mv88e6xxx_port_setup_mac can be called once per second by phylink as
> you say, do we really want to read the value via MDIO every time? We
> already have cmode cached (read from registers at mv88e6xxx_setup, and
> then changed when cmode change is requested). From cmode we can already
> differentiate mode in the terms of phy_interface_t, unless it is RGMII,
> in which case we would have to read RX/TX timing.

Hi Marek

cmode gets used a lot, and in interrupt thread context. So i think it
was worth caching it. RGMII Rx/Tx timing is not used much, so i don't
think it is worth caching it. But as you say, using cmode to determine
if the registers actually need to be read does make sense. Most ports
don't use RGMII, they have internal PHYs.

      Andrew

^ permalink raw reply

* Re: [PATCH net-next v2 6/9] net: macsec: hardware offloading infrastructure
From: Igor Russkikh @ 2019-08-13 16:18 UTC (permalink / raw)
  To: Andrew Lunn, Antoine Tenart
  Cc: davem@davemloft.net, sd@queasysnail.net, f.fainelli@gmail.com,
	hkallweit1@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, thomas.petazzoni@bootlin.com,
	alexandre.belloni@bootlin.com, allan.nielsen@microchip.com,
	camelia.groza@nxp.com, Simon Edelhaus, Pavel Belous
In-Reply-To: <20190813131706.GE15047@lunn.ch>



On 13.08.2019 16:17, Andrew Lunn wrote:
> On Tue, Aug 13, 2019 at 10:58:17AM +0200, Antoine Tenart wrote:
>> I think this question is linked to the use of a MACsec virtual interface
>> when using h/w offloading. The starting point for me was that I wanted
>> to reuse the data structures and the API exposed to the userspace by the
>> s/w implementation of MACsec. I then had two choices: keeping the exact
>> same interface for the user (having a virtual MACsec interface), or
>> registering the MACsec genl ops onto the real net devices (and making
>> the s/w implementation a virtual net dev and a provider of the MACsec
>> "offloading" ops).
>>
>> The advantages of the first option were that nearly all the logic of the
>> s/w implementation could be kept and especially that it would be
>> transparent for the user to use both implementations of MACsec.
> 
> Hi Antoine
> 
> We have always talked about offloading operations to the hardware,
> accelerating what the linux stack can do by making use of hardware
> accelerators. The basic user API should not change because of
> acceleration. Those are the general guidelines.
> 
> It would however be interesting to get comments from those who did the
> software implementation and what they think of this architecture. I've
> no personal experience with MACSec, so it is hard for me to say if the
> current architecture makes sense when using accelerators.

In terms of overall concepts, I'd add the following:

1) With current implementation it's impossible to install SW macsec engine onto
the device which supports HW offload. That could be a strong limitation in
cases when user sees HW macsec offload is broken or work differently, and he/she
wants to replace it with SW one.
MACSec is a complex feature, and it may happen something is missing in HW.
Trivial example is 256bit encryption, which is not always a musthave in HW
implementations.

2) I think, Antoine, its not totally true that otherwise the user macsec API
will be broken/changed. netlink api is the same, the only thing we may want to
add is an optional parameter to force selection of SW macsec engine.

I'm also eager to hear from sw macsec users/devs on whats better here.


Regards,
  Igor


^ permalink raw reply

* [PATCH bpf-next v3 0/4] bpf: support cloning sk storage on accept()
From: Stanislav Fomichev @ 2019-08-13 16:26 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau,
	Yonghong Song

Currently there is no way to propagate sk storage from the listener
socket to a newly accepted one. Consider the following use case:

        fd = socket();
        setsockopt(fd, SOL_IP, IP_TOS,...);
        /* ^^^ setsockopt BPF program triggers here and saves something
         * into sk storage of the listener.
         */
        listen(fd, ...);
        while (client = accept(fd)) {
                /* At this point all association between listener
                 * socket and newly accepted one is gone. New
                 * socket will not have any sk storage attached.
                 */
        }

Let's add new BPF_F_CLONE flag that can be specified when creating
a socket storage map. This new flag indicates that map contents
should be cloned when the socket is cloned.

v3:
* make sure BPF_F_NO_PREALLOC is always present when creating
  a map (Martin KaFai Lau)
* don't call bpf_sk_storage_free explicitly, rely on
  sk_free_unlock_clone to do the cleanup (Martin KaFai Lau)

v2:
* remove spinlocks around selem_link_map/sk (Martin KaFai Lau)
* BPF_F_CLONE on a map, not selem (Martin KaFai Lau)
* hold a map while cloning (Martin KaFai Lau)
* use BTF maps in selftests (Yonghong Song)
* do proper cleanup selftests; don't call close(-1) (Yonghong Song)
* export bpf_map_inc_not_zero

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>

Stanislav Fomichev (4):
  bpf: export bpf_map_inc_not_zero
  bpf: support cloning sk storage on accept()
  bpf: sync bpf.h to tools/
  selftests/bpf: add sockopt clone/inheritance test

 include/linux/bpf.h                           |   2 +
 include/net/bpf_sk_storage.h                  |  10 +
 include/uapi/linux/bpf.h                      |   3 +
 kernel/bpf/syscall.c                          |  16 +-
 net/core/bpf_sk_storage.c                     | 103 ++++++-
 net/core/sock.c                               |   9 +-
 tools/include/uapi/linux/bpf.h                |   3 +
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   3 +-
 .../selftests/bpf/progs/sockopt_inherit.c     |  97 +++++++
 .../selftests/bpf/test_sockopt_inherit.c      | 253 ++++++++++++++++++
 11 files changed, 490 insertions(+), 10 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/sockopt_inherit.c
 create mode 100644 tools/testing/selftests/bpf/test_sockopt_inherit.c

-- 
2.23.0.rc1.153.gdeed80330f-goog

^ permalink raw reply

* [PATCH bpf-next v3 1/4] bpf: export bpf_map_inc_not_zero
From: Stanislav Fomichev @ 2019-08-13 16:26 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau,
	Yonghong Song
In-Reply-To: <20190813162630.124544-1-sdf@google.com>

Rename existing bpf_map_inc_not_zero to __bpf_map_inc_not_zero to
indicate that it's caller's responsibility to do proper locking.
Create and export bpf_map_inc_not_zero wrapper that properly
locks map_idr_lock. Will be used in the next commit to
hold a map while cloning a socket.

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf.h  |  2 ++
 kernel/bpf/syscall.c | 16 +++++++++++++---
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f9a506147c8a..15ae49862b82 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -647,6 +647,8 @@ void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
 struct bpf_map *bpf_map_get_with_uref(u32 ufd);
 struct bpf_map *__bpf_map_get(struct fd f);
 struct bpf_map * __must_check bpf_map_inc(struct bpf_map *map, bool uref);
+struct bpf_map * __must_check bpf_map_inc_not_zero(struct bpf_map *map,
+						   bool uref);
 void bpf_map_put_with_uref(struct bpf_map *map);
 void bpf_map_put(struct bpf_map *map);
 int bpf_map_charge_memlock(struct bpf_map *map, u32 pages);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5d141f16f6fa..cf8052b016e7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -683,8 +683,8 @@ struct bpf_map *bpf_map_get_with_uref(u32 ufd)
 }
 
 /* map_idr_lock should have been held */
-static struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map,
-					    bool uref)
+static struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map,
+					      bool uref)
 {
 	int refold;
 
@@ -704,6 +704,16 @@ static struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map,
 	return map;
 }
 
+struct bpf_map *bpf_map_inc_not_zero(struct bpf_map *map, bool uref)
+{
+	spin_lock_bh(&map_idr_lock);
+	map = __bpf_map_inc_not_zero(map, uref);
+	spin_unlock_bh(&map_idr_lock);
+
+	return map;
+}
+EXPORT_SYMBOL_GPL(bpf_map_inc_not_zero);
+
 int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value)
 {
 	return -ENOTSUPP;
@@ -2177,7 +2187,7 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
 	spin_lock_bh(&map_idr_lock);
 	map = idr_find(&map_idr, id);
 	if (map)
-		map = bpf_map_inc_not_zero(map, true);
+		map = __bpf_map_inc_not_zero(map, true);
 	else
 		map = ERR_PTR(-ENOENT);
 	spin_unlock_bh(&map_idr_lock);
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related

* [PATCH bpf-next v3 2/4] bpf: support cloning sk storage on accept()
From: Stanislav Fomichev @ 2019-08-13 16:26 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau,
	Yonghong Song
In-Reply-To: <20190813162630.124544-1-sdf@google.com>

Add new helper bpf_sk_storage_clone which optionally clones sk storage
and call it from sk_clone_lock.

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/net/bpf_sk_storage.h |  10 ++++
 include/uapi/linux/bpf.h     |   3 +
 net/core/bpf_sk_storage.c    | 103 ++++++++++++++++++++++++++++++++++-
 net/core/sock.c              |   9 ++-
 4 files changed, 119 insertions(+), 6 deletions(-)

diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h
index b9dcb02e756b..8e4f831d2e52 100644
--- a/include/net/bpf_sk_storage.h
+++ b/include/net/bpf_sk_storage.h
@@ -10,4 +10,14 @@ void bpf_sk_storage_free(struct sock *sk);
 extern const struct bpf_func_proto bpf_sk_storage_get_proto;
 extern const struct bpf_func_proto bpf_sk_storage_delete_proto;
 
+#ifdef CONFIG_BPF_SYSCALL
+int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk);
+#else
+static inline int bpf_sk_storage_clone(const struct sock *sk,
+				       struct sock *newsk)
+{
+	return 0;
+}
+#endif
+
 #endif /* _BPF_SK_STORAGE_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4393bd4b2419..0ef594ac3899 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -337,6 +337,9 @@ enum bpf_attach_type {
 #define BPF_F_RDONLY_PROG	(1U << 7)
 #define BPF_F_WRONLY_PROG	(1U << 8)
 
+/* Clone map from listener for newly accepted socket */
+#define BPF_F_CLONE		(1U << 9)
+
 /* flags for BPF_PROG_QUERY */
 #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
 
diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c
index 94c7f77ecb6b..1bc7de7e18ba 100644
--- a/net/core/bpf_sk_storage.c
+++ b/net/core/bpf_sk_storage.c
@@ -12,6 +12,9 @@
 
 static atomic_t cache_idx;
 
+#define SK_STORAGE_CREATE_FLAG_MASK					\
+	(BPF_F_NO_PREALLOC | BPF_F_CLONE)
+
 struct bucket {
 	struct hlist_head list;
 	raw_spinlock_t lock;
@@ -209,7 +212,6 @@ static void selem_unlink_sk(struct bpf_sk_storage_elem *selem)
 		kfree_rcu(sk_storage, rcu);
 }
 
-/* sk_storage->lock must be held and sk_storage->list cannot be empty */
 static void __selem_link_sk(struct bpf_sk_storage *sk_storage,
 			    struct bpf_sk_storage_elem *selem)
 {
@@ -509,7 +511,7 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map)
 	return 0;
 }
 
-/* Called by __sk_destruct() */
+/* Called by __sk_destruct() & bpf_sk_storage_clone() */
 void bpf_sk_storage_free(struct sock *sk)
 {
 	struct bpf_sk_storage_elem *selem;
@@ -557,6 +559,11 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
 
 	smap = (struct bpf_sk_storage_map *)map;
 
+	/* Note that this map might be concurrently cloned from
+	 * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone
+	 * RCU read section to finish before proceeding. New RCU
+	 * read sections should be prevented via bpf_map_inc_not_zero.
+	 */
 	synchronize_rcu();
 
 	/* bpf prog and the userspace can no longer access this map
@@ -601,7 +608,9 @@ static void bpf_sk_storage_map_free(struct bpf_map *map)
 
 static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr)
 {
-	if (attr->map_flags != BPF_F_NO_PREALLOC || attr->max_entries ||
+	if (attr->map_flags & ~SK_STORAGE_CREATE_FLAG_MASK ||
+	    !(attr->map_flags & BPF_F_NO_PREALLOC) ||
+	    attr->max_entries ||
 	    attr->key_size != sizeof(int) || !attr->value_size ||
 	    /* Enforce BTF for userspace sk dumping */
 	    !attr->btf_key_type_id || !attr->btf_value_type_id)
@@ -739,6 +748,94 @@ static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key)
 	return err;
 }
 
+static struct bpf_sk_storage_elem *
+bpf_sk_storage_clone_elem(struct sock *newsk,
+			  struct bpf_sk_storage_map *smap,
+			  struct bpf_sk_storage_elem *selem)
+{
+	struct bpf_sk_storage_elem *copy_selem;
+
+	copy_selem = selem_alloc(smap, newsk, NULL, true);
+	if (!copy_selem)
+		return NULL;
+
+	if (map_value_has_spin_lock(&smap->map))
+		copy_map_value_locked(&smap->map, SDATA(copy_selem)->data,
+				      SDATA(selem)->data, true);
+	else
+		copy_map_value(&smap->map, SDATA(copy_selem)->data,
+			       SDATA(selem)->data);
+
+	return copy_selem;
+}
+
+int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk)
+{
+	struct bpf_sk_storage *new_sk_storage = NULL;
+	struct bpf_sk_storage *sk_storage;
+	struct bpf_sk_storage_elem *selem;
+	int ret;
+
+	RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
+
+	rcu_read_lock();
+	sk_storage = rcu_dereference(sk->sk_bpf_storage);
+
+	if (!sk_storage || hlist_empty(&sk_storage->list))
+		goto out;
+
+	hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) {
+		struct bpf_sk_storage_elem *copy_selem;
+		struct bpf_sk_storage_map *smap;
+		struct bpf_map *map;
+
+		smap = rcu_dereference(SDATA(selem)->smap);
+		if (!(smap->map.map_flags & BPF_F_CLONE))
+			continue;
+
+		map = bpf_map_inc_not_zero(&smap->map, false);
+		if (IS_ERR(map))
+			continue;
+
+		copy_selem = bpf_sk_storage_clone_elem(newsk, smap, selem);
+		if (!copy_selem) {
+			ret = -ENOMEM;
+			bpf_map_put(map);
+			goto err;
+		}
+
+		if (new_sk_storage) {
+			selem_link_map(smap, copy_selem);
+			__selem_link_sk(new_sk_storage, copy_selem);
+		} else {
+			ret = sk_storage_alloc(newsk, smap, copy_selem);
+			if (ret) {
+				kfree(copy_selem);
+				atomic_sub(smap->elem_size,
+					   &newsk->sk_omem_alloc);
+				bpf_map_put(map);
+				goto err;
+			}
+
+			new_sk_storage = rcu_dereference(copy_selem->sk_storage);
+		}
+		bpf_map_put(map);
+	}
+
+out:
+	rcu_read_unlock();
+	return 0;
+
+err:
+	rcu_read_unlock();
+
+	/* Don't free anything explicitly here, caller is responsible to
+	 * call bpf_sk_storage_free in case of an error.
+	 */
+
+	return ret;
+}
+
 BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk,
 	   void *, value, u64, flags)
 {
diff --git a/net/core/sock.c b/net/core/sock.c
index d57b0cc995a0..f5e801a9cea4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1851,9 +1851,12 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 			goto out;
 		}
 		RCU_INIT_POINTER(newsk->sk_reuseport_cb, NULL);
-#ifdef CONFIG_BPF_SYSCALL
-		RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL);
-#endif
+
+		if (bpf_sk_storage_clone(sk, newsk)) {
+			sk_free_unlock_clone(newsk);
+			newsk = NULL;
+			goto out;
+		}
 
 		newsk->sk_err	   = 0;
 		newsk->sk_err_soft = 0;
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related

* [PATCH bpf-next v3 3/4] bpf: sync bpf.h to tools/
From: Stanislav Fomichev @ 2019-08-13 16:26 UTC (permalink / raw)
  To: netdev, bpf
  Cc: davem, ast, daniel, Stanislav Fomichev, Martin KaFai Lau,
	Yonghong Song
In-Reply-To: <20190813162630.124544-1-sdf@google.com>

Sync new sk storage clone flag.

Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/include/uapi/linux/bpf.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 4393bd4b2419..0ef594ac3899 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -337,6 +337,9 @@ enum bpf_attach_type {
 #define BPF_F_RDONLY_PROG	(1U << 7)
 #define BPF_F_WRONLY_PROG	(1U << 8)
 
+/* Clone map from listener for newly accepted socket */
+#define BPF_F_CLONE		(1U << 9)
+
 /* flags for BPF_PROG_QUERY */
 #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
 
-- 
2.23.0.rc1.153.gdeed80330f-goog


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox