Netdev List
 help / color / mirror / Atom feed
* lockup in hacked 4.20.17+ kernel, maybe addrconf_verify_work related?
From: Ben Greear @ 2019-07-10 18:51 UTC (permalink / raw)
  To: netdev

Hello,

This is with my hacked-upon kernel, could be my mistake, etc.  But, curious
if anyone else has seen a similar deadlock?  I was running a complicated automated
wifi test, for what that is worth.

66707.212081] ath: regdomain 0x8348 dynamically updated by user
[67044.625948] INFO: task kworker/0:0:27387 blocked for more than 180 seconds.
[67044.631831]       Tainted: G        W  O      4.20.17+ #30
[67044.636180] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[67044.647424] kworker/0:0     D    0 27387      2 0x80000080
[67044.647447] Workqueue: ipv6_addrconf addrconf_verify_work [ipv6]
[67044.647448] Call Trace:
[67044.647455]  ? __schedule+0x29e/0x880
[67044.647457]  schedule+0x2a/0x80
[67044.647459]  schedule_preempt_disabled+0xc/0x20
[67044.647460]  __mutex_lock.isra.10+0x2e7/0x4f0
[67044.647468]  addrconf_verify_work+0x5/0x10 [ipv6]
[67044.647472]  process_one_work+0x1f3/0x420
[67044.647473]  worker_thread+0x28/0x3c0
[67044.647475]  ? process_one_work+0x420/0x420
[67044.647476]  kthread+0x10b/0x130
[67044.647478]  ? kthread_create_on_node+0x60/0x60
[67044.647480]  ret_from_fork+0x1f/0x30
[67044.647491] INFO: task DNS Resolver #6:19810 blocked for more than 180 seconds.
[67044.656865]       Tainted: G        W  O      4.20.17+ #30
[67044.661364] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[67044.670861] DNS Resolver #6 D    0 19810  19213 0x00000080
[67044.670863] Call Trace:
[67044.670870]  ? __schedule+0x29e/0x880
[67044.670872]  schedule+0x2a/0x80
[67044.670874]  schedule_preempt_disabled+0xc/0x20
[67044.670876]  __mutex_lock.isra.10+0x2e7/0x4f0
[67044.670879]  ? netlink_lookup+0x111/0x160
[67044.670881]  __netlink_dump_start+0x4f/0x1d0
[67044.670883]  ? rtnl_xdp_prog_skb+0x60/0x60
[67044.670885]  rtnetlink_rcv_msg+0x25c/0x390
[67044.670886]  ? rtnl_xdp_prog_skb+0x60/0x60
[67044.670888]  ? rtnl_calcit.isra.31+0x110/0x110
[67044.670889]  netlink_rcv_skb+0x44/0x120
[67044.670891]  netlink_unicast+0x18b/0x220
[67044.670893]  netlink_sendmsg+0x1ff/0x3d0
[67044.670896]  sock_sendmsg+0x2b/0x40
[67044.670898]  __sys_sendto+0xe9/0x150
[67044.670901]  ? __audit_syscall_exit+0x216/0x280
[67044.670903]  __x64_sys_sendto+0x1f/0x30
[67044.670906]  do_syscall_64+0x4a/0xf0
[67044.670909]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[67044.670911] RIP: 0033:0x7f94b119b6b0
[67044.670915] Code: Bad RIP value.
[67044.670915] RSP: 002b:00007f94817942e0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[67044.670917] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f94b119b6b0
[67044.670917] RDX: 0000000000000014 RSI: 00007f9481795420 RDI: 000000000000004d
[67044.670918] RBP: 00007f9481795420 R08: 00007f94817953c4 R09: 000000000000000c
[67044.670919] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000014
[67044.670919] R13: 00007f94817953c4 R14: 00007f947ff42208 R15: 000000000000004d
[67044.670933] INFO: task kworker/0:2:17500 blocked for more than 180 seconds.
[67044.678868]       Tainted: G        W  O      4.20.17+ #30
[67044.685860] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[67044.692746] kworker/0:2     D    0 17500      2 0x80000080
[67044.692787] Workqueue: events_power_efficient reg_check_chans_work [cfg80211]
[67044.692788] Call Trace:
[67044.692795]  ? __schedule+0x29e/0x880
[67044.692797]  schedule+0x2a/0x80
[67044.692799]  schedule_preempt_disabled+0xc/0x20
[67044.692800]  __mutex_lock.isra.10+0x2e7/0x4f0
[67044.692810]  reg_check_chans_work+0x28/0x350 [cfg80211]
[67044.692815]  process_one_work+0x1f3/0x420
[67044.692817]  worker_thread+0x28/0x3c0
[67044.692819]  ? process_one_work+0x420/0x420
[67044.692821]  kthread+0x10b/0x130
[67044.692822]  ? kthread_create_on_node+0x60/0x60
[67044.692825]  ret_from_fork+0x1f/0x30
[67044.692833] INFO: task iw:1488 blocked for more than 180 seconds.
[67044.700860]       Tainted: G        W  O      4.20.17+ #30
[67044.705216] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[67044.714862] iw              D    0  1488   1487 0x00000080
[67044.714864] Call Trace:
[67044.714871]  ? __schedule+0x29e/0x880
[67044.714874]  schedule+0x2a/0x80
[67044.714876]  schedule_preempt_disabled+0xc/0x20
[67044.714877]  __mutex_lock.isra.10+0x2e7/0x4f0
[67044.714915]  nl80211_dump_wiphy+0x1d/0x1b0 [cfg80211]
[67044.714919]  genl_lock_dumpit+0x23/0x40
[67044.714921]  netlink_dump+0x16d/0x360
[67044.714923]  __netlink_dump_start+0x168/0x1d0
[67044.714925]  genl_family_rcv_msg+0x25f/0x3a0
[67044.714927]  ? genl_lock_dumpit+0x40/0x40
[67044.714928]  ? genl_lock_done+0x40/0x40
[67044.714929]  ? genl_unlock+0x10/0x10
[67044.714931]  ? netlink_unicast+0x1ff/0x220
[67044.714932]  genl_rcv_msg+0x42/0x87
[67044.714934]  ? genl_family_rcv_msg+0x3a0/0x3a0
[67044.714935]  netlink_rcv_skb+0x44/0x120
[67044.714937]  genl_rcv+0x1f/0x30
[67044.714939]  netlink_unicast+0x18b/0x220
[67044.714940]  netlink_sendmsg+0x1ff/0x3d0
[67044.714944]  sock_sendmsg+0x2b/0x40
[67044.714946]  ___sys_sendmsg+0x28a/0x2f0
[67044.714947]  ? ___sys_recvmsg+0x156/0x1d0
[67044.714950]  ? __alloc_pages_nodemask+0x111/0x280
[67044.714954]  ? alloc_pages_vma+0x6f/0x1c0
[67044.714957]  ? page_add_new_anon_rmap+0x72/0xb0
[67044.714958]  ? __handle_mm_fault+0x7db/0x12c0
[67044.714961]  __sys_sendmsg+0x52/0xa0
[67044.714964]  do_syscall_64+0x4a/0xf0
[67044.714967]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[67044.714969] RIP: 0033:0x7fa9c4af15a7
[67044.714972] Code: Bad RIP value.
[67044.714973] RSP: 002b:00007fffdd7ac818 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[67044.714974] RAX: ffffffffffffffda RBX: 00000000021ae990 RCX: 00007fa9c4af15a7
[67044.714975] RDX: 0000000000000000 RSI: 00007fffdd7ac8b0 RDI: 0000000000000008
[67044.714976] RBP: 00000000021b3d80 R08: 0000000000000004 R09: 00007fa9c4dabf20
[67044.714976] R10: 0000000000000170 R11: 0000000000000246 R12: 00000000021b3ec0
[67044.714977] R13: 00007fffdd7ac8b0 R14: 00000000021b3ec0 R15: 00007fffdd7acb18
[67044.714980] INFO: task sshd:1763 blocked for more than 180 seconds.
[67044.720810]       Tainted: G        W  O      4.20.17+ #30
[67044.725186] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[67044.732027] sshd            D    0  1763   1355 0x00000080
[67044.732029] Call Trace:
[67044.732038]  ? __schedule+0x29e/0x880
[67044.732040]  schedule+0x2a/0x80
[67044.732042]  schedule_preempt_disabled+0xc/0x20
[67044.732043]  __mutex_lock.isra.10+0x2e7/0x4f0
[67044.732046]  ? netlink_lookup+0x111/0x160
[67044.732048]  __netlink_dump_start+0x4f/0x1d0
[67044.732051]  ? rtnl_xdp_prog_skb+0x60/0x60
[67044.732052]  rtnetlink_rcv_msg+0x25c/0x390
[67044.732054]  ? rtnl_xdp_prog_skb+0x60/0x60
[67044.732055]  ? rtnl_calcit.isra.31+0x110/0x110
[67044.732057]  netlink_rcv_skb+0x44/0x120
[67044.732059]  netlink_unicast+0x18b/0x220
[67044.732060]  netlink_sendmsg+0x1ff/0x3d0
[67044.732064]  sock_sendmsg+0x2b/0x40
[67044.732066]  __sys_sendto+0xe9/0x150
[67044.732070]  ? __audit_syscall_exit+0x216/0x280
[67044.732071]  __x64_sys_sendto+0x1f/0x30
[67044.732075]  do_syscall_64+0x4a/0xf0
[67044.732077]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[67044.732079] RIP: 0033:0x7f16e29c765a
[67044.732082] Code: Bad RIP value.
[67044.732083] RSP: 002b:00007ffe57e52e88 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[67044.732084] RAX: ffffffffffffffda RBX: 00007ffe57e53f80 RCX: 00007f16e29c765a
[67044.732085] RDX: 0000000000000014 RSI: 00007ffe57e53f80 RDI: 0000000000000003
[67044.732085] RBP: 00007ffe57e53fd0 R08: 00007ffe57e53f24 R09: 000000000000000c
[67044.732086] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffe57e53f24
[67044.732087] R13: 00007ffe57e54160 R14: 0000000000000000 R15: 0000000000000003

Thanks,
Ben
-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply

* [PATCH] net/mlx5e: Move priv variable into case statement in mlx5e_setup_tc
From: Nathan Chancellor @ 2019-07-10 19:05 UTC (permalink / raw)
  To: Saeed Mahameed, Leon Romanovsky, David S. Miller
  Cc: netdev, linux-rdma, linux-kernel, clang-built-linux,
	Nathan Chancellor

There is an unused variable warning on arm64 defconfig when
CONFIG_MLX5_ESWITCH is unset:

drivers/net/ethernet/mellanox/mlx5/core/en_main.c:3467:21: warning:
unused variable 'priv' [-Wunused-variable]
        struct mlx5e_priv *priv = netdev_priv(dev);
                           ^
1 warning generated.

Move it down into the case statement where it is used.

Fixes: 4e95bc268b91 ("net: flow_offload: add flow_block_cb_setup_simple()")
Link: https://github.com/ClangBuiltLinux/linux/issues/597
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 6d0ae87c8ded..651eb714eb5b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3464,15 +3464,16 @@ static LIST_HEAD(mlx5e_block_cb_list);
 static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
 			  void *type_data)
 {
-	struct mlx5e_priv *priv = netdev_priv(dev);
-
 	switch (type) {
 #ifdef CONFIG_MLX5_ESWITCH
-	case TC_SETUP_BLOCK:
+	case TC_SETUP_BLOCK: {
+		struct mlx5e_priv *priv = netdev_priv(dev);
+
 		return flow_block_cb_setup_simple(type_data,
 						  &mlx5e_block_cb_list,
 						  mlx5e_setup_tc_block_cb,
 						  priv, priv, true);
+	}
 #endif
 	case TC_SETUP_QDISC_MQPRIO:
 		return mlx5e_setup_tc_mqprio(dev, type_data);
-- 
2.22.0


^ permalink raw reply related

* Re: [PATCH bpf-next v9 0/2] bpf: Allow bpf_skb_event_output for more prog types
From: Andrii Nakryiko @ 2019-07-10 19:05 UTC (permalink / raw)
  To: Allan Zhang
  Cc: Networking, bpf, Song Liu, Daniel Borkmann, Alexei Starovoitov
In-Reply-To: <20190710181811.127374-1-allanzhang@google.com>

On Wed, Jul 10, 2019 at 11:18 AM Allan Zhang <allanzhang@google.com> wrote:
>
> Software event output is only enabled by a few prog types right now (TC,
> LWT out, XDP, sockops). Many other skb based prog types need
> bpf_skb_event_output to produce software event.
>
> More prog types are enabled to access bpf_skb_event_output in this
> patch.
>
> v9 changes:
> add "Acked-by" field.

Thanks! This looks good to me. Just FYI. Not sure if you followed, but
bpf-next is closed, so this can't go in until it's open. Maintainers
might ask you to resubmit at that time, if patches don't apply
cleanly.

>
> v8 changes:
> No actual change, just cc to netdev@vger.kernel.org and
> bpf@vger.kernel.org.
> v7 patches are acked by Song Liu.
>
> v7 changes:
> Reformat from hints by scripts/checkpatch.pl, including Song's comment
> on signed-off-by name to captical case in cover letter.
> 3 of hints are ignored:
> 1. new file mode.
> 2. SPDX-License-Identifier for event_output.c since all files under
>    this dir have no such line.
> 3. "Macros ... enclosed in parentheses" for macro in event_output.c
>    due to code's nature.
>
> Change patch 02 subject "bpf:..." to "selftests/bpf:..."
>
> v6 changes:
> Fix Signed-off-by, fix fixup map creation.
>
> v5 changes:
> Fix typos, reformat comments in event_output.c, move revision history to
> cover letter.
>
> v4 changes:
> Reformating log message.
>
> v3 changes:
> Reformating log message.
>
> v2 changes:
> Reformating log message.
>
> Allan Zhang (2):
>   bpf: Allow bpf_skb_event_output for a few prog types
>   selftests/bpf: Add selftests for bpf_perf_event_output
>
>  net/core/filter.c                             |  6 ++
>  tools/testing/selftests/bpf/test_verifier.c   | 12 ++-
>  .../selftests/bpf/verifier/event_output.c     | 94 +++++++++++++++++++
>  3 files changed, 111 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/verifier/event_output.c
>
> --
> 2.22.0.410.gd8fdbe21b5-goog
>

^ permalink raw reply

* Re: Question about linux kernel commit: "net/ipv6: move metrics from dst to rt6_info"
From: Stefano Brivio @ 2019-07-10 19:09 UTC (permalink / raw)
  To: Jan Szewczyk
  Cc: David Ahern, davem@davemloft.net, netdev@vger.kernel.org,
	Wei Wang, Martin KaFai Lau, Eric Dumazet
In-Reply-To: <AM6PR07MB5639E2AEF438DD017246DF13F2F00@AM6PR07MB5639.eurprd07.prod.outlook.com>

Jan,

On Wed, 10 Jul 2019 12:59:41 +0000
Jan Szewczyk <jan.szewczyk@ericsson.com> wrote:

> Hi!
> I digged up a little further and maybe it's not a problem with MTU
> itself. I checked every entry I get from RTM_GETROUTE netlink message
> and after triggering "too big packet" by pinging ipv6address I get
> exactly the same messages on 4.12 and 4.18, except that the one with
> that pinged ipv6address is missing on 4.18 at all. What is weird -
> it's visible when running "ip route get to ipv6address". Do you know
> why there is a mismatch there?

If I understand you correctly, an implementation equivalent to 'ip -6
route list show' (using the NLM_F_DUMP flag) won't show the so-called
route exception, while 'ip -6 route get' shows it.

If that's the case: that was broken by commit 2b760fcf5cfb ("ipv6: hook
up exception table to store dst cache") that landed in 4.15, and fixed
by net-next commit 1e47b4837f3b ("ipv6: Dump route exceptions if
requested"). For more details, see the log of this commit itself.

-- 
Stefano

^ permalink raw reply

* Re: Question about linux kernel commit: "net/ipv6: move metrics from dst to rt6_info"
From: David Ahern @ 2019-07-10 19:13 UTC (permalink / raw)
  To: Stefano Brivio, Jan Szewczyk
  Cc: davem@davemloft.net, netdev@vger.kernel.org, Wei Wang,
	Martin KaFai Lau, Eric Dumazet
In-Reply-To: <20190710210954.530d72a5@elisabeth>

On 7/10/19 1:09 PM, Stefano Brivio wrote:
> Jan,
> 
> On Wed, 10 Jul 2019 12:59:41 +0000
> Jan Szewczyk <jan.szewczyk@ericsson.com> wrote:
> 
>> Hi!
>> I digged up a little further and maybe it's not a problem with MTU
>> itself. I checked every entry I get from RTM_GETROUTE netlink message
>> and after triggering "too big packet" by pinging ipv6address I get
>> exactly the same messages on 4.12 and 4.18, except that the one with
>> that pinged ipv6address is missing on 4.18 at all. What is weird -
>> it's visible when running "ip route get to ipv6address". Do you know
>> why there is a mismatch there?
> 
> If I understand you correctly, an implementation equivalent to 'ip -6
> route list show' (using the NLM_F_DUMP flag) won't show the so-called
> route exception, while 'ip -6 route get' shows it.
> 
> If that's the case: that was broken by commit 2b760fcf5cfb ("ipv6: hook
> up exception table to store dst cache") that landed in 4.15, and fixed
> by net-next commit 1e47b4837f3b ("ipv6: Dump route exceptions if
> requested"). For more details, see the log of this commit itself.
> 

ah, good point. My mind locked on RTM_GETROUTE as a specific route
request not a dump.

^ permalink raw reply

* Re: [PATCH v6 bpf-next 3/3] bpf: add tests for bpf_descendant_of
From: Andrii Nakryiko @ 2019-07-10 19:25 UTC (permalink / raw)
  To: Javier Honduvilla Coto; +Cc: Networking, Yonghong Song, Kernel Team, jonhaslam
In-Reply-To: <20190710180025.94726-4-javierhonduco@fb.com>

On Wed, Jul 10, 2019 at 11:31 AM Javier Honduvilla Coto
<javierhonduco@fb.com> wrote:
>
> Adding the following test cases:

FYI, bpf-next is closed, so this won't be able to go in for about 2 weeks.

>
> - bpf_descendant_of(current->pid) == 1
> - bpf_descendant_of(current->real_parent->pid) == 1
> - bpf_descendant_of(1) == 1
> - bpf_descendant_of(0) == 1
>
> - bpf_descendant_of(-1) == 0
> - bpf_descendant_of(current->children[0]->pid) == 0
>
> Signed-off-by: Javier Honduvilla Coto <javierhonduco@fb.com>
> ---
>  tools/testing/selftests/bpf/.gitignore        |   1 +
>  tools/testing/selftests/bpf/Makefile          |   2 +-
>  tools/testing/selftests/bpf/bpf_helpers.h     |   3 +
>  .../bpf/progs/test_descendant_of_kern.c       |  43 +++
>  .../selftests/bpf/test_descendant_of_user.c   | 266 ++++++++++++++++++
>  5 files changed, 314 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/test_descendant_of_kern.c
>  create mode 100644 tools/testing/selftests/bpf/test_descendant_of_user.c
>
> diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
> index 90f70d2c7c22..4b63d7105ba2 100644
> --- a/tools/testing/selftests/bpf/.gitignore
> +++ b/tools/testing/selftests/bpf/.gitignore
> @@ -43,3 +43,4 @@ test_sockopt
>  test_sockopt_sk
>  test_sockopt_multi
>  test_tcp_rtt
> +test_descendant_of_user
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 2620406a53ec..b3dc1e26c41c 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -27,7 +27,7 @@ TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map test
>         test_cgroup_storage test_select_reuseport test_section_names \
>         test_netcnt test_tcpnotify_user test_sock_fields test_sysctl test_hashmap \
>         test_btf_dump test_cgroup_attach xdping test_sockopt test_sockopt_sk \
> -       test_sockopt_multi test_tcp_rtt
> +       test_sockopt_multi test_tcp_rtt test_descendant_of_user

Could you please instead add this test as part of test_progs? See for
instance prog_tests/attach_probe.c for recently added test.

>
>  BPF_OBJ_FILES = $(patsubst %.c,%.o, $(notdir $(wildcard progs/*.c)))
>  TEST_GEN_FILES = $(BPF_OBJ_FILES)
> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index 5a3d92c8bec8..7525783ffbc9 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -1,4 +1,6 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
> +#include <sys/types.h>
> +
>  #ifndef __BPF_HELPERS_H
>  #define __BPF_HELPERS_H
>
> @@ -228,6 +230,7 @@ static void *(*bpf_sk_storage_get)(void *map, struct bpf_sock *sk,
>  static int (*bpf_sk_storage_delete)(void *map, struct bpf_sock *sk) =
>         (void *)BPF_FUNC_sk_storage_delete;
>  static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
> +static int (*bpf_descendant_of)(pid_t pid) = (void *) BPF_FUNC_descendant_of;

Can you split bpf_helpers.h update into a separate commit?

>
>  /* llvm builtin functions that eBPF C program may use to
>   * emit BPF_LD_ABS and BPF_LD_IND instructions
> diff --git a/tools/testing/selftests/bpf/progs/test_descendant_of_kern.c b/tools/testing/selftests/bpf/progs/test_descendant_of_kern.c
> new file mode 100644
> index 000000000000..802e01595527
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_descendant_of_kern.c
> @@ -0,0 +1,43 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <linux/bpf.h>
> +#include "bpf_helpers.h"
> +
> +struct bpf_map_def SEC("maps") pidmap = {
> +       .type = BPF_MAP_TYPE_ARRAY,
> +       .key_size = sizeof(__u32),
> +       .value_size = sizeof(__u32),
> +       .max_entries = 2,
> +};
> +
> +struct bpf_map_def SEC("maps") resultmap = {
> +       .type = BPF_MAP_TYPE_ARRAY,
> +       .key_size = sizeof(__u32),
> +       .value_size = sizeof(__u32),
> +       .max_entries = 1,
> +};

Please update this to use new BTF-defined maps (see lots of recently
converted tests for example).

> +
> +SEC("tracepoint/syscalls/sys_enter_open")
> +int trace(void *ctx)
> +{
> +       __u32 pid = bpf_get_current_pid_tgid();
> +       __u32 current_key = 0, ancestor_key = 1, *expected_pid, *ancestor_pid;
> +       __u32 *val;
> +
> +       expected_pid = bpf_map_lookup_elem(&pidmap, &current_key);
> +       if (!expected_pid || *expected_pid != pid)
> +               return 0;
> +
> +       ancestor_pid = bpf_map_lookup_elem(&pidmap, &ancestor_key);
> +       if (!ancestor_pid)
> +               return 0;
> +
> +       val = bpf_map_lookup_elem(&resultmap, &current_key);
> +       if (val)
> +               *val = bpf_descendant_of(*ancestor_pid);
> +
> +       return 0;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> +__u32 _version SEC("version") = 1;
> diff --git a/tools/testing/selftests/bpf/test_descendant_of_user.c b/tools/testing/selftests/bpf/test_descendant_of_user.c
> new file mode 100644
> index 000000000000..f616c8c976a4
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/test_descendant_of_user.c
> @@ -0,0 +1,266 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <syscall.h>
> +#include <unistd.h>
> +#include <linux/perf_event.h>
> +#include <sys/ioctl.h>
> +#include <sys/types.h>
> +#include <sys/wait.h>
> +
> +#include <bpf/bpf.h>
> +#include <bpf/libbpf.h>
> +
> +#define CHECK(condition, tag, format...)                                       \
> +       ({                                                                     \
> +               int __ret = !!(condition);                                     \
> +               if (__ret) {                                                   \
> +                       printf("%s:FAIL:%s ", __func__, tag);                  \
> +                       printf(format);                                        \
> +               } else {                                                       \
> +                       printf("%s:PASS:%s\n", __func__, tag);                 \
> +               }                                                              \
> +               __ret;                                                         \
> +       })

You won't need this if done as part of test_progs.

> +
> +static int bpf_find_map(const char *test, struct bpf_object *obj,
> +                       const char *name)
> +{
> +       struct bpf_map *map;
> +
> +       map = bpf_object__find_map_by_name(obj, name);
> +       if (!map)
> +               return -1;
> +       return bpf_map__fd(map);
> +}
> +
> +int main(int argc, char **argv)
> +{
> +       const char *probe_name = "syscalls/sys_enter_open";
> +       const char *file = "test_descendant_of_kern.o";
> +       int err, bytes, efd, prog_fd, pmu_fd;
> +       int resultmap_fd, pidmap_fd;
> +       struct perf_event_attr attr = {};
> +       struct bpf_object *obj;
> +       __u32 descendant_of_result = 0;
> +       __u32 key = 0, pid;
> +       int exit_code = EXIT_FAILURE;
> +       char buf[256];
> +
> +       int child_pid, ancestor_pid, root_fd, nonexistant = -42;
> +       __u32 ancestor_key = 1;
> +       int pipefd[2];
> +       char marker[1];
> +
> +       err = bpf_prog_load(file, BPF_PROG_TYPE_TRACEPOINT, &obj, &prog_fd);
> +       if (CHECK(err, "bpf_prog_load", "err %d errno %d\n", err, errno))
> +               goto fail;
> +
> +       resultmap_fd = bpf_find_map(__func__, obj, "resultmap");
> +       if (CHECK(resultmap_fd < 0, "bpf_find_map", "err %d errno %d\n",
> +                 resultmap_fd, errno))
> +               goto close_prog;
> +
> +       pidmap_fd = bpf_find_map(__func__, obj, "pidmap");
> +       if (CHECK(pidmap_fd < 0, "bpf_find_map", "err %d errno %d\n", pidmap_fd,
> +                 errno))
> +               goto close_prog;
> +
> +       pid = getpid();
> +       bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
> +       bpf_map_update_elem(pidmap_fd, &ancestor_key, &pid, 0);
> +
> +       snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/events/%s/id",
> +                probe_name);
> +       efd = open(buf, O_RDONLY, 0);
> +       if (CHECK(efd < 0, "open", "err %d errno %d\n", efd, errno))
> +               goto close_prog;
> +       bytes = read(efd, buf, sizeof(buf));
> +       close(efd);
> +       if (CHECK(bytes <= 0 || bytes >= sizeof(buf), "read",
> +                 "bytes %d errno %d\n", bytes, errno))
> +               goto close_prog;
> +
> +       attr.config = strtol(buf, NULL, 0);
> +       attr.type = PERF_TYPE_TRACEPOINT;
> +       attr.sample_type = PERF_SAMPLE_RAW;
> +       attr.sample_period = 1;
> +       attr.wakeup_events = 1;
> +
> +       pmu_fd = syscall(__NR_perf_event_open, &attr, getpid(), -1, -1, 0);
> +       if (CHECK(pmu_fd < 0, "perf_event_open", "err %d errno %d\n", pmu_fd,
> +                 errno))
> +               goto close_prog;
> +
> +       err = ioctl(pmu_fd, PERF_EVENT_IOC_ENABLE, 0);
> +       if (CHECK(err, "perf_event_ioc_enable", "err %d errno %d\n", err,
> +                 errno))
> +               goto close_pmu;
> +
> +       err = ioctl(pmu_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
> +       if (CHECK(err, "perf_event_ioc_set_bpf", "err %d errno %d\n", err,
> +                 errno))
> +               goto close_pmu;

Can you please switch all this to new libbpf tracing APIs? see
prog_tests/attach_probe.c for examples.

> +
> +       // Test that descendant_of(current->pid) is true
> +       bpf_map_update_elem(pidmap_fd, &key, &pid, 0);
> +       bpf_map_update_elem(pidmap_fd, &ancestor_key, &pid, 0);
> +       bpf_map_update_elem(resultmap_fd, &key, &nonexistant, 0);
> +
> +       root_fd = open("/", O_RDONLY);
> +       if (CHECK(efd < 0, "open", "errno %d\n", errno))
> +               goto close_prog;
> +       close(root_fd);

I'd suggest to just use raw_tracepoint 'sys_enter', which would be
easy to trigger with just `usleep(1)`. Makes for quite simpler code.
See, e.g., tools/testing/selftests/bpf/prog_tests/get_stack_raw_tp.c

> +
> +       err = bpf_map_lookup_elem(resultmap_fd, &key, &descendant_of_result);
> +       if (CHECK(err, "bpf_map_lookup_elem", "err %d errno %d\n", err, errno))
> +               goto close_pmu;
> +       if (CHECK(descendant_of_result != 1,
> +                 "descendant_of is true with same pid", "%d == %d\n",
> +                 descendant_of_result, 1))
> +               goto close_pmu;
> +

<snip>

^ permalink raw reply

* Re: [PATCH net 2/4] tcp: tcp_fragment() should apply sane memory limits
From: Eric Dumazet @ 2019-07-10 19:26 UTC (permalink / raw)
  To: Prout, Andrew - LLSC - MITLL, Eric Dumazet, Christoph Paasch
  Cc: David S . Miller, netdev, Greg Kroah-Hartman, Jonathan Looney,
	Neal Cardwell, Tyler Hicks, Yuchung Cheng, Bruce Curtis,
	Jonathan Lemon, Dustin Marquess
In-Reply-To: <e471350b70e244daa10043f06fbb3ebe@ll.mit.edu>



On 7/10/19 8:53 PM, Prout, Andrew - LLSC - MITLL wrote:
> 
> Our initial rollout was v4.14.130, but I reproduced it with v4.14.132 as well, reliably for the samba test and once (not reliably) with synthetic test I was trying. A patched v4.14.132 with this patch partially reverted (just the four lines from tcp_fragment deleted) passed the samba test.
> 
> The synthetic test was a pair of simple send/recv test programs under the following conditions:
> -The send socket was non-blocking
> -SO_SNDBUF set to 128KiB
> -The receiver NIC was being flooded with traffic from multiple hosts (to induce packet loss/retransmits)
> -Load was on both systems: a while(1) program spinning on each CPU core
> -The receiver was on an older unaffected kernel
> 

SO_SNDBUF to 128KB does not permit to recover from heavy losses,
since skbs needs to be allocated for retransmits.

The bug we fixed allowed remote attackers to crash all linux hosts,

I am afraid we have to enforce the real SO_SNDBUF limit, finally.

Even a cushion of 128KB per socket is dangerous, for servers with millions of TCP sockets.

You will either have to set SO_SNDBUF to higher values, or let autotuning in place.
Or revert the patches and allow attackers hit you badly.


^ permalink raw reply

* Re: [PATCH V2 1/1 (was 0/1 by accident)] tools/dtrace: initial implementation of DTrace
From: Daniel Borkmann @ 2019-07-10 19:32 UTC (permalink / raw)
  To: Kris Van Hees
  Cc: netdev, bpf, dtrace-devel, linux-kernel, rostedt, mhiramat, acme,
	ast, Peter Zijlstra, Chris Mason, brendan.d.gregg, davem
In-Reply-To: <20190710181227.GA9925@oracle.com>

Hello Kris,

On 07/10/2019 08:12 PM, Kris Van Hees wrote:
> This patch's subject should of course be [PATCH V2 1/1] rather than 0/1.
> Sorry about that.
> 
> On Wed, Jul 10, 2019 at 08:42:24AM -0700, Kris Van Hees wrote:
>> This initial implementation of a tiny subset of DTrace functionality
>> provides the following options:
>>
>> 	dtrace [-lvV] [-b bufsz] -s script
>> 	    -b  set trace buffer size
>> 	    -l  list probes (only works with '-s script' for now)
>> 	    -s  enable or list probes for the specified BPF program
>> 	    -V  report DTrace API version
>>
>> The patch comprises quite a bit of code due to DTrace requiring a few
>> crucial components, even in its most basic form.
>>
>> The code is structured around the command line interface implemented in
>> dtrace.c.  It provides option parsing and drives the three modes of
>> operation that are currently implemented:
>>
>> 1. Report DTrace API version information.
>> 	Report the version information and terminate.
>>
>> 2. List probes in BPF programs.
>> 	Initialize the list of probes that DTrace recognizes, load BPF
>> 	programs, parse all BPF ELF section names, resolve them into
>> 	known probes, and emit the probe names.  Then terminate.
>>
>> 3. Load BPF programs and collect tracing data.
>> 	Initialize the list of probes that DTrace recognizes, load BPF
>> 	programs and attach them to their corresponding probes, set up
>> 	perf event output buffers, and start processing tracing data.
>>
>> This implementation makes extensive use of BPF (handled by dt_bpf.c) and
>> the perf event output ring buffer (handled by dt_buffer.c).  DTrace-style
>> probe handling (dt_probe.c) offers an interface to probes that hides the
>> implementation details of the individual probe types by provider (dt_fbt.c
>> and dt_syscall.c).  Probe lookup by name uses a hashtable implementation
>> (dt_hash.c).  The dt_utils.c code populates a list of online CPU ids, so
>> we know what CPUs we can obtain tracing data from.
>>
>> Building the tool is trivial because its only dependency (libbpf) is in
>> the kernel tree under tools/lib/bpf.  A simple 'make' in the tools/dtrace
>> directory suffices.
>>
>> The 'dtrace' executable needs to run as root because BPF programs cannot
>> be loaded by non-root users.
>>
>> Signed-off-by: Kris Van Hees <kris.van.hees@oracle.com>
>> Reviewed-by: David Mc Lean <david.mclean@oracle.com>
>> Reviewed-by: Eugene Loh <eugene.loh@oracle.com>
>> ---
>> Changes in v2:
>>         - Use ring_buffer_read_head() and ring_buffer_write_tail() to
>>           avoid use of volatile.
>>         - Handle perf events that wrap around the ring buffer boundary.
>>         - Remove unnecessary PERF_EVENT_IOC_ENABLE.
>>         - Remove -I$(srctree)/tools/perf from KBUILD_HOSTCFLAGS since it
>>           is not actually used.
>>         - Use PT_REGS_PARM1(x), etc instead of my own macros.  Adding 
>>           PT_REGS_PARM6(x) in bpf_sample.c because we need to be able to
>>           support up to 6 arguments passed by registers.

Looks like you missed Brendan Gregg's prior feedback from v1 [0]. I haven't
seen a strong compelling argument for why this needs to reside in the kernel
tree given we also have all the other tracing tools and many of which also
rely on BPF such as bcc, bpftrace, ply, systemtap, sysdig, lttng to just name
a few. Given all the other tracers manage to live outside the kernel tree just
fine, so can dtrace as well; it's _not_ special in this regard in any way. It
will be tons of code in long term which is better off in its separate project,
and if we add tools/dtrace/, other projects will come as well asking for kernel
tree inclusion 'because tools/dtrace' is now there, too. While it totally makes
sense to extend the missing kernel bits where needed, it doesn't make sense to
have another big tracing project similar to perf in the tree. Therefore, I'm
not applying this patch, sorry.

Thanks,
Daniel

  [0] https://lore.kernel.org/bpf/CAE40pdeSfJBpbBHTmwz1xZ+MW02=kJ0krq1mN+EkjSLqf2GX_w@mail.gmail.com/

>> ---
>>  MAINTAINERS                |   6 +
>>  tools/dtrace/Makefile      |  87 ++++++++++
>>  tools/dtrace/bpf_sample.c  | 146 ++++++++++++++++
>>  tools/dtrace/dt_bpf.c      | 185 ++++++++++++++++++++
>>  tools/dtrace/dt_buffer.c   | 338 +++++++++++++++++++++++++++++++++++++
>>  tools/dtrace/dt_fbt.c      | 201 ++++++++++++++++++++++
>>  tools/dtrace/dt_hash.c     | 211 +++++++++++++++++++++++
>>  tools/dtrace/dt_probe.c    | 230 +++++++++++++++++++++++++
>>  tools/dtrace/dt_syscall.c  | 179 ++++++++++++++++++++
>>  tools/dtrace/dt_utils.c    | 132 +++++++++++++++
>>  tools/dtrace/dtrace.c      | 249 +++++++++++++++++++++++++++
>>  tools/dtrace/dtrace.h      |  13 ++
>>  tools/dtrace/dtrace_impl.h | 101 +++++++++++
>>  13 files changed, 2078 insertions(+)
>>  create mode 100644 tools/dtrace/Makefile
>>  create mode 100644 tools/dtrace/bpf_sample.c
>>  create mode 100644 tools/dtrace/dt_bpf.c
>>  create mode 100644 tools/dtrace/dt_buffer.c
>>  create mode 100644 tools/dtrace/dt_fbt.c
>>  create mode 100644 tools/dtrace/dt_hash.c
>>  create mode 100644 tools/dtrace/dt_probe.c
>>  create mode 100644 tools/dtrace/dt_syscall.c
>>  create mode 100644 tools/dtrace/dt_utils.c
>>  create mode 100644 tools/dtrace/dtrace.c
>>  create mode 100644 tools/dtrace/dtrace.h
>>  create mode 100644 tools/dtrace/dtrace_impl.h

^ permalink raw reply

* Re: [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close
From: Jakub Kicinski @ 2019-07-10 19:34 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf
In-Reply-To: <5d255dececd33_1b7a2aec940d65b45@john-XPS-13-9370.notmuch>

On Tue, 09 Jul 2019 20:39:24 -0700, John Fastabend wrote:
> Jakub Kicinski wrote:
> > On Mon, 08 Jul 2019 19:14:05 +0000, John Fastabend wrote:  
> > > @@ -287,6 +313,27 @@ static void tls_sk_proto_cleanup(struct sock *sk,
> > >  #endif
> > >  }
> > >  
> > > +static void tls_sk_proto_unhash(struct sock *sk)
> > > +{
> > > +	struct inet_connection_sock *icsk = inet_csk(sk);
> > > +	long timeo = sock_sndtimeo(sk, 0);
> > > +	struct tls_context *ctx;
> > > +
> > > +	if (unlikely(!icsk->icsk_ulp_data)) {  
> > 
> > Is this for when sockmap is stacked on top of TLS and TLS got removed
> > without letting sockmap know?  
> 
> Right its a pattern I used on the sockmap side and put here. But
> I dropped the patch to let sockmap stack on top of TLS because
> it was more than a fix IMO. We could probably drop this check on
> the other hand its harmless.

I feel like this code is pretty complex I struggle to follow all the
paths, so perhaps it'd be better to drop stuff that's not necessary 
to have a clearer picture.

> > > +		if (sk->sk_prot->unhash)
> > > +			sk->sk_prot->unhash(sk);
> > > +	}
> > > +
> > > +	ctx = tls_get_ctx(sk);
> > > +	if (ctx->tx_conf == TLS_SW || ctx->rx_conf == TLS_SW)
> > > +		tls_sk_proto_cleanup(sk, ctx, timeo);
> > > +	icsk->icsk_ulp_data = NULL;  
> > 
> > I think close only starts checking if ctx is NULL in patch 6.
> > Looks like some chunks of ctx checking/clearing got spread to
> > patch 1 and some to patch 6.  
> 
> Yeah, I thought the patches were easier to read this way but
> maybe not. Could add something in the commit log.

Ack! Let me try to get a full grip of patches 2 and 6 and come back 
to this.

> > > +	tls_ctx_free_wq(ctx);
> > > +
> > > +	if (ctx->unhash)
> > > +		ctx->unhash(sk);
> > > +}
> > > +
> > >  static void tls_sk_proto_close(struct sock *sk, long timeout)
> > >  {
> > >  	struct tls_context *ctx = tls_get_ctx(sk);  

^ permalink raw reply

* Re: [bpf PATCH v2 6/6] bpf: sockmap/tls, close can race with map free
From: Jakub Kicinski @ 2019-07-10 19:35 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf
In-Reply-To: <5d255ca6e5b0d_1b7a2aec940d65b4f6@john-XPS-13-9370.notmuch>

On Tue, 09 Jul 2019 20:33:58 -0700, John Fastabend wrote:
> Jakub Kicinski wrote:
> > On Mon, 08 Jul 2019 19:15:18 +0000, John Fastabend wrote:  
> > > @@ -352,15 +354,18 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
> > >  	if (ctx->tx_conf == TLS_BASE && ctx->rx_conf == TLS_BASE)
> > >  		goto skip_tx_cleanup;
> > >  
> > > -	sk->sk_prot = ctx->sk_proto;
> > >  	tls_sk_proto_cleanup(sk, ctx, timeo);
> > >  
> > >  skip_tx_cleanup:
> > > +	write_lock_bh(&sk->sk_callback_lock);
> > > +	icsk->icsk_ulp_data = NULL;  
> > 
> > Is ulp_data pointer now supposed to be updated under the
> > sk_callback_lock?  
> 
> Yes otherwise it can race with tls_update(). I didn't remove the
> ulp pointer null set from tcp_ulp.c though. Could be done in this
> patch or as a follow up.

Do we need to hold the lock in unhash, too, or is unhash called with
sk_callback_lock held?

> > > +	if (sk->sk_prot->close == tls_sk_proto_close)
> > > +		sk->sk_prot = ctx->sk_proto;
> > > +	write_unlock_bh(&sk->sk_callback_lock);
> > >  	release_sock(sk);
> > >  	if (ctx->rx_conf == TLS_SW)
> > >  		tls_sw_release_strp_rx(ctx);
> > > -	sk_proto_close(sk, timeout);
> > > -
> > > +	ctx->sk_proto_close(sk, timeout);
> > >  	if (ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW &&
> > >  	    ctx->tx_conf != TLS_HW_RECORD && ctx->rx_conf != TLS_HW_RECORD)
> > >  		tls_ctx_free(ctx);  


^ permalink raw reply

* Re: [GIT PULL] Keys: Set 4 - Key ACLs for 5.3
From: Eric Biggers @ 2019-07-10 19:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Howells, James Morris James Morris, keyrings, Netdev,
	linux-nfs, CIFS, linux-afs, linux-fsdevel, linux-integrity,
	LSM List, Linux List Kernel Mailing
In-Reply-To: <CAHk-=wjxoeMJfeBahnWH=9zShKp2bsVy527vo3_y8HfOdhwAAw@mail.gmail.com>

On Wed, Jul 10, 2019 at 11:35:07AM -0700, Linus Torvalds wrote:
> On Fri, Jul 5, 2019 at 2:30 PM David Howells <dhowells@redhat.com> wrote:
> >
> > Here's my fourth block of keyrings changes for the next merge window.  They
> > change the permissions model used by keys and keyrings to be based on an
> > internal ACL by the following means:
> 
> It turns out that this is broken, and I'll probably have to revert the
> merge entirely.
> 
> With this merge in place, I can't boot any of the machines that have
> an encrypted disk setup. The boot just stops at
> 
>   systemd[1]: Started Forward Password Requests to Plymouth Directory Watch.
>   systemd[1]: Reached target Paths.
> 
> and never gets any further. I never get the prompt for a passphrase
> for the disk encryption.
> 
> Apparently not a lot of developers are using encrypted volumes for
> their development machines.
> 
> I'm not sure if the only requirement is an encrypted volume, or if
> this is also particular to a F30 install in case you need to be able
> to reproduce. But considering that you have a redhat email address,
> I'm sure you can find a F30 install somewhere with an encrypted disk.
> 
> David, if you can fix this quickly, I'll hold off on the revert of it
> all, but I can wait only so long. I've stopped merging stuff since I
> noticed my machines don't work (this merge window has not been
> pleasant so far - in addition to this issue I had another entirely
> unrelated boot failure which made bisecting this one even more fun).
> 
> So if I don't see a quick fix, I'll just revert in order to then
> continue to do pull requests later today. Because I do not want to do
> further pulls with something that I can't boot as a base.
> 
>                  Linus

This also broke 'keyctl new_session' and hence all the fscrypt tests
(https://lkml.kernel.org/lkml/20190710011559.GA7973@sol.localdomain/), and it
also broke loading in-kernel X.509 certificates
(https://lore.kernel.org/lkml/27671.1562384658@turing-police/T/#u).

I'm *guessing* these are all some underlying issue where keyrings aren't being
given all the needed permissions anymore.

But just FYI, David had said he's on vacation with no laptop or email access for
2 weeks starting from Sunday (3 days ago).  So I don't think you can expect a
quick fix from him.

I was planning to look into this to fix the fscrypt tests, but it might be a few
days before I get to it.  And while I'm *guessing* it will be a simple fix, it
might not be.  So I can't speak for David, but personally I'm fine with the
commits being reverted for now.

I'm also unhappy that the new keyctl KEYCTL_GRANT_PERMISSION doesn't have any
documentation or tests.  (Which seems to be a common problem with David's
work...  None of the new mount syscalls in v5.2 have any tests, for example, and
the man pages are still work-in-progress and last sent out for review a year
ago, despite API changes that occurred before the syscalls were merged.)

- Eric

^ permalink raw reply

* Re: [PATCH V2 1/1 (was 0/1 by accident)] tools/dtrace: initial implementation of DTrace
From: Steven Rostedt @ 2019-07-10 19:47 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kris Van Hees, netdev, bpf, dtrace-devel, linux-kernel, mhiramat,
	acme, ast, Peter Zijlstra, Chris Mason, brendan.d.gregg, davem
In-Reply-To: <c7f15d1d-1696-4d95-1729-4c4e97bdc43e@iogearbox.net>

On Wed, 10 Jul 2019 21:32:25 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:


> Looks like you missed Brendan Gregg's prior feedback from v1 [0]. I haven't
> seen a strong compelling argument for why this needs to reside in the kernel
> tree given we also have all the other tracing tools and many of which also
> rely on BPF such as bcc, bpftrace, ply, systemtap, sysdig, lttng to just name
> a few. Given all the other tracers manage to live outside the kernel tree just
> fine, so can dtrace as well; it's _not_ special in this regard in any way. It
> will be tons of code in long term which is better off in its separate project,
> and if we add tools/dtrace/, other projects will come as well asking for kernel
> tree inclusion 'because tools/dtrace' is now there, too. While it totally makes
> sense to extend the missing kernel bits where needed, it doesn't make sense to
> have another big tracing project similar to perf in the tree. Therefore, I'm
> not applying this patch, sorry.

I agree with this.

Note, trace-cmd is very tied to ftrace just as much as perf is to the
code in tree. There was a window in time I had a choice to add it to
tools/ as well, but after careful consideration, I decided it's best
against it. The only thing being in tree gives you is marketing.
Otherwise, it makes it too coupled. I keep having to compile perf
separately, because a lot of perf distro packages appear to think that
it requires the same kernel version.

It also makes it easier to have your own release cycles, otherwise it
forces you to be on a 2 1/2 month cycle that the kernel is on. And it
forces you to have a clear separation between kernel and user space.

That said, I'm working to put together libraries that interact with all
the current tracers (perf, trace-cmd, lttng, bpftrace, etc) and call it
the "Unified Tracing Platform". The purpose is to allow any tool to be
able to take advantage of any of the supported tracers within the
running kernel. This will be one of the topics at the Tracing MC at
Linux Plumbers in September. I hope to see all of you there ;-)

-- Steve


^ permalink raw reply

* Re: [bpf PATCH v2 2/6] bpf: tls fix transition through disconnect with close
From: Jakub Kicinski @ 2019-07-10 20:04 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev, edumazet, bpf
In-Reply-To: <20190710123417.2157a459@cakuba.netronome.com>

On Wed, 10 Jul 2019 12:34:17 -0700, Jakub Kicinski wrote:
> > > > +		if (sk->sk_prot->unhash)
> > > > +			sk->sk_prot->unhash(sk);
> > > > +	}
> > > > +
> > > > +	ctx = tls_get_ctx(sk);
> > > > +	if (ctx->tx_conf == TLS_SW || ctx->rx_conf == TLS_SW)
> > > > +		tls_sk_proto_cleanup(sk, ctx, timeo);

Do we still need to hook into unhash? With patch 6 in place perhaps we
can just do disconnect 🥺

cleanup is going to kick off TX but also:

	if (unlikely(sk->sk_write_pending) &&
	    !wait_on_pending_writer(sk, &timeo))
		tls_handle_open_record(sk, 0);

Are we guaranteed that sk_write_pending is 0?  Otherwise
wait_on_pending_writer is hiding yet another release_sock() :(

> > > > +	icsk->icsk_ulp_data = NULL;    
> > > 
> > > I think close only starts checking if ctx is NULL in patch 6.
> > > Looks like some chunks of ctx checking/clearing got spread to
> > > patch 1 and some to patch 6.    
> > 
> > Yeah, I thought the patches were easier to read this way but
> > maybe not. Could add something in the commit log.  
> 
> Ack! Let me try to get a full grip of patches 2 and 6 and come back 
> to this.
> 
> > > > +	tls_ctx_free_wq(ctx);
> > > > +
> > > > +	if (ctx->unhash)
> > > > +		ctx->unhash(sk);
> > > > +}
> > > > +
> > > >  static void tls_sk_proto_close(struct sock *sk, long timeout)
> > > >  {
> > > >  	struct tls_context *ctx = tls_get_ctx(sk);    


^ permalink raw reply

* Re: [GIT PULL] Keys: Set 4 - Key ACLs for 5.3
From: Eric Biggers @ 2019-07-10 20:15 UTC (permalink / raw)
  To: Linus Torvalds, David Howells, James Morris, keyrings, Netdev,
	linux-nfs, CIFS, linux-afs, linux-fsdevel, linux-integrity,
	LSM List, Linux List Kernel Mailing
In-Reply-To: <20190710194620.GA83443@gmail.com>

On Wed, Jul 10, 2019 at 12:46:22PM -0700, Eric Biggers wrote:
> On Wed, Jul 10, 2019 at 11:35:07AM -0700, Linus Torvalds wrote:
> > On Fri, Jul 5, 2019 at 2:30 PM David Howells <dhowells@redhat.com> wrote:
> > >
> > > Here's my fourth block of keyrings changes for the next merge window.  They
> > > change the permissions model used by keys and keyrings to be based on an
> > > internal ACL by the following means:
> > 
> > It turns out that this is broken, and I'll probably have to revert the
> > merge entirely.
> > 
> > With this merge in place, I can't boot any of the machines that have
> > an encrypted disk setup. The boot just stops at
> > 
> >   systemd[1]: Started Forward Password Requests to Plymouth Directory Watch.
> >   systemd[1]: Reached target Paths.
> > 
> > and never gets any further. I never get the prompt for a passphrase
> > for the disk encryption.
> > 
> > Apparently not a lot of developers are using encrypted volumes for
> > their development machines.
> > 
> > I'm not sure if the only requirement is an encrypted volume, or if
> > this is also particular to a F30 install in case you need to be able
> > to reproduce. But considering that you have a redhat email address,
> > I'm sure you can find a F30 install somewhere with an encrypted disk.
> > 
> > David, if you can fix this quickly, I'll hold off on the revert of it
> > all, but I can wait only so long. I've stopped merging stuff since I
> > noticed my machines don't work (this merge window has not been
> > pleasant so far - in addition to this issue I had another entirely
> > unrelated boot failure which made bisecting this one even more fun).
> > 
> > So if I don't see a quick fix, I'll just revert in order to then
> > continue to do pull requests later today. Because I do not want to do
> > further pulls with something that I can't boot as a base.
> > 
> >                  Linus
> 
> This also broke 'keyctl new_session' and hence all the fscrypt tests
> (https://lkml.kernel.org/lkml/20190710011559.GA7973@sol.localdomain/), and it
> also broke loading in-kernel X.509 certificates
> (https://lore.kernel.org/lkml/27671.1562384658@turing-police/T/#u).
> 
> I'm *guessing* these are all some underlying issue where keyrings aren't being
> given all the needed permissions anymore.
> 
> But just FYI, David had said he's on vacation with no laptop or email access for
> 2 weeks starting from Sunday (3 days ago).  So I don't think you can expect a
> quick fix from him.
> 
> I was planning to look into this to fix the fscrypt tests, but it might be a few
> days before I get to it.  And while I'm *guessing* it will be a simple fix, it
> might not be.  So I can't speak for David, but personally I'm fine with the
> commits being reverted for now.
> 
> I'm also unhappy that the new keyctl KEYCTL_GRANT_PERMISSION doesn't have any
> documentation or tests.  (Which seems to be a common problem with David's
> work...  None of the new mount syscalls in v5.2 have any tests, for example, and
> the man pages are still work-in-progress and last sent out for review a year
> ago, despite API changes that occurred before the syscalls were merged.)
> 

Also worth noting that the key ACL patches were only in linux-next for 9 days
before the pull request was sent.  The X.509 certificate loading bug (which
might be the same underlying bug) was reported on July 6 by someone testing
linux-next, but the pull request had already been sent on July 5.  I suspect
these bug(s) would have been fixed if they had been in linux-next for longer.

- Eric

^ permalink raw reply

* Re: [PATCH V2 1/1 (was 0/1 by accident)] tools/dtrace: initial implementation of DTrace
From: Jonathan Corbet @ 2019-07-10 20:30 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Kris Van Hees, netdev, bpf, dtrace-devel, linux-kernel, rostedt,
	mhiramat, acme, ast, Peter Zijlstra, Chris Mason, brendan.d.gregg,
	davem
In-Reply-To: <c7f15d1d-1696-4d95-1729-4c4e97bdc43e@iogearbox.net>

On Wed, 10 Jul 2019 21:32:25 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> Looks like you missed Brendan Gregg's prior feedback from v1 [0]. I haven't
> seen a strong compelling argument for why this needs to reside in the kernel
> tree given we also have all the other tracing tools and many of which also
> rely on BPF such as bcc, bpftrace, ply, systemtap, sysdig, lttng to just name
> a few.

So I'm just watching from the sidelines here, but I do feel the need to
point out that Kris appears to be trying to follow the previous feedback
he got from Alexei, where creating tools/dtrace is exactly what he was
told to do:

  https://lwn.net/ml/netdev/20190521175617.ipry6ue7o24a2e6n@ast-mbp.dhcp.thefacebook.com/

Now he's being told the exact opposite.  Not the best experience for
somebody who is trying to make the kernel better.

There are still people interested in DTrace out there.  How would you
recommend that Kris proceed at this point?

Thanks,

jon

^ permalink raw reply

* Re: [PATCH iproute2-next] ip: bond: add peer notification delay support
From: David Ahern @ 2019-07-10 20:54 UTC (permalink / raw)
  To: Vincent Bernat, Stephen Hemminger, netdev
In-Reply-To: <20190707175115.3704-1-vincent@bernat.ch>

On 7/7/19 11:51 AM, Vincent Bernat wrote:
> Ability to tweak the delay between gratuitous ND/ARP packets has been
> added in kernel commit 07a4ddec3ce9 ("bonding: add an option to
> specify a delay between peer notifications"), through
> IFLA_BOND_PEER_NOTIF_DELAY attribute. Add support to set and show this
> value.
> 
> Example:
> 
>     $ ip -d link set bond0 type bond peer_notify_delay 1000
>     $ ip -d link l dev bond0
>     2: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue
>     state UP mode DEFAULT group default qlen 1000
>         link/ether 50:54:33:00:00:01 brd ff:ff:ff:ff:ff:ff
>         bond mode active-backup active_slave eth0 miimon 100 updelay 0
>     downdelay 0 peer_notify_delay 1000 use_carrier 1 arp_interval 0
>     arp_validate none arp_all_targets any primary eth0
>     primary_reselect always fail_over_mac active xmit_hash_policy
>     layer2 resend_igmp 1 num_grat_arp 5 all_slaves_active 0 min_links
>     0 lp_interval 1 packets_per_slave 1 lacp_rate slow ad_select
>     stable tlb_dynamic_lb 1 addrgenmode eu
> 
> Signed-off-by: Vincent Bernat <vincent@bernat.ch>
> ---
>  include/uapi/linux/if_link.h |  1 +
>  ip/iplink_bond.c             | 14 +++++++++++++-
>  2 files changed, 14 insertions(+), 1 deletion(-)
> 

applied to iproute2-next. Thanks

^ permalink raw reply

* Re: [PATCH] tipc: ensure skb->lock is initialised
From: Chris Packham @ 2019-07-10 20:58 UTC (permalink / raw)
  To: Jon Maloy, Eric Dumazet, ying.xue@windriver.com,
	davem@davemloft.net
  Cc: netdev@vger.kernel.org, tipc-discussion@lists.sourceforge.net,
	linux-kernel@vger.kernel.org
In-Reply-To: <MN2PR15MB3581E1D6D56D6AA7DE8E357E9AF00@MN2PR15MB3581.namprd15.prod.outlook.com>


On 11/07/19 1:10 AM, Jon Maloy wrote:
>> -----Original Message-----
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Sent: 10-Jul-19 04:00
>> To: Jon Maloy <jon.maloy@ericsson.com>; Eric Dumazet
>> <eric.dumazet@gmail.com>; Chris Packham
>> <Chris.Packham@alliedtelesis.co.nz>; ying.xue@windriver.com;
>> davem@davemloft.net
>> Cc: netdev@vger.kernel.org; tipc-discussion@lists.sourceforge.net; linux-
>> kernel@vger.kernel.org
>> Subject: Re: [PATCH] tipc: ensure skb->lock is initialised
>>
>>
>>
>> On 7/9/19 10:15 PM, Jon Maloy wrote:
>>>
>>> It is not only for lockdep purposes, -it is essential.  But please provide details
>> about where you see that more fixes are needed.
>>>
>>
>> Simple fact that you detect a problem only when skb_queue_purge() is called
>> should talk by itself.
>>
>> As I stated, there are many places where the list is manipulated _without_ its
>> spinlock being held.
> 
> Yes, and that is the way it should be on the send path.
> 
>>
>> You want consistency, then
>>
>> - grab the spinlock all the time.
>> - Or do not ever use it.
> 
> That is exactly what we are doing.
> - The send path doesn't need the spinlock, and never grabs it.
> - The receive path does need it, and always grabs it.
> 
> However, since we don't know from the beginning which path a created
> message will follow, we initialize the queue spinlock "just in case"
> when it is created, even though it may never be used later.
> You can see this as a violation of the principle you are stating
> above, but it is a prize that is worth paying, given savings in code
> volume, complexity and performance.
> 
>>
>> Do not initialize the spinlock just in case a path will use skb_queue_purge()
>> (instead of using __skb_queue_purge())
> 
> I am ok with that. I think we can agree that Chris goes for that
> solution, so we can get this bug fixed.

So would you like a v2 with an improved commit message? I note that I 
said skb->lock instead of head->lock in the subject line so at least 
that should be corrected.


^ permalink raw reply

* Re: [PATCH net-next iproute2 v2 2/2] devlink: Introduce PCI PF and VF port flavour and attribute
From: David Ahern @ 2019-07-10 21:01 UTC (permalink / raw)
  To: Parav Pandit, netdev; +Cc: stephen, dsahern, jiri
In-Reply-To: <20190710123952.6877-2-parav@mellanox.com>

On 7/10/19 6:39 AM, Parav Pandit wrote:
> Introduce PCI PF and VF port flavour and port attributes such as PF
> number and VF number.
> 
> $ devlink port show
> pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
> pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
> pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1
> 
> Signed-off-by: Parav Pandit <parav@mellanox.com>
> ---
> Changelog:
> v1->v2:
>  - Instead of if-else using switch-case.
>  - Split patch to two patches to have kernel header update in dedicated
>    patch.
> ---
>  devlink/devlink.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 

applied to iproute2-next. Thanks

^ permalink raw reply

* Re: [PATCH iproute2-next v2 0/3] add interface to TC MPLS actions
From: David Ahern @ 2019-07-10 21:08 UTC (permalink / raw)
  To: John Hurley, netdev
  Cc: davem, jiri, xiyou.wangcong, willemdebruijn.kernel, stephen,
	simon.horman, jakub.kicinski, oss-drivers
In-Reply-To: <1562762440-25656-1-git-send-email-john.hurley@netronome.com>

On 7/10/19 6:40 AM, John Hurley wrote:
> Recent kernel additions to TC allows the manipulation of MPLS headers as
> filter actions.
> 
> The following patchset creates an iproute2 interface to the new actions
> and includes documentation on how to use it.
> 
> v1->v2:
> - change error from print_string() to fprintf(strerr,) (Stephen Hemminger)
> - split long line in explain() message (David Ahern)
> - use _SL_ instead of /n in print message (David Ahern)
> 
> John Hurley (3):
>   lib: add mpls_uc and mpls_mc as link layer protocol names
>   tc: add mpls actions
>   man: update man pages for TC MPLS actions
> 
>  lib/ll_proto.c     |   2 +
>  man/man8/tc-mpls.8 | 156 ++++++++++++++++++++++++++++++
>  tc/Makefile        |   1 +
>  tc/m_mpls.c        | 276 +++++++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 435 insertions(+)
>  create mode 100644 man/man8/tc-mpls.8
>  create mode 100644 tc/m_mpls.c
> 

applied to iproute2-next. Thanks


^ permalink raw reply

* Re: [PATCH AUTOSEL 4.19 14/60] mwifiex: Abort at too short BSS descriptor element
From: Brian Norris @ 2019-07-10 21:12 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Linux Kernel, stable, Takashi Iwai, Kalle Valo, linux-wireless,
	<netdev@vger.kernel.org>
In-Reply-To: <20190710145112.GX10104@sasha-vm>

On Wed, Jul 10, 2019 at 7:51 AM Sasha Levin <sashal@kernel.org> wrote:
> I see that 63d7ef36103d didn't make it into 5.2, so I'll just drop this
> for now.

Yeah, I think it's stuck at net/master. Presumably it'll get into
5.3-rc somewhere.

Brian

^ permalink raw reply

* Re: [PATCH V2 1/1 (was 0/1 by accident)] tools/dtrace: initial implementation of DTrace
From: Daniel Borkmann @ 2019-07-10 21:19 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Kris Van Hees, netdev, bpf, dtrace-devel, linux-kernel, rostedt,
	mhiramat, acme, ast, Peter Zijlstra, Chris Mason, brendan.d.gregg,
	davem
In-Reply-To: <20190710143048.3923d1d9@lwn.net>

On 07/10/2019 10:30 PM, Jonathan Corbet wrote:
> On Wed, 10 Jul 2019 21:32:25 +0200
> Daniel Borkmann <daniel@iogearbox.net> wrote:
> 
>> Looks like you missed Brendan Gregg's prior feedback from v1 [0]. I haven't
>> seen a strong compelling argument for why this needs to reside in the kernel
>> tree given we also have all the other tracing tools and many of which also
>> rely on BPF such as bcc, bpftrace, ply, systemtap, sysdig, lttng to just name
>> a few.
> 
> So I'm just watching from the sidelines here, but I do feel the need to
> point out that Kris appears to be trying to follow the previous feedback
> he got from Alexei, where creating tools/dtrace is exactly what he was
> told to do:
> 
>   https://lwn.net/ml/netdev/20190521175617.ipry6ue7o24a2e6n@ast-mbp.dhcp.thefacebook.com/
> 
> Now he's being told the exact opposite.  Not the best experience for
> somebody who is trying to make the kernel better.

Ugh, agree, sorry for the misleading direction. Alexei is currently offgrid
this week, he might comment later.

It has nothing to do with making the _kernel_ better, it's a /user space/ front
end for the existing kernel infrastructure like many of the other tracers out
there. Don't get me wrong, adding the missing /kernel parts/ for it is a totally
different subject [and _that_ is what is making the kernel better, not the former].
Hypothetical question: does it make the _kernel_ better if we suddenly add a huge
and complex project like tools/mysql/ to the kernel tree? Nope.

> There are still people interested in DTrace out there.  How would you
> recommend that Kris proceed at this point?

My recommendation to proceed is to maintain the dtrace user space tooling in
its own separate project like the vast majority of all the other tracing projects
(see also the other advantages that Steven pointed out from his experience), and
extend the kernel bits whenever needed.

Thanks,
Daniel

^ permalink raw reply

* Re: [PATCH V2 1/1 (was 0/1 by accident)] tools/dtrace: initial implementation of DTrace
From: Brendan Gregg @ 2019-07-10 21:35 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Daniel Borkmann, Kris Van Hees, netdev, bpf, dtrace-devel, LKML,
	Steven Rostedt, Masami Hiramatsu, Arnaldo Carvalho de Melo,
	Alexei Starovoitov, Peter Zijlstra, Chris Mason, David S. Miller
In-Reply-To: <20190710143048.3923d1d9@lwn.net>

On Wed, Jul 10, 2019 at 1:30 PM Jonathan Corbet <corbet@lwn.net> wrote:
>
> On Wed, 10 Jul 2019 21:32:25 +0200
> Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> > Looks like you missed Brendan Gregg's prior feedback from v1 [0]. I haven't
> > seen a strong compelling argument for why this needs to reside in the kernel
> > tree given we also have all the other tracing tools and many of which also
> > rely on BPF such as bcc, bpftrace, ply, systemtap, sysdig, lttng to just name
> > a few.
>
> So I'm just watching from the sidelines here, but I do feel the need to
> point out that Kris appears to be trying to follow the previous feedback
> he got from Alexei, where creating tools/dtrace is exactly what he was
> told to do:
>
>   https://lwn.net/ml/netdev/20190521175617.ipry6ue7o24a2e6n@ast-mbp.dhcp.thefacebook.com/

From what I saw, the discussion was about kernel and user bits, where
the user bit was a "little tool is currently hardcoded to process a
single test case". I missed it first time around, but was going to
make the case that such tests belong in tools/testing/selftests/bpf
rather than tools/dtrace, since we have other similar test cases to
ensure bits of BPF work.

This patchset pivoted from a single test case to the entire DTrace
front end. If this was Kris's intent all along, it wasn't clear to me
(and maybe Alexei either) until now.

>
> Now he's being told the exact opposite.  Not the best experience for
> somebody who is trying to make the kernel better.
>
> There are still people interested in DTrace out there.

Yes, they can:

apt-get install bpftrace (or snap, yum, whatever)

and start solving production problems today, like we are.

> How would you
> recommend that Kris proceed at this point?

You may not be asking me, but I don't think it's best for Linux to
split our tracing expertise among 13 different tracers (SystemTap,
LTTng, ftrace, perf, dtrace4linux, OEL DTrace, ktap, sysdig, Intel
PIN, bcc, shark, ply, and bpftrace). bpftrace is already far ahead and
in use in production, and Kris is just starting building a new one. I
actually think we need to consolidate our expertise on fewer tracers,
which includes asking developers to jump the fence and work on other
projects (like I did myself). But I also recognize's Kris's need to
support legacy users, so I'll stop short of saying that it shouldn't
exist at all.

Brendan

^ permalink raw reply

* Re: [PATCH V2 1/1 (was 0/1 by accident)] tools/dtrace: initial implementation of DTrace
From: Kris Van Hees @ 2019-07-10 21:36 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jonathan Corbet, Kris Van Hees, netdev, bpf, dtrace-devel,
	linux-kernel, rostedt, mhiramat, acme, ast, Peter Zijlstra,
	Chris Mason, brendan.d.gregg, davem
In-Reply-To: <1de27d29-65bb-89d3-9fca-7c452cd66934@iogearbox.net>

On Wed, Jul 10, 2019 at 11:19:43PM +0200, Daniel Borkmann wrote:
> On 07/10/2019 10:30 PM, Jonathan Corbet wrote:
> > On Wed, 10 Jul 2019 21:32:25 +0200
> > Daniel Borkmann <daniel@iogearbox.net> wrote:
> > 
> >> Looks like you missed Brendan Gregg's prior feedback from v1 [0]. I haven't
> >> seen a strong compelling argument for why this needs to reside in the kernel
> >> tree given we also have all the other tracing tools and many of which also
> >> rely on BPF such as bcc, bpftrace, ply, systemtap, sysdig, lttng to just name
> >> a few.
> > 
> > So I'm just watching from the sidelines here, but I do feel the need to
> > point out that Kris appears to be trying to follow the previous feedback
> > he got from Alexei, where creating tools/dtrace is exactly what he was
> > told to do:
> > 
> >   https://lwn.net/ml/netdev/20190521175617.ipry6ue7o24a2e6n@ast-mbp.dhcp.thefacebook.com/
> > 
> > Now he's being told the exact opposite.  Not the best experience for
> > somebody who is trying to make the kernel better.
> 
> Ugh, agree, sorry for the misleading direction. Alexei is currently offgrid
> this week, he might comment later.
> 
> It has nothing to do with making the _kernel_ better, it's a /user space/ front
> end for the existing kernel infrastructure like many of the other tracers out
> there. Don't get me wrong, adding the missing /kernel parts/ for it is a totally
> different subject [and _that_ is what is making the kernel better, not the former].

I disagree.  Yes, the current patch obviously isn't making the kernel better
because it doesn't touch the kernel.  But DTrace as a whole is not just a
/front end/ to the existing kernel infrastructure, and I did make that point
at LPC 2018 and in my emails.  Some of its more advanced features will lead
to contributions to the kernel that (by virtue of being developed as part of
this DTrace re-implementation) will more often than not be able to benefit
other tracers as well.  I do think that aspect qualifies as working towards
making the kenrel better.

> Hypothetical question: does it make the _kernel_ better if we suddenly add a huge
> and complex project like tools/mysql/ to the kernel tree? Nope.
> 
> > There are still people interested in DTrace out there.  How would you
> > recommend that Kris proceed at this point?
> 
> My recommendation to proceed is to maintain the dtrace user space tooling in
> its own separate project like the vast majority of all the other tracing projects
> (see also the other advantages that Steven pointed out from his experience), and
> extend the kernel bits whenever needed.

I wish that would have been the initial recommendation because it certainly
would have avoided me going down a path that was going to lead to rejection.

Either way, I do hope that as work progresses and contributions to the kernel
code are submitted in support of advancing tracing on Linux, those patches
will receive a fair review and consideration.  I can appreciate that some
people do not like DTrace or feel that it is not necessary, but personal
opinions about tools should not be a deciding factor in whether a contribution
has merit or not.

	Kris

^ permalink raw reply

* Re: [PATCH V2 1/1 (was 0/1 by accident)] tools/dtrace: initial implementation of DTrace
From: Brendan Gregg @ 2019-07-10 21:49 UTC (permalink / raw)
  To: Kris Van Hees
  Cc: Daniel Borkmann, Jonathan Corbet, netdev, bpf, dtrace-devel, LKML,
	Steven Rostedt, Masami Hiramatsu, Arnaldo Carvalho de Melo,
	Alexei Starovoitov, Peter Zijlstra, Chris Mason, David S. Miller
In-Reply-To: <20190710213637.GB13962@oracle.com>

On Wed, Jul 10, 2019 at 2:36 PM Kris Van Hees <kris.van.hees@oracle.com> wrote:
>
> On Wed, Jul 10, 2019 at 11:19:43PM +0200, Daniel Borkmann wrote:
> > On 07/10/2019 10:30 PM, Jonathan Corbet wrote:
> > > On Wed, 10 Jul 2019 21:32:25 +0200
> > > Daniel Borkmann <daniel@iogearbox.net> wrote:
> > >
> > >> Looks like you missed Brendan Gregg's prior feedback from v1 [0]. I haven't
> > >> seen a strong compelling argument for why this needs to reside in the kernel
> > >> tree given we also have all the other tracing tools and many of which also
> > >> rely on BPF such as bcc, bpftrace, ply, systemtap, sysdig, lttng to just name
> > >> a few.
> > >
> > > So I'm just watching from the sidelines here, but I do feel the need to
> > > point out that Kris appears to be trying to follow the previous feedback
> > > he got from Alexei, where creating tools/dtrace is exactly what he was
> > > told to do:
> > >
> > >   https://lwn.net/ml/netdev/20190521175617.ipry6ue7o24a2e6n@ast-mbp.dhcp.thefacebook.com/
> > >
> > > Now he's being told the exact opposite.  Not the best experience for
> > > somebody who is trying to make the kernel better.
> >
> > Ugh, agree, sorry for the misleading direction. Alexei is currently offgrid
> > this week, he might comment later.
> >
> > It has nothing to do with making the _kernel_ better, it's a /user space/ front
> > end for the existing kernel infrastructure like many of the other tracers out
> > there. Don't get me wrong, adding the missing /kernel parts/ for it is a totally
> > different subject [and _that_ is what is making the kernel better, not the former].
>
> I disagree.  Yes, the current patch obviously isn't making the kernel better
> because it doesn't touch the kernel.  But DTrace as a whole is not just a
> /front end/ to the existing kernel infrastructure, and I did make that point
> at LPC 2018 and in my emails.  Some of its more advanced features will lead
> to contributions to the kernel that (by virtue of being developed as part of
> this DTrace re-implementation) will more often than not be able to benefit
> other tracers as well.  I do think that aspect qualifies as working towards
> making the kenrel better.
>
> > Hypothetical question: does it make the _kernel_ better if we suddenly add a huge
> > and complex project like tools/mysql/ to the kernel tree? Nope.
> >
> > > There are still people interested in DTrace out there.  How would you
> > > recommend that Kris proceed at this point?
> >
> > My recommendation to proceed is to maintain the dtrace user space tooling in
> > its own separate project like the vast majority of all the other tracing projects
> > (see also the other advantages that Steven pointed out from his experience), and
> > extend the kernel bits whenever needed.
>
> I wish that would have been the initial recommendation because it certainly
> would have avoided me going down a path that was going to lead to rejection.
>
> Either way, I do hope that as work progresses and contributions to the kernel
> code are submitted in support of advancing tracing on Linux, those patches
> will receive a fair review and consideration.  I can appreciate that some
> people do not like DTrace or feel that it is not necessary, but personal
> opinions about tools should not be a deciding factor in whether a contribution
> has merit or not.

Hey Kris -- so you're referring to me, and I've used DTrace more than
anyone over the past 15 years, and I don't think anyone has used all
the different Linux tracers more than I have. I think my opinion has a
lot of value.


Brendan

^ permalink raw reply

* Re: [PATCH net-next] net/sched: Fix kernel NULL pointer dereference
From: Pablo Neira Ayuso @ 2019-07-10 22:03 UTC (permalink / raw)
  To: wenxu; +Cc: davem, netfilter-devel, netdev
In-Reply-To: <1562766304-20272-1-git-send-email-wenxu@ucloud.cn>

On Wed, Jul 10, 2019 at 09:45:04PM +0800, wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
> 
> [  697.665184] BUG: kernel NULL pointer dereference, address: 0000000000000030
> [  697.665550] #PF: supervisor read access in kernel mode
> [  697.665906] #PF: error_code(0x0000) - not-present page
> [  697.666297] PGD 800000104e636067 P4D 800000104e636067 PUD ff4b02067 PMD 0
> [  697.666710] Oops: 0000 [#1] SMP PTI
> [  697.667115] CPU: 31 PID: 24466 Comm: modprobe Kdump: loaded Tainted: G           O      5.2.0-rc6+ #1
> [  697.667867] Hardware name: Huawei Technologies Co., Ltd. RH1288 V3/BC11HGSC0, BIOS 3.57 02/26/2017
> [  697.668620] RIP: 0010:tc_indr_block_ing_cmd.isra.52+0x4c/0xb0
> [  697.669029] Code: 83 ec 40 65 48 8b 04 25 28 00 00 00 48 89 45 e8 31 c0 f3 48 ab 48 8b 06 49 8b b3 e8 04 00 00 44 89 45 b0 c7 45 b4 01 00 00 00 <8b> 48 30 48 89 75 c0 85 c9 48 8d 4d b0 0f 95 45 b8 48 85 c0 4c 8d
> [  697.670132] RSP: 0018:ffffc90007bf7958 EFLAGS: 00010246
> [  697.670537] RAX: 0000000000000000 RBX: ffff88905e2cbae8 RCX: 0000000000000000
> [  697.670938] RDX: ffff88905e2cbcd8 RSI: ffffffff823a8480 RDI: ffffc90007bf7990
> [  697.671352] RBP: ffffc90007bf79a8 R08: 0000000000000000 R09: ffff88905e2cbcc0
> [  697.671761] R10: ffff888107c07780 R11: ffff88902c249000 R12: ffff88905e2cbcd0
> [  697.672173] R13: ffff88905e2cbac0 R14: ffff88885596bc00 R15: ffff88905e2cbcc0
> [  697.672582] FS:  00007fe0b4095740(0000) GS:ffff88905fbc0000(0000) knlGS:0000000000000000
> [  697.673335] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  697.673746] CR2: 0000000000000030 CR3: 0000000ff46b4005 CR4: 00000000001606e0
> [  697.674156] Call Trace:
> [  697.674563]  __tc_indr_block_cb_register+0x11e/0x3c0
> [  697.674998]  mlx5e_nic_rep_netdevice_event+0x9e/0x110 [mlx5_core]
> [  697.675411]  notifier_call_chain+0x53/0xa0
> [  697.675812]  raw_notifier_call_chain+0x16/0x20
> [  697.676223]  call_netdevice_notifiers_info+0x2d/0x60
> [  697.676633]  register_netdevice+0x3fa/0x500
> 
> get indr_dev->block after check it.
> 
> Fixes: 955bcb6ea0df ("drivers: net: use flow block API")
> Signed-off-by: wenxu <wenxu@ucloud.cn>

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox