Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH bpf-next 1/2] bpf: Add flag BPF_F_NO_TUNNEL_KEY to bpf_skb_set_tunnel_key()
From: Christian Ehrig @ 2022-12-18  5:17 UTC (permalink / raw)
  To: bpf
  Cc: cehrig, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko,
	Shuah Khan, Joanne Koong, Kui-Feng Lee, Maxim Mikityanskiy,
	Kaixi Fan, Shmulik Ladkani, Paul Chaignon, linux-kernel, netdev,
	linux-kselftest

This patch allows to remove TUNNEL_KEY from the tunnel flags bitmap
when using bpf_skb_set_tunnel_key by providing a BPF_F_NO_TUNNEL_KEY
flag. On egress, the resulting tunnel header will not contain a tunnel
key if the protocol and implementation supports it.

At the moment bpf_tunnel_key wants a user to specify a numeric tunnel
key. This will wrap the inner packet into a tunnel header with the key
bit and value set accordingly. This is problematic when using a tunnel
protocol that supports optional tunnel keys and a receiving tunnel
device that is not expecting packets with the key bit set. The receiver
won't decapsulate and drop the packet.

RFC 2890 and RFC 2784 GRE tunnels are examples where this flag is
useful. It allows for generating packets, that can be decapsulated by
a GRE tunnel device not operating in collect metadata mode or not
expecting the key bit set.

Signed-off-by: Christian Ehrig <cehrig@cloudflare.com>
---
 include/uapi/linux/bpf.h       | 4 ++++
 net/core/filter.c              | 5 ++++-
 tools/include/uapi/linux/bpf.h | 4 ++++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 464ca3f01fe7..bc1a3d232ae4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2001,6 +2001,9 @@ union bpf_attr {
  * 			sending the packet. This flag was added for GRE
  * 			encapsulation, but might be used with other protocols
  * 			as well in the future.
+ * 		**BPF_F_NO_TUNNEL_KEY**
+ * 			Add a flag to tunnel metadata indicating that no tunnel
+ * 			key should be set in the resulting tunnel header.
  *
  * 		Here is a typical usage on the transmit path:
  *
@@ -5764,6 +5767,7 @@ enum {
 	BPF_F_ZERO_CSUM_TX		= (1ULL << 1),
 	BPF_F_DONT_FRAGMENT		= (1ULL << 2),
 	BPF_F_SEQ_NUMBER		= (1ULL << 3),
+	BPF_F_NO_TUNNEL_KEY		= (1ULL << 4),
 };
 
 /* BPF_FUNC_skb_get_tunnel_key flags. */
diff --git a/net/core/filter.c b/net/core/filter.c
index 929358677183..c746e4d77214 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4615,7 +4615,8 @@ BPF_CALL_4(bpf_skb_set_tunnel_key, struct sk_buff *, skb,
 	struct ip_tunnel_info *info;
 
 	if (unlikely(flags & ~(BPF_F_TUNINFO_IPV6 | BPF_F_ZERO_CSUM_TX |
-			       BPF_F_DONT_FRAGMENT | BPF_F_SEQ_NUMBER)))
+			       BPF_F_DONT_FRAGMENT | BPF_F_SEQ_NUMBER |
+			       BPF_F_NO_TUNNEL_KEY)))
 		return -EINVAL;
 	if (unlikely(size != sizeof(struct bpf_tunnel_key))) {
 		switch (size) {
@@ -4653,6 +4654,8 @@ BPF_CALL_4(bpf_skb_set_tunnel_key, struct sk_buff *, skb,
 		info->key.tun_flags &= ~TUNNEL_CSUM;
 	if (flags & BPF_F_SEQ_NUMBER)
 		info->key.tun_flags |= TUNNEL_SEQ;
+	if (flags & BPF_F_NO_TUNNEL_KEY)
+		info->key.tun_flags &= ~TUNNEL_KEY;
 
 	info->key.tun_id = cpu_to_be64(from->tunnel_id);
 	info->key.tos = from->tunnel_tos;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 464ca3f01fe7..bc1a3d232ae4 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2001,6 +2001,9 @@ union bpf_attr {
  * 			sending the packet. This flag was added for GRE
  * 			encapsulation, but might be used with other protocols
  * 			as well in the future.
+ * 		**BPF_F_NO_TUNNEL_KEY**
+ * 			Add a flag to tunnel metadata indicating that no tunnel
+ * 			key should be set in the resulting tunnel header.
  *
  * 		Here is a typical usage on the transmit path:
  *
@@ -5764,6 +5767,7 @@ enum {
 	BPF_F_ZERO_CSUM_TX		= (1ULL << 1),
 	BPF_F_DONT_FRAGMENT		= (1ULL << 2),
 	BPF_F_SEQ_NUMBER		= (1ULL << 3),
+	BPF_F_NO_TUNNEL_KEY		= (1ULL << 4),
 };
 
 /* BPF_FUNC_skb_get_tunnel_key flags. */
-- 
2.37.4


^ permalink raw reply related

* Re: [PATCH net] mctp: serial: Fix starting value for frame check sequence
From: Jeremy Kerr @ 2022-12-18  1:32 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: netdev, Matt Johnston, David S. Miller, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, Harsh Tyagi
In-Reply-To: <CAKgT0Ufb-PeuqH4+dLca8ANJ=Y7uhYCK-QLNz-Pjo991pJZ8Xw@mail.gmail.com>

Hi Alexander,

> 
> > Yep, that would make sense if they're commonly used values, but I'm
> > not sure that would be suitable to include that in this fix, as it
> > would just add disruption to any backport work.
> 
> Sorry, I didn't intend that for this patch. I was suggesting it as
> possible follow-on work.

Ah, cool! Yep, definitely a contender for a future change.

(this might be a good opportunity as a starter item for someone wanting
to get familiar with the development/review process, if you know of
anyone?)

Cheers,


Jeremy


^ permalink raw reply

* [bpf-next v3 3/3] samples/bpf: fix uninitialized warning with test_current_task_under_cgroup
From: Daniel T. Lee @ 2022-12-18  0:43 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Yonghong Song
  Cc: bpf, netdev
In-Reply-To: <20221218004307.4872-1-danieltimlee@gmail.com>

Currently, compiling samples/bpf with LLVM warns about the uninitialized
use of variable with test_current_task_under_cgroup.

    ./samples/bpf/test_current_task_under_cgroup_user.c:57:6:
    warning: variable 'cg2' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
	    if (setup_cgroup_environment())
		^~~~~~~~~~~~~~~~~~~~~~~~~~
    ./samples/bpf/test_current_task_under_cgroup_user.c:106:8:
    note: uninitialized use occurs here
	    close(cg2);
		  ^~~
    ./samples/bpf/test_current_task_under_cgroup_user.c:57:2:
    note: remove the 'if' if its condition is always false
	    if (setup_cgroup_environment())
	    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    ./samples/bpf/test_current_task_under_cgroup_user.c:19:9:
    note: initialize the variable 'cg2' to silence this warning
	    int cg2, idx = 0, rc = 1;
		   ^
		    = 0
    1 warning generated.

This commit resolve this compiler warning by pre-initialize the variable
with error for safeguard.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/test_current_task_under_cgroup_user.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/test_current_task_under_cgroup_user.c b/samples/bpf/test_current_task_under_cgroup_user.c
index ac251a417f45..6fb25906835e 100644
--- a/samples/bpf/test_current_task_under_cgroup_user.c
+++ b/samples/bpf/test_current_task_under_cgroup_user.c
@@ -14,9 +14,9 @@
 int main(int argc, char **argv)
 {
 	pid_t remote_pid, local_pid = getpid();
+	int cg2 = -1, idx = 0, rc = 1;
 	struct bpf_link *link = NULL;
 	struct bpf_program *prog;
-	int cg2, idx = 0, rc = 1;
 	struct bpf_object *obj;
 	char filename[256];
 	int map_fd[2];
@@ -103,7 +103,9 @@ int main(int argc, char **argv)
 	rc = 0;
 
 err:
-	close(cg2);
+	if (cg2 != -1)
+		close(cg2);
+
 	cleanup_cgroup_environment();
 
 cleanup:
-- 
2.34.1


^ permalink raw reply related

* [bpf-next v3 2/3] samples/bpf: replace meaningless counter with tracex4
From: Daniel T. Lee @ 2022-12-18  0:43 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Yonghong Song
  Cc: bpf, netdev
In-Reply-To: <20221218004307.4872-1-danieltimlee@gmail.com>

Currently, compiling samples/bpf with LLVM warns about the unused but
set variable with tracex4_user.

    ./samples/bpf/tracex4_user.c:54:14:
    warning: variable 'i' set but not used [-Wunused-but-set-variable]
        int map_fd, i, j = 0;
                    ^
                    1 warning generated.

This commit resolve this compiler warning by replacing the meaningless
counter.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/tracex4_user.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/tracex4_user.c b/samples/bpf/tracex4_user.c
index 227b05a0bc88..dee8f0a091ba 100644
--- a/samples/bpf/tracex4_user.c
+++ b/samples/bpf/tracex4_user.c
@@ -51,7 +51,7 @@ int main(int ac, char **argv)
 	struct bpf_program *prog;
 	struct bpf_object *obj;
 	char filename[256];
-	int map_fd, i, j = 0;
+	int map_fd, j = 0;
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
 	obj = bpf_object__open_file(filename, NULL);
@@ -82,7 +82,7 @@ int main(int ac, char **argv)
 		j++;
 	}
 
-	for (i = 0; ; i++) {
+	while (1) {
 		print_old_objects(map_fd);
 		sleep(1);
 	}
-- 
2.34.1


^ permalink raw reply related

* [bpf-next v3 1/3] samples/bpf: remove unused function with test_lru_dist
From: Daniel T. Lee @ 2022-12-18  0:43 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Yonghong Song
  Cc: bpf, netdev
In-Reply-To: <20221218004307.4872-1-danieltimlee@gmail.com>

Currently, compiling samples/bpf with LLVM warns about the unused
function with test_lru_dist.

    ./samples/bpf/test_lru_dist.c:45:19:
    warning: unused function 'list_empty' [-Wunused-function]
    static inline int list_empty(const struct list_head *head)
                      ^
                      1 warning generated.

This commit resolve this compiler warning by removing the abandoned
function.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/test_lru_dist.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/samples/bpf/test_lru_dist.c b/samples/bpf/test_lru_dist.c
index 5efb91763d65..1c161276d57b 100644
--- a/samples/bpf/test_lru_dist.c
+++ b/samples/bpf/test_lru_dist.c
@@ -42,11 +42,6 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
 	list->prev = list;
 }
 
-static inline int list_empty(const struct list_head *head)
-{
-	return head->next == head;
-}
-
 static inline void __list_add(struct list_head *new,
 			      struct list_head *prev,
 			      struct list_head *next)
-- 
2.34.1


^ permalink raw reply related

* [bpf-next v3 0/3] samples/bpf: fix LLVM compilation warning with samples
From: Daniel T. Lee @ 2022-12-18  0:43 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Yonghong Song
  Cc: bpf, netdev

Currently, compiling samples/bpf with LLVM emits several warning. They
are only small details, but they do not appear when compiled with GCC.
Detailed compilation command and warning logs can be found from bpf CI.

Daniel T. Lee (3):
  samples/bpf: remove unused function with test_lru_dist
  samples/bpf: replace meaningless counter with tracex4
  samples/bpf: fix uninitialized warning with
    test_current_task_under_cgroup

 samples/bpf/test_current_task_under_cgroup_user.c | 6 ++++--
 samples/bpf/test_lru_dist.c                       | 5 -----
 samples/bpf/tracex4_user.c                        | 4 ++--
 3 files changed, 6 insertions(+), 9 deletions(-)

---
Changes in V2:
- Change the cover letter subject
Changes in V3:
- Fix style problem with the patch

-- 
2.34.1


^ permalink raw reply

* Re: igc: 5.10.146 Kernel BUG at 0xffffffff813ce19f
From: Kyle Sanderson @ 2022-12-18  0:20 UTC (permalink / raw)
  To: Linux-Kernal, jesse.brandeburg, anthony.l.nguyen
  Cc: Linus Torvalds, intel-wired-lan, netdev, openwrt-bugs
In-Reply-To: <CACsaVZL6ykbsVvEaV2Cv3r6m_jKt04MEUOw5=mSnR5AYTyE7qg@mail.gmail.com>

hi Intel igc maintainers,

just confirmed on the same box that 13-CURRENT on FBSD does not result
in a kernel panic when a single remote host loses power on the
network. Please let me know if there's any additional information you
need.

Kyle.

On Thu, Dec 15, 2022 at 2:28 PM Kyle Sanderson <kyle.leet@gmail.com> wrote:
>
> (Un)fortunately I can reproduce this bug by simply removing the
> ethernet cable from the box while there is traffic flowing. kprint
> below from a console line. Please CC / to me for any additional
> information I can provide for this panic.
>
> [  156.707054] igc 0000:01:00.0 eth0: NIC Link is Down
> [  156.712981] br-lan: port 1(eth0) entered disabled state
> [  156.719246] igc 0000:01:00.0 eth0: Register Dump
> [  156.724784] igc 0000:01:00.0 eth0: Register Name   Value
> [  156.731067] igc 0000:01:00.0 eth0: CTRL            181c0641
> [  156.737607] igc 0000:01:00.0 eth0: STATUS          00380681
> [  156.744133] igc 0000:01:00.0 eth0: CTRL_EXT        100000c0
> [  156.750759] igc 0000:01:00.0 eth0: MDIC            18017949
> [  156.757258] igc 0000:01:00.0 eth0: ICR             00000001
> [  156.763785] igc 0000:01:00.0 eth0: RCTL            0440803a
> [  156.770324] igc 0000:01:00.0 eth0: RDLEN[0-3]      00001000
> 00001000 00001000 00001000
> [  156.779457] igc 0000:01:00.0 eth0: RDH[0-3]        000000ef
> 000000a1 00000092 000000ba
> [  156.788500] igc 0000:01:00.0 eth0: RDT[0-3]        000000ee
> 000000a0 00000091 000000b9
> [  156.797650] igc 0000:01:00.0 eth0: RXDCTL[0-3]     02040808
> 02040808 02040808 02040808
> [  156.806688] igc 0000:01:00.0 eth0: RDBAL[0-3]      02f43000
> 02180000 02e7f000 02278000
> [  156.815781] igc 0000:01:00.0 eth0: RDBAH[0-3]      00000001
> 00000001 00000001 00000001
> [  156.824928] igc 0000:01:00.0 eth0: TCTL            a503f0fa
> [  156.831587] igc 0000:01:00.0 eth0: TDBAL[0-3]      02f43000
> 02180000 02e7f000 02278000
> [  156.840637] igc 0000:01:00.0 eth0: TDBAH[0-3]      00000001
> 00000001 00000001 00000001
> [  156.849753] igc 0000:01:00.0 eth0: TDLEN[0-3]      00001000
> 00001000 00001000 00001000
> [  156.858760] igc 0000:01:00.0 eth0: TDH[0-3]        000000d4
> 0000003d 000000af 0000002a
> [  156.867771] igc 0000:01:00.0 eth0: TDT[0-3]        000000e4
> 0000005a 000000c8 0000002a
> [  156.876864] igc 0000:01:00.0 eth0: TXDCTL[0-3]     02100108
> 02100108 02100108 02100108
> [  156.885905] igc 0000:01:00.0 eth0: Reset adapter
> [  160.307195] igc 0000:01:00.0 eth0: NIC Link is Up 1000 Mbps Full
> Duplex, Flow Control: RX/TX
> [  160.317974] br-lan: port 1(eth0) entered blocking state
> [  160.324532] br-lan: port 1(eth0) entered forwarding state
> [  161.197263] ------------[ cut here ]------------
> [  161.202669] Kernel BUG at 0xffffffff813ce19f [verbose debug info unavailable]
> [  161.210769] invalid opcode: 0000 [#1] SMP NOPTI
> [  161.216022] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.10.146 #0
> [  161.222980] Hardware name: Default string Default string/Default
> string, BIOS 5.19 09/23/2022
> [  161.232546] RIP: 0010:0xffffffff813ce19f
> [  161.237167] Code: 03 01 4c 89 48 58 e9 2f ff ff ff 85 db 41 0f 95
> c2 45 39 d9 41 0f 95 c1 45 84 ca 74 05 45 85 e4 78 0a 44 89 c2 e9 10
> ff ff ff <0f> 0b 01 d2 45 89 c1 41 29 d1 ba 00 00 00 00 44 0f 48 ca eb
> 80 cc
> [  161.258651] RSP: 0018:ffffc90000118e88 EFLAGS: 00010283
> [  161.264736] RAX: ffff888101f8f200 RBX: ffffc900006f9bd0 RCX: 000000000000050e
> [  161.272837] RDX: ffff888101fec000 RSI: 0000000000000a1c RDI: 0000000000061a10
> [  161.280942] RBP: ffffc90000118ef8 R08: 0000000000000000 R09: 0000000000061502
> [  161.289089] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffff3f
> [  161.297229] R13: ffff888101f8f140 R14: 0000000000000000 R15: ffff888100ad9b00
> [  161.305345] FS:  0000000000000000(0000) GS:ffff88903fe80000(0000)
> knlGS:00000 00000000000
> [  161.314492] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  161.321139] CR2: 00007f941ad43a9b CR3: 000000000340a000 CR4: 0000000000350ee0
> [  161.329284] Call Trace:
> [  161.332373]  <IRQ>
> [  161.334981]  ? 0xffffffffa0185f78 [igc@00000000f400031b+0x13000]
> [  161.341949]  0xffffffff8185b047
> [  161.345797]  0xffffffff8185b2ca
> [  161.349637]  0xffffffff81e000bb
> [  161.353465]  0xffffffff81c0109f
> [  161.357304]  </IRQ>
> [  161.359988]  0xffffffff8102cdac
> [  161.363783]  0xffffffff810bfdaf
> [  161.367584]  0xffffffff81a2e616
> [  161.371374]  0xffffffff81c00c9e
> [  161.375192] RIP: 0010:0xffffffff817e331b
> [  161.379840] Code: 21 90 ff 65 8b 3d 45 23 83 7e e8 80 20 90 ff 31
> ff 49 89 c6 e8 26 2d 90 ff 80 7d d7 00 0f 85 9e 01 00 00 fb 66 0f 1f
> 44 00 00 <45> 85 ff 0f 88 cf 00 00 00 49 63 cf 48 8d 04 49 48 8d 14 81
> 48 c1
> [  161.401397] RSP: 0018:ffffc900000d3e80 EFLAGS: 00000246
> [  161.407493] RAX: ffff88903fea5180 RBX: ffff88903feadf00 RCX: 000000000000001f
> [  161.415648] RDX: 0000000000000000 RSI: 0000000046ec0743 RDI: 0000000000000000
> [  161.423811] RBP: ffffc900000d3eb8 R08: 00000025881a3b81 R09: ffff888100317340
> [  161.432003] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000003
> [  161.440154] R13: ffffffff824c7bc0 R14: 00000025881a3b81 R15: 0000000000000003
> [  161.448285]  0xffffffff817e357f
> [  161.452123]  0xffffffff810e6258
> [  161.455938]  0xffffffff810e63fb
> [  161.459746]  0xffffffff8104bec0
> [  161.463526]  0xffffffff810000f5
> [  161.467290] Modules linked in: pppoe ppp_async nft_fib_inet
> nf_flow_table_ipv 6 nf_flow_table_ipv4 nf_flow_table_inet wireguard
> pppox ppp_generic nft_reject_i pv6 nft_reject_ipv4 nft_reject_inet
> nft_reject nft_redir nft_quota nft_objref nf t_numgen nft_nat nft_masq
> nft_log nft_limit nft_hash nft_flow_offload nft_fib_ip v6 nft_fib_ipv4
> nft_fib nft_ct nft_counter nft_chain_nat nf_tables nf_nat nf_flo
> w_table nf_conntrack libchacha20poly1305 curve25519_x86_64
> chacha_x86_64 slhc r8 169 poly1305_x86_64 nfnetlink nf_reject_ipv6
> nf_reject_ipv4 nf_log_ipv6 nf_log_i pv4 nf_log_common nf_defrag_ipv6
> nf_defrag_ipv4 libcurve25519_generic libcrc32c libchacha igc forcedeth
> e1000e crc_ccitt bnx2 i2c_dev ixgbe e1000 amd_xgbe ip6_u dp_tunnel
> udp_tunnel mdio nls_utf8 ena kpp nls_iso8859_1 nls_cp437 vfat fat igb
> button_hotplug tg3 ptp realtek pps_core mii
> [  161.550507] ---[ end trace b1cb18ab2d1741bd ]---
> [  161.555938] RIP: 0010:0xffffffff813ce19f
> [  161.560634] Code: 03 01 4c 89 48 58 e9 2f ff ff ff 85 db 41 0f 95
> c2 45 39 d9 41 0f 95 c1 45 84 ca 74 05 45 85 e4 78 0a 44 89 c2 e9 10
> ff ff ff <0f> 0b 01 d2 45 89 c1 41 29 d1 ba 00 00 00 00 44 0f 48 ca eb
> 80 cc
> [  161.582281] RSP: 0018:ffffc90000118e88 EFLAGS: 00010283
> [  161.588426] RAX: ffff888101f8f200 RBX: ffffc900006f9bd0 RCX: 000000000000050e
> [  161.596668] RDX: ffff888101fec000 RSI: 0000000000000a1c RDI: 0000000000061a10
> [  161.604860] RBP: ffffc90000118ef8 R08: 0000000000000000 R09: 0000000000061502
> [  161.613052] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffff3f
> [  161.621291] R13: ffff888101f8f140 R14: 0000000000000000 R15: ffff888100ad9b00
> [  161.629505] FS:  0000000000000000(0000) GS:ffff88903fe80000(0000)
> knlGS:00000 00000000000
> [  161.638781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  161.645549] CR2: 00007f941ad43a9b CR3: 000000000340a000 CR4: 0000000000350ee0
> [  161.653841] Kernel panic - not syncing: Fatal exception in interrupt
> [  161.661287] Kernel Offset: disabled
> [  161.665644] Rebooting in 3 seconds..
> [  164.670313] ACPI MEMORY or I/O RESET_REG.
>
> Kyle.

^ permalink raw reply

* Re: [bpf-next 0/3] samples/bpf: fix LLVM compilation warning with
From: Daniel T. Lee @ 2022-12-18  0:08 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Yonghong Song, bpf, netdev
In-Reply-To: <fe2bf0e5-9b5e-51fe-d6c8-55390f75313d@meta.com>

On Sun, Dec 18, 2022 at 2:48 AM Yonghong Song <yhs@meta.com> wrote:
>
>
>
> On 12/17/22 7:38 AM, Daniel T. Lee wrote:
> > Currently, compiling samples/bpf with LLVM emits several warning. They
> > are only small details, but they do not appear when compiled with GCC.
> > Detailed compilation command and warning logs can be found from bpf CI.
>
> Could you change the subject line to
>    samples/bpf: fix LLVM compilation warning
>
> >
> > Daniel T. Lee (3):
> >    samples/bpf: remove unused function with test_lru_dist
> >    samples/bpf: replace meaningless counter with tracex4
> >    samples/bpf: fix uninitialized warning with
> >      test_current_task_under_cgroup
> >
> >   samples/bpf/test_current_task_under_cgroup_user.c | 6 ++++--
> >   samples/bpf/test_lru_dist.c                       | 5 -----
> >   samples/bpf/tracex4_user.c                        | 4 ++--
> >   3 files changed, 6 insertions(+), 9 deletions(-)
> >


Thanks for pointing this out.
I will send a v2 patch.

^ permalink raw reply

* [bpf-next v1 3/3] samples/bpf: fix uninitialized warning with test_current_task_under_cgroup
From: Daniel T. Lee @ 2022-12-18  0:07 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Yonghong Song
  Cc: bpf, netdev
In-Reply-To: <20221218000753.4519-1-danieltimlee@gmail.com>

Currently, compiling samples/bpf with LLVM warns about the uninitialized
use of variable with test_current_task_under_cgroup.

    samples/bpf $ export LLVM=-16;
    samples/bpf $ make
    /tmp/bpf/samples/bpf/test_current_task_under_cgroup_user.c:57:6:
    warning: variable 'cg2' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
	    if (setup_cgroup_environment())
		^~~~~~~~~~~~~~~~~~~~~~~~~~
    /tmp/bpf/samples/bpf/test_current_task_under_cgroup_user.c:106:8:
    note: uninitialized use occurs here
	    close(cg2);
		  ^~~
    /tmp/bpf/samples/bpf/test_current_task_under_cgroup_user.c:57:2:
    note: remove the 'if' if its condition is always false
	    if (setup_cgroup_environment())
	    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    /tmp/bpf/samples/bpf/test_current_task_under_cgroup_user.c:19:9:
    note: initialize the variable 'cg2' to silence this warning
	    int cg2, idx = 0, rc = 1;
		   ^
		    = 0
    1 warning generated.

This commit resolve this compiler warning by pre-initialize the variable
with error for safeguard.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/test_current_task_under_cgroup_user.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/test_current_task_under_cgroup_user.c b/samples/bpf/test_current_task_under_cgroup_user.c
index ac251a417f45..6fb25906835e 100644
--- a/samples/bpf/test_current_task_under_cgroup_user.c
+++ b/samples/bpf/test_current_task_under_cgroup_user.c
@@ -14,9 +14,9 @@
 int main(int argc, char **argv)
 {
 	pid_t remote_pid, local_pid = getpid();
+	int cg2 = -1, idx = 0, rc = 1;
 	struct bpf_link *link = NULL;
 	struct bpf_program *prog;
-	int cg2, idx = 0, rc = 1;
 	struct bpf_object *obj;
 	char filename[256];
 	int map_fd[2];
@@ -103,7 +103,9 @@ int main(int argc, char **argv)
 	rc = 0;
 
 err:
-	close(cg2);
+	if (cg2 != -1)
+		close(cg2);
+
 	cleanup_cgroup_environment();
 
 cleanup:
-- 
2.34.1


^ permalink raw reply related

* [bpf-next v2 2/3] samples/bpf: replace meaningless counter with tracex4
From: Daniel T. Lee @ 2022-12-18  0:07 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Yonghong Song
  Cc: bpf, netdev
In-Reply-To: <20221218000753.4519-1-danieltimlee@gmail.com>

Currently, compiling samples/bpf with LLVM warns about the unused but
set variable with tracex4_user.

    ./samples/bpf/tracex4_user.c:54:14:
    warning: variable 'i' set but not used [-Wunused-but-set-variable]
        int map_fd, i, j = 0;
                    ^
                    1 warning generated.

This commit resolve this compiler warning by replacing the meaningless
counter.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/tracex4_user.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/tracex4_user.c b/samples/bpf/tracex4_user.c
index 227b05a0bc88..dee8f0a091ba 100644
--- a/samples/bpf/tracex4_user.c
+++ b/samples/bpf/tracex4_user.c
@@ -51,7 +51,7 @@ int main(int ac, char **argv)
 	struct bpf_program *prog;
 	struct bpf_object *obj;
 	char filename[256];
-	int map_fd, i, j = 0;
+	int map_fd, j = 0;
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
 	obj = bpf_object__open_file(filename, NULL);
@@ -82,7 +82,7 @@ int main(int ac, char **argv)
 		j++;
 	}
 
-	for (i = 0; ; i++) {
+	while (1) {
 		print_old_objects(map_fd);
 		sleep(1);
 	}
-- 
2.34.1


^ permalink raw reply related

* [bpf-next v2 1/3] samples/bpf: remove unused function with test_lru_dist
From: Daniel T. Lee @ 2022-12-18  0:07 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Yonghong Song
  Cc: bpf, netdev
In-Reply-To: <20221218000753.4519-1-danieltimlee@gmail.com>

Currently, compiling samples/bpf with LLVM warns about the unused
function with test_lru_dist.

    ./samples/bpf/test_lru_dist.c:45:19:
    warning: unused function 'list_empty' [-Wunused-function]
    static inline int list_empty(const struct list_head *head)
                      ^
                      1 warning generated.

This commit resolve this compiler warning by removing the abandoned
function.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
---
 samples/bpf/test_lru_dist.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/samples/bpf/test_lru_dist.c b/samples/bpf/test_lru_dist.c
index 5efb91763d65..1c161276d57b 100644
--- a/samples/bpf/test_lru_dist.c
+++ b/samples/bpf/test_lru_dist.c
@@ -42,11 +42,6 @@ static inline void INIT_LIST_HEAD(struct list_head *list)
 	list->prev = list;
 }
 
-static inline int list_empty(const struct list_head *head)
-{
-	return head->next == head;
-}
-
 static inline void __list_add(struct list_head *new,
 			      struct list_head *prev,
 			      struct list_head *next)
-- 
2.34.1


^ permalink raw reply related

* [bpf-next v2 0/3] samples/bpf: fix LLVM compilation warning with samples
From: Daniel T. Lee @ 2022-12-18  0:07 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov, Andrii Nakryiko,
	Yonghong Song
  Cc: bpf, netdev

Currently, compiling samples/bpf with LLVM emits several warning. They
are only small details, but they do not appear when compiled with GCC.
Detailed compilation command and warning logs can be found from bpf CI.

Daniel T. Lee (3):
  samples/bpf: remove unused function with test_lru_dist
  samples/bpf: replace meaningless counter with tracex4
  samples/bpf: fix uninitialized warning with
    test_current_task_under_cgroup

 samples/bpf/test_current_task_under_cgroup_user.c | 6 ++++--
 samples/bpf/test_lru_dist.c                       | 5 -----
 samples/bpf/tracex4_user.c                        | 4 ++--
 3 files changed, 6 insertions(+), 9 deletions(-)

-- 
2.34.1

Changes in V2: 
- Change the cover letter subject


^ permalink raw reply

* [Patch net] net_sched: reject TCF_EM_SIMPLE case for complex ematch module
From: Cong Wang @ 2022-12-17 22:17 UTC (permalink / raw)
  To: netdev
  Cc: Cong Wang, syzbot+4caeae4c7103813598ae, Jun Nie, Jamal Hadi Salim,
	Paolo Abeni

From: Cong Wang <cong.wang@bytedance.com>

When TCF_EM_SIMPLE was introduced, it is supposed to be convenient
for ematch implementation:

https://lore.kernel.org/all/20050105110048.GO26856@postel.suug.ch/

"You don't have to, providing a 32bit data chunk without TCF_EM_SIMPLE
set will simply result in allocating & copy. It's an optimization,
nothing more."

So if an ematch module provides ops->datalen that means it wants a
complex data structure (saved in its em->data) instead of a simple u32
value. We should simply reject such a combination, otherwise this u32
could be misinterpreted as a pointer.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-and-tested-by: syzbot+4caeae4c7103813598ae@syzkaller.appspotmail.com
Reported-by: Jun Nie <jun.nie@linaro.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 net/sched/ematch.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/sched/ematch.c b/net/sched/ematch.c
index 4ce681361851..5c1235e6076a 100644
--- a/net/sched/ematch.c
+++ b/net/sched/ematch.c
@@ -255,6 +255,8 @@ static int tcf_em_validate(struct tcf_proto *tp,
 			 * the value carried.
 			 */
 			if (em_hdr->flags & TCF_EM_SIMPLE) {
+				if (em->ops->datalen > 0)
+					goto errout;
 				if (data_len < sizeof(u32))
 					goto errout;
 				em->data = *(u32 *) data;
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH net v2] net: sched: ematch: reject invalid data
From: Cong Wang @ 2022-12-17 21:37 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Jun Nie, jhs, jiri, davem, edumazet, kuba, netdev, linux-kernel
In-Reply-To: <f8af2b70e3c2074de04b2117100b2cdc5ec4ec6d.camel@redhat.com>

On Thu, Dec 15, 2022 at 01:50:43PM +0100, Paolo Abeni wrote:
> On Wed, 2022-12-14 at 10:20 +0800, Jun Nie wrote:
> > ---
> >  net/sched/em_cmp.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/sched/em_cmp.c b/net/sched/em_cmp.c
> > index f17b049ea530..0284394be53f 100644
> > --- a/net/sched/em_cmp.c
> > +++ b/net/sched/em_cmp.c
> > @@ -22,9 +22,14 @@ static int em_cmp_match(struct sk_buff *skb, struct tcf_ematch *em,
> >  			struct tcf_pkt_info *info)
> >  {
> >  	struct tcf_em_cmp *cmp = (struct tcf_em_cmp *) em->data;
> > -	unsigned char *ptr = tcf_get_base_ptr(skb, cmp->layer) + cmp->off;
> > +	unsigned char *ptr;
> >  	u32 val = 0;
> >  
> > +	if (!cmp)
> > +		return 0;
> 
> It feels like this is papering over the real issue. Why em->data is
> NULL here? why other ematches are not afflicted by this issue? 
> 
> is em->data really NULL or some small value instead? KASAN seams to
> tell it's a small value, not 0, so this patch should not avoid the
> oops. Have you tested it vs the reproducer?

Right. I think I have found the root cause, let me test my patch to see
if it makes syzbot happy.

Thanks.

^ permalink raw reply

* Re: [PATCH net] mctp: serial: Fix starting value for frame check sequence
From: Alexander Duyck @ 2022-12-17 20:59 UTC (permalink / raw)
  To: Jeremy Kerr
  Cc: netdev, Matt Johnston, David S. Miller, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, Harsh Tyagi
In-Reply-To: <6d07e45e6237f24ec32a723e747dd070fb53bea7.camel@codeconstruct.com.au>

On Fri, Dec 16, 2022 at 10:44 PM Jeremy Kerr <jk@codeconstruct.com.au> wrote:
>
>
> Hi Alexander.
>
> > Since the starting value isn't unique would it possibly be worthwhile
> > to look at adding a define to include/linux/crc-ccitt.h to be used to
> > handle the cases where the initial value is 0xffff? I notice there
> > seems to only be two starting values 0 and 0xffff for all callers so
> > it might make sense to centralize it in one place.
>
> Yep, that would make sense if they're commonly used values, but I'm not
> sure that would be suitable to include that in this fix, as it would
> just add disruption to any backport work.

Sorry, I didn't intend that for this patch. I was suggesting it as
possible follow-on work.

Thanks,

- Alex

^ permalink raw reply

* Re: [PATCH] net/ncsi: Always use unicast source MAC address
From: Alexander Duyck @ 2022-12-17 20:57 UTC (permalink / raw)
  To: Peter Delevoryas; +Cc: sam, davem, edumazet, kuba, pabeni, netdev, linux-kernel
In-Reply-To: <09CDE7FD-2C7D-4A0B-B085-E877472FA997@pjd.dev>

On Fri, Dec 16, 2022 at 8:20 PM Peter Delevoryas <peter@pjd.dev> wrote:
>
>
>
> > On Dec 16, 2022, at 10:29 AM, Alexander Duyck <alexander.duyck@gmail.com> wrote:
> >
> > On Thu, Dec 15, 2022 at 5:08 PM Peter Delevoryas <peter@pjd.dev> wrote:
> >>
> >>
> >>
> >>> On Dec 13, 2022, at 8:41 AM, Alexander H Duyck <alexander.duyck@gmail.com> wrote:
> >>>
> >>> On Mon, 2022-12-12 at 16:47 -0800, Peter Delevoryas wrote:

<...>

> >
> >>> My main
> >>> concern would be that the dev_addr is not initialized for those first
> >>> few messages so you may be leaking information.
> >>>
> >>>> This might have the effect of causing the NIC to learn 2 MAC addresses from
> >>>> an NC-SI link if the BMC uses OEM Get MAC Address commands to change its
> >>>> initial MAC address, but it shouldn't really matter. Who knows if NIC's
> >>>> even have MAC learning enabled from the out-of-band BMC link, lol.
> >>>>
> >>>> [1]: https://tinyurl.com/4933mhaj
> >>>> [2]: https://tinyurl.com/mr3tyadb
> >>>
> >>> The thing is the OpenBMC approach initializes the value themselves to
> >>> broadcast[3]. As a result the two code bases are essentially doing the
> >>> same thing since mac_addr is defaulted to the broadcast address when
> >>> the ncsi interface is registered.
> >>
> >> That’s a very good point, thanks for pointing that out, I hadn’t
> >> even noticed that!
> >>
> >> Anyways, let me know what you think of the traces I added above.
> >> Sorry for the delay, I’ve just been busy with some other stuff,
> >> but I do really actually care about upstreaming this (and several
> >> other NC-SI changes I’ll submit after this one, which are unrelated
> >> but more useful).
> >>
> >> Thanks,
> >> Peter
> >
> > So the NC-SI spec says any value can be used for the source MAC and
> > that broadcast "may" be used. I would say there are some debugging
> > advantages to using broadcast that will be obvious in a packet trace.
>
> Ehhhhh yeah I guess, but the ethertype is what I filter for. But sure,
> a broadcast source MAC is pretty unique too.
>
> > I wonder if we couldn't look at doing something like requiring
> > broadcast or LAA if the gma_flag isn't set.
>
> What is LAA? I’m out of the loop

Locally administered MAC address[4]. Basically it is a MAC address
that is generated locally such as your random MAC address. Assuming
the other end of the NC-SI link is using a MAC address with a vendor
OUI there should be no risk of collisions on a point-to-point link.
Essentially if you wanted to you could probably just generate a random
MAC address for the NCSI protocol and then use that in place of the
broadcast address.

> But also: aren’t we already using broadcast if the gma_flag isn’t set?
>
> -       if (nca->ndp->gma_flag == 1)
> -               memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);
> -       else
> -               eth_broadcast_addr(eh->h_source);
> +       memcpy(eh->h_source, nca->ndp->ndev.dev->dev_addr, ETH_ALEN);

That I am not sure about. You were using this kernel without your
patch right? With your patch it would make sense to see that behavior,
but without I am not sure why you would see that address for any NC-SI
commands before the gma_flag is set.

>
> > With that we could at
> > least advertise that we don't expect this packet to be going out in a
> > real network as we cannot guarantee the MAC is unique.
>
> Yeah, but it probably wouldn’t help my simulation scenario.
>
> I guess it sounds like this patch is not a good idea, which to be fair,
> is totally reasonable.
>
> I can just add some iptables rules to tunnel these packets with a different
> source MAC, or fix the multicast socket issue I was having. It’s really
> not a big deal, and like you’re saying, we probably don’t want to make
> it harder to maintain _forever_.

Like I said before I would be good with either a Broadcast address OR
a LAA address. The one thing we need to watch out for though is any
sort of leak. One possible concern would be if for example you had 4
ports using 4 different MAC addresses but one BMC. You don't want to
accidently leak the MAC address from one port onto the other one. With
a LAA address if it were to leak and screw up ARP tables somewhere it
wouldn't be a big deal since it isn't expected to be switched in the
first place.

> I would just suggest praying for the next guy that tries to test NC-SI
> stuff with QEMU and finds out NC-SI traffic gets dropped by bridges.
> I had to resort to reading the source code and printing stuff with
> BPF to identify this. Maybe it’s more obvious to other people this wouldn’t
> work though.

Well it seems like NC-SI isn't meant to be bridged based on the fact
that it is using a broadcast MAC address as a source. If nothing else
I suppose you could try to work with the standards committee on that
to see what can be done to make the protocol more portable.. :-)

[4]: https://macaddress.io/faq/what-are-a-universal-address-and-a-local-administered-address

^ permalink raw reply

* [syzbot] WARNING in drv_link_info_changed
From: syzbot @ 2022-12-17 20:23 UTC (permalink / raw)
  To: davem, edumazet, johannes, kuba, linux-kernel, linux-wireless,
	netdev, pabeni, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    e2ca6ba6ba01 Merge tag 'mm-stable-2022-12-13' of git://git..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16d4f027880000
kernel config:  https://syzkaller.appspot.com/x/.config?x=a6133b41a9a0f500
dashboard link: https://syzkaller.appspot.com/bug?extid=224ad65c927c83902f06
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/be256841c209/disk-e2ca6ba6.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/76c90a4cdade/vmlinux-e2ca6ba6.xz
kernel image: https://storage.googleapis.com/syzbot-assets/a44766da5515/bzImage-e2ca6ba6.xz

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+224ad65c927c83902f06@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 0 PID: 5553 at net/mac80211/driver-ops.c:416 drv_link_info_changed+0xd2/0x780 net/mac80211/driver-ops.c:416
Modules linked in:
CPU: 1 PID: 5553 Comm: kworker/u4:23 Not tainted 6.1.0-syzkaller-09941-ge2ca6ba6ba01 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Workqueue: phy12 ieee80211_roc_work
RIP: 0010:drv_link_info_changed+0xd2/0x780 net/mac80211/driver-ops.c:416
Code: 83 f8 01 0f 84 f6 00 00 00 e8 ea a6 4f f8 83 eb 07 31 ff 83 e3 fb 89 de e8 8b a3 4f f8 85 db 0f 84 da 00 00 00 e8 ce a6 4f f8 <0f> 0b e9 c5 02 00 00 e8 c2 a6 4f f8 4d 8d bc 24 50 1a 00 00 48 b8
RSP: 0018:ffffc90004befb90 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 00000000fffffffb RCX: 0000000000000000
RDX: ffff888027601d40 RSI: ffffffff893103a2 RDI: 0000000000000005
RBP: ffff88807e7e8de0 R08: 0000000000000005 R09: 0000000000000000
R10: 00000000fffffffb R11: 0000000000000000 R12: ffff88803f4c0c80
R13: 0000000000000200 R14: 0000000000000000 R15: ffff88803f4c26d0
FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f696ff85058 CR3: 000000002991d000 CR4: 0000000000350ee0
Call Trace:
 <TASK>
 ieee80211_link_info_change_notify+0x17a/0x270 net/mac80211/main.c:290
 ieee80211_offchannel_stop_vifs+0x308/0x4e0 net/mac80211/offchannel.c:121
 _ieee80211_start_next_roc+0x6f7/0x9a0 net/mac80211/offchannel.c:365
 __ieee80211_roc_work+0x190/0x3d0 net/mac80211/offchannel.c:432
 ieee80211_roc_work+0x2f/0x40 net/mac80211/offchannel.c:460
 process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
 worker_thread+0x669/0x1090 kernel/workqueue.c:2436
 kthread+0x2e8/0x3a0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply

* [RFC PATCH v1 2/2] vsock_test: mutual hungup reproducer
From: Arseniy Krasnov @ 2022-12-17 19:47 UTC (permalink / raw)
  To: Stefano Garzarella, Stefan Hajnoczi, edumazet@google.com,
	David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	kernel, Krasnov Arseniy, Arseniy Krasnov
In-Reply-To: <39b2e9fd-601b-189d-39a9-914e5574524c@sberdevices.ru>

This is not for merge, just demo.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 tools/testing/vsock/vsock_test.c | 78 ++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index bb6d691cb30d..320ecf4db74b 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -699,7 +699,85 @@ static void test_stream_poll_rcvlowat_client(const struct test_opts *opts)
 	close(fd);
 }
 
+static void test_stall_client(const struct test_opts *opts)
+{
+	int fd;
+	unsigned long lowat_val = 128*1024;
+	size_t data_size = 64 * 1024;
+	void *data;
+
+
+	fd = vsock_stream_connect(opts->peer_cid, 1234);
+	assert(fd != -1);
+
+	assert(!setsockopt(fd, SOL_SOCKET, SO_RCVLOWAT,
+		       &lowat_val, sizeof(lowat_val)));
+
+	data = malloc(data_size);
+	assert(data);
+
+	/* Wait for tx to send data. */
+	sleep(3);
+
+	while (1) {
+		struct pollfd fds = {0};
+
+		fds.fd = fd;
+		fds.events = POLLIN | POLLRDNORM | POLLERR | POLLRDHUP | POLLHUP;
+
+		/* Try to wait for 1 sec. */
+		printf("[RX] ENTER POLL\n");
+		assert(poll(&fds, 1, -1) >= 0);
+		printf("[RX] LEAVE POLL\n");
+
+		if (fds.revents & (POLLIN | POLLRDNORM)) {
+			read(fd, data, data_size);
+		}
+
+		if (fds.revents & POLLERR) {
+			printf("[RX] POLL ERR\n");
+			break;
+		}
+
+		if (fds.revents & (POLLRDHUP | POLLHUP)) {
+			printf("[RX] POLL DONE\n");
+			break;
+		}
+	}
+
+	close(fd);
+	exit(0);
+}
+
+static void test_stall_server(const struct test_opts *opts)
+{
+	size_t data_size = (256 * 1024) + 1;
+	void *data;
+	int fd;
+
+	fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL);
+	assert(fd != -1);
+
+	data = malloc(data_size);
+	assert(data);
+
+	printf("[TX] ENTER WRITE\n");
+	assert(write(fd, data, data_size) == data_size);
+
+	/* Never get here without kernel patch. */
+	printf("[TX] LEAVE WRITE\n");
+
+	close(fd);
+	exit(0);
+}
+
+
 static struct test_case test_cases[] = {
+	{
+		.name = "Test stall",
+		.run_client = test_stall_client,
+		.run_server = test_stall_server,
+	},
 	{
 		.name = "SOCK_STREAM connection reset",
 		.run_client = test_stream_connection_reset,
-- 
2.25.1

^ permalink raw reply related

* [RFC PATCH v1 1/2] virtio/vsock: send credit update depending on SO_RCVLOWAT
From: Arseniy Krasnov @ 2022-12-17 19:45 UTC (permalink / raw)
  To: Stefano Garzarella, Stefan Hajnoczi, edumazet@google.com,
	David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	kernel, Krasnov Arseniy, Arseniy Krasnov
In-Reply-To: <39b2e9fd-601b-189d-39a9-914e5574524c@sberdevices.ru>

This adds extra condition to send credit update message during data
read to userspace. Problem arises, when sender waits for the free space
on the receiver while receiver waits in 'poll()' until 'rx_bytes' reaches
SO_RCVLOWAT value of the socket. With this patch, receiver sends credit
update message when number of bytes in it's rx queue is too small to
avoid sleeping in 'poll()'.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
---
 net/vmw_vsock/virtio_transport_common.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index a1581c77cf84..4cf26cf8a754 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -362,6 +362,7 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	struct virtio_vsock_sock *vvs = vsk->trans;
 	size_t bytes, total = 0;
 	struct sk_buff *skb;
+	bool low_rx_bytes;
 	int err = -EFAULT;
 	u32 free_space;
 
@@ -396,6 +397,8 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	}
 
 	free_space = vvs->buf_alloc - (vvs->fwd_cnt - vvs->last_fwd_cnt);
+	low_rx_bytes = (vvs->rx_bytes <
+			sock_rcvlowat(sk_vsock(vsk), 0, INT_MAX));
 
 	spin_unlock_bh(&vvs->rx_lock);
 
@@ -405,9 +408,11 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk,
 	 * too high causes extra messages. Too low causes transmitter
 	 * stalls. As stalls are in theory more expensive than extra
 	 * messages, we set the limit to a high value. TODO: experiment
-	 * with different values.
+	 * with different values. Also send credit update message when
+	 * number of bytes in rx queue is not enough to wake up reader.
 	 */
-	if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE)
+	if (free_space < VIRTIO_VSOCK_MAX_PKT_BUF_SIZE ||
+	    low_rx_bytes)
 		virtio_transport_send_credit_update(vsk);
 
 	return total;
-- 
2.25.1

^ permalink raw reply related

* [RFC PATCH v1 0/2] virtio/vsock: fix mutual rx/tx hungup
From: Arseniy Krasnov @ 2022-12-17 19:42 UTC (permalink / raw)
  To: Stefano Garzarella, Stefan Hajnoczi, edumazet@google.com,
	David S. Miller, Jakub Kicinski, Paolo Abeni
  Cc: linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	virtualization@lists.linux-foundation.org, kvm@vger.kernel.org,
	kernel, Krasnov Arseniy, Arseniy Krasnov

Hello,

seems I found strange thing(may be a bug) where sender('tx' later) and
receiver('rx' later) could stuck forever. Potential fix is in the first
patch, second patch contains reproducer, based on vsock test suite.
Reproducer is simple: tx just sends data to rx by 'write() syscall, rx
dequeues it using 'read()' syscall and uses 'poll()' for waiting. I run
server in host and client in guest.

rx side params:
1) SO_VM_SOCKETS_BUFFER_SIZE is 256Kb(e.g. default).
2) SO_RCVLOWAT is 128Kb.

What happens in the reproducer step by step:

1) tx tries to send 256Kb + 1 byte (in a single 'write()')
2) tx sends 256Kb, data reaches rx (rx_bytes == 256Kb)
3) tx waits for space in 'write()' to send last 1 byte
4) rx does poll(), (rx_bytes >= rcvlowat) 256Kb >= 128Kb, POLLIN is set
5) rx reads 64Kb, credit update is not sent due to *
6) rx does poll(), (rx_bytes >= rcvlowat) 192Kb >= 128Kb, POLLIN is set
7) rx reads 64Kb, credit update is not sent due to *
8) rx does poll(), (rx_bytes >= rcvlowat) 128Kb >= 128Kb, POLLIN is set
9) rx reads 64Kb, credit update is not sent due to *
10) rx does poll(), (rx_bytes < rcvlowat) 64Kb < 128Kb, rx waits in poll()

* is optimization in 'virtio_transport_stream_do_dequeue()' which
  sends OP_CREDIT_UPDATE only when we have not too much space -
  less than VIRTIO_VSOCK_MAX_PKT_BUF_SIZE.

Now tx side waits for space inside write() and rx waits in poll() for
'rx_bytes' to reach SO_RCVLOWAT value. Both sides will wait forever. I
think, possible fix is to send credit update not only when we have too
small space, but also when number of bytes in receive queue is smaller
than SO_RCVLOWAT thus not enough to wake up sleeping reader. I'm not
sure about correctness of this idea, but anyway - I think that problem
above exists. What do You think?

Patchset was rebased and tested on skbuff v7 patch from Bobby Eshleman:
https://lore.kernel.org/netdev/20221213192843.421032-1-bobby.eshleman@bytedance.com/

Arseniy Krasnov(2):
 virtio/vsock: send credit update depending on SO_RCVLOWAT
 vsock_test: mutual hungup reproducer

 net/vmw_vsock/virtio_transport_common.c |  9 +++-
 tools/testing/vsock/vsock_test.c        | 78 +++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 2 deletions(-)
-- 
2.25.1

^ permalink raw reply

* [PATCH] qed: allow sleep in qed_mcp_trace_dump()
From: Caleb Sander @ 2022-12-17 17:56 UTC (permalink / raw)
  To: Ariel Elior, Manish Chopra, netdev; +Cc: Joern Engel, Caleb Sander

By default, qed_mcp_cmd_and_union() waits for 10us at a time
in a loop that can run 500K times, so calls to qed_mcp_nvm_rd_cmd()
may block the current thread for over 5s.
We observed thread scheduling delays of over 700ms in production,
with stacktraces pointing to this code as the culprit.

qed_mcp_trace_dump() is called from ethtool, so sleeping is permitted.
It already can sleep in qed_mcp_halt(), which calls qed_mcp_cmd().
Add a "can sleep" parameter to qed_find_nvram_image() and
qed_nvram_read() so they can sleep during qed_mcp_trace_dump().
qed_mcp_trace_get_meta_info() and qed_mcp_trace_read_meta(),
called only by qed_mcp_trace_dump(), allow these functions to sleep.
It's not clear to me that the other caller (qed_grc_dump_mcp_hw_dump())
can sleep, so it keeps b_can_sleep set to false.

Signed-off-by: Caleb Sander <csander@purestorage.com>
---
 drivers/net/ethernet/qlogic/qed/qed_debug.c | 28 +++++++++++++++------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_debug.c b/drivers/net/ethernet/qlogic/qed/qed_debug.c
index 86ecb080b153..cdcead614e9f 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_debug.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_debug.c
@@ -1830,11 +1830,12 @@ static void qed_grc_clear_all_prty(struct qed_hwfn *p_hwfn,
 /* Finds the meta data image in NVRAM */
 static enum dbg_status qed_find_nvram_image(struct qed_hwfn *p_hwfn,
 					    struct qed_ptt *p_ptt,
 					    u32 image_type,
 					    u32 *nvram_offset_bytes,
-					    u32 *nvram_size_bytes)
+					    u32 *nvram_size_bytes,
+					    bool b_can_sleep)
 {
 	u32 ret_mcp_resp, ret_mcp_param, ret_txn_size;
 	struct mcp_file_att file_att;
 	int nvm_result;
 
@@ -1844,11 +1845,12 @@ static enum dbg_status qed_find_nvram_image(struct qed_hwfn *p_hwfn,
 					DRV_MSG_CODE_NVM_GET_FILE_ATT,
 					image_type,
 					&ret_mcp_resp,
 					&ret_mcp_param,
 					&ret_txn_size,
-					(u32 *)&file_att, false);
+					(u32 *)&file_att,
+					b_can_sleep);
 
 	/* Check response */
 	if (nvm_result || (ret_mcp_resp & FW_MSG_CODE_MASK) !=
 	    FW_MSG_CODE_NVM_OK)
 		return DBG_STATUS_NVRAM_GET_IMAGE_FAILED;
@@ -1871,11 +1873,13 @@ static enum dbg_status qed_find_nvram_image(struct qed_hwfn *p_hwfn,
 
 /* Reads data from NVRAM */
 static enum dbg_status qed_nvram_read(struct qed_hwfn *p_hwfn,
 				      struct qed_ptt *p_ptt,
 				      u32 nvram_offset_bytes,
-				      u32 nvram_size_bytes, u32 *ret_buf)
+				      u32 nvram_size_bytes,
+				      u32 *ret_buf,
+				      bool b_can_sleep)
 {
 	u32 ret_mcp_resp, ret_mcp_param, ret_read_size, bytes_to_copy;
 	s32 bytes_left = nvram_size_bytes;
 	u32 read_offset = 0, param = 0;
 
@@ -1897,11 +1901,11 @@ static enum dbg_status qed_nvram_read(struct qed_hwfn *p_hwfn,
 		if (qed_mcp_nvm_rd_cmd(p_hwfn, p_ptt,
 				       DRV_MSG_CODE_NVM_READ_NVRAM, param,
 				       &ret_mcp_resp,
 				       &ret_mcp_param, &ret_read_size,
 				       (u32 *)((u8 *)ret_buf + read_offset),
-				       false))
+				       b_can_sleep))
 			return DBG_STATUS_NVRAM_READ_FAILED;
 
 		/* Check response */
 		if ((ret_mcp_resp & FW_MSG_CODE_MASK) != FW_MSG_CODE_NVM_OK)
 			return DBG_STATUS_NVRAM_READ_FAILED;
@@ -3378,11 +3382,12 @@ static u32 qed_grc_dump_mcp_hw_dump(struct qed_hwfn *p_hwfn,
 	/* Read HW dump image from NVRAM */
 	status = qed_find_nvram_image(p_hwfn,
 				      p_ptt,
 				      NVM_TYPE_HW_DUMP_OUT,
 				      &hw_dump_offset_bytes,
-				      &hw_dump_size_bytes);
+				      &hw_dump_size_bytes,
+				      false);
 	if (status != DBG_STATUS_OK)
 		return 0;
 
 	hw_dump_size_dwords = BYTES_TO_DWORDS(hw_dump_size_bytes);
 
@@ -3395,11 +3400,13 @@ static u32 qed_grc_dump_mcp_hw_dump(struct qed_hwfn *p_hwfn,
 	/* Read MCP HW dump image into dump buffer */
 	if (dump && hw_dump_size_dwords) {
 		status = qed_nvram_read(p_hwfn,
 					p_ptt,
 					hw_dump_offset_bytes,
-					hw_dump_size_bytes, dump_buf + offset);
+					hw_dump_size_bytes,
+					dump_buf + offset,
+					false);
 		if (status != DBG_STATUS_OK) {
 			DP_NOTICE(p_hwfn,
 				  "Failed to read MCP HW Dump image from NVRAM\n");
 			return 0;
 		}
@@ -4121,11 +4128,13 @@ static enum dbg_status qed_mcp_trace_get_meta_info(struct qed_hwfn *p_hwfn,
 	    (*running_bundle_id ==
 	     DIR_ID_1) ? NVM_TYPE_MFW_TRACE1 : NVM_TYPE_MFW_TRACE2;
 	return qed_find_nvram_image(p_hwfn,
 				    p_ptt,
 				    nvram_image_type,
-				    trace_meta_offset, trace_meta_size);
+				    trace_meta_offset,
+				    trace_meta_size,
+				    true);
 }
 
 /* Reads the MCP Trace meta data from NVRAM into the specified buffer */
 static enum dbg_status qed_mcp_trace_read_meta(struct qed_hwfn *p_hwfn,
 					       struct qed_ptt *p_ptt,
@@ -4137,11 +4146,14 @@ static enum dbg_status qed_mcp_trace_read_meta(struct qed_hwfn *p_hwfn,
 	u32 signature;
 
 	/* Read meta data from NVRAM */
 	status = qed_nvram_read(p_hwfn,
 				p_ptt,
-				nvram_offset_in_bytes, size_in_bytes, buf);
+				nvram_offset_in_bytes,
+				size_in_bytes,
+				buf,
+				true);
 	if (status != DBG_STATUS_OK)
 		return status;
 
 	/* Extract and check first signature */
 	signature = qed_read_unaligned_dword(byte_buf);
-- 
2.25.1


^ permalink raw reply related

* Re: [bpf-next 3/3] samples/bpf: fix uninitialized warning with test_current_task_under_cgroup
From: Yonghong Song @ 2022-12-17 17:49 UTC (permalink / raw)
  To: Daniel T. Lee, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko, Yonghong Song
  Cc: bpf, netdev
In-Reply-To: <20221217153821.2285-4-danieltimlee@gmail.com>



On 12/17/22 7:38 AM, Daniel T. Lee wrote:
> Currently, compiling samples/bpf with LLVM warns about the uninitialized
> use of variable with test_current_task_under_cgroup.
> 
>      ./samples/bpf/test_current_task_under_cgroup_user.c:57:6:
>      warning: variable 'cg2' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
> 	    if (setup_cgroup_environment())
> 		^~~~~~~~~~~~~~~~~~~~~~~~~~
>      ./samples/bpf/test_current_task_under_cgroup_user.c:106:8:
>      note: uninitialized use occurs here
> 	    close(cg2);
> 		  ^~~
>      ./samples/bpf/test_current_task_under_cgroup_user.c:57:2:
>      note: remove the 'if' if its condition is always false
> 	    if (setup_cgroup_environment())
> 	    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>      ./samples/bpf/test_current_task_under_cgroup_user.c:19:9:
>      note: initialize the variable 'cg2' to silence this warning
> 	    int cg2, idx = 0, rc = 1;
> 		   ^
> 		    = 0
>      1 warning generated.
> 
> This commit resolve this compiler warning by pre-initialize the variable
> with error for safeguard.
> 
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply

* Re: [bpf-next 2/3] samples/bpf: replace meaningless counter with tracex4
From: Yonghong Song @ 2022-12-17 17:49 UTC (permalink / raw)
  To: Daniel T. Lee, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko, Yonghong Song
  Cc: bpf, netdev
In-Reply-To: <20221217153821.2285-3-danieltimlee@gmail.com>



On 12/17/22 7:38 AM, Daniel T. Lee wrote:
> Currently, compiling samples/bpf with LLVM warns about the unused but
> set variable with tracex4_user.
> 
>      ./samples/bpf/tracex4_user.c:54:14:
>      warning: variable 'i' set but not used [-Wunused-but-set-variable]
>          int map_fd, i, j = 0;
>                      ^
>                      1 warning generated.
> 
> This commit resolve this compiler warning by replacing the meaningless
> counter.
> 
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply

* Re: [bpf-next 1/3] samples/bpf: remove unused function with test_lru_dist
From: Yonghong Song @ 2022-12-17 17:48 UTC (permalink / raw)
  To: Daniel T. Lee, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko, Yonghong Song
  Cc: bpf, netdev
In-Reply-To: <20221217153821.2285-2-danieltimlee@gmail.com>



On 12/17/22 7:38 AM, Daniel T. Lee wrote:
> Currently, compiling samples/bpf with LLVM warns about the unused
> function with test_lru_dist.
> 
>      ./samples/bpf/test_lru_dist.c:45:19:
>      warning: unused function 'list_empty' [-Wunused-function]
>      static inline int list_empty(const struct list_head *head)
>                        ^
>                        1 warning generated.
> 
> This commit resolve this compiler warning by removing the abandoned
> function.
> 
> Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply

* Re: [bpf-next 0/3] samples/bpf: fix LLVM compilation warning with
From: Yonghong Song @ 2022-12-17 17:48 UTC (permalink / raw)
  To: Daniel T. Lee, Daniel Borkmann, Alexei Starovoitov,
	Andrii Nakryiko, Yonghong Song
  Cc: bpf, netdev
In-Reply-To: <20221217153821.2285-1-danieltimlee@gmail.com>



On 12/17/22 7:38 AM, Daniel T. Lee wrote:
> Currently, compiling samples/bpf with LLVM emits several warning. They
> are only small details, but they do not appear when compiled with GCC.
> Detailed compilation command and warning logs can be found from bpf CI.

Could you change the subject line to
   samples/bpf: fix LLVM compilation warning

> 
> Daniel T. Lee (3):
>    samples/bpf: remove unused function with test_lru_dist
>    samples/bpf: replace meaningless counter with tracex4
>    samples/bpf: fix uninitialized warning with
>      test_current_task_under_cgroup
> 
>   samples/bpf/test_current_task_under_cgroup_user.c | 6 ++++--
>   samples/bpf/test_lru_dist.c                       | 5 -----
>   samples/bpf/tracex4_user.c                        | 4 ++--
>   3 files changed, 6 insertions(+), 9 deletions(-)
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox