[PATCH net v3 0/2] bugfixes for skmsg

linux-kselftest.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net v3 0/2] bugfixes for skmsg
@ 2024-06-28  5:47 Geliang Tang
  2024-06-28  5:47 ` [PATCH net v3 1/2] skmsg: prevent empty ingress skb from enqueuing Geliang Tang
  2024-06-28  5:47 ` [PATCH net v3 2/2] skmsg: bugfix for sk_msg sge iteration Geliang Tang
  0 siblings, 2 replies; 11+ messages in thread
From: Geliang Tang @ 2024-06-28  5:47 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann
  Cc: Geliang Tang, David Ahern, Eduard Zingerman, Mykola Lysenko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Mykyta Yatsenko, Miao Xu, Yuran Pereira, Huacai Chen, Tiezhu Yang,
	netdev, bpf, linux-kselftest

From: Geliang Tang <tanggeliang@kylinos.cn>

v3:
 - modifications that better address the root causes.
 - only contains the first two patches for -net.

v2:
 - add patch 2, a new fix for sk_msg_memcopy_from_iter.
 - update patch 3, only test "sk->sk_prot->close" as Eric suggested.
 - update patch 4, use "goto err" instead of "return" as Eduard
   suggested.
 - add "fixes" tag for patch 1-3.
 - change subject prefixes as "bpf-next" to trigger BPF CI.
 - cc Loongarch maintainers too.

BPF selftests seem to have not been fully tested on Loongarch. When I
ran these tests on Loongarch recently, some errors occur. This patch set
contains two bugfixes for skmsg.

Geliang Tang (2):
  skmsg: prevent empty ingress skb from enqueuing
  skmsg: bugfix for sk_msg sge iteration

 net/core/skmsg.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH net v3 1/2] skmsg: prevent empty ingress skb from enqueuing
  2024-06-28  5:47 [PATCH net v3 0/2] bugfixes for skmsg Geliang Tang
@ 2024-06-28  5:47 ` Geliang Tang
  2024-07-01  9:08   ` D. Wythe
                     ` (2 more replies)
  2024-06-28  5:47 ` [PATCH net v3 2/2] skmsg: bugfix for sk_msg sge iteration Geliang Tang
  1 sibling, 3 replies; 11+ messages in thread
From: Geliang Tang @ 2024-06-28  5:47 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann
  Cc: Geliang Tang, David Ahern, Eduard Zingerman, Mykola Lysenko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Mykyta Yatsenko, Miao Xu, Yuran Pereira, Huacai Chen, Tiezhu Yang,
	netdev, bpf, linux-kselftest

From: Geliang Tang <tanggeliang@kylinos.cn>

Run this BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
platform, a Kernel panic occurs:

'''
Oops[#1]:
CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE  6.10.0-rc2+ #18
Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
   ... ...
   ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
  ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
 CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
 PRMD: 0000000c (PPLV0 +PIE +PWE)
 EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
 ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
 BADV: 0000000000000040
 PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
Stack : ...
        ...
Call Trace:
[<9000000004162774>] copy_page_to_iter+0x74/0x1c0
[<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
[<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
[<90000000049aae34>] inet_recvmsg+0x54/0x100
[<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
[<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
[<900000000481e27c>] sys_recvfrom+0x1c/0x40
[<9000000004c076ec>] do_syscall+0x8c/0xc0
[<9000000003731da4>] handle_syscall+0xc4/0x160

Code: ...

---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Fatal exception
Kernel relocated by 0x3510000
 .text @ 0x9000000003710000
 .data @ 0x9000000004d70000
 .bss  @ 0x9000000006469400
---[ end Kernel panic - not syncing: Fatal exception ]---
'''

This crash happens every time when running sockmap_skb_verdict_shutdown
subtest in sockmap_basic.

This crash is because a NULL pointer is passed to page_address() in
sk_msg_recvmsg(). Due to the difference in architecture, page_address(0)
will not trigger a panic on the X86 platform but will panic on the
Loogarch platform. So this bug was hidden on the x86 platform, but now
it is exposed on the Loogarch platform.

The root cause is an empty skb (skb->len == 0) is put on the queue.

In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
page is put to this sge (see sg_set_page in sg_set_page), but this empty
sge is queued into ingress_msg list.

And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is
got by sg_page(sge). Pass this NULL-page to copy_page_to_iter(), it
passed to kmap_local_page() and page_address(), then kernel panics.

To solve this, we should prevent empty skb from putting on the queue. So
in sk_psock_verdict_recv(), if the skb->len is zero, drop this skb.

Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/core/skmsg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index fd20aae30be2..44952cdd1425 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -1184,7 +1184,7 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb)
 
 	rcu_read_lock();
 	psock = sk_psock(sk);
-	if (unlikely(!psock)) {
+	if (unlikely(!psock || !len)) {
 		len = 0;
 		tcp_eat_skb(sk, skb);
 		sock_drop(sk, skb);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH net v3 2/2] skmsg: bugfix for sk_msg sge iteration
  2024-06-28  5:47 [PATCH net v3 0/2] bugfixes for skmsg Geliang Tang
  2024-06-28  5:47 ` [PATCH net v3 1/2] skmsg: prevent empty ingress skb from enqueuing Geliang Tang
@ 2024-06-28  5:47 ` Geliang Tang
  2024-07-01  9:00   ` D. Wythe
  1 sibling, 1 reply; 11+ messages in thread
From: Geliang Tang @ 2024-06-28  5:47 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann
  Cc: Geliang Tang, David Ahern, Eduard Zingerman, Mykola Lysenko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Mykyta Yatsenko, Miao Xu, Yuran Pereira, Huacai Chen, Tiezhu Yang,
	netdev, bpf, linux-kselftest

From: Geliang Tang <tanggeliang@kylinos.cn>

Every time run this BPF selftests (./test_sockmap) on a Loongarch platform,
a Kernel panic occurs:

'''
 Oops[#1]:
 CPU: 20 PID: 23245 Comm: test_sockmap Tainted: G     OE 6.10.0-rc2+ #32
 Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
 ... ...
    ra: 90000000043a315c tcp_bpf_sendmsg+0x23c/0x420
   ERA: 900000000426cd1c sk_msg_memcopy_from_iter+0xbc/0x220
  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
  PRMD: 0000000c (PPLV0 +PIE +PWE)
  EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
 ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
  BADV: 0000000000000040
  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
 Modules linked in: tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
 Process test_sockmap (pid: 23245, threadinfo=00000000aeb68043, task=...)
 Stack : ... ...
         ...
 Call Trace:
 [<900000000426cd1c>] sk_msg_memcopy_from_iter+0xbc/0x220
 [<90000000043a315c>] tcp_bpf_sendmsg+0x23c/0x420
 [<90000000041cafc8>] __sock_sendmsg+0x68/0xe0
 [<90000000041cc4bc>] ____sys_sendmsg+0x2bc/0x360
 [<90000000041cea18>] ___sys_sendmsg+0xb8/0x120
 [<90000000041cf1f8>] __sys_sendmsg+0x98/0x100
 [<90000000045b76ec>] do_syscall+0x8c/0xc0
 [<90000000030e1da4>] handle_syscall+0xc4/0x160

 Code: ...

 ---[ end trace 0000000000000000 ]---
'''

This crash is because a NULL pointer is passed to page_address() in
sk_msg_memcopy_from_iter(). Due to the difference in architecture,
page_address(0) will not trigger a panic on the X86 platform but will panic
on the Loogarch platform. So this bug was hidden on the x86 platform, but
now it is exposed on the Loogarch platform.

This bug is a logic error indeed. In sk_msg_memcopy_from_iter(), an invalid
"sge" is always used:

	if (msg->sg.copybreak >= sge->length) {
		msg->sg.copybreak = 0;
		sk_msg_iter_var_next(i);
		if (i == msg->sg.end)
			break;
		sge = sk_msg_elem(msg, i);
	}

If the value of i is 2, msg->sg.end is also 2 when entering this if block.
sk_msg_iter_var_next() increases i by 1, and now i is 3, which is no longer
equal to msg->sg.end. The break will not be triggered, and the next sge
obtained by sk_msg_elem(3) will be an invalid one.

The correct approach is to check (i == msg->sg.end) first, and then invoke
sk_msg_iter_var_next() if they are not equal.

Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/core/skmsg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 44952cdd1425..1906d0d0eeac 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -378,9 +378,9 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from,
 		/* This is possible if a trim operation shrunk the buffer */
 		if (msg->sg.copybreak >= sge->length) {
 			msg->sg.copybreak = 0;
-			sk_msg_iter_var_next(i);
 			if (i == msg->sg.end)
 				break;
+			sk_msg_iter_var_next(i);
 			sge = sk_msg_elem(msg, i);
 		}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net v3 2/2] skmsg: bugfix for sk_msg sge iteration
  2024-06-28  5:47 ` [PATCH net v3 2/2] skmsg: bugfix for sk_msg sge iteration Geliang Tang
@ 2024-07-01  9:00   ` D. Wythe
  2024-07-01 10:29     ` Geliang Tang
  0 siblings, 1 reply; 11+ messages in thread
From: D. Wythe @ 2024-07-01  9:00 UTC (permalink / raw)
  To: Geliang Tang, John Fastabend, Jakub Sitnicki, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann
  Cc: Geliang Tang, David Ahern, Eduard Zingerman, Mykola Lysenko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Mykyta Yatsenko, Miao Xu, Yuran Pereira, Huacai Chen, Tiezhu Yang,
	netdev, bpf, linux-kselftest



On 6/28/24 1:47 PM, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
>
> Every time run this BPF selftests (./test_sockmap) on a Loongarch platform,
> a Kernel panic occurs:
>
> '''
>   Oops[#1]:
>   CPU: 20 PID: 23245 Comm: test_sockmap Tainted: G     OE 6.10.0-rc2+ #32
>   Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
>   ... ...
>      ra: 90000000043a315c tcp_bpf_sendmsg+0x23c/0x420
>     ERA: 900000000426cd1c sk_msg_memcopy_from_iter+0xbc/0x220
>    CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
>    PRMD: 0000000c (PPLV0 +PIE +PWE)
>    EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
>    ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
>   ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
>    BADV: 0000000000000040
>    PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
>   Modules linked in: tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
>   Process test_sockmap (pid: 23245, threadinfo=00000000aeb68043, task=...)
>   Stack : ... ...
>           ...
>   Call Trace:
>   [<900000000426cd1c>] sk_msg_memcopy_from_iter+0xbc/0x220
>   [<90000000043a315c>] tcp_bpf_sendmsg+0x23c/0x420
>   [<90000000041cafc8>] __sock_sendmsg+0x68/0xe0
>   [<90000000041cc4bc>] ____sys_sendmsg+0x2bc/0x360
>   [<90000000041cea18>] ___sys_sendmsg+0xb8/0x120
>   [<90000000041cf1f8>] __sys_sendmsg+0x98/0x100
>   [<90000000045b76ec>] do_syscall+0x8c/0xc0
>   [<90000000030e1da4>] handle_syscall+0xc4/0x160
>
>   Code: ...
>
>   ---[ end trace 0000000000000000 ]---
> '''
>
> This crash is because a NULL pointer is passed to page_address() in
> sk_msg_memcopy_from_iter(). Due to the difference in architecture,
> page_address(0) will not trigger a panic on the X86 platform but will panic
> on the Loogarch platform. So this bug was hidden on the x86 platform, but
> now it is exposed on the Loogarch platform.
>
> This bug is a logic error indeed. In sk_msg_memcopy_from_iter(), an invalid
> "sge" is always used:
>
> 	if (msg->sg.copybreak >= sge->length) {
> 		msg->sg.copybreak = 0;
> 		sk_msg_iter_var_next(i);
> 		if (i == msg->sg.end)
> 			break;
> 		sge = sk_msg_elem(msg, i);
> 	}
>
> If the value of i is 2, msg->sg.end is also 2 when entering this if block.
> sk_msg_iter_var_next() increases i by 1, and now i is 3, which is no longer
> equal to msg->sg.end. The break will not be triggered, and the next sge
> obtained by sk_msg_elem(3) will be an invalid one.
>
> The correct approach is to check (i == msg->sg.end) first, and then invoke
> sk_msg_iter_var_next() if they are not equal.
>
> Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
>   net/core/skmsg.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 44952cdd1425..1906d0d0eeac 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -378,9 +378,9 @@ int sk_msg_memcopy_from_iter(struct sock *sk, struct iov_iter *from,
>   		/* This is possible if a trim operation shrunk the buffer */
>   		if (msg->sg.copybreak >= sge->length) {
>   			msg->sg.copybreak = 0;
> -			sk_msg_iter_var_next(i);
>   			if (i == msg->sg.end)
>   				break;
> +			sk_msg_iter_var_next(i);
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>
>   			sge = sk_msg_elem(msg, i);
>   		}
>   



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v3 1/2] skmsg: prevent empty ingress skb from enqueuing
  2024-06-28  5:47 ` [PATCH net v3 1/2] skmsg: prevent empty ingress skb from enqueuing Geliang Tang
@ 2024-07-01  9:08   ` D. Wythe
  2024-07-02  7:02   ` Geliang Tang
  2024-07-03  1:03   ` John Fastabend
  2 siblings, 0 replies; 11+ messages in thread
From: D. Wythe @ 2024-07-01  9:08 UTC (permalink / raw)
  To: Geliang Tang, John Fastabend, Jakub Sitnicki, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann
  Cc: Geliang Tang, David Ahern, Eduard Zingerman, Mykola Lysenko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Mykyta Yatsenko, Miao Xu, Yuran Pereira, Huacai Chen, Tiezhu Yang,
	netdev, bpf, linux-kselftest



On 6/28/24 1:47 PM, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
>
> Run this BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
> platform, a Kernel panic occurs:
>
> '''
> Oops[#1]:
> CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE  6.10.0-rc2+ #18
> Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
>     ... ...
>     ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
>    ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
>   CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
>   PRMD: 0000000c (PPLV0 +PIE +PWE)
>   EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
>   ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
>   BADV: 0000000000000040
>   PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
> Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
> Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
> Stack : ...
>          ...
> Call Trace:
> [<9000000004162774>] copy_page_to_iter+0x74/0x1c0
> [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
> [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
> [<90000000049aae34>] inet_recvmsg+0x54/0x100
> [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
> [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
> [<900000000481e27c>] sys_recvfrom+0x1c/0x40
> [<9000000004c076ec>] do_syscall+0x8c/0xc0
> [<9000000003731da4>] handle_syscall+0xc4/0x160
>
> Code: ...
>
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Fatal exception
> Kernel relocated by 0x3510000
>   .text @ 0x9000000003710000
>   .data @ 0x9000000004d70000
>   .bss  @ 0x9000000006469400
> ---[ end Kernel panic - not syncing: Fatal exception ]---
> '''
>
> This crash happens every time when running sockmap_skb_verdict_shutdown
> subtest in sockmap_basic.
>
> This crash is because a NULL pointer is passed to page_address() in
> sk_msg_recvmsg(). Due to the difference in architecture, page_address(0)
> will not trigger a panic on the X86 platform but will panic on the
> Loogarch platform. So this bug was hidden on the x86 platform, but now
> it is exposed on the Loogarch platform.

Maybe Loongarch ?  Besides that, LGTM.
Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>

>
> The root cause is an empty skb (skb->len == 0) is put on the queue.
>
> In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
> page is put to this sge (see sg_set_page in sg_set_page), but this empty
> sge is queued into ingress_msg list.
>
> And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is
> got by sg_page(sge). Pass this NULL-page to copy_page_to_iter(), it
> passed to kmap_local_page() and page_address(), then kernel panics.
>
> To solve this, we should prevent empty skb from putting on the queue. So
> in sk_psock_verdict_recv(), if the skb->len is zero, drop this skb.
>
> Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program")
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
>   net/core/skmsg.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index fd20aae30be2..44952cdd1425 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1184,7 +1184,7 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb)
>   
>   	rcu_read_lock();
>   	psock = sk_psock(sk);
> -	if (unlikely(!psock)) {
> +	if (unlikely(!psock || !len)) {
>   		len = 0;
>   		tcp_eat_skb(sk, skb);
>   		sock_drop(sk, skb);


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v3 2/2] skmsg: bugfix for sk_msg sge iteration
  2024-07-01  9:00   ` D. Wythe
@ 2024-07-01 10:29     ` Geliang Tang
  2024-07-16  6:53       ` Geliang Tang
  0 siblings, 1 reply; 11+ messages in thread
From: Geliang Tang @ 2024-07-01 10:29 UTC (permalink / raw)
  To: D. Wythe, John Fastabend, Jakub Sitnicki, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann
  Cc: Geliang Tang, David Ahern, Eduard Zingerman, Mykola Lysenko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Mykyta Yatsenko, Miao Xu, Yuran Pereira, Huacai Chen, Tiezhu Yang,
	netdev, bpf, linux-kselftest

Hello,

On Mon, 2024-07-01 at 17:00 +0800, D. Wythe wrote:
> 
> 
> On 6/28/24 1:47 PM, Geliang Tang wrote:
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> > 
> > Every time run this BPF selftests (./test_sockmap) on a Loongarch
> > platform,
> > a Kernel panic occurs:
> > 
> > '''
> >   Oops[#1]:
> >   CPU: 20 PID: 23245 Comm: test_sockmap Tainted: G     OE 6.10.0-
> > rc2+ #32
> >   Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS
> > Loongson-UDK2018
> >   ... ...
> >      ra: 90000000043a315c tcp_bpf_sendmsg+0x23c/0x420
> >     ERA: 900000000426cd1c sk_msg_memcopy_from_iter+0xbc/0x220
> >    CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
> >    PRMD: 0000000c (PPLV0 +PIE +PWE)
> >    EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
> >    ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> >   ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
> >    BADV: 0000000000000040
> >    PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
> >   Modules linked in: tls xt_CHECKSUM xt_MASQUERADE xt_conntrack
> > ipt_REJECT
> >   Process test_sockmap (pid: 23245, threadinfo=00000000aeb68043,
> > task=...)
> >   Stack : ... ...
> >           ...
> >   Call Trace:
> >   [<900000000426cd1c>] sk_msg_memcopy_from_iter+0xbc/0x220
> >   [<90000000043a315c>] tcp_bpf_sendmsg+0x23c/0x420
> >   [<90000000041cafc8>] __sock_sendmsg+0x68/0xe0
> >   [<90000000041cc4bc>] ____sys_sendmsg+0x2bc/0x360
> >   [<90000000041cea18>] ___sys_sendmsg+0xb8/0x120
> >   [<90000000041cf1f8>] __sys_sendmsg+0x98/0x100
> >   [<90000000045b76ec>] do_syscall+0x8c/0xc0
> >   [<90000000030e1da4>] handle_syscall+0xc4/0x160
> > 
> >   Code: ...
> > 
> >   ---[ end trace 0000000000000000 ]---
> > '''
> > 
> > This crash is because a NULL pointer is passed to page_address() in
> > sk_msg_memcopy_from_iter(). Due to the difference in architecture,
> > page_address(0) will not trigger a panic on the X86 platform but
> > will panic
> > on the Loogarch platform. So this bug was hidden on the x86
> > platform, but
> > now it is exposed on the Loogarch platform.
> > 
> > This bug is a logic error indeed. In sk_msg_memcopy_from_iter(), an
> > invalid
> > "sge" is always used:
> > 
> > 	if (msg->sg.copybreak >= sge->length) {
> > 		msg->sg.copybreak = 0;
> > 		sk_msg_iter_var_next(i);
> > 		if (i == msg->sg.end)
> > 			break;
> > 		sge = sk_msg_elem(msg, i);
> > 	}
> > 
> > If the value of i is 2, msg->sg.end is also 2 when entering this if
> > block.
> > sk_msg_iter_var_next() increases i by 1, and now i is 3, which is
> > no longer
> > equal to msg->sg.end. The break will not be triggered, and the next
> > sge
> > obtained by sk_msg_elem(3) will be an invalid one.
> > 
> > The correct approach is to check (i == msg->sg.end) first, and then
> > invoke
> > sk_msg_iter_var_next() if they are not equal.
> > 
> > Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg
> > interface")
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> >   net/core/skmsg.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index 44952cdd1425..1906d0d0eeac 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -378,9 +378,9 @@ int sk_msg_memcopy_from_iter(struct sock *sk,
> > struct iov_iter *from,
> >   		/* This is possible if a trim operation shrunk the
> > buffer */
> >   		if (msg->sg.copybreak >= sge->length) {
> >   			msg->sg.copybreak = 0;
> > -			sk_msg_iter_var_next(i);
> >   			if (i == msg->sg.end)
> >   				break;
> > +			sk_msg_iter_var_next(i);
> Reviewed-by: D. Wythe <alibuda@linux.alibaba.com>

Thanks for your review.

But this change breaks test_sockmap. Will send a v4 to fix this.

Changes Requested.

-Geliang

> >   			sge = sk_msg_elem(msg, i);
> >   		}
> >   
> 
> 
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v3 1/2] skmsg: prevent empty ingress skb from enqueuing
  2024-06-28  5:47 ` [PATCH net v3 1/2] skmsg: prevent empty ingress skb from enqueuing Geliang Tang
  2024-07-01  9:08   ` D. Wythe
@ 2024-07-02  7:02   ` Geliang Tang
  2024-07-03  1:03   ` John Fastabend
  2 siblings, 0 replies; 11+ messages in thread
From: Geliang Tang @ 2024-07-02  7:02 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann
  Cc: D. Wythe, Geliang Tang, David Ahern, Eduard Zingerman,
	Mykola Lysenko, Martin KaFai Lau, Song Liu, Yonghong Song,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Mykyta Yatsenko, Miao Xu, Yuran Pereira, Huacai Chen, Tiezhu Yang,
	netdev, bpf, linux-kselftest

Superseded.

I just sent a v4 ("skmsg: skip empty sge in sk_msg_recvmsg").

Thanks,
-Geliang

On Fri, 2024-06-28 at 13:47 +0800, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
> 
> Run this BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
> platform, a Kernel panic occurs:
> 
> '''
> Oops[#1]:
> CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE  6.10.0-
> rc2+ #18
> Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-
> UDK2018
>    ... ...
>    ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
>   ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
>  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
>  PRMD: 0000000c (PPLV0 +PIE +PWE)
>  EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
>  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
>  BADV: 0000000000000040
>  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
> Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE
> xt_conntrack
> Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
> Stack : ...
>         ...
> Call Trace:
> [<9000000004162774>] copy_page_to_iter+0x74/0x1c0
> [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
> [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
> [<90000000049aae34>] inet_recvmsg+0x54/0x100
> [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
> [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
> [<900000000481e27c>] sys_recvfrom+0x1c/0x40
> [<9000000004c076ec>] do_syscall+0x8c/0xc0
> [<9000000003731da4>] handle_syscall+0xc4/0x160
> 
> Code: ...
> 
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Fatal exception
> Kernel relocated by 0x3510000
>  .text @ 0x9000000003710000
>  .data @ 0x9000000004d70000
>  .bss  @ 0x9000000006469400
> ---[ end Kernel panic - not syncing: Fatal exception ]---
> '''
> 
> This crash happens every time when running
> sockmap_skb_verdict_shutdown
> subtest in sockmap_basic.
> 
> This crash is because a NULL pointer is passed to page_address() in
> sk_msg_recvmsg(). Due to the difference in architecture,
> page_address(0)
> will not trigger a panic on the X86 platform but will panic on the
> Loogarch platform. So this bug was hidden on the x86 platform, but
> now
> it is exposed on the Loogarch platform.
> 
> The root cause is an empty skb (skb->len == 0) is put on the queue.
> 
> In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and
> no
> page is put to this sge (see sg_set_page in sg_set_page), but this
> empty
> sge is queued into ingress_msg list.
> 
> And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is
> got by sg_page(sge). Pass this NULL-page to copy_page_to_iter(), it
> passed to kmap_local_page() and page_address(), then kernel panics.
> 
> To solve this, we should prevent empty skb from putting on the queue.
> So
> in sk_psock_verdict_recv(), if the skb->len is zero, drop this skb.
> 
> Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser
> program")
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
>  net/core/skmsg.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index fd20aae30be2..44952cdd1425 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1184,7 +1184,7 @@ static int sk_psock_verdict_recv(struct sock
> *sk, struct sk_buff *skb)
>  
>  	rcu_read_lock();
>  	psock = sk_psock(sk);
> -	if (unlikely(!psock)) {
> +	if (unlikely(!psock || !len)) {
>  		len = 0;
>  		tcp_eat_skb(sk, skb);
>  		sock_drop(sk, skb);


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH net v3 1/2] skmsg: prevent empty ingress skb from enqueuing
  2024-06-28  5:47 ` [PATCH net v3 1/2] skmsg: prevent empty ingress skb from enqueuing Geliang Tang
  2024-07-01  9:08   ` D. Wythe
  2024-07-02  7:02   ` Geliang Tang
@ 2024-07-03  1:03   ` John Fastabend
  2024-07-03  1:54     ` Geliang Tang
  2 siblings, 1 reply; 11+ messages in thread
From: John Fastabend @ 2024-07-03  1:03 UTC (permalink / raw)
  To: Geliang Tang, John Fastabend, Jakub Sitnicki, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann
  Cc: Geliang Tang, David Ahern, Eduard Zingerman, Mykola Lysenko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Mykyta Yatsenko, Miao Xu, Yuran Pereira, Huacai Chen, Tiezhu Yang,
	netdev, bpf, linux-kselftest

Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
> 
> Run this BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
> platform, a Kernel panic occurs:
> 
> '''
> Oops[#1]:
> CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE  6.10.0-rc2+ #18
> Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
>    ... ...
>    ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
>   ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
>  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
>  PRMD: 0000000c (PPLV0 +PIE +PWE)
>  EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
>  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
>  BADV: 0000000000000040
>  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
> Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
> Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
> Stack : ...
>         ...
> Call Trace:
> [<9000000004162774>] copy_page_to_iter+0x74/0x1c0
> [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
> [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
> [<90000000049aae34>] inet_recvmsg+0x54/0x100
> [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
> [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
> [<900000000481e27c>] sys_recvfrom+0x1c/0x40
> [<9000000004c076ec>] do_syscall+0x8c/0xc0
> [<9000000003731da4>] handle_syscall+0xc4/0x160
> 
> Code: ...
> 
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Fatal exception
> Kernel relocated by 0x3510000
>  .text @ 0x9000000003710000
>  .data @ 0x9000000004d70000
>  .bss  @ 0x9000000006469400
> ---[ end Kernel panic - not syncing: Fatal exception ]---
> '''
> 
> This crash happens every time when running sockmap_skb_verdict_shutdown
> subtest in sockmap_basic.
> 
> This crash is because a NULL pointer is passed to page_address() in
> sk_msg_recvmsg(). Due to the difference in architecture, page_address(0)
> will not trigger a panic on the X86 platform but will panic on the
> Loogarch platform. So this bug was hidden on the x86 platform, but now
> it is exposed on the Loogarch platform.
> 
> The root cause is an empty skb (skb->len == 0) is put on the queue.
> 
> In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
> page is put to this sge (see sg_set_page in sg_set_page), but this empty
> sge is queued into ingress_msg list.
> 
> And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is
> got by sg_page(sge). Pass this NULL-page to copy_page_to_iter(), it
> passed to kmap_local_page() and page_address(), then kernel panics.
> 
> To solve this, we should prevent empty skb from putting on the queue. So
> in sk_psock_verdict_recv(), if the skb->len is zero, drop this skb.
> 
> Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser program")
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
>  net/core/skmsg.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index fd20aae30be2..44952cdd1425 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -1184,7 +1184,7 @@ static int sk_psock_verdict_recv(struct sock *sk, struct sk_buff *skb)
>  
>  	rcu_read_lock();
>  	psock = sk_psock(sk);
> -	if (unlikely(!psock)) {
> +	if (unlikely(!psock || !len)) {
>  		len = 0;
>  		tcp_eat_skb(sk, skb);
>  		sock_drop(sk, skb);

The skb->len == 0 here is the FIN pkt right? We are using the EFAULT return
triggered by copy_page_to_iter to check for is_fin in tcp_bpf.c.

The concern I have here is if we don't have the skb fin pkt on the recv
queue we might go into wait_data and block instead of return to user when
rcvmsg() is called from user. I wonder if we can write a test for this if
we don't already have one we probably should create one.

Maybe a better fix assuming my assumption about fin being skb->len=0 is
correct?

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index fd20aae30be2..bbf40b999713 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -434,7 +434,8 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
                        page = sg_page(sge);
                        if (copied + copy > len)
                                copy = len - copied;
-                       copy = copy_page_to_iter(page, sge->offset, copy, iter);
+                       if (copy)
+                               copy = copy_page_to_iter(page, sge->offset, copy, iter);
                        if (!copy) {
                                copied = copied ? copied : -EFAULT;
                                goto out;

Thanks,
John

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH net v3 1/2] skmsg: prevent empty ingress skb from enqueuing
  2024-07-03  1:03   ` John Fastabend
@ 2024-07-03  1:54     ` Geliang Tang
  2024-07-05  4:25       ` Geliang Tang
  0 siblings, 1 reply; 11+ messages in thread
From: Geliang Tang @ 2024-07-03  1:54 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann
  Cc: Geliang Tang, David Ahern, Eduard Zingerman, Mykola Lysenko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Mykyta Yatsenko, Miao Xu, Yuran Pereira, Huacai Chen, Tiezhu Yang,
	netdev, bpf, linux-kselftest

On Tue, 2024-07-02 at 18:03 -0700, John Fastabend wrote:
> Geliang Tang wrote:
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> > 
> > Run this BPF selftests (./test_progs -t sockmap_basic) on a
> > Loongarch
> > platform, a Kernel panic occurs:
> > 
> > '''
> > Oops[#1]:
> > CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE  6.10.0-
> > rc2+ #18
> > Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-
> > UDK2018
> >    ... ...
> >    ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
> >   ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
> >  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
> >  PRMD: 0000000c (PPLV0 +PIE +PWE)
> >  EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
> >  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> > ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
> >  BADV: 0000000000000040
> >  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
> > Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE
> > xt_conntrack
> > Process test_progs (pid: 2824, threadinfo=0000000000863a31,
> > task=...)
> > Stack : ...
> >         ...
> > Call Trace:
> > [<9000000004162774>] copy_page_to_iter+0x74/0x1c0
> > [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
> > [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
> > [<90000000049aae34>] inet_recvmsg+0x54/0x100
> > [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
> > [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
> > [<900000000481e27c>] sys_recvfrom+0x1c/0x40
> > [<9000000004c076ec>] do_syscall+0x8c/0xc0
> > [<9000000003731da4>] handle_syscall+0xc4/0x160
> > 
> > Code: ...
> > 
> > ---[ end trace 0000000000000000 ]---
> > Kernel panic - not syncing: Fatal exception
> > Kernel relocated by 0x3510000
> >  .text @ 0x9000000003710000
> >  .data @ 0x9000000004d70000
> >  .bss  @ 0x9000000006469400
> > ---[ end Kernel panic - not syncing: Fatal exception ]---
> > '''
> > 
> > This crash happens every time when running
> > sockmap_skb_verdict_shutdown
> > subtest in sockmap_basic.
> > 
> > This crash is because a NULL pointer is passed to page_address() in
> > sk_msg_recvmsg(). Due to the difference in architecture,
> > page_address(0)
> > will not trigger a panic on the X86 platform but will panic on the
> > Loogarch platform. So this bug was hidden on the x86 platform, but
> > now
> > it is exposed on the Loogarch platform.
> > 
> > The root cause is an empty skb (skb->len == 0) is put on the queue.
> > 
> > In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero,
> > and no
> > page is put to this sge (see sg_set_page in sg_set_page), but this
> > empty
> > sge is queued into ingress_msg list.
> > 
> > And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is
> > got by sg_page(sge). Pass this NULL-page to copy_page_to_iter(), it
> > passed to kmap_local_page() and page_address(), then kernel panics.
> > 
> > To solve this, we should prevent empty skb from putting on the
> > queue. So
> > in sk_psock_verdict_recv(), if the skb->len is zero, drop this skb.
> > 
> > Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser
> > program")
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> >  net/core/skmsg.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index fd20aae30be2..44952cdd1425 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -1184,7 +1184,7 @@ static int sk_psock_verdict_recv(struct sock
> > *sk, struct sk_buff *skb)
> >  
> >  	rcu_read_lock();
> >  	psock = sk_psock(sk);
> > -	if (unlikely(!psock)) {
> > +	if (unlikely(!psock || !len)) {
> >  		len = 0;
> >  		tcp_eat_skb(sk, skb);
> >  		sock_drop(sk, skb);
> 
> The skb->len == 0 here is the FIN pkt right? We are using the EFAULT
> return
> triggered by copy_page_to_iter to check for is_fin in tcp_bpf.c.
> 
> The concern I have here is if we don't have the skb fin pkt on the
> recv
> queue we might go into wait_data and block instead of return to user
> when
> rcvmsg() is called from user. I wonder if we can write a test for
> this if
> we don't already have one we probably should create one.
> 
> Maybe a better fix assuming my assumption about fin being skb->len=0
> is
> correct?

Thanks John. Your fix is much better than mine. I'll use this as v5 and
update the commit log. I'll add your "Suggested-by" tag in it.

-Geliang

> 
> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index fd20aae30be2..bbf40b999713 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -434,7 +434,8 @@ int sk_msg_recvmsg(struct sock *sk, struct
> sk_psock *psock, struct msghdr *msg,
>                         page = sg_page(sge);
>                         if (copied + copy > len)
>                                 copy = len - copied;
> -                       copy = copy_page_to_iter(page, sge->offset,
> copy, iter);
> +                       if (copy)
> +                               copy = copy_page_to_iter(page, sge-
> >offset, copy, iter);
>                         if (!copy) {
>                                 copied = copied ? copied : -EFAULT;
>                                 goto out;
> 
> Thanks,
> John


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v3 1/2] skmsg: prevent empty ingress skb from enqueuing
  2024-07-03  1:54     ` Geliang Tang
@ 2024-07-05  4:25       ` Geliang Tang
  0 siblings, 0 replies; 11+ messages in thread
From: Geliang Tang @ 2024-07-05  4:25 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann
  Cc: Geliang Tang, David Ahern, Eduard Zingerman, Mykola Lysenko,
	Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh,
	Stanislav Fomichev, Hao Luo, Jiri Olsa, Shuah Khan,
	Mykyta Yatsenko, Miao Xu, Yuran Pereira, Huacai Chen, Tiezhu Yang,
	netdev, bpf, linux-kselftest

Hi John,

On Wed, 2024-07-03 at 09:54 +0800, Geliang Tang wrote:
> On Tue, 2024-07-02 at 18:03 -0700, John Fastabend wrote:
> > Geliang Tang wrote:
> > > From: Geliang Tang <tanggeliang@kylinos.cn>
> > > 
> > > Run this BPF selftests (./test_progs -t sockmap_basic) on a
> > > Loongarch
> > > platform, a Kernel panic occurs:
> > > 
> > > '''
> > > Oops[#1]:
> > > CPU: 22 PID: 2824 Comm: test_progs Tainted: G           OE 
> > > 6.10.0-
> > > rc2+ #18
> > > Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS
> > > Loongson-
> > > UDK2018
> > >    ... ...
> > >    ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
> > >   ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
> > >  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
> > >  PRMD: 0000000c (PPLV0 +PIE +PWE)
> > >  EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
> > >  ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
> > > ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
> > >  BADV: 0000000000000040
> > >  PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
> > > Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE
> > > xt_conntrack
> > > Process test_progs (pid: 2824, threadinfo=0000000000863a31,
> > > task=...)
> > > Stack : ...
> > >         ...
> > > Call Trace:
> > > [<9000000004162774>] copy_page_to_iter+0x74/0x1c0
> > > [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
> > > [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
> > > [<90000000049aae34>] inet_recvmsg+0x54/0x100
> > > [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
> > > [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
> > > [<900000000481e27c>] sys_recvfrom+0x1c/0x40
> > > [<9000000004c076ec>] do_syscall+0x8c/0xc0
> > > [<9000000003731da4>] handle_syscall+0xc4/0x160
> > > 
> > > Code: ...
> > > 
> > > ---[ end trace 0000000000000000 ]---
> > > Kernel panic - not syncing: Fatal exception
> > > Kernel relocated by 0x3510000
> > >  .text @ 0x9000000003710000
> > >  .data @ 0x9000000004d70000
> > >  .bss  @ 0x9000000006469400
> > > ---[ end Kernel panic - not syncing: Fatal exception ]---
> > > '''
> > > 
> > > This crash happens every time when running
> > > sockmap_skb_verdict_shutdown
> > > subtest in sockmap_basic.
> > > 
> > > This crash is because a NULL pointer is passed to page_address()
> > > in
> > > sk_msg_recvmsg(). Due to the difference in architecture,
> > > page_address(0)
> > > will not trigger a panic on the X86 platform but will panic on
> > > the
> > > Loogarch platform. So this bug was hidden on the x86 platform,
> > > but
> > > now
> > > it is exposed on the Loogarch platform.
> > > 
> > > The root cause is an empty skb (skb->len == 0) is put on the
> > > queue.
> > > 
> > > In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero,
> > > and no
> > > page is put to this sge (see sg_set_page in sg_set_page), but
> > > this
> > > empty
> > > sge is queued into ingress_msg list.
> > > 
> > > And in sk_msg_recvmsg(), this empty sge is used, and a NULL page
> > > is
> > > got by sg_page(sge). Pass this NULL-page to copy_page_to_iter(),
> > > it
> > > passed to kmap_local_page() and page_address(), then kernel
> > > panics.
> > > 
> > > To solve this, we should prevent empty skb from putting on the
> > > queue. So
> > > in sk_psock_verdict_recv(), if the skb->len is zero, drop this
> > > skb.
> > > 
> > > Fixes: ef5659280eb1 ("bpf, sockmap: Allow skipping sk_skb parser
> > > program")
> > > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > > ---
> > >  net/core/skmsg.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > > index fd20aae30be2..44952cdd1425 100644
> > > --- a/net/core/skmsg.c
> > > +++ b/net/core/skmsg.c
> > > @@ -1184,7 +1184,7 @@ static int sk_psock_verdict_recv(struct
> > > sock
> > > *sk, struct sk_buff *skb)
> > >  
> > >  	rcu_read_lock();
> > >  	psock = sk_psock(sk);
> > > -	if (unlikely(!psock)) {
> > > +	if (unlikely(!psock || !len)) {
> > >  		len = 0;
> > >  		tcp_eat_skb(sk, skb);
> > >  		sock_drop(sk, skb);
> > 
> > The skb->len == 0 here is the FIN pkt right? We are using the
> > EFAULT
> > return
> > triggered by copy_page_to_iter to check for is_fin in tcp_bpf.c.

I added some logs for debugging and found that this FIN packet do hit
is_fin check in tcp_bpf.c.

> > 
> > The concern I have here is if we don't have the skb fin pkt on the
> > recv
> > queue we might go into wait_data and block instead of return to
> > user
> > when
> > rcvmsg() is called from user. I wonder if we can write a test for
> > this if
> > we don't already have one we probably should create one.

In test_sockmap_skb_verdict_shutdown(), the FIN packet is sent by

	shutdown(p1, SHUT_WR);

and received by

	n = recv(c1, &b, 1, SOCK_NONBLOCK);
	ASSERT_EQ(n, 0, "recv_timeout(fin)");

I think this test has covered the FIN packet scenario already. No need
to add a new one. WDYT?

> > 
> > Maybe a better fix assuming my assumption about fin being skb-
> > >len=0
> > is
> > correct?
> 
> Thanks John. Your fix is much better than mine. I'll use this as v5
> and
> update the commit log. I'll add your "Suggested-by" tag in it.

Anyway, this v5 (skmsg: skip zero length skb in sk_msg_recvmsg) seems
ready to be merged.

Thanks,
-Geliang

> 
> -Geliang
> 
> > 
> > diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> > index fd20aae30be2..bbf40b999713 100644
> > --- a/net/core/skmsg.c
> > +++ b/net/core/skmsg.c
> > @@ -434,7 +434,8 @@ int sk_msg_recvmsg(struct sock *sk, struct
> > sk_psock *psock, struct msghdr *msg,
> >                         page = sg_page(sge);
> >                         if (copied + copy > len)
> >                                 copy = len - copied;
> > -                       copy = copy_page_to_iter(page, sge->offset,
> > copy, iter);
> > +                       if (copy)
> > +                               copy = copy_page_to_iter(page, sge-
> > > offset, copy, iter);
> >                         if (!copy) {
> >                                 copied = copied ? copied : -EFAULT;
> >                                 goto out;
> > 
> > Thanks,
> > John
> 
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH net v3 2/2] skmsg: bugfix for sk_msg sge iteration
  2024-07-01 10:29     ` Geliang Tang
@ 2024-07-16  6:53       ` Geliang Tang
  0 siblings, 0 replies; 11+ messages in thread
From: Geliang Tang @ 2024-07-16  6:53 UTC (permalink / raw)
  To: John Fastabend, Jakub Sitnicki
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, D. Wythe, Geliang Tang,
	David Ahern, Eduard Zingerman, Mykola Lysenko, Martin KaFai Lau,
	Song Liu, Yonghong Song, KP Singh, Stanislav Fomichev, Hao Luo,
	Jiri Olsa, Shuah Khan, Mykyta Yatsenko, Miao Xu, Yuran Pereira,
	Huacai Chen, Tiezhu Yang, netdev, bpf, linux-kselftest

Hi John,

On Mon, 2024-07-01 at 18:29 +0800, Geliang Tang wrote:
> > > > > > > > Hello,
> > > > > > > > 
> > > > > > > > On Mon, 2024-07-01 at 17:00 +0800, D. Wythe wrote:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > On 6/28/24 1:47 PM, Geliang Tang wrote:
> > > > > > > > > > > > > > > > > > > > > > > > From: Geliang Tang
> > > > > > > > > > > > > > > > > > > > > > > > <tanggeliang@kylinos.cn
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > Every time run this BPF
> > > > > > > > > > > > > > > > > > > > > > > > selftests
> > > > > > > > > > > > > > > > > > > > > > > > (./test_sockmap) on a
> > > > > > > > > > > > > > > > > > > > > > > > Loongarch
> > > > > > > > > > > > > > > > > > > > > > > > platform,

This issue seems related to "txmsg_cork".

Every subtest of test_sockmap with a txmsg_cork value greater than 1
will trigger this issue. For example, this test_txmsg_cork_hangs test:

static void test_txmsg_cork_hangs(int cgrp, struct sockmap_options
*opt)
{
        txmsg_pass = 1; 
        txmsg_redir = 0; 
        txmsg_cork = 4097;
        txmsg_apply = 4097;
        test_send_large(opt, cgrp);

        txmsg_pass = 0; 
        txmsg_redir = 1; 
        txmsg_apply = 0; 
        txmsg_cork = 4097;
        test_send_large(opt, cgrp);

        txmsg_pass = 0; 
        txmsg_redir = 1; 
        txmsg_apply = 4097;
        txmsg_cork = 4097;
        test_send_large(opt, cgrp);
}

These tests will break the sk_msg sge iteration in
sk_msg_memcopy_from_iter().

I added the following test code:

'''
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index bbf40b999713..e1fd40aa8586 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -374,6 +374,7 @@ int sk_msg_memcopy_from_iter(struct sock *sk,
struct iov_iter *from,
 	void *to;
 
 	do {
+		pr_info("%s i=%d end=%d\n", __func__, i, msg->sg.end);
 		sge = sk_msg_elem(msg, i);
 		/* This is possible if a trim operation shrunk the
buffer */
 		if (msg->sg.copybreak >= sge->length) {
--
'''

And got this logs when running "test hanging corks" test:

'''
# 1/ 1  sockmap::txmsg test hanging corks:OK
# 2/ 1 sockhash::txmsg test hanging corks:OK
# 3/ 1 sockhash:ktls:txmsg test hanging corks:OK

[   55.751687] sk_msg_memcopy_from_iter i=0 end=8
[   55.751712] sk_msg_memcopy_from_iter i=1 end=8
[   55.751726] sk_msg_memcopy_from_iter i=2 end=8
[   55.751769] sk_msg_memcopy_from_iter i=3 end=8
[   55.751778] sk_msg_memcopy_from_iter i=4 end=8
[   55.751787] sk_msg_memcopy_from_iter i=5 end=8
[   55.751796] sk_msg_memcopy_from_iter i=6 end=8
[   55.751805] sk_msg_memcopy_from_iter i=7 end=8
[   55.752979] sk_msg_memcopy_from_iter i=8 end=16
[   55.752988] sk_msg_memcopy_from_iter i=9 end=16
[   55.752995] sk_msg_memcopy_from_iter i=10 end=16
[   55.753002] sk_msg_memcopy_from_iter i=11 end=16
[   55.753008] sk_msg_memcopy_from_iter i=12 end=16
[   55.753015] sk_msg_memcopy_from_iter i=13 end=16
[   55.753022] sk_msg_memcopy_from_iter i=14 end=16
[   55.753028] sk_msg_memcopy_from_iter i=15 end=16
[   56.087047] sk_msg_memcopy_from_iter i=0 end=1
[   56.087075] sk_msg_memcopy_from_iter i=1 end=1
[   56.087081] sk_msg_memcopy_from_iter i=3 end=1
[   56.087086] sk_msg_memcopy_from_iter i=5 end=1
[   56.087091] sk_msg_memcopy_from_iter i=7 end=1
[   56.087095] sk_msg_memcopy_from_iter i=9 end=1
[   56.087100] sk_msg_memcopy_from_iter i=11 end=1
[   56.087105] sk_msg_memcopy_from_iter i=13 end=1
[   56.087110] sk_msg_memcopy_from_iter i=15 end=1
[   56.087115] sk_msg_memcopy_from_iter i=17 end=1
'''

When "i" is greater than "end", the sge we get is empty, sge->length is
0. If we access this sge, this issue will occur. Kernel panics on some
machines.

To fix this, we can't "break" the loop like I did in this patch itself.
It will break test_sockmap with the following error logs:

# 1/ 1  sockmap::txmsg test hanging corks:OK
sendpage loop error: No space left on device
msg_loop_tx: iov_count 1024 iov_buf 256 cnt 2 err -1
tx thread exited with err 1.
# 2/ 1 sockhash::txmsg test hanging corks:FAIL
# 3/ 1 sockhash:ktls:txmsg test hanging corks:OK
Pass: 2 Fail: 1

We must "continue" the loop to get the next valid sge.

A better fix is here:

'''
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index bbf40b999713..bbaf909d0f9c 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -376,13 +376,8 @@ int sk_msg_memcopy_from_iter(struct sock *sk,
struct iov_iter *from,
 	do {
 		sge = sk_msg_elem(msg, i);
 		/* This is possible if a trim operation shrunk the
buffer */
-		if (msg->sg.copybreak >= sge->length) {
-			msg->sg.copybreak = 0;
-			sk_msg_iter_var_next(i);
-			if (i == msg->sg.end)
-				break;
-			sge = sk_msg_elem(msg, i);
-		}
+		if (msg->sg.copybreak >= sge->length)
+			goto next;
 
 		buf_size = sge->length - msg->sg.copybreak;
 		copy = (buf_size > bytes) ? bytes : buf_size;
@@ -399,6 +394,7 @@ int sk_msg_memcopy_from_iter(struct sock *sk,
struct iov_iter *from,
 		bytes -= copy;
 		if (!bytes)
 			break;
+next:
 		msg->sg.copybreak = 0;
 		sk_msg_iter_var_next(i);
 	} while (i != msg->sg.end);
-- 
'''

WDYT? I want to hear your opinion.

Thanks,
-Geliang

> > > > > > > > > > > > > > > > > > > > > > > > a Kernel panic occurs:
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > '''
> > > > > > > > > > > > > > > > > > > > > > > >   Oops[#1]:
> > > > > > > > > > > > > > > > > > > > > > > >   CPU: 20 PID: 23245
> > > > > > > > > > > > > > > > > > > > > > > > Comm: test_sockmap
> > > > > > > > > > > > > > > > > > > > > > > > Tainted: G     OE
> > > > > > > > > > > > > > > > > > > > > > > > 6.10.0-
> > > > > > > > > > > > > > > > > > > > > > > > rc2+ #32
> > > > > > > > > > > > > > > > > > > > > > > >   Hardware name:
> > > > > > > > > > > > > > > > > > > > > > > > LOONGSON
> > > > > > > > > > > > > > > > > > > > > > > > Dabieshan/Loongson-
> > > > > > > > > > > > > > > > > > > > > > > > TC542F0, BIOS
> > > > > > > > > > > > > > > > > > > > > > > > Loongson-UDK2018
> > > > > > > > > > > > > > > > > > > > > > > >   ... ...
> > > > > > > > > > > > > > > > > > > > > > > >      ra:
> > > > > > > > > > > > > > > > > > > > > > > > 90000000043a315c
> > > > > > > > > > > > > > > > > > > > > > > > tcp_bpf_sendmsg+0x23c/0
> > > > > > > > > > > > > > > > > > > > > > > > x420
> > > > > > > > > > > > > > > > > > > > > > > >     ERA:
> > > > > > > > > > > > > > > > > > > > > > > > 900000000426cd1c
> > > > > > > > > > > > > > > > > > > > > > > > sk_msg_memcopy_from_ite
> > > > > > > > > > > > > > > > > > > > > > > > r+0xbc/0x220
> > > > > > > > > > > > > > > > > > > > > > > >    CRMD: 000000b0 (PLV0
> > > > > > > > > > > > > > > > > > > > > > > > -IE -DA +PG DACF=CC
> > > > > > > > > > > > > > > > > > > > > > > > DACM=CC -WE)
> > > > > > > > > > > > > > > > > > > > > > > >    PRMD: 0000000c
> > > > > > > > > > > > > > > > > > > > > > > > (PPLV0 +PIE +PWE)
> > > > > > > > > > > > > > > > > > > > > > > >    EUEN: 00000007 (+FPE
> > > > > > > > > > > > > > > > > > > > > > > > +SXE +ASXE -BTE)
> > > > > > > > > > > > > > > > > > > > > > > >    ECFG: 00071c1d
> > > > > > > > > > > > > > > > > > > > > > > > (LIE=0,2-4,10-12 VS=7)
> > > > > > > > > > > > > > > > > > > > > > > >   ESTAT: 00010000 [PIL]
> > > > > > > > > > > > > > > > > > > > > > > > (IS= ECode=1
> > > > > > > > > > > > > > > > > > > > > > > > EsubCode=0)
> > > > > > > > > > > > > > > > > > > > > > > >    BADV:
> > > > > > > > > > > > > > > > > > > > > > > > 0000000000000040
> > > > > > > > > > > > > > > > > > > > > > > >    PRID: 0014c011
> > > > > > > > > > > > > > > > > > > > > > > > (Loongson-64bit,
> > > > > > > > > > > > > > > > > > > > > > > > Loongson-3C5000)
> > > > > > > > > > > > > > > > > > > > > > > >   Modules linked in:
> > > > > > > > > > > > > > > > > > > > > > > > tls xt_CHECKSUM
> > > > > > > > > > > > > > > > > > > > > > > > xt_MASQUERADE
> > > > > > > > > > > > > > > > > > > > > > > > xt_conntrack
> > > > > > > > > > > > > > > > > > > > > > > > ipt_REJECT
> > > > > > > > > > > > > > > > > > > > > > > >   Process test_sockmap
> > > > > > > > > > > > > > > > > > > > > > > > (pid: 23245,
> > > > > > > > > > > > > > > > > > > > > > > > threadinfo=00000000aeb6
> > > > > > > > > > > > > > > > > > > > > > > > 8043,
> > > > > > > > > > > > > > > > > > > > > > > > task=...)
> > > > > > > > > > > > > > > > > > > > > > > >   Stack : ... ...
> > > > > > > > > > > > > > > > > > > > > > > >           ...
> > > > > > > > > > > > > > > > > > > > > > > >   Call Trace:
> > > > > > > > > > > > > > > > > > > > > > > >   [<900000000426cd1c>]
> > > > > > > > > > > > > > > > > > > > > > > > sk_msg_memcopy_from_ite
> > > > > > > > > > > > > > > > > > > > > > > > r+0xbc/0x220
> > > > > > > > > > > > > > > > > > > > > > > >   [<90000000043a315c>]
> > > > > > > > > > > > > > > > > > > > > > > > tcp_bpf_sendmsg+0x23c/0
> > > > > > > > > > > > > > > > > > > > > > > > x420
> > > > > > > > > > > > > > > > > > > > > > > >   [<90000000041cafc8>]
> > > > > > > > > > > > > > > > > > > > > > > > __sock_sendmsg+0x68/0xe
> > > > > > > > > > > > > > > > > > > > > > > > 0
> > > > > > > > > > > > > > > > > > > > > > > >   [<90000000041cc4bc>]
> > > > > > > > > > > > > > > > > > > > > > > > ____sys_sendmsg+0x2bc/0
> > > > > > > > > > > > > > > > > > > > > > > > x360
> > > > > > > > > > > > > > > > > > > > > > > >   [<90000000041cea18>]
> > > > > > > > > > > > > > > > > > > > > > > > ___sys_sendmsg+0xb8/0x1
> > > > > > > > > > > > > > > > > > > > > > > > 20
> > > > > > > > > > > > > > > > > > > > > > > >   [<90000000041cf1f8>]
> > > > > > > > > > > > > > > > > > > > > > > > __sys_sendmsg+0x98/0x10
> > > > > > > > > > > > > > > > > > > > > > > > 0
> > > > > > > > > > > > > > > > > > > > > > > >   [<90000000045b76ec>]
> > > > > > > > > > > > > > > > > > > > > > > > do_syscall+0x8c/0xc0
> > > > > > > > > > > > > > > > > > > > > > > >   [<90000000030e1da4>]
> > > > > > > > > > > > > > > > > > > > > > > > handle_syscall+0xc4/0x1
> > > > > > > > > > > > > > > > > > > > > > > > 60
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > >   Code: ...
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > >   ---[ end trace
> > > > > > > > > > > > > > > > > > > > > > > > 0000000000000000 ]---
> > > > > > > > > > > > > > > > > > > > > > > > '''
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > This crash is because a
> > > > > > > > > > > > > > > > > > > > > > > > NULL pointer is passed
> > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > page_address()
> > > > > > > > > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > > > > > > > > > > sk_msg_memcopy_from_ite
> > > > > > > > > > > > > > > > > > > > > > > > r(). Due to the
> > > > > > > > > > > > > > > > > > > > > > > > difference in
> > > > > > > > > > > > > > > > > > > > > > > > architecture,
> > > > > > > > > > > > > > > > > > > > > > > > page_address(0) will
> > > > > > > > > > > > > > > > > > > > > > > > not trigger a panic on
> > > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > X86
> > > > > > > > > > > > > > > > > > > > > > > > platform but
> > > > > > > > > > > > > > > > > > > > > > > > will panic
> > > > > > > > > > > > > > > > > > > > > > > > on the Loogarch
> > > > > > > > > > > > > > > > > > > > > > > > platform. So this bug
> > > > > > > > > > > > > > > > > > > > > > > > was
> > > > > > > > > > > > > > > > > > > > > > > > hidden on the x86
> > > > > > > > > > > > > > > > > > > > > > > > platform, but
> > > > > > > > > > > > > > > > > > > > > > > > now it is exposed on
> > > > > > > > > > > > > > > > > > > > > > > > the Loogarch platform.
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > This bug is a logic
> > > > > > > > > > > > > > > > > > > > > > > > error indeed. In
> > > > > > > > > > > > > > > > > > > > > > > > sk_msg_memcopy_from_ite
> > > > > > > > > > > > > > > > > > > > > > > > r(),
> > > > > > > > > > > > > > > > > > > > > > > > an
> > > > > > > > > > > > > > > > > > > > > > > > invalid
> > > > > > > > > > > > > > > > > > > > > > > > "sge" is always used:
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > >  if (msg->sg.copybreak
> > > > > > > > > > > > > > > > > > > > > > > > >= sge->length) {
> > > > > > > > > > > > > > > > > > > > > > > >  msg->sg.copybreak = 0;
> > > > > > > > > > > > > > > > > > > > > > > >  sk_msg_iter_var_next(i
> > > > > > > > > > > > > > > > > > > > > > > > );
> > > > > > > > > > > > > > > > > > > > > > > >  if (i == msg->sg.end)
> > > > > > > > > > > > > > > > > > > > > > > >  break;
> > > > > > > > > > > > > > > > > > > > > > > >  sge = sk_msg_elem(msg,
> > > > > > > > > > > > > > > > > > > > > > > > i);
> > > > > > > > > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > If the value of i is 2,
> > > > > > > > > > > > > > > > > > > > > > > > msg->sg.end is also 2
> > > > > > > > > > > > > > > > > > > > > > > > when entering
> > > > > > > > > > > > > > > > > > > > > > > > this
> > > > > > > > > > > > > > > > > > > > > > > > if
> > > > > > > > > > > > > > > > > > > > > > > > block.
> > > > > > > > > > > > > > > > > > > > > > > > sk_msg_iter_var_next()
> > > > > > > > > > > > > > > > > > > > > > > > increases i by 1, and
> > > > > > > > > > > > > > > > > > > > > > > > now i is 3,
> > > > > > > > > > > > > > > > > > > > > > > > which is
> > > > > > > > > > > > > > > > > > > > > > > > no longer
> > > > > > > > > > > > > > > > > > > > > > > > equal to msg->sg.end.
> > > > > > > > > > > > > > > > > > > > > > > > The break will not be
> > > > > > > > > > > > > > > > > > > > > > > > triggered, and
> > > > > > > > > > > > > > > > > > > > > > > > the
> > > > > > > > > > > > > > > > > > > > > > > > next
> > > > > > > > > > > > > > > > > > > > > > > > sge
> > > > > > > > > > > > > > > > > > > > > > > > obtained by
> > > > > > > > > > > > > > > > > > > > > > > > sk_msg_elem(3) will be
> > > > > > > > > > > > > > > > > > > > > > > > an invalid
> > > > > > > > > > > > > > > > > > > > > > > > one.
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > The correct approach is
> > > > > > > > > > > > > > > > > > > > > > > > to check (i ==
> > > > > > > > > > > > > > > > > > > > > > > > msg->sg.end) first,
> > > > > > > > > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > > > > > > > > then
> > > > > > > > > > > > > > > > > > > > > > > > invoke
> > > > > > > > > > > > > > > > > > > > > > > > sk_msg_iter_var_next()
> > > > > > > > > > > > > > > > > > > > > > > > if they are not equal.
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > Fixes: 604326b41a6f
> > > > > > > > > > > > > > > > > > > > > > > > ("bpf, sockmap: convert
> > > > > > > > > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > > > > > > > > > generic
> > > > > > > > > > > > > > > > > > > > > > > > sk_msg
> > > > > > > > > > > > > > > > > > > > > > > > interface")
> > > > > > > > > > > > > > > > > > > > > > > > Signed-off-by: Geliang
> > > > > > > > > > > > > > > > > > > > > > > > Tang
> > > > > > > > > > > > > > > > > > > > > > > > <
> > > > > > > > > > > > > > > > > > > > > > > > tanggeliang@kylinos.cn>
> > > > > > > > > > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > > > > > > > > > >   net/core/skmsg.c | 2
> > > > > > > > > > > > > > > > > > > > > > > > +-
> > > > > > > > > > > > > > > > > > > > > > > >   1 file changed, 1
> > > > > > > > > > > > > > > > > > > > > > > > insertion(+), 1
> > > > > > > > > > > > > > > > > > > > > > > > deletion(-)
> > > > > > > > > > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > > diff --git
> > > > > > > > > > > > > > > > > > > > > > > > a/net/core/skmsg.c
> > > > > > > > > > > > > > > > > > > > > > > > b/net/core/skmsg.c
> > > > > > > > > > > > > > > > > > > > > > > > index
> > > > > > > > > > > > > > > > > > > > > > > > 44952cdd1425..1906d0d0e
> > > > > > > > > > > > > > > > > > > > > > > > eac 100644
> > > > > > > > > > > > > > > > > > > > > > > > --- a/net/core/skmsg.c
> > > > > > > > > > > > > > > > > > > > > > > > +++ b/net/core/skmsg.c
> > > > > > > > > > > > > > > > > > > > > > > > @@ -378,9 +378,9 @@ int
> > > > > > > > > > > > > > > > > > > > > > > > sk_msg_memcopy_from_ite
> > > > > > > > > > > > > > > > > > > > > > > > r(struct
> > > > > > > > > > > > > > > > > > > > > > > > sock *sk,
> > > > > > > > > > > > > > > > > > > > > > > > struct iov_iter *from,
> > > > > > > > > > > > > > > > > > > > > > > >    /* This is possible
> > > > > > > > > > > > > > > > > > > > > > > > if a trim operation
> > > > > > > > > > > > > > > > > > > > > > > > shrunk the
> > > > > > > > > > > > > > > > > > > > > > > > buffer */
> > > > > > > > > > > > > > > > > > > > > > > >    if (msg-
> > > > > > > > > > > > > > > > > > > > > > > > >sg.copybreak >= sge-
> > > > > > > > > > > > > > > > > > > > > > > > >length) {
> > > > > > > > > > > > > > > > > > > > > > > >    msg->sg.copybreak =
> > > > > > > > > > > > > > > > > > > > > > > > 0;
> > > > > > > > > > > > > > > > > > > > > > > > -
> > > > > > > > > > > > > > > > > > > > > > > > sk_msg_iter_var_next(i)
> > > > > > > > > > > > > > > > > > > > > > > > ;
> > > > > > > > > > > > > > > > > > > > > > > >    if (i == msg-
> > > > > > > > > > > > > > > > > > > > > > > > >sg.end)
> > > > > > > > > > > > > > > > > > > > > > > >    break;
> > > > > > > > > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > > > > > > > > sk_msg_iter_var_next(i)
> > > > > > > > > > > > > > > > > > > > > > > > ;
> > > > > > > > > > > > > > > > Reviewed-by: D. Wythe
> > > > > > > > > > > > > > > > <alibuda@linux.alibaba.com>
> > > > > > > > 
> > > > > > > > Thanks for your review.
> > > > > > > > 
> > > > > > > > But this change breaks test_sockmap. Will send a v4 to
> > > > > > > > fix
> > > > > > > > this.
> > > > > > > > 
> > > > > > > > Changes Requested.
> > > > > > > > 
> > > > > > > > -Geliang
> > > > > > > > 
> > > > > > > > > > > > > > > > > > > > > > > >    sge =
> > > > > > > > > > > > > > > > > > > > > > > > sk_msg_elem(msg, i);
> > > > > > > > > > > > > > > > > > > > > > > >    }
> > > > > > > > > > > > > > > > > > > > > > > >   
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > 
> > > > > > > > 
> > > > > > > > 


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-07-16  6:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-28  5:47 [PATCH net v3 0/2] bugfixes for skmsg Geliang Tang
2024-06-28  5:47 ` [PATCH net v3 1/2] skmsg: prevent empty ingress skb from enqueuing Geliang Tang
2024-07-01  9:08   ` D. Wythe
2024-07-02  7:02   ` Geliang Tang
2024-07-03  1:03   ` John Fastabend
2024-07-03  1:54     ` Geliang Tang
2024-07-05  4:25       ` Geliang Tang
2024-06-28  5:47 ` [PATCH net v3 2/2] skmsg: bugfix for sk_msg sge iteration Geliang Tang
2024-07-01  9:00   ` D. Wythe
2024-07-01 10:29     ` Geliang Tang
2024-07-16  6:53       ` Geliang Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).