* [RFC,net-next] tcp: add support for read with offset when using MSG_PEEK
@ 2024-01-15 21:51 Martin Zaharinov
2024-01-15 22:41 ` Jon Maloy
0 siblings, 1 reply; 13+ messages in thread
From: Martin Zaharinov @ 2024-01-15 21:51 UTC (permalink / raw)
To: jmaloy; +Cc: netdev
Hi Jon
After apply the patch on kernel 6.7.0 system hang with this bug report :
Jan 15 22:27:39 6.7.0,1,863,194879739,-,caller=T3523;BUG: unable to handle page fault for address: 00007fff333174e0
Jan 15 22:27:39 6.7.0,1,864,194879876,-,caller=T3523;#PF: supervisor read access in kernel mode
Jan 15 22:27:39 6.7.0,1,865,194879976,-,caller=T3523;#PF: error_code(0x0001) - permissions violation
Jan 15 22:27:39 6.7.0,6,866,194880075,-,caller=T3523;PGD 107cbd067 P4D 107cbd067 PUD 22055d067 PMD 10a384067 PTE 8000000228b00067
Jan 15 22:27:39 6.7.0,4,867,194880202,-,caller=T3523;Oops: 0001 [#1] SMP
Jan 15 22:27:39 6.7.0,4,868,194880297,-,caller=T3523;CPU: 12 PID: 3523 Comm: server-nft Tainted: G O 6.7.0 #1
Jan 15 22:27:39 6.7.0,4,869,194880420,-,caller=T3523;Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C612D8, BIOS P2.30 04/30/2018
Jan 15 22:27:39 6.7.0,4,870,194880547,-,caller=T3523;RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
Jan 15 22:27:39 6.7.0,4,871,194880709,-,caller=T3523;Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
Jan 15 22:27:39 6.7.0,4,872,194880876,-,caller=T3523;RSP: 0018:ffffa47b01307d00 EFLAGS: 00010202
Jan 15 22:27:39 6.7.0,4,873,194880975,-,caller=T3523;RAX: 0000000000000002 RBX: ffff8cf8c3209800 RCX: 00000000a87ac03c
Jan 15 22:27:39 6.7.0,4,874,194881096,-,caller=T3523;RDX: 00007fff333174e0 RSI: ffffa47b01307e18 RDI: ffff8cf8c3209800
Jan 15 22:27:39 6.7.0,4,875,194881217,-,caller=T3523;RBP: ffffa47b01307d78 R08: ffffa47b01307d90 R09: ffffa47b01307d8c
Jan 15 22:27:39 6.7.0,4,876,194881338,-,caller=T3523;R10: 0000000000000002 R11: ffffa47b01307e18 R12: ffff8cf8c3209800
Jan 15 22:27:39 6.7.0,4,877,194881458,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: ffffa47b01307d44
Jan 15 22:27:39 6.7.0,4,878,194881579,-,caller=T3523;FS: 00007f4941b0ad80(0000) GS:ffff8d001f900000(0000) knlGS:0000000000000000
Jan 15 22:27:39 6.7.0,4,879,194881703,-,caller=T3523;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 15 22:27:39 6.7.0,4,880,194881802,-,caller=T3523;CR2: 00007fff333174e0 CR3: 000000010df04002 CR4: 00000000003706f0
Jan 15 22:27:39 6.7.0,4,881,194881922,-,caller=T3523;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 15 22:27:39 6.7.0,4,882,194882043,-,caller=T3523;DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 15 22:27:39 6.7.0,4,883,194882164,-,caller=T3523;Call Trace:
Jan 15 22:27:39 6.7.0,4,884,194882257,-,caller=T3523; <TASK>
Jan 15 22:27:39 6.7.0,4,885,194882347,-,caller=T3523; ? __die+0xe4/0xf0
Jan 15 22:27:39 6.7.0,4,886,194882442,-,caller=T3523; ? page_fault_oops+0x144/0x3e0
Jan 15 22:27:39 6.7.0,4,887,194882539,-,caller=T3523; ? zap_pte_range+0x6a4/0xdc0
Jan 15 22:27:39 6.7.0,4,888,194882638,-,caller=T3523; ? exc_page_fault+0x5d/0xa0
Jan 15 22:27:39 6.7.0,4,889,194882736,-,caller=T3523; ? asm_exc_page_fault+0x22/0x30
Jan 15 22:27:39 6.7.0,4,890,194882834,-,caller=T3523; ? tcp_recvmsg_locked+0x498/0xea0
Jan 15 22:27:39 6.7.0,4,891,194882931,-,caller=T3523; ? __call_rcu_common.constprop.0+0xbc/0x770
Jan 15 22:27:39 6.7.0,4,892,194883031,-,caller=T3523; ? rcu_nocb_flush_bypass.part.0+0xec/0x120
Jan 15 22:27:39 6.7.0,4,893,194883133,-,caller=T3523; tcp_recvmsg+0x5c/0x1e0
Jan 15 22:27:39 6.7.0,4,894,194883228,-,caller=T3523; inet_recvmsg+0x2a/0x90
Jan 15 22:27:39 6.7.0,4,895,194883325,-,caller=T3523; __sys_recvfrom+0x15e/0x200
Jan 15 22:27:39 6.7.0,4,896,194883423,-,caller=T3523; ? wait_task_zombie+0xee/0x410
Jan 15 22:27:39 6.7.0,4,897,194883539,-,caller=T3523; ? remove_wait_queue+0x1b/0x60
Jan 15 22:27:39 6.7.0,4,898,194883635,-,caller=T3523; ? do_wait+0x93/0xa0
Jan 15 22:27:39 6.7.0,4,899,194883729,-,caller=T3523; ? __x64_sys_poll+0xa7/0x170
Jan 15 22:27:39 6.7.0,4,900,194883825,-,caller=T3523; __x64_sys_recvfrom+0x1b/0x20
Jan 15 22:27:39 6.7.0,4,901,194883921,-,caller=T3523; do_syscall_64+0x2c/0xa0
Jan 15 22:27:39 6.7.0,4,902,194884018,-,caller=T3523; entry_SYSCALL_64_after_hwframe+0x46/0x4e
Jan 15 22:27:39 6.7.0,4,903,194884116,-,caller=T3523;RIP: 0033:0x7f4941fe92a9
Jan 15 22:27:39 6.7.0,4,904,194884210,-,caller=T3523;Code: 0c 00 64 c7 02 02 00 00 00 eb bf 66 0f 1f 44 00 00 80 3d a9 e0 0c 00 00 41 89 ca 74 1c 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 55 48 83 ec 20 48 89
Jan 15 22:27:39 6.7.0,4,905,194884377,-,caller=T3523;RSP: 002b:00007fff33317468 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
Jan 15 22:27:39 6.7.0,4,906,194884499,-,caller=T3523;RAX: ffffffffffffffda RBX: 00007fff333174e0 RCX: 00007f4941fe92a9
Jan 15 22:27:39 6.7.0,4,907,194884620,-,caller=T3523;RDX: 0000000000000001 RSI: 00007fff333174e0 RDI: 0000000000000005
Jan 15 22:27:39 6.7.0,4,908,194884740,-,caller=T3523;RBP: 00007fff33317550 R08: 0000000000000000 R09: 0000000000000000
Jan 15 22:27:39 6.7.0,4,909,194884860,-,caller=T3523;R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000
Jan 15 22:27:39 6.7.0,4,910,194884980,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: 00007f49418850a0
Jan 15 22:27:39 6.7.0,4,911,194885101,-,caller=T3523; </TASK>
Jan 15 22:27:39 6.7.0,4,912,194885191,-,caller=T3523;Modules linked in: nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding igb i2c_algo_bit i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos aesni_intel crypto_simd cryptd
Jan 15 22:27:39 6.7.0,4,913,194885507,-,caller=T3523;CR2: 00007fff333174e0
Jan 15 22:27:39 6.7.0,4,914,194885602,-,caller=T3523;---[ end trace 0000000000000000 ]---
Jan 15 22:27:39 6.7.0,4,915,194885698,-,caller=T3523;RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
Jan 15 22:27:39 6.7.0,4,916,194885797,-,caller=T3523;Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
Jan 15 22:27:39 6.7.0,4,917,194887079,-,caller=T3523;RSP: 0018:ffffa47b01307d00 EFLAGS: 00010202
Jan 15 22:27:39 6.7.0,4,918,194887177,-,caller=T3523;RAX: 0000000000000002 RBX: ffff8cf8c3209800 RCX: 00000000a87ac03c
Jan 15 22:27:39 6.7.0,4,919,194887298,-,caller=T3523;RDX: 00007fff333174e0 RSI: ffffa47b01307e18 RDI: ffff8cf8c3209800
Jan 15 22:27:39 6.7.0,4,920,194887418,-,caller=T3523;RBP: ffffa47b01307d78 R08: ffffa47b01307d90 R09: ffffa47b01307d8c
Jan 15 22:27:39 6.7.0,4,921,194887538,-,caller=T3523;R10: 0000000000000002 R11: ffffa47b01307e18 R12: ffff8cf8c3209800
Jan 15 22:27:39 6.7.0,4,922,194887658,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: ffffa47b01307d44
Jan 15 22:27:39 6.7.0,4,923,194887779,-,caller=T3523;FS: 00007f4941b0ad80(0000) GS:ffff8d001f900000(0000) knlGS:0000000000000000
Jan 15 22:27:39 6.7.0,4,924,194887901,-,caller=T3523;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 15 22:27:39 6.7.0,4,925,194888000,-,caller=T3523;CR2: 00007fff333174e0 CR3: 000000010df04002 CR4: 00000000003706f0
Jan 15 22:27:39 6.7.0,4,926,194888120,-,caller=T3523;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 15 22:27:39 6.7.0,4,927,194888240,-,caller=T3523;DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Jan 15 22:27:39 6.7.0,0,928,194888360,-,caller=T3523;Kernel panic - not syncing: Fatal exception
Jan 15 22:27:40 6.7.0,0,929,195391096,-,caller=T3523;Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Jan 15 22:27:40 6.7.0,0,930,195391224,-,caller=T3523;Rebooting in 10 seconds..
m.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC,net-next] tcp: add support for read with offset when using MSG_PEEK
2024-01-15 21:51 [RFC,net-next] tcp: add support for read with offset when using MSG_PEEK Martin Zaharinov
@ 2024-01-15 22:41 ` Jon Maloy
2024-01-16 4:59 ` Martin Zaharinov
0 siblings, 1 reply; 13+ messages in thread
From: Jon Maloy @ 2024-01-15 22:41 UTC (permalink / raw)
To: Martin Zaharinov; +Cc: netdev
On 2024-01-15 16:51, Martin Zaharinov wrote:
> Hi Jon
>
> After apply the patch on kernel 6.7.0 system hang with this bug report :
Hmm,
I have been running this for weeks without any problems, on x86_64 with
current net and net-next.
There must be some difference between our kernels.
Which configuration are you using?
It would also be interesting to see your test program.
Regards
///jon
>
> Jan 15 22:27:39 6.7.0,1,863,194879739,-,caller=T3523;BUG: unable to handle page fault for address: 00007fff333174e0
> Jan 15 22:27:39 6.7.0,1,864,194879876,-,caller=T3523;#PF: supervisor read access in kernel mode
> Jan 15 22:27:39 6.7.0,1,865,194879976,-,caller=T3523;#PF: error_code(0x0001) - permissions violation
> Jan 15 22:27:39 6.7.0,6,866,194880075,-,caller=T3523;PGD 107cbd067 P4D 107cbd067 PUD 22055d067 PMD 10a384067 PTE 8000000228b00067
> Jan 15 22:27:39 6.7.0,4,867,194880202,-,caller=T3523;Oops: 0001 [#1] SMP
> Jan 15 22:27:39 6.7.0,4,868,194880297,-,caller=T3523;CPU: 12 PID: 3523 Comm: server-nft Tainted: G O 6.7.0 #1
> Jan 15 22:27:39 6.7.0,4,869,194880420,-,caller=T3523;Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C612D8, BIOS P2.30 04/30/2018
> Jan 15 22:27:39 6.7.0,4,870,194880547,-,caller=T3523;RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
> Jan 15 22:27:39 6.7.0,4,871,194880709,-,caller=T3523;Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
> Jan 15 22:27:39 6.7.0,4,872,194880876,-,caller=T3523;RSP: 0018:ffffa47b01307d00 EFLAGS: 00010202
> Jan 15 22:27:39 6.7.0,4,873,194880975,-,caller=T3523;RAX: 0000000000000002 RBX: ffff8cf8c3209800 RCX: 00000000a87ac03c
> Jan 15 22:27:39 6.7.0,4,874,194881096,-,caller=T3523;RDX: 00007fff333174e0 RSI: ffffa47b01307e18 RDI: ffff8cf8c3209800
> Jan 15 22:27:39 6.7.0,4,875,194881217,-,caller=T3523;RBP: ffffa47b01307d78 R08: ffffa47b01307d90 R09: ffffa47b01307d8c
> Jan 15 22:27:39 6.7.0,4,876,194881338,-,caller=T3523;R10: 0000000000000002 R11: ffffa47b01307e18 R12: ffff8cf8c3209800
> Jan 15 22:27:39 6.7.0,4,877,194881458,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: ffffa47b01307d44
> Jan 15 22:27:39 6.7.0,4,878,194881579,-,caller=T3523;FS: 00007f4941b0ad80(0000) GS:ffff8d001f900000(0000) knlGS:0000000000000000
> Jan 15 22:27:39 6.7.0,4,879,194881703,-,caller=T3523;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jan 15 22:27:39 6.7.0,4,880,194881802,-,caller=T3523;CR2: 00007fff333174e0 CR3: 000000010df04002 CR4: 00000000003706f0
> Jan 15 22:27:39 6.7.0,4,881,194881922,-,caller=T3523;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Jan 15 22:27:39 6.7.0,4,882,194882043,-,caller=T3523;DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Jan 15 22:27:39 6.7.0,4,883,194882164,-,caller=T3523;Call Trace:
> Jan 15 22:27:39 6.7.0,4,884,194882257,-,caller=T3523; <TASK>
> Jan 15 22:27:39 6.7.0,4,885,194882347,-,caller=T3523; ? __die+0xe4/0xf0
> Jan 15 22:27:39 6.7.0,4,886,194882442,-,caller=T3523; ? page_fault_oops+0x144/0x3e0
> Jan 15 22:27:39 6.7.0,4,887,194882539,-,caller=T3523; ? zap_pte_range+0x6a4/0xdc0
> Jan 15 22:27:39 6.7.0,4,888,194882638,-,caller=T3523; ? exc_page_fault+0x5d/0xa0
> Jan 15 22:27:39 6.7.0,4,889,194882736,-,caller=T3523; ? asm_exc_page_fault+0x22/0x30
> Jan 15 22:27:39 6.7.0,4,890,194882834,-,caller=T3523; ? tcp_recvmsg_locked+0x498/0xea0
> Jan 15 22:27:39 6.7.0,4,891,194882931,-,caller=T3523; ? __call_rcu_common.constprop.0+0xbc/0x770
> Jan 15 22:27:39 6.7.0,4,892,194883031,-,caller=T3523; ? rcu_nocb_flush_bypass.part.0+0xec/0x120
> Jan 15 22:27:39 6.7.0,4,893,194883133,-,caller=T3523; tcp_recvmsg+0x5c/0x1e0
> Jan 15 22:27:39 6.7.0,4,894,194883228,-,caller=T3523; inet_recvmsg+0x2a/0x90
> Jan 15 22:27:39 6.7.0,4,895,194883325,-,caller=T3523; __sys_recvfrom+0x15e/0x200
> Jan 15 22:27:39 6.7.0,4,896,194883423,-,caller=T3523; ? wait_task_zombie+0xee/0x410
> Jan 15 22:27:39 6.7.0,4,897,194883539,-,caller=T3523; ? remove_wait_queue+0x1b/0x60
> Jan 15 22:27:39 6.7.0,4,898,194883635,-,caller=T3523; ? do_wait+0x93/0xa0
> Jan 15 22:27:39 6.7.0,4,899,194883729,-,caller=T3523; ? __x64_sys_poll+0xa7/0x170
> Jan 15 22:27:39 6.7.0,4,900,194883825,-,caller=T3523; __x64_sys_recvfrom+0x1b/0x20
> Jan 15 22:27:39 6.7.0,4,901,194883921,-,caller=T3523; do_syscall_64+0x2c/0xa0
> Jan 15 22:27:39 6.7.0,4,902,194884018,-,caller=T3523; entry_SYSCALL_64_after_hwframe+0x46/0x4e
> Jan 15 22:27:39 6.7.0,4,903,194884116,-,caller=T3523;RIP: 0033:0x7f4941fe92a9
> Jan 15 22:27:39 6.7.0,4,904,194884210,-,caller=T3523;Code: 0c 00 64 c7 02 02 00 00 00 eb bf 66 0f 1f 44 00 00 80 3d a9 e0 0c 00 00 41 89 ca 74 1c 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 55 48 83 ec 20 48 89
> Jan 15 22:27:39 6.7.0,4,905,194884377,-,caller=T3523;RSP: 002b:00007fff33317468 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
> Jan 15 22:27:39 6.7.0,4,906,194884499,-,caller=T3523;RAX: ffffffffffffffda RBX: 00007fff333174e0 RCX: 00007f4941fe92a9
> Jan 15 22:27:39 6.7.0,4,907,194884620,-,caller=T3523;RDX: 0000000000000001 RSI: 00007fff333174e0 RDI: 0000000000000005
> Jan 15 22:27:39 6.7.0,4,908,194884740,-,caller=T3523;RBP: 00007fff33317550 R08: 0000000000000000 R09: 0000000000000000
> Jan 15 22:27:39 6.7.0,4,909,194884860,-,caller=T3523;R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000
> Jan 15 22:27:39 6.7.0,4,910,194884980,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: 00007f49418850a0
> Jan 15 22:27:39 6.7.0,4,911,194885101,-,caller=T3523; </TASK>
> Jan 15 22:27:39 6.7.0,4,912,194885191,-,caller=T3523;Modules linked in: nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding igb i2c_algo_bit i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos aesni_intel crypto_simd cryptd
> Jan 15 22:27:39 6.7.0,4,913,194885507,-,caller=T3523;CR2: 00007fff333174e0
> Jan 15 22:27:39 6.7.0,4,914,194885602,-,caller=T3523;---[ end trace 0000000000000000 ]---
> Jan 15 22:27:39 6.7.0,4,915,194885698,-,caller=T3523;RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
> Jan 15 22:27:39 6.7.0,4,916,194885797,-,caller=T3523;Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
> Jan 15 22:27:39 6.7.0,4,917,194887079,-,caller=T3523;RSP: 0018:ffffa47b01307d00 EFLAGS: 00010202
> Jan 15 22:27:39 6.7.0,4,918,194887177,-,caller=T3523;RAX: 0000000000000002 RBX: ffff8cf8c3209800 RCX: 00000000a87ac03c
> Jan 15 22:27:39 6.7.0,4,919,194887298,-,caller=T3523;RDX: 00007fff333174e0 RSI: ffffa47b01307e18 RDI: ffff8cf8c3209800
> Jan 15 22:27:39 6.7.0,4,920,194887418,-,caller=T3523;RBP: ffffa47b01307d78 R08: ffffa47b01307d90 R09: ffffa47b01307d8c
> Jan 15 22:27:39 6.7.0,4,921,194887538,-,caller=T3523;R10: 0000000000000002 R11: ffffa47b01307e18 R12: ffff8cf8c3209800
> Jan 15 22:27:39 6.7.0,4,922,194887658,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: ffffa47b01307d44
> Jan 15 22:27:39 6.7.0,4,923,194887779,-,caller=T3523;FS: 00007f4941b0ad80(0000) GS:ffff8d001f900000(0000) knlGS:0000000000000000
> Jan 15 22:27:39 6.7.0,4,924,194887901,-,caller=T3523;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jan 15 22:27:39 6.7.0,4,925,194888000,-,caller=T3523;CR2: 00007fff333174e0 CR3: 000000010df04002 CR4: 00000000003706f0
> Jan 15 22:27:39 6.7.0,4,926,194888120,-,caller=T3523;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Jan 15 22:27:39 6.7.0,4,927,194888240,-,caller=T3523;DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Jan 15 22:27:39 6.7.0,0,928,194888360,-,caller=T3523;Kernel panic - not syncing: Fatal exception
> Jan 15 22:27:40 6.7.0,0,929,195391096,-,caller=T3523;Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> Jan 15 22:27:40 6.7.0,0,930,195391224,-,caller=T3523;Rebooting in 10 seconds..
>
>
>
> m.
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC,net-next] tcp: add support for read with offset when using MSG_PEEK
2024-01-15 22:41 ` Jon Maloy
@ 2024-01-16 4:59 ` Martin Zaharinov
2024-01-17 16:33 ` Jon Maloy
0 siblings, 1 reply; 13+ messages in thread
From: Martin Zaharinov @ 2024-01-16 4:59 UTC (permalink / raw)
To: Jon Maloy; +Cc: netdev
Hi Jon,
yes same here in our test lab where have one test user all is fine .
But when install kernel on production server with 500 users (ppp) and 400-500mbit/s traffic machine crash with this bug log.
Its run as isp router firewall + shapers …
m.
> On 16 Jan 2024, at 0:41, Jon Maloy <jmaloy@redhat.com> wrote:
>
>
>
> On 2024-01-15 16:51, Martin Zaharinov wrote:
>> Hi Jon
>>
>> After apply the patch on kernel 6.7.0 system hang with this bug report :
> Hmm,
> I have been running this for weeks without any problems, on x86_64 with current net and net-next.
> There must be some difference between our kernels.
> Which configuration are you using?
> It would also be interesting to see your test program.
>
> Regards
> ///jon
>
>
>>
>> Jan 15 22:27:39 6.7.0,1,863,194879739,-,caller=T3523;BUG: unable to handle page fault for address: 00007fff333174e0
>> Jan 15 22:27:39 6.7.0,1,864,194879876,-,caller=T3523;#PF: supervisor read access in kernel mode
>> Jan 15 22:27:39 6.7.0,1,865,194879976,-,caller=T3523;#PF: error_code(0x0001) - permissions violation
>> Jan 15 22:27:39 6.7.0,6,866,194880075,-,caller=T3523;PGD 107cbd067 P4D 107cbd067 PUD 22055d067 PMD 10a384067 PTE 8000000228b00067
>> Jan 15 22:27:39 6.7.0,4,867,194880202,-,caller=T3523;Oops: 0001 [#1] SMP
>> Jan 15 22:27:39 6.7.0,4,868,194880297,-,caller=T3523;CPU: 12 PID: 3523 Comm: server-nft Tainted: G O 6.7.0 #1
>> Jan 15 22:27:39 6.7.0,4,869,194880420,-,caller=T3523;Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C612D8, BIOS P2.30 04/30/2018
>> Jan 15 22:27:39 6.7.0,4,870,194880547,-,caller=T3523;RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
>> Jan 15 22:27:39 6.7.0,4,871,194880709,-,caller=T3523;Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
>> Jan 15 22:27:39 6.7.0,4,872,194880876,-,caller=T3523;RSP: 0018:ffffa47b01307d00 EFLAGS: 00010202
>> Jan 15 22:27:39 6.7.0,4,873,194880975,-,caller=T3523;RAX: 0000000000000002 RBX: ffff8cf8c3209800 RCX: 00000000a87ac03c
>> Jan 15 22:27:39 6.7.0,4,874,194881096,-,caller=T3523;RDX: 00007fff333174e0 RSI: ffffa47b01307e18 RDI: ffff8cf8c3209800
>> Jan 15 22:27:39 6.7.0,4,875,194881217,-,caller=T3523;RBP: ffffa47b01307d78 R08: ffffa47b01307d90 R09: ffffa47b01307d8c
>> Jan 15 22:27:39 6.7.0,4,876,194881338,-,caller=T3523;R10: 0000000000000002 R11: ffffa47b01307e18 R12: ffff8cf8c3209800
>> Jan 15 22:27:39 6.7.0,4,877,194881458,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: ffffa47b01307d44
>> Jan 15 22:27:39 6.7.0,4,878,194881579,-,caller=T3523;FS: 00007f4941b0ad80(0000) GS:ffff8d001f900000(0000) knlGS:0000000000000000
>> Jan 15 22:27:39 6.7.0,4,879,194881703,-,caller=T3523;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Jan 15 22:27:39 6.7.0,4,880,194881802,-,caller=T3523;CR2: 00007fff333174e0 CR3: 000000010df04002 CR4: 00000000003706f0
>> Jan 15 22:27:39 6.7.0,4,881,194881922,-,caller=T3523;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Jan 15 22:27:39 6.7.0,4,882,194882043,-,caller=T3523;DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Jan 15 22:27:39 6.7.0,4,883,194882164,-,caller=T3523;Call Trace:
>> Jan 15 22:27:39 6.7.0,4,884,194882257,-,caller=T3523; <TASK>
>> Jan 15 22:27:39 6.7.0,4,885,194882347,-,caller=T3523; ? __die+0xe4/0xf0
>> Jan 15 22:27:39 6.7.0,4,886,194882442,-,caller=T3523; ? page_fault_oops+0x144/0x3e0
>> Jan 15 22:27:39 6.7.0,4,887,194882539,-,caller=T3523; ? zap_pte_range+0x6a4/0xdc0
>> Jan 15 22:27:39 6.7.0,4,888,194882638,-,caller=T3523; ? exc_page_fault+0x5d/0xa0
>> Jan 15 22:27:39 6.7.0,4,889,194882736,-,caller=T3523; ? asm_exc_page_fault+0x22/0x30
>> Jan 15 22:27:39 6.7.0,4,890,194882834,-,caller=T3523; ? tcp_recvmsg_locked+0x498/0xea0
>> Jan 15 22:27:39 6.7.0,4,891,194882931,-,caller=T3523; ? __call_rcu_common.constprop.0+0xbc/0x770
>> Jan 15 22:27:39 6.7.0,4,892,194883031,-,caller=T3523; ? rcu_nocb_flush_bypass.part.0+0xec/0x120
>> Jan 15 22:27:39 6.7.0,4,893,194883133,-,caller=T3523; tcp_recvmsg+0x5c/0x1e0
>> Jan 15 22:27:39 6.7.0,4,894,194883228,-,caller=T3523; inet_recvmsg+0x2a/0x90
>> Jan 15 22:27:39 6.7.0,4,895,194883325,-,caller=T3523; __sys_recvfrom+0x15e/0x200
>> Jan 15 22:27:39 6.7.0,4,896,194883423,-,caller=T3523; ? wait_task_zombie+0xee/0x410
>> Jan 15 22:27:39 6.7.0,4,897,194883539,-,caller=T3523; ? remove_wait_queue+0x1b/0x60
>> Jan 15 22:27:39 6.7.0,4,898,194883635,-,caller=T3523; ? do_wait+0x93/0xa0
>> Jan 15 22:27:39 6.7.0,4,899,194883729,-,caller=T3523; ? __x64_sys_poll+0xa7/0x170
>> Jan 15 22:27:39 6.7.0,4,900,194883825,-,caller=T3523; __x64_sys_recvfrom+0x1b/0x20
>> Jan 15 22:27:39 6.7.0,4,901,194883921,-,caller=T3523; do_syscall_64+0x2c/0xa0
>> Jan 15 22:27:39 6.7.0,4,902,194884018,-,caller=T3523; entry_SYSCALL_64_after_hwframe+0x46/0x4e
>> Jan 15 22:27:39 6.7.0,4,903,194884116,-,caller=T3523;RIP: 0033:0x7f4941fe92a9
>> Jan 15 22:27:39 6.7.0,4,904,194884210,-,caller=T3523;Code: 0c 00 64 c7 02 02 00 00 00 eb bf 66 0f 1f 44 00 00 80 3d a9 e0 0c 00 00 41 89 ca 74 1c 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 55 48 83 ec 20 48 89
>> Jan 15 22:27:39 6.7.0,4,905,194884377,-,caller=T3523;RSP: 002b:00007fff33317468 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
>> Jan 15 22:27:39 6.7.0,4,906,194884499,-,caller=T3523;RAX: ffffffffffffffda RBX: 00007fff333174e0 RCX: 00007f4941fe92a9
>> Jan 15 22:27:39 6.7.0,4,907,194884620,-,caller=T3523;RDX: 0000000000000001 RSI: 00007fff333174e0 RDI: 0000000000000005
>> Jan 15 22:27:39 6.7.0,4,908,194884740,-,caller=T3523;RBP: 00007fff33317550 R08: 0000000000000000 R09: 0000000000000000
>> Jan 15 22:27:39 6.7.0,4,909,194884860,-,caller=T3523;R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000
>> Jan 15 22:27:39 6.7.0,4,910,194884980,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: 00007f49418850a0
>> Jan 15 22:27:39 6.7.0,4,911,194885101,-,caller=T3523; </TASK>
>> Jan 15 22:27:39 6.7.0,4,912,194885191,-,caller=T3523;Modules linked in: nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding igb i2c_algo_bit i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos aesni_intel crypto_simd cryptd
>> Jan 15 22:27:39 6.7.0,4,913,194885507,-,caller=T3523;CR2: 00007fff333174e0
>> Jan 15 22:27:39 6.7.0,4,914,194885602,-,caller=T3523;---[ end trace 0000000000000000 ]---
>> Jan 15 22:27:39 6.7.0,4,915,194885698,-,caller=T3523;RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
>> Jan 15 22:27:39 6.7.0,4,916,194885797,-,caller=T3523;Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
>> Jan 15 22:27:39 6.7.0,4,917,194887079,-,caller=T3523;RSP: 0018:ffffa47b01307d00 EFLAGS: 00010202
>> Jan 15 22:27:39 6.7.0,4,918,194887177,-,caller=T3523;RAX: 0000000000000002 RBX: ffff8cf8c3209800 RCX: 00000000a87ac03c
>> Jan 15 22:27:39 6.7.0,4,919,194887298,-,caller=T3523;RDX: 00007fff333174e0 RSI: ffffa47b01307e18 RDI: ffff8cf8c3209800
>> Jan 15 22:27:39 6.7.0,4,920,194887418,-,caller=T3523;RBP: ffffa47b01307d78 R08: ffffa47b01307d90 R09: ffffa47b01307d8c
>> Jan 15 22:27:39 6.7.0,4,921,194887538,-,caller=T3523;R10: 0000000000000002 R11: ffffa47b01307e18 R12: ffff8cf8c3209800
>> Jan 15 22:27:39 6.7.0,4,922,194887658,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: ffffa47b01307d44
>> Jan 15 22:27:39 6.7.0,4,923,194887779,-,caller=T3523;FS: 00007f4941b0ad80(0000) GS:ffff8d001f900000(0000) knlGS:0000000000000000
>> Jan 15 22:27:39 6.7.0,4,924,194887901,-,caller=T3523;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> Jan 15 22:27:39 6.7.0,4,925,194888000,-,caller=T3523;CR2: 00007fff333174e0 CR3: 000000010df04002 CR4: 00000000003706f0
>> Jan 15 22:27:39 6.7.0,4,926,194888120,-,caller=T3523;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> Jan 15 22:27:39 6.7.0,4,927,194888240,-,caller=T3523;DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Jan 15 22:27:39 6.7.0,0,928,194888360,-,caller=T3523;Kernel panic - not syncing: Fatal exception
>> Jan 15 22:27:40 6.7.0,0,929,195391096,-,caller=T3523;Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> Jan 15 22:27:40 6.7.0,0,930,195391224,-,caller=T3523;Rebooting in 10 seconds..
>>
>>
>>
>> m.
>>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC,net-next] tcp: add support for read with offset when using MSG_PEEK
2024-01-16 4:59 ` Martin Zaharinov
@ 2024-01-17 16:33 ` Jon Maloy
2024-01-17 17:11 ` Martin Zaharinov
2024-01-26 15:01 ` Martin Zaharinov
0 siblings, 2 replies; 13+ messages in thread
From: Jon Maloy @ 2024-01-17 16:33 UTC (permalink / raw)
To: Martin Zaharinov; +Cc: netdev
On 2024-01-15 23:59, Martin Zaharinov wrote:
> Hi Jon,
>
> yes same here in our test lab where have one test user all is fine .
>
> But when install kernel on production server with 500 users (ppp) and 400-500mbit/s traffic machine crash with this bug log.
> Its run as isp router firewall + shapers …
Just to get it straight, does it crash when you are running your test
program on top of that heavily loaded machine, or does it just happen
randomly when the patch is present?
////jon
>
> m.
>
>> On 16 Jan 2024, at 0:41, Jon Maloy <jmaloy@redhat.com> wrote:
>>
>>
>>
>> On 2024-01-15 16:51, Martin Zaharinov wrote:
>>> Hi Jon
>>>
>>> After apply the patch on kernel 6.7.0 system hang with this bug report :
>> Hmm,
>> I have been running this for weeks without any problems, on x86_64 with current net and net-next.
>> There must be some difference between our kernels.
>> Which configuration are you using?
>> It would also be interesting to see your test program.
>>
>> Regards
>> ///jon
>>
>>
>>> Jan 15 22:27:39 6.7.0,1,863,194879739,-,caller=T3523;BUG: unable to handle page fault for address: 00007fff333174e0
>>> Jan 15 22:27:39 6.7.0,1,864,194879876,-,caller=T3523;#PF: supervisor read access in kernel mode
>>> Jan 15 22:27:39 6.7.0,1,865,194879976,-,caller=T3523;#PF: error_code(0x0001) - permissions violation
>>> Jan 15 22:27:39 6.7.0,6,866,194880075,-,caller=T3523;PGD 107cbd067 P4D 107cbd067 PUD 22055d067 PMD 10a384067 PTE 8000000228b00067
>>> Jan 15 22:27:39 6.7.0,4,867,194880202,-,caller=T3523;Oops: 0001 [#1] SMP
>>> Jan 15 22:27:39 6.7.0,4,868,194880297,-,caller=T3523;CPU: 12 PID: 3523 Comm: server-nft Tainted: G O 6.7.0 #1
>>> Jan 15 22:27:39 6.7.0,4,869,194880420,-,caller=T3523;Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C612D8, BIOS P2.30 04/30/2018
>>> Jan 15 22:27:39 6.7.0,4,870,194880547,-,caller=T3523;RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
>>> Jan 15 22:27:39 6.7.0,4,871,194880709,-,caller=T3523;Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
>>> Jan 15 22:27:39 6.7.0,4,872,194880876,-,caller=T3523;RSP: 0018:ffffa47b01307d00 EFLAGS: 00010202
>>> Jan 15 22:27:39 6.7.0,4,873,194880975,-,caller=T3523;RAX: 0000000000000002 RBX: ffff8cf8c3209800 RCX: 00000000a87ac03c
>>> Jan 15 22:27:39 6.7.0,4,874,194881096,-,caller=T3523;RDX: 00007fff333174e0 RSI: ffffa47b01307e18 RDI: ffff8cf8c3209800
>>> Jan 15 22:27:39 6.7.0,4,875,194881217,-,caller=T3523;RBP: ffffa47b01307d78 R08: ffffa47b01307d90 R09: ffffa47b01307d8c
>>> Jan 15 22:27:39 6.7.0,4,876,194881338,-,caller=T3523;R10: 0000000000000002 R11: ffffa47b01307e18 R12: ffff8cf8c3209800
>>> Jan 15 22:27:39 6.7.0,4,877,194881458,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: ffffa47b01307d44
>>> Jan 15 22:27:39 6.7.0,4,878,194881579,-,caller=T3523;FS: 00007f4941b0ad80(0000) GS:ffff8d001f900000(0000) knlGS:0000000000000000
>>> Jan 15 22:27:39 6.7.0,4,879,194881703,-,caller=T3523;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Jan 15 22:27:39 6.7.0,4,880,194881802,-,caller=T3523;CR2: 00007fff333174e0 CR3: 000000010df04002 CR4: 00000000003706f0
>>> Jan 15 22:27:39 6.7.0,4,881,194881922,-,caller=T3523;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> Jan 15 22:27:39 6.7.0,4,882,194882043,-,caller=T3523;DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Jan 15 22:27:39 6.7.0,4,883,194882164,-,caller=T3523;Call Trace:
>>> Jan 15 22:27:39 6.7.0,4,884,194882257,-,caller=T3523; <TASK>
>>> Jan 15 22:27:39 6.7.0,4,885,194882347,-,caller=T3523; ? __die+0xe4/0xf0
>>> Jan 15 22:27:39 6.7.0,4,886,194882442,-,caller=T3523; ? page_fault_oops+0x144/0x3e0
>>> Jan 15 22:27:39 6.7.0,4,887,194882539,-,caller=T3523; ? zap_pte_range+0x6a4/0xdc0
>>> Jan 15 22:27:39 6.7.0,4,888,194882638,-,caller=T3523; ? exc_page_fault+0x5d/0xa0
>>> Jan 15 22:27:39 6.7.0,4,889,194882736,-,caller=T3523; ? asm_exc_page_fault+0x22/0x30
>>> Jan 15 22:27:39 6.7.0,4,890,194882834,-,caller=T3523; ? tcp_recvmsg_locked+0x498/0xea0
>>> Jan 15 22:27:39 6.7.0,4,891,194882931,-,caller=T3523; ? __call_rcu_common.constprop.0+0xbc/0x770
>>> Jan 15 22:27:39 6.7.0,4,892,194883031,-,caller=T3523; ? rcu_nocb_flush_bypass.part.0+0xec/0x120
>>> Jan 15 22:27:39 6.7.0,4,893,194883133,-,caller=T3523; tcp_recvmsg+0x5c/0x1e0
>>> Jan 15 22:27:39 6.7.0,4,894,194883228,-,caller=T3523; inet_recvmsg+0x2a/0x90
>>> Jan 15 22:27:39 6.7.0,4,895,194883325,-,caller=T3523; __sys_recvfrom+0x15e/0x200
>>> Jan 15 22:27:39 6.7.0,4,896,194883423,-,caller=T3523; ? wait_task_zombie+0xee/0x410
>>> Jan 15 22:27:39 6.7.0,4,897,194883539,-,caller=T3523; ? remove_wait_queue+0x1b/0x60
>>> Jan 15 22:27:39 6.7.0,4,898,194883635,-,caller=T3523; ? do_wait+0x93/0xa0
>>> Jan 15 22:27:39 6.7.0,4,899,194883729,-,caller=T3523; ? __x64_sys_poll+0xa7/0x170
>>> Jan 15 22:27:39 6.7.0,4,900,194883825,-,caller=T3523; __x64_sys_recvfrom+0x1b/0x20
>>> Jan 15 22:27:39 6.7.0,4,901,194883921,-,caller=T3523; do_syscall_64+0x2c/0xa0
>>> Jan 15 22:27:39 6.7.0,4,902,194884018,-,caller=T3523; entry_SYSCALL_64_after_hwframe+0x46/0x4e
>>> Jan 15 22:27:39 6.7.0,4,903,194884116,-,caller=T3523;RIP: 0033:0x7f4941fe92a9
>>> Jan 15 22:27:39 6.7.0,4,904,194884210,-,caller=T3523;Code: 0c 00 64 c7 02 02 00 00 00 eb bf 66 0f 1f 44 00 00 80 3d a9 e0 0c 00 00 41 89 ca 74 1c 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 55 48 83 ec 20 48 89
>>> Jan 15 22:27:39 6.7.0,4,905,194884377,-,caller=T3523;RSP: 002b:00007fff33317468 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
>>> Jan 15 22:27:39 6.7.0,4,906,194884499,-,caller=T3523;RAX: ffffffffffffffda RBX: 00007fff333174e0 RCX: 00007f4941fe92a9
>>> Jan 15 22:27:39 6.7.0,4,907,194884620,-,caller=T3523;RDX: 0000000000000001 RSI: 00007fff333174e0 RDI: 0000000000000005
>>> Jan 15 22:27:39 6.7.0,4,908,194884740,-,caller=T3523;RBP: 00007fff33317550 R08: 0000000000000000 R09: 0000000000000000
>>> Jan 15 22:27:39 6.7.0,4,909,194884860,-,caller=T3523;R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000
>>> Jan 15 22:27:39 6.7.0,4,910,194884980,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: 00007f49418850a0
>>> Jan 15 22:27:39 6.7.0,4,911,194885101,-,caller=T3523; </TASK>
>>> Jan 15 22:27:39 6.7.0,4,912,194885191,-,caller=T3523;Modules linked in: nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding igb i2c_algo_bit i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos aesni_intel crypto_simd cryptd
>>> Jan 15 22:27:39 6.7.0,4,913,194885507,-,caller=T3523;CR2: 00007fff333174e0
>>> Jan 15 22:27:39 6.7.0,4,914,194885602,-,caller=T3523;---[ end trace 0000000000000000 ]---
>>> Jan 15 22:27:39 6.7.0,4,915,194885698,-,caller=T3523;RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
>>> Jan 15 22:27:39 6.7.0,4,916,194885797,-,caller=T3523;Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
>>> Jan 15 22:27:39 6.7.0,4,917,194887079,-,caller=T3523;RSP: 0018:ffffa47b01307d00 EFLAGS: 00010202
>>> Jan 15 22:27:39 6.7.0,4,918,194887177,-,caller=T3523;RAX: 0000000000000002 RBX: ffff8cf8c3209800 RCX: 00000000a87ac03c
>>> Jan 15 22:27:39 6.7.0,4,919,194887298,-,caller=T3523;RDX: 00007fff333174e0 RSI: ffffa47b01307e18 RDI: ffff8cf8c3209800
>>> Jan 15 22:27:39 6.7.0,4,920,194887418,-,caller=T3523;RBP: ffffa47b01307d78 R08: ffffa47b01307d90 R09: ffffa47b01307d8c
>>> Jan 15 22:27:39 6.7.0,4,921,194887538,-,caller=T3523;R10: 0000000000000002 R11: ffffa47b01307e18 R12: ffff8cf8c3209800
>>> Jan 15 22:27:39 6.7.0,4,922,194887658,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: ffffa47b01307d44
>>> Jan 15 22:27:39 6.7.0,4,923,194887779,-,caller=T3523;FS: 00007f4941b0ad80(0000) GS:ffff8d001f900000(0000) knlGS:0000000000000000
>>> Jan 15 22:27:39 6.7.0,4,924,194887901,-,caller=T3523;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> Jan 15 22:27:39 6.7.0,4,925,194888000,-,caller=T3523;CR2: 00007fff333174e0 CR3: 000000010df04002 CR4: 00000000003706f0
>>> Jan 15 22:27:39 6.7.0,4,926,194888120,-,caller=T3523;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> Jan 15 22:27:39 6.7.0,4,927,194888240,-,caller=T3523;DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Jan 15 22:27:39 6.7.0,0,928,194888360,-,caller=T3523;Kernel panic - not syncing: Fatal exception
>>> Jan 15 22:27:40 6.7.0,0,929,195391096,-,caller=T3523;Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>> Jan 15 22:27:40 6.7.0,0,930,195391224,-,caller=T3523;Rebooting in 10 seconds..
>>>
>>>
>>>
>>> m.
>>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC,net-next] tcp: add support for read with offset when using MSG_PEEK
2024-01-17 16:33 ` Jon Maloy
@ 2024-01-17 17:11 ` Martin Zaharinov
2024-01-26 15:01 ` Martin Zaharinov
1 sibling, 0 replies; 13+ messages in thread
From: Martin Zaharinov @ 2024-01-17 17:11 UTC (permalink / raw)
To: Jon Maloy; +Cc: netdev
Hi Jon,
Two scenarios :
1. in machine with 1 test client , i try to open nginx web interface and machine restart :)
2. when machine boot and start all service and start connect ppp users machine reboot with this bug.
m.
P.S.
this is bug when try to open web:
Jan 17 09:32:49 BUG: unable to handle page fault for address: 00007ffd7e893a70
Jan 17 09:32:49 #PF: supervisor read access in kernel mode
Jan 17 09:32:49 #PF: error_code(0x0001) - permissions violation
Jan 17 09:32:49 PGD 14347a067 P4D 14347a067 PUD 135bc2067 PMD 104e33067 PTE 800000025667d067
Jan 17 09:32:49 Oops: 0001 [#1] SMP
Jan 17 09:32:49 CPU: 2 PID: 1805 Comm: nginx Tainted: G O 6.7.0 #1
Jan 17 09:32:49 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
Jan 17 09:32:49 RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
Jan 17 09:32:49 Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
Jan 17 09:32:49 RSP: 0018:ffffb7a9039d7d00 EFLAGS: 00010202
Jan 17 09:32:49 RAX: 0000000000000042 RBX: ffff94ae43581c80 RCX: 00000000662a8e9e
Jan 17 09:32:49 RDX: 00007ffd7e893a70 RSI: ffffb7a9039d7e18 RDI: ffff94ae43581c80
Jan 17 09:32:49 RBP: ffffb7a9039d7d78 R08: ffffb7a9039d7d90 R09: ffffb7a9039d7d8c
Jan 17 09:32:49 R10: 0000000000000002 R11: ffffb7a9039d7e18 R12: ffff94ae43581c80
Jan 17 09:32:49 R13: 0000000000000000 R14: 0000000000000000 R15: ffffb7a9039d7d44
Jan 17 09:32:49 FS: 00007f957a7b4740(0000) GS:ffff94af77c80000(0000) knlGS:0000000000000000
Jan 17 09:32:49 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 17 09:32:49 CR2: 00007ffd7e893a70 CR3: 000000014346f000 CR4: 00000000003506f0
Jan 17 09:32:49 Call Trace:
Jan 17 09:32:49 <TASK>
Jan 17 09:32:49 ? __die+0xe4/0xf0
Jan 17 09:32:49 ? page_fault_oops+0x144/0x3e0
Jan 17 09:32:49 ? unix_stream_read_generic+0x24f/0xb20
Jan 17 09:32:49 ? exc_page_fault+0x5d/0xa0
Jan 17 09:32:49 ? asm_exc_page_fault+0x22/0x30
Jan 17 09:32:49 ? tcp_recvmsg_locked+0x498/0xea0
Jan 17 09:32:49 ? __schedule+0x36c/0x960
Jan 17 09:32:49 tcp_recvmsg+0x5c/0x1e0
Jan 17 09:32:49 ? schedule_hrtimeout_range_clock+0x28b/0x310
Jan 17 09:32:49 ? vmxnet3_tq_tx_complete.isra.0+0x2b0/0x2b0 [vmxnet3]
Jan 17 09:32:49 inet_recvmsg+0x2a/0x90
Jan 17 09:32:49 __sys_recvfrom+0x15e/0x200
Jan 17 09:32:49 ? ep_busy_loop_end+0x60/0x60
Jan 17 09:32:49 ? ktime_get_ts64+0x44/0xe0
Jan 17 09:32:49 __x64_sys_recvfrom+0x1b/0x20
Jan 17 09:32:49 do_syscall_64+0x2c/0xa0
Jan 17 09:32:49 entry_SYSCALL_64_after_hwframe+0x46/0x4e
Jan 17 09:32:49 RIP: 0033:0x7f957a8f62a9
Jan 17 09:32:49 Code: 0c 00 64 c7 02 02 00 00 00 eb bf 66 0f 1f 44 00 00 80 3d a9 e0 0c 00 00 41 89 ca 74 1c 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 55 48 83 ec 20 48 89
Jan 17 09:32:49 RSP: 002b:00007ffd7e893a48 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
Jan 17 09:32:49 RAX: ffffffffffffffda RBX: 00007f9579cb73d0 RCX: 00007f957a8f62a9
Jan 17 09:32:49 RDX: 0000000000000001 RSI: 00007ffd7e893a70 RDI: 000000000000000d
Jan 17 09:32:49 RBP: 0000000001b29250 R08: 0000000000000000 R09: 0000000000000000
Jan 17 09:32:49 R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000
Jan 17 09:32:49 R13: 0000000001a871a0 R14: 00007ffd7e893a70 R15: 0000000000000000
Jan 17 09:32:49 </TASK>
Jan 17 09:32:49 Modules linked in: pppoe pppox ppp_generic slhc nf_conntrack_sip nf_conntrack_ftp nf_conntrack_pptp nft_ct nft_nat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables netconsole virtio_net net_failover failover virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring vmxnet3 aesni_intel crypto_simd cryptd
Jan 17 09:32:49 CR2: 00007ffd7e893a70
Jan 17 09:32:49 BUG: unable to handle page fault for address: 00007ffd7e893a70
Jan 17 09:32:50 ---[ end trace 0000000000000000 ]---
Jan 17 09:32:50 #PF: supervisor read access in kernel mode
Jan 17 09:32:50 RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
Jan 17 09:32:50 #PF: error_code(0x0001) - permissions violation
Jan 17 09:32:50 Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
Jan 17 09:32:50 PGD 10d24b067
Jan 17 09:32:50 RSP: 0018:ffffb7a9039d7d00 EFLAGS: 00010202
Jan 17 09:32:50 P4D 10d24b067
Jan 17 09:32:50 RAX: 0000000000000042 RBX: ffff94ae43581c80 RCX: 00000000662a8e9e
Jan 17 09:32:50 PUD 10334e067
Jan 17 09:32:50 RDX: 00007ffd7e893a70 RSI: ffffb7a9039d7e18 RDI: ffff94ae43581c80
Jan 17 09:32:50 PMD 11299d067 PTE 8000000148d73067
Jan 17 09:32:50 RBP: ffffb7a9039d7d78 R08: ffffb7a9039d7d90 R09: ffffb7a9039d7d8c
Jan 17 09:32:50
Jan 17 09:32:50 R10: 0000000000000002 R11: ffffb7a9039d7e18 R12: ffff94ae43581c80
Jan 17 09:32:50 Oops: 0001 [#2] SMP
Jan 17 09:32:50 R13: 0000000000000000 R14: 0000000000000000 R15: ffffb7a9039d7d44
Jan 17 09:32:50 CPU: 4 PID: 1807 Comm: nginx Tainted: G D O 6.7.0 #1
Jan 17 09:32:50 FS: 00007f957a7b4740(0000) GS:ffff94af77c80000(0000) knlGS:0000000000000000
Jan 17 09:32:50 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
Jan 17 09:32:50 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 17 09:32:50 RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
Jan 17 09:32:50 CR2: 00007ffd7e893a70 CR3: 000000014346f000 CR4: 00000000003506f0
Jan 17 09:32:50 Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
Jan 17 09:32:50 Kernel panic - not syncing: Fatal exception
Jan 17 09:32:50 RSP: 0018:ffffb7a9039e7d00 EFLAGS: 00010202
Jan 17 09:32:50 RAX: 0000000000000042 RBX: ffff94ae137cc780 RCX: 000000000a5a62eb
Jan 17 09:32:50 RDX: 00007ffd7e893a70 RSI: ffffb7a9039e7e18 RDI: ffff94ae137cc780
Jan 17 09:32:50 RBP: ffffb7a9039e7d78 R08: ffffb7a9039e7d90 R09: ffffb7a9039e7d8c
Jan 17 09:32:50 R10: 0000000000000002 R11: ffffb7a9039e7e18 R12: ffff94ae137cc780
Jan 17 09:32:50 R13: 0000000000000000 R14: 0000000000000000 R15: ffffb7a9039e7d44
Jan 17 09:32:50 FS: 00007f957a7b4740(0000) GS:ffff94af77d00000(0000) knlGS:0000000000000000
Jan 17 09:32:50 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 17 09:32:50 CR2: 00007ffd7e893a70 CR3: 0000000103369000 CR4: 00000000003506f0
Jan 17 09:32:50 Call Trace:
Jan 17 09:32:50 <TASK>
Jan 17 09:32:50 ? __die+0xe4/0xf0
Jan 17 09:32:50 ? page_fault_oops+0x144/0x3e0
Jan 17 09:32:50 ? unix_stream_read_generic+0x24f/0xb20
Jan 17 09:32:50 ? exc_page_fault+0x5d/0xa0
Jan 17 09:32:50 ? asm_exc_page_fault+0x22/0x30
Jan 17 09:32:50 ? tcp_recvmsg_locked+0x498/0xea0
Jan 17 09:32:50 ? __schedule+0x36c/0x960
Jan 17 09:32:50 tcp_recvmsg+0x5c/0x1e0
Jan 17 09:32:50 ? schedule_hrtimeout_range_clock+0x28b/0x310
Jan 17 09:32:50 ? vmxnet3_tq_tx_complete.isra.0+0x2b0/0x2b0 [vmxnet3]
Jan 17 09:32:50 inet_recvmsg+0x2a/0x90
Jan 17 09:32:50 __sys_recvfrom+0x15e/0x200
Jan 17 09:32:50 ? ep_busy_loop_end+0x60/0x60
Jan 17 09:32:50 ? ktime_get_ts64+0x44/0xe0
Jan 17 09:32:50 __x64_sys_recvfrom+0x1b/0x20
Jan 17 09:32:50 do_syscall_64+0x2c/0xa0
Jan 17 09:32:50 entry_SYSCALL_64_after_hwframe+0x46/0x4e
Jan 17 09:32:50 RIP: 0033:0x7f957a8f62a9
Jan 17 09:32:50 Code: 0c 00 64 c7 02 02 00 00 00 eb bf 66 0f 1f 44 00 00 80 3d a9 e0 0c 00 00 41 89 ca 74 1c 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 55 48 83 ec 20 48 89
Jan 17 09:32:50 RSP: 002b:00007ffd7e893a48 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
Jan 17 09:32:50 RAX: ffffffffffffffda RBX: 00007f9579cb73d0 RCX: 00007f957a8f62a9
Jan 17 09:32:50 RDX: 0000000000000001 RSI: 00007ffd7e893a70 RDI: 000000000000000a
Jan 17 09:32:50 RBP: 0000000001b29250 R08: 0000000000000000 R09: 0000000000000000
Jan 17 09:32:50 R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000
Jan 17 09:32:50 R13: 0000000001a871a0 R14: 00007ffd7e893a70 R15: 0000000000000000
Jan 17 09:32:50 </TASK>
Jan 17 09:32:50 Modules linked in: pppoe pppox ppp_generic slhc nf_conntrack_sip nf_conntrack_ftp nf_conntrack_pptp nft_ct nft_nat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables netconsole virtio_net net_failover failover virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev virtio virtio_ring vmxnet3 aesni_intel crypto_simd cryptd
Jan 17 09:32:50 CR2: 00007ffd7e893a70
Jan 17 09:32:50 ---[ end trace 0000000000000000 ]---
Jan 17 09:32:50 RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
Jan 17 09:32:50 Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
Jan 17 09:32:50 RSP: 0018:ffffb7a9039d7d00 EFLAGS: 00010202
Jan 17 09:32:50 RAX: 0000000000000042 RBX: ffff94ae43581c80 RCX: 00000000662a8e9e
Jan 17 09:32:50 RDX: 00007ffd7e893a70 RSI: ffffb7a9039d7e18 RDI: ffff94ae43581c80
Jan 17 09:32:50 RBP: ffffb7a9039d7d78 R08: ffffb7a9039d7d90 R09: ffffb7a9039d7d8c
Jan 17 09:32:50 R10: 0000000000000002 R11: ffffb7a9039d7e18 R12: ffff94ae43581c80
Jan 17 09:32:50 R13: 0000000000000000 R14: 0000000000000000 R15: ffffb7a9039d7d44
Jan 17 09:32:50 FS: 00007f957a7b4740(0000) GS:ffff94af77d00000(0000) knlGS:0000000000000000
Jan 17 09:32:50 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 17 09:32:50 CR2: 00007ffd7e893a70 CR3: 0000000103369000 CR4: 00000000003506f0
Jan 17 09:32:50 Shutting down cpus with NMI
Jan 17 09:32:50 Kernel Offset: 0x25000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Jan 17 09:32:50 Rebooting in 10 seconds..
> On 17 Jan 2024, at 18:33, Jon Maloy <jmaloy@redhat.com> wrote:
>
>
>
> On 2024-01-15 23:59, Martin Zaharinov wrote:
>> Hi Jon,
>>
>> yes same here in our test lab where have one test user all is fine .
>>
>> But when install kernel on production server with 500 users (ppp) and 400-500mbit/s traffic machine crash with this bug log.
>> Its run as isp router firewall + shapers …
> Just to get it straight, does it crash when you are running your test program on top of that heavily loaded machine, or does it just happen randomly when the patch is present?
>
> ////jon
>
>
>
>
>>
>> m.
>>
>>> On 16 Jan 2024, at 0:41, Jon Maloy <jmaloy@redhat.com> wrote:
>>>
>>>
>>>
>>> On 2024-01-15 16:51, Martin Zaharinov wrote:
>>>> Hi Jon
>>>>
>>>> After apply the patch on kernel 6.7.0 system hang with this bug report :
>>> Hmm,
>>> I have been running this for weeks without any problems, on x86_64 with current net and net-next.
>>> There must be some difference between our kernels.
>>> Which configuration are you using?
>>> It would also be interesting to see your test program.
>>>
>>> Regards
>>> ///jon
>>>
>>>
>>>> Jan 15 22:27:39 6.7.0,1,863,194879739,-,caller=T3523;BUG: unable to handle page fault for address: 00007fff333174e0
>>>> Jan 15 22:27:39 6.7.0,1,864,194879876,-,caller=T3523;#PF: supervisor read access in kernel mode
>>>> Jan 15 22:27:39 6.7.0,1,865,194879976,-,caller=T3523;#PF: error_code(0x0001) - permissions violation
>>>> Jan 15 22:27:39 6.7.0,6,866,194880075,-,caller=T3523;PGD 107cbd067 P4D 107cbd067 PUD 22055d067 PMD 10a384067 PTE 8000000228b00067
>>>> Jan 15 22:27:39 6.7.0,4,867,194880202,-,caller=T3523;Oops: 0001 [#1] SMP
>>>> Jan 15 22:27:39 6.7.0,4,868,194880297,-,caller=T3523;CPU: 12 PID: 3523 Comm: server-nft Tainted: G O 6.7.0 #1
>>>> Jan 15 22:27:39 6.7.0,4,869,194880420,-,caller=T3523;Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C612D8, BIOS P2.30 04/30/2018
>>>> Jan 15 22:27:39 6.7.0,4,870,194880547,-,caller=T3523;RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
>>>> Jan 15 22:27:39 6.7.0,4,871,194880709,-,caller=T3523;Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
>>>> Jan 15 22:27:39 6.7.0,4,872,194880876,-,caller=T3523;RSP: 0018:ffffa47b01307d00 EFLAGS: 00010202
>>>> Jan 15 22:27:39 6.7.0,4,873,194880975,-,caller=T3523;RAX: 0000000000000002 RBX: ffff8cf8c3209800 RCX: 00000000a87ac03c
>>>> Jan 15 22:27:39 6.7.0,4,874,194881096,-,caller=T3523;RDX: 00007fff333174e0 RSI: ffffa47b01307e18 RDI: ffff8cf8c3209800
>>>> Jan 15 22:27:39 6.7.0,4,875,194881217,-,caller=T3523;RBP: ffffa47b01307d78 R08: ffffa47b01307d90 R09: ffffa47b01307d8c
>>>> Jan 15 22:27:39 6.7.0,4,876,194881338,-,caller=T3523;R10: 0000000000000002 R11: ffffa47b01307e18 R12: ffff8cf8c3209800
>>>> Jan 15 22:27:39 6.7.0,4,877,194881458,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: ffffa47b01307d44
>>>> Jan 15 22:27:39 6.7.0,4,878,194881579,-,caller=T3523;FS: 00007f4941b0ad80(0000) GS:ffff8d001f900000(0000) knlGS:0000000000000000
>>>> Jan 15 22:27:39 6.7.0,4,879,194881703,-,caller=T3523;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> Jan 15 22:27:39 6.7.0,4,880,194881802,-,caller=T3523;CR2: 00007fff333174e0 CR3: 000000010df04002 CR4: 00000000003706f0
>>>> Jan 15 22:27:39 6.7.0,4,881,194881922,-,caller=T3523;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> Jan 15 22:27:39 6.7.0,4,882,194882043,-,caller=T3523;DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Jan 15 22:27:39 6.7.0,4,883,194882164,-,caller=T3523;Call Trace:
>>>> Jan 15 22:27:39 6.7.0,4,884,194882257,-,caller=T3523; <TASK>
>>>> Jan 15 22:27:39 6.7.0,4,885,194882347,-,caller=T3523; ? __die+0xe4/0xf0
>>>> Jan 15 22:27:39 6.7.0,4,886,194882442,-,caller=T3523; ? page_fault_oops+0x144/0x3e0
>>>> Jan 15 22:27:39 6.7.0,4,887,194882539,-,caller=T3523; ? zap_pte_range+0x6a4/0xdc0
>>>> Jan 15 22:27:39 6.7.0,4,888,194882638,-,caller=T3523; ? exc_page_fault+0x5d/0xa0
>>>> Jan 15 22:27:39 6.7.0,4,889,194882736,-,caller=T3523; ? asm_exc_page_fault+0x22/0x30
>>>> Jan 15 22:27:39 6.7.0,4,890,194882834,-,caller=T3523; ? tcp_recvmsg_locked+0x498/0xea0
>>>> Jan 15 22:27:39 6.7.0,4,891,194882931,-,caller=T3523; ? __call_rcu_common.constprop.0+0xbc/0x770
>>>> Jan 15 22:27:39 6.7.0,4,892,194883031,-,caller=T3523; ? rcu_nocb_flush_bypass.part.0+0xec/0x120
>>>> Jan 15 22:27:39 6.7.0,4,893,194883133,-,caller=T3523; tcp_recvmsg+0x5c/0x1e0
>>>> Jan 15 22:27:39 6.7.0,4,894,194883228,-,caller=T3523; inet_recvmsg+0x2a/0x90
>>>> Jan 15 22:27:39 6.7.0,4,895,194883325,-,caller=T3523; __sys_recvfrom+0x15e/0x200
>>>> Jan 15 22:27:39 6.7.0,4,896,194883423,-,caller=T3523; ? wait_task_zombie+0xee/0x410
>>>> Jan 15 22:27:39 6.7.0,4,897,194883539,-,caller=T3523; ? remove_wait_queue+0x1b/0x60
>>>> Jan 15 22:27:39 6.7.0,4,898,194883635,-,caller=T3523; ? do_wait+0x93/0xa0
>>>> Jan 15 22:27:39 6.7.0,4,899,194883729,-,caller=T3523; ? __x64_sys_poll+0xa7/0x170
>>>> Jan 15 22:27:39 6.7.0,4,900,194883825,-,caller=T3523; __x64_sys_recvfrom+0x1b/0x20
>>>> Jan 15 22:27:39 6.7.0,4,901,194883921,-,caller=T3523; do_syscall_64+0x2c/0xa0
>>>> Jan 15 22:27:39 6.7.0,4,902,194884018,-,caller=T3523; entry_SYSCALL_64_after_hwframe+0x46/0x4e
>>>> Jan 15 22:27:39 6.7.0,4,903,194884116,-,caller=T3523;RIP: 0033:0x7f4941fe92a9
>>>> Jan 15 22:27:39 6.7.0,4,904,194884210,-,caller=T3523;Code: 0c 00 64 c7 02 02 00 00 00 eb bf 66 0f 1f 44 00 00 80 3d a9 e0 0c 00 00 41 89 ca 74 1c 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 55 48 83 ec 20 48 89
>>>> Jan 15 22:27:39 6.7.0,4,905,194884377,-,caller=T3523;RSP: 002b:00007fff33317468 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
>>>> Jan 15 22:27:39 6.7.0,4,906,194884499,-,caller=T3523;RAX: ffffffffffffffda RBX: 00007fff333174e0 RCX: 00007f4941fe92a9
>>>> Jan 15 22:27:39 6.7.0,4,907,194884620,-,caller=T3523;RDX: 0000000000000001 RSI: 00007fff333174e0 RDI: 0000000000000005
>>>> Jan 15 22:27:39 6.7.0,4,908,194884740,-,caller=T3523;RBP: 00007fff33317550 R08: 0000000000000000 R09: 0000000000000000
>>>> Jan 15 22:27:39 6.7.0,4,909,194884860,-,caller=T3523;R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000
>>>> Jan 15 22:27:39 6.7.0,4,910,194884980,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: 00007f49418850a0
>>>> Jan 15 22:27:39 6.7.0,4,911,194885101,-,caller=T3523; </TASK>
>>>> Jan 15 22:27:39 6.7.0,4,912,194885191,-,caller=T3523;Modules linked in: nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding igb i2c_algo_bit i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos aesni_intel crypto_simd cryptd
>>>> Jan 15 22:27:39 6.7.0,4,913,194885507,-,caller=T3523;CR2: 00007fff333174e0
>>>> Jan 15 22:27:39 6.7.0,4,914,194885602,-,caller=T3523;---[ end trace 0000000000000000 ]---
>>>> Jan 15 22:27:39 6.7.0,4,915,194885698,-,caller=T3523;RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
>>>> Jan 15 22:27:39 6.7.0,4,916,194885797,-,caller=T3523;Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
>>>> Jan 15 22:27:39 6.7.0,4,917,194887079,-,caller=T3523;RSP: 0018:ffffa47b01307d00 EFLAGS: 00010202
>>>> Jan 15 22:27:39 6.7.0,4,918,194887177,-,caller=T3523;RAX: 0000000000000002 RBX: ffff8cf8c3209800 RCX: 00000000a87ac03c
>>>> Jan 15 22:27:39 6.7.0,4,919,194887298,-,caller=T3523;RDX: 00007fff333174e0 RSI: ffffa47b01307e18 RDI: ffff8cf8c3209800
>>>> Jan 15 22:27:39 6.7.0,4,920,194887418,-,caller=T3523;RBP: ffffa47b01307d78 R08: ffffa47b01307d90 R09: ffffa47b01307d8c
>>>> Jan 15 22:27:39 6.7.0,4,921,194887538,-,caller=T3523;R10: 0000000000000002 R11: ffffa47b01307e18 R12: ffff8cf8c3209800
>>>> Jan 15 22:27:39 6.7.0,4,922,194887658,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: ffffa47b01307d44
>>>> Jan 15 22:27:39 6.7.0,4,923,194887779,-,caller=T3523;FS: 00007f4941b0ad80(0000) GS:ffff8d001f900000(0000) knlGS:0000000000000000
>>>> Jan 15 22:27:39 6.7.0,4,924,194887901,-,caller=T3523;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> Jan 15 22:27:39 6.7.0,4,925,194888000,-,caller=T3523;CR2: 00007fff333174e0 CR3: 000000010df04002 CR4: 00000000003706f0
>>>> Jan 15 22:27:39 6.7.0,4,926,194888120,-,caller=T3523;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> Jan 15 22:27:39 6.7.0,4,927,194888240,-,caller=T3523;DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Jan 15 22:27:39 6.7.0,0,928,194888360,-,caller=T3523;Kernel panic - not syncing: Fatal exception
>>>> Jan 15 22:27:40 6.7.0,0,929,195391096,-,caller=T3523;Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>> Jan 15 22:27:40 6.7.0,0,930,195391224,-,caller=T3523;Rebooting in 10 seconds..
>>>>
>>>>
>>>>
>>>> m.
>>>>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC,net-next] tcp: add support for read with offset when using MSG_PEEK
2024-01-17 16:33 ` Jon Maloy
2024-01-17 17:11 ` Martin Zaharinov
@ 2024-01-26 15:01 ` Martin Zaharinov
2024-01-28 18:52 ` Jon Maloy
1 sibling, 1 reply; 13+ messages in thread
From: Martin Zaharinov @ 2024-01-26 15:01 UTC (permalink / raw)
To: Jon Maloy; +Cc: netdev
Hi Jon
For now run release v2 : https://patchwork.kernel.org/project/netdevbpf/patch/20240120165218.2283302-1-jmaloy@redhat.com/
and work without any problem.
If i see any will update you.
Thanks
Martin
> On 17 Jan 2024, at 18:33, Jon Maloy <jmaloy@redhat.com> wrote:
>
>
>
> On 2024-01-15 23:59, Martin Zaharinov wrote:
>> Hi Jon,
>>
>> yes same here in our test lab where have one test user all is fine .
>>
>> But when install kernel on production server with 500 users (ppp) and 400-500mbit/s traffic machine crash with this bug log.
>> Its run as isp router firewall + shapers …
> Just to get it straight, does it crash when you are running your test program on top of that heavily loaded machine, or does it just happen randomly when the patch is present?
>
> ////jon
>
>
>
>
>>
>> m.
>>
>>> On 16 Jan 2024, at 0:41, Jon Maloy <jmaloy@redhat.com> wrote:
>>>
>>>
>>>
>>> On 2024-01-15 16:51, Martin Zaharinov wrote:
>>>> Hi Jon
>>>>
>>>> After apply the patch on kernel 6.7.0 system hang with this bug report :
>>> Hmm,
>>> I have been running this for weeks without any problems, on x86_64 with current net and net-next.
>>> There must be some difference between our kernels.
>>> Which configuration are you using?
>>> It would also be interesting to see your test program.
>>>
>>> Regards
>>> ///jon
>>>
>>>
>>>> Jan 15 22:27:39 6.7.0,1,863,194879739,-,caller=T3523;BUG: unable to handle page fault for address: 00007fff333174e0
>>>> Jan 15 22:27:39 6.7.0,1,864,194879876,-,caller=T3523;#PF: supervisor read access in kernel mode
>>>> Jan 15 22:27:39 6.7.0,1,865,194879976,-,caller=T3523;#PF: error_code(0x0001) - permissions violation
>>>> Jan 15 22:27:39 6.7.0,6,866,194880075,-,caller=T3523;PGD 107cbd067 P4D 107cbd067 PUD 22055d067 PMD 10a384067 PTE 8000000228b00067
>>>> Jan 15 22:27:39 6.7.0,4,867,194880202,-,caller=T3523;Oops: 0001 [#1] SMP
>>>> Jan 15 22:27:39 6.7.0,4,868,194880297,-,caller=T3523;CPU: 12 PID: 3523 Comm: server-nft Tainted: G O 6.7.0 #1
>>>> Jan 15 22:27:39 6.7.0,4,869,194880420,-,caller=T3523;Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C612D8, BIOS P2.30 04/30/2018
>>>> Jan 15 22:27:39 6.7.0,4,870,194880547,-,caller=T3523;RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
>>>> Jan 15 22:27:39 6.7.0,4,871,194880709,-,caller=T3523;Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
>>>> Jan 15 22:27:39 6.7.0,4,872,194880876,-,caller=T3523;RSP: 0018:ffffa47b01307d00 EFLAGS: 00010202
>>>> Jan 15 22:27:39 6.7.0,4,873,194880975,-,caller=T3523;RAX: 0000000000000002 RBX: ffff8cf8c3209800 RCX: 00000000a87ac03c
>>>> Jan 15 22:27:39 6.7.0,4,874,194881096,-,caller=T3523;RDX: 00007fff333174e0 RSI: ffffa47b01307e18 RDI: ffff8cf8c3209800
>>>> Jan 15 22:27:39 6.7.0,4,875,194881217,-,caller=T3523;RBP: ffffa47b01307d78 R08: ffffa47b01307d90 R09: ffffa47b01307d8c
>>>> Jan 15 22:27:39 6.7.0,4,876,194881338,-,caller=T3523;R10: 0000000000000002 R11: ffffa47b01307e18 R12: ffff8cf8c3209800
>>>> Jan 15 22:27:39 6.7.0,4,877,194881458,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: ffffa47b01307d44
>>>> Jan 15 22:27:39 6.7.0,4,878,194881579,-,caller=T3523;FS: 00007f4941b0ad80(0000) GS:ffff8d001f900000(0000) knlGS:0000000000000000
>>>> Jan 15 22:27:39 6.7.0,4,879,194881703,-,caller=T3523;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> Jan 15 22:27:39 6.7.0,4,880,194881802,-,caller=T3523;CR2: 00007fff333174e0 CR3: 000000010df04002 CR4: 00000000003706f0
>>>> Jan 15 22:27:39 6.7.0,4,881,194881922,-,caller=T3523;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> Jan 15 22:27:39 6.7.0,4,882,194882043,-,caller=T3523;DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Jan 15 22:27:39 6.7.0,4,883,194882164,-,caller=T3523;Call Trace:
>>>> Jan 15 22:27:39 6.7.0,4,884,194882257,-,caller=T3523; <TASK>
>>>> Jan 15 22:27:39 6.7.0,4,885,194882347,-,caller=T3523; ? __die+0xe4/0xf0
>>>> Jan 15 22:27:39 6.7.0,4,886,194882442,-,caller=T3523; ? page_fault_oops+0x144/0x3e0
>>>> Jan 15 22:27:39 6.7.0,4,887,194882539,-,caller=T3523; ? zap_pte_range+0x6a4/0xdc0
>>>> Jan 15 22:27:39 6.7.0,4,888,194882638,-,caller=T3523; ? exc_page_fault+0x5d/0xa0
>>>> Jan 15 22:27:39 6.7.0,4,889,194882736,-,caller=T3523; ? asm_exc_page_fault+0x22/0x30
>>>> Jan 15 22:27:39 6.7.0,4,890,194882834,-,caller=T3523; ? tcp_recvmsg_locked+0x498/0xea0
>>>> Jan 15 22:27:39 6.7.0,4,891,194882931,-,caller=T3523; ? __call_rcu_common.constprop.0+0xbc/0x770
>>>> Jan 15 22:27:39 6.7.0,4,892,194883031,-,caller=T3523; ? rcu_nocb_flush_bypass.part.0+0xec/0x120
>>>> Jan 15 22:27:39 6.7.0,4,893,194883133,-,caller=T3523; tcp_recvmsg+0x5c/0x1e0
>>>> Jan 15 22:27:39 6.7.0,4,894,194883228,-,caller=T3523; inet_recvmsg+0x2a/0x90
>>>> Jan 15 22:27:39 6.7.0,4,895,194883325,-,caller=T3523; __sys_recvfrom+0x15e/0x200
>>>> Jan 15 22:27:39 6.7.0,4,896,194883423,-,caller=T3523; ? wait_task_zombie+0xee/0x410
>>>> Jan 15 22:27:39 6.7.0,4,897,194883539,-,caller=T3523; ? remove_wait_queue+0x1b/0x60
>>>> Jan 15 22:27:39 6.7.0,4,898,194883635,-,caller=T3523; ? do_wait+0x93/0xa0
>>>> Jan 15 22:27:39 6.7.0,4,899,194883729,-,caller=T3523; ? __x64_sys_poll+0xa7/0x170
>>>> Jan 15 22:27:39 6.7.0,4,900,194883825,-,caller=T3523; __x64_sys_recvfrom+0x1b/0x20
>>>> Jan 15 22:27:39 6.7.0,4,901,194883921,-,caller=T3523; do_syscall_64+0x2c/0xa0
>>>> Jan 15 22:27:39 6.7.0,4,902,194884018,-,caller=T3523; entry_SYSCALL_64_after_hwframe+0x46/0x4e
>>>> Jan 15 22:27:39 6.7.0,4,903,194884116,-,caller=T3523;RIP: 0033:0x7f4941fe92a9
>>>> Jan 15 22:27:39 6.7.0,4,904,194884210,-,caller=T3523;Code: 0c 00 64 c7 02 02 00 00 00 eb bf 66 0f 1f 44 00 00 80 3d a9 e0 0c 00 00 41 89 ca 74 1c 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 55 48 83 ec 20 48 89
>>>> Jan 15 22:27:39 6.7.0,4,905,194884377,-,caller=T3523;RSP: 002b:00007fff33317468 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
>>>> Jan 15 22:27:39 6.7.0,4,906,194884499,-,caller=T3523;RAX: ffffffffffffffda RBX: 00007fff333174e0 RCX: 00007f4941fe92a9
>>>> Jan 15 22:27:39 6.7.0,4,907,194884620,-,caller=T3523;RDX: 0000000000000001 RSI: 00007fff333174e0 RDI: 0000000000000005
>>>> Jan 15 22:27:39 6.7.0,4,908,194884740,-,caller=T3523;RBP: 00007fff33317550 R08: 0000000000000000 R09: 0000000000000000
>>>> Jan 15 22:27:39 6.7.0,4,909,194884860,-,caller=T3523;R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000
>>>> Jan 15 22:27:39 6.7.0,4,910,194884980,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: 00007f49418850a0
>>>> Jan 15 22:27:39 6.7.0,4,911,194885101,-,caller=T3523; </TASK>
>>>> Jan 15 22:27:39 6.7.0,4,912,194885191,-,caller=T3523;Modules linked in: nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding igb i2c_algo_bit i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos aesni_intel crypto_simd cryptd
>>>> Jan 15 22:27:39 6.7.0,4,913,194885507,-,caller=T3523;CR2: 00007fff333174e0
>>>> Jan 15 22:27:39 6.7.0,4,914,194885602,-,caller=T3523;---[ end trace 0000000000000000 ]---
>>>> Jan 15 22:27:39 6.7.0,4,915,194885698,-,caller=T3523;RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
>>>> Jan 15 22:27:39 6.7.0,4,916,194885797,-,caller=T3523;Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
>>>> Jan 15 22:27:39 6.7.0,4,917,194887079,-,caller=T3523;RSP: 0018:ffffa47b01307d00 EFLAGS: 00010202
>>>> Jan 15 22:27:39 6.7.0,4,918,194887177,-,caller=T3523;RAX: 0000000000000002 RBX: ffff8cf8c3209800 RCX: 00000000a87ac03c
>>>> Jan 15 22:27:39 6.7.0,4,919,194887298,-,caller=T3523;RDX: 00007fff333174e0 RSI: ffffa47b01307e18 RDI: ffff8cf8c3209800
>>>> Jan 15 22:27:39 6.7.0,4,920,194887418,-,caller=T3523;RBP: ffffa47b01307d78 R08: ffffa47b01307d90 R09: ffffa47b01307d8c
>>>> Jan 15 22:27:39 6.7.0,4,921,194887538,-,caller=T3523;R10: 0000000000000002 R11: ffffa47b01307e18 R12: ffff8cf8c3209800
>>>> Jan 15 22:27:39 6.7.0,4,922,194887658,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: ffffa47b01307d44
>>>> Jan 15 22:27:39 6.7.0,4,923,194887779,-,caller=T3523;FS: 00007f4941b0ad80(0000) GS:ffff8d001f900000(0000) knlGS:0000000000000000
>>>> Jan 15 22:27:39 6.7.0,4,924,194887901,-,caller=T3523;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> Jan 15 22:27:39 6.7.0,4,925,194888000,-,caller=T3523;CR2: 00007fff333174e0 CR3: 000000010df04002 CR4: 00000000003706f0
>>>> Jan 15 22:27:39 6.7.0,4,926,194888120,-,caller=T3523;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>> Jan 15 22:27:39 6.7.0,4,927,194888240,-,caller=T3523;DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>> Jan 15 22:27:39 6.7.0,0,928,194888360,-,caller=T3523;Kernel panic - not syncing: Fatal exception
>>>> Jan 15 22:27:40 6.7.0,0,929,195391096,-,caller=T3523;Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>> Jan 15 22:27:40 6.7.0,0,930,195391224,-,caller=T3523;Rebooting in 10 seconds..
>>>>
>>>>
>>>>
>>>> m.
>>>>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC,net-next] tcp: add support for read with offset when using MSG_PEEK
2024-01-26 15:01 ` Martin Zaharinov
@ 2024-01-28 18:52 ` Jon Maloy
0 siblings, 0 replies; 13+ messages in thread
From: Jon Maloy @ 2024-01-28 18:52 UTC (permalink / raw)
To: Martin Zaharinov; +Cc: netdev
On 2024-01-26 10:01, Martin Zaharinov wrote:
> Hi Jon
>
> For now run release v2 : https://patchwork.kernel.org/project/netdevbpf/patch/20240120165218.2283302-1-jmaloy@redhat.com/
>
> and work without any problem.
> If i see any will update you.
Thanks.
I am now working on a v3 version where I am trying SO_PEEK_OFF flag
instead, as suggested by Paolo Abeni.
///jon
>
> Thanks
> Martin
>
>> On 17 Jan 2024, at 18:33, Jon Maloy <jmaloy@redhat.com> wrote:
>>
>>
>>
>> On 2024-01-15 23:59, Martin Zaharinov wrote:
>>> Hi Jon,
>>>
>>> yes same here in our test lab where have one test user all is fine .
>>>
>>> But when install kernel on production server with 500 users (ppp) and 400-500mbit/s traffic machine crash with this bug log.
>>> Its run as isp router firewall + shapers …
>> Just to get it straight, does it crash when you are running your test program on top of that heavily loaded machine, or does it just happen randomly when the patch is present?
>>
>> ////jon
>>
>>
>>
>>
>>> m.
>>>
>>>> On 16 Jan 2024, at 0:41, Jon Maloy <jmaloy@redhat.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2024-01-15 16:51, Martin Zaharinov wrote:
>>>>> Hi Jon
>>>>>
>>>>> After apply the patch on kernel 6.7.0 system hang with this bug report :
>>>> Hmm,
>>>> I have been running this for weeks without any problems, on x86_64 with current net and net-next.
>>>> There must be some difference between our kernels.
>>>> Which configuration are you using?
>>>> It would also be interesting to see your test program.
>>>>
>>>> Regards
>>>> ///jon
>>>>
>>>>
>>>>> Jan 15 22:27:39 6.7.0,1,863,194879739,-,caller=T3523;BUG: unable to handle page fault for address: 00007fff333174e0
>>>>> Jan 15 22:27:39 6.7.0,1,864,194879876,-,caller=T3523;#PF: supervisor read access in kernel mode
>>>>> Jan 15 22:27:39 6.7.0,1,865,194879976,-,caller=T3523;#PF: error_code(0x0001) - permissions violation
>>>>> Jan 15 22:27:39 6.7.0,6,866,194880075,-,caller=T3523;PGD 107cbd067 P4D 107cbd067 PUD 22055d067 PMD 10a384067 PTE 8000000228b00067
>>>>> Jan 15 22:27:39 6.7.0,4,867,194880202,-,caller=T3523;Oops: 0001 [#1] SMP
>>>>> Jan 15 22:27:39 6.7.0,4,868,194880297,-,caller=T3523;CPU: 12 PID: 3523 Comm: server-nft Tainted: G O 6.7.0 #1
>>>>> Jan 15 22:27:39 6.7.0,4,869,194880420,-,caller=T3523;Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./EP2C612D8, BIOS P2.30 04/30/2018
>>>>> Jan 15 22:27:39 6.7.0,4,870,194880547,-,caller=T3523;RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
>>>>> Jan 15 22:27:39 6.7.0,4,871,194880709,-,caller=T3523;Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
>>>>> Jan 15 22:27:39 6.7.0,4,872,194880876,-,caller=T3523;RSP: 0018:ffffa47b01307d00 EFLAGS: 00010202
>>>>> Jan 15 22:27:39 6.7.0,4,873,194880975,-,caller=T3523;RAX: 0000000000000002 RBX: ffff8cf8c3209800 RCX: 00000000a87ac03c
>>>>> Jan 15 22:27:39 6.7.0,4,874,194881096,-,caller=T3523;RDX: 00007fff333174e0 RSI: ffffa47b01307e18 RDI: ffff8cf8c3209800
>>>>> Jan 15 22:27:39 6.7.0,4,875,194881217,-,caller=T3523;RBP: ffffa47b01307d78 R08: ffffa47b01307d90 R09: ffffa47b01307d8c
>>>>> Jan 15 22:27:39 6.7.0,4,876,194881338,-,caller=T3523;R10: 0000000000000002 R11: ffffa47b01307e18 R12: ffff8cf8c3209800
>>>>> Jan 15 22:27:39 6.7.0,4,877,194881458,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: ffffa47b01307d44
>>>>> Jan 15 22:27:39 6.7.0,4,878,194881579,-,caller=T3523;FS: 00007f4941b0ad80(0000) GS:ffff8d001f900000(0000) knlGS:0000000000000000
>>>>> Jan 15 22:27:39 6.7.0,4,879,194881703,-,caller=T3523;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> Jan 15 22:27:39 6.7.0,4,880,194881802,-,caller=T3523;CR2: 00007fff333174e0 CR3: 000000010df04002 CR4: 00000000003706f0
>>>>> Jan 15 22:27:39 6.7.0,4,881,194881922,-,caller=T3523;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> Jan 15 22:27:39 6.7.0,4,882,194882043,-,caller=T3523;DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> Jan 15 22:27:39 6.7.0,4,883,194882164,-,caller=T3523;Call Trace:
>>>>> Jan 15 22:27:39 6.7.0,4,884,194882257,-,caller=T3523; <TASK>
>>>>> Jan 15 22:27:39 6.7.0,4,885,194882347,-,caller=T3523; ? __die+0xe4/0xf0
>>>>> Jan 15 22:27:39 6.7.0,4,886,194882442,-,caller=T3523; ? page_fault_oops+0x144/0x3e0
>>>>> Jan 15 22:27:39 6.7.0,4,887,194882539,-,caller=T3523; ? zap_pte_range+0x6a4/0xdc0
>>>>> Jan 15 22:27:39 6.7.0,4,888,194882638,-,caller=T3523; ? exc_page_fault+0x5d/0xa0
>>>>> Jan 15 22:27:39 6.7.0,4,889,194882736,-,caller=T3523; ? asm_exc_page_fault+0x22/0x30
>>>>> Jan 15 22:27:39 6.7.0,4,890,194882834,-,caller=T3523; ? tcp_recvmsg_locked+0x498/0xea0
>>>>> Jan 15 22:27:39 6.7.0,4,891,194882931,-,caller=T3523; ? __call_rcu_common.constprop.0+0xbc/0x770
>>>>> Jan 15 22:27:39 6.7.0,4,892,194883031,-,caller=T3523; ? rcu_nocb_flush_bypass.part.0+0xec/0x120
>>>>> Jan 15 22:27:39 6.7.0,4,893,194883133,-,caller=T3523; tcp_recvmsg+0x5c/0x1e0
>>>>> Jan 15 22:27:39 6.7.0,4,894,194883228,-,caller=T3523; inet_recvmsg+0x2a/0x90
>>>>> Jan 15 22:27:39 6.7.0,4,895,194883325,-,caller=T3523; __sys_recvfrom+0x15e/0x200
>>>>> Jan 15 22:27:39 6.7.0,4,896,194883423,-,caller=T3523; ? wait_task_zombie+0xee/0x410
>>>>> Jan 15 22:27:39 6.7.0,4,897,194883539,-,caller=T3523; ? remove_wait_queue+0x1b/0x60
>>>>> Jan 15 22:27:39 6.7.0,4,898,194883635,-,caller=T3523; ? do_wait+0x93/0xa0
>>>>> Jan 15 22:27:39 6.7.0,4,899,194883729,-,caller=T3523; ? __x64_sys_poll+0xa7/0x170
>>>>> Jan 15 22:27:39 6.7.0,4,900,194883825,-,caller=T3523; __x64_sys_recvfrom+0x1b/0x20
>>>>> Jan 15 22:27:39 6.7.0,4,901,194883921,-,caller=T3523; do_syscall_64+0x2c/0xa0
>>>>> Jan 15 22:27:39 6.7.0,4,902,194884018,-,caller=T3523; entry_SYSCALL_64_after_hwframe+0x46/0x4e
>>>>> Jan 15 22:27:39 6.7.0,4,903,194884116,-,caller=T3523;RIP: 0033:0x7f4941fe92a9
>>>>> Jan 15 22:27:39 6.7.0,4,904,194884210,-,caller=T3523;Code: 0c 00 64 c7 02 02 00 00 00 eb bf 66 0f 1f 44 00 00 80 3d a9 e0 0c 00 00 41 89 ca 74 1c 45 31 c9 45 31 c0 b8 2d 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 67 c3 66 0f 1f 44 00 00 55 48 83 ec 20 48 89
>>>>> Jan 15 22:27:39 6.7.0,4,905,194884377,-,caller=T3523;RSP: 002b:00007fff33317468 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
>>>>> Jan 15 22:27:39 6.7.0,4,906,194884499,-,caller=T3523;RAX: ffffffffffffffda RBX: 00007fff333174e0 RCX: 00007f4941fe92a9
>>>>> Jan 15 22:27:39 6.7.0,4,907,194884620,-,caller=T3523;RDX: 0000000000000001 RSI: 00007fff333174e0 RDI: 0000000000000005
>>>>> Jan 15 22:27:39 6.7.0,4,908,194884740,-,caller=T3523;RBP: 00007fff33317550 R08: 0000000000000000 R09: 0000000000000000
>>>>> Jan 15 22:27:39 6.7.0,4,909,194884860,-,caller=T3523;R10: 0000000000000002 R11: 0000000000000246 R12: 0000000000000000
>>>>> Jan 15 22:27:39 6.7.0,4,910,194884980,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: 00007f49418850a0
>>>>> Jan 15 22:27:39 6.7.0,4,911,194885101,-,caller=T3523; </TASK>
>>>>> Jan 15 22:27:39 6.7.0,4,912,194885191,-,caller=T3523;Modules linked in: nft_limit pppoe pppox ppp_generic slhc nft_ct nft_nat nft_chain_nat nf_tables netconsole coretemp bonding igb i2c_algo_bit i40e ixgbe mdio nf_nat_sip nf_conntrack_sip nf_nat_pptp nf_conntrack_pptp nf_nat_tftp nf_conntrack_tftp nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_si ipmi_devintf ipmi_msghandler rtc_cmos aesni_intel crypto_simd cryptd
>>>>> Jan 15 22:27:39 6.7.0,4,913,194885507,-,caller=T3523;CR2: 00007fff333174e0
>>>>> Jan 15 22:27:39 6.7.0,4,914,194885602,-,caller=T3523;---[ end trace 0000000000000000 ]---
>>>>> Jan 15 22:27:39 6.7.0,4,915,194885698,-,caller=T3523;RIP: 0010:tcp_recvmsg_locked+0x498/0xea0
>>>>> Jan 15 22:27:39 6.7.0,4,916,194885797,-,caller=T3523;Code: a3 07 00 00 80 fa 02 0f 84 88 07 00 00 84 d2 0f 84 f1 04 00 00 41 8b 8c 24 d8 05 00 00 49 8b 53 20 4c 8d 7c 24 44 89 4c 24 44 <48> 83 3a 00 0f 85 e5 fb ff ff 49 8b 73 30 48 83 fe 01 0f 86 c4 04
>>>>> Jan 15 22:27:39 6.7.0,4,917,194887079,-,caller=T3523;RSP: 0018:ffffa47b01307d00 EFLAGS: 00010202
>>>>> Jan 15 22:27:39 6.7.0,4,918,194887177,-,caller=T3523;RAX: 0000000000000002 RBX: ffff8cf8c3209800 RCX: 00000000a87ac03c
>>>>> Jan 15 22:27:39 6.7.0,4,919,194887298,-,caller=T3523;RDX: 00007fff333174e0 RSI: ffffa47b01307e18 RDI: ffff8cf8c3209800
>>>>> Jan 15 22:27:39 6.7.0,4,920,194887418,-,caller=T3523;RBP: ffffa47b01307d78 R08: ffffa47b01307d90 R09: ffffa47b01307d8c
>>>>> Jan 15 22:27:39 6.7.0,4,921,194887538,-,caller=T3523;R10: 0000000000000002 R11: ffffa47b01307e18 R12: ffff8cf8c3209800
>>>>> Jan 15 22:27:39 6.7.0,4,922,194887658,-,caller=T3523;R13: 0000000000000000 R14: 0000000000000000 R15: ffffa47b01307d44
>>>>> Jan 15 22:27:39 6.7.0,4,923,194887779,-,caller=T3523;FS: 00007f4941b0ad80(0000) GS:ffff8d001f900000(0000) knlGS:0000000000000000
>>>>> Jan 15 22:27:39 6.7.0,4,924,194887901,-,caller=T3523;CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> Jan 15 22:27:39 6.7.0,4,925,194888000,-,caller=T3523;CR2: 00007fff333174e0 CR3: 000000010df04002 CR4: 00000000003706f0
>>>>> Jan 15 22:27:39 6.7.0,4,926,194888120,-,caller=T3523;DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> Jan 15 22:27:39 6.7.0,4,927,194888240,-,caller=T3523;DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> Jan 15 22:27:39 6.7.0,0,928,194888360,-,caller=T3523;Kernel panic - not syncing: Fatal exception
>>>>> Jan 15 22:27:40 6.7.0,0,929,195391096,-,caller=T3523;Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>>>> Jan 15 22:27:40 6.7.0,0,930,195391224,-,caller=T3523;Rebooting in 10 seconds..
>>>>>
>>>>>
>>>>>
>>>>> m.
>>>>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC net-next] tcp: add support for read with offset when using MSG_PEEK
@ 2024-01-11 23:00 jmaloy
2024-01-16 10:49 ` Paolo Abeni
0 siblings, 1 reply; 13+ messages in thread
From: jmaloy @ 2024-01-11 23:00 UTC (permalink / raw)
To: netdev, davem; +Cc: kuba, passt-dev, jmaloy, sbrivio, lvivier, dgibson
From: Jon Maloy <jmaloy@redhat.com>
When reading received messages from a socket with MSG_PEEK, we may want
to read the contents with an offset, like we can do with pread/preadv()
when reading files. Currently, it is not possible to do that.
In this commit, we allow the user to set iovec.iov_base in the first
vector entry to NULL. This tells the socket to skip the first entry,
hence letting the iov_len field of that entry indicate the offset value.
This way, there is no need to add any new arguments or flags.
In the iperf3 log examples shown below, we can observe a throughput
improvement of ~15 % in the direction host->namespace when using the
protocol splicer 'pasta' (https://passt.top).
This is a consistent result.
pasta(1) and passt(1) implement user-mode networking for network
namespaces (containers) and virtual machines by means of a translation
layer between Layer-2 network interface and native Layer-4 sockets
(TCP, UDP, ICMP/ICMPv6 echo).
Received, pending TCP data to the container/guest is kept in kernel
buffers until acknowledged, so the tool routinely needs to fetch new
data from socket, skipping data that was already sent.
At the moment this is implemented using a dummy buffer passed to
recvmsg(). With this change, we don't need a dummy buffer and the
related buffer copy (copy_to_user()) anymore.
passt and pasta are supported in KubeVirt and libvirt/qemu.
jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f
MSG_PEEK with offset not supported by kernel.
jmaloy@freyr:~/passt# iperf3 -s
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 192.168.122.1, port 44822
[ 5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 44832
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 1.02 GBytes 8.78 Gbits/sec
[ 5] 1.00-2.00 sec 1.06 GBytes 9.08 Gbits/sec
[ 5] 2.00-3.00 sec 1.07 GBytes 9.15 Gbits/sec
[ 5] 3.00-4.00 sec 1.10 GBytes 9.46 Gbits/sec
[ 5] 4.00-5.00 sec 1.03 GBytes 8.85 Gbits/sec
[ 5] 5.00-6.00 sec 1.10 GBytes 9.44 Gbits/sec
[ 5] 6.00-7.00 sec 1.11 GBytes 9.56 Gbits/sec
[ 5] 7.00-8.00 sec 1.07 GBytes 9.20 Gbits/sec
[ 5] 8.00-9.00 sec 667 MBytes 5.59 Gbits/sec
[ 5] 9.00-10.00 sec 1.03 GBytes 8.83 Gbits/sec
[ 5] 10.00-10.04 sec 30.1 MBytes 6.36 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.04 sec 10.3 GBytes 8.78 Gbits/sec receiver
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------
^Ciperf3: interrupt - the server has terminated
jmaloy@freyr:~/passt#
logout
[ perf record: Woken up 23 times to write data ]
[ perf record: Captured and wrote 5.696 MB perf.data (35580 samples) ]
jmaloy@freyr:~/passt$
jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f
MSG_PEEK with offset supported by kernel.
jmaloy@freyr:~/passt# iperf3 -s
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 192.168.122.1, port 40854
[ 5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 40862
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 1.22 GBytes 10.5 Gbits/sec
[ 5] 1.00-2.00 sec 1.19 GBytes 10.2 Gbits/sec
[ 5] 2.00-3.00 sec 1.22 GBytes 10.5 Gbits/sec
[ 5] 3.00-4.00 sec 1.11 GBytes 9.56 Gbits/sec
[ 5] 4.00-5.00 sec 1.20 GBytes 10.3 Gbits/sec
[ 5] 5.00-6.00 sec 1.14 GBytes 9.80 Gbits/sec
[ 5] 6.00-7.00 sec 1.17 GBytes 10.0 Gbits/sec
[ 5] 7.00-8.00 sec 1.12 GBytes 9.61 Gbits/sec
[ 5] 8.00-9.00 sec 1.13 GBytes 9.74 Gbits/sec
[ 5] 9.00-10.00 sec 1.26 GBytes 10.8 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.04 sec 11.8 GBytes 10.1 Gbits/sec receiver
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------
^Ciperf3: interrupt - the server has terminated
logout
[ perf record: Woken up 20 times to write data ]
[ perf record: Captured and wrote 5.040 MB perf.data (33411 samples) ]
jmaloy@freyr:~/passt$
The perf record confirms this result. Below, we can observe that the
CPU spends significantly less time in the function ____sys_recvmsg()
when we have offset support.
Without offset support:
----------------------
jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 -p ____sys_recvmsg -x --stdio -i perf.data | head -1
46.32% 0.00% passt.avx2 [kernel.vmlinux] [k] do_syscall_64 ____sys_recvmsg
With offset support:
----------------------
jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 -p ____sys_recvmsg -x --stdio -i perf.data | head -1
27.24% 0.00% passt.avx2 [kernel.vmlinux] [k] do_syscall_64 ____sys_recvmsg
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
---
net/ipv4/tcp.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1baa484d2190..82e1da3f0f98 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2351,6 +2351,20 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
if (flags & MSG_PEEK) {
peek_seq = tp->copied_seq;
seq = &peek_seq;
+ if (!msg->msg_iter.__iov[0].iov_base) {
+ size_t peek_offset;
+
+ if (msg->msg_iter.nr_segs < 2) {
+ err = -EINVAL;
+ goto out;
+ }
+ peek_offset = msg->msg_iter.__iov[0].iov_len;
+ msg->msg_iter.__iov = &msg->msg_iter.__iov[1];
+ msg->msg_iter.nr_segs -= 1;
+ msg->msg_iter.count -= peek_offset;
+ len -= peek_offset;
+ *seq += peek_offset;
+ }
}
target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
--
2.42.0
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [RFC net-next] tcp: add support for read with offset when using MSG_PEEK
2024-01-11 23:00 [RFC net-next] " jmaloy
@ 2024-01-16 10:49 ` Paolo Abeni
2024-01-18 22:22 ` Jon Maloy
0 siblings, 1 reply; 13+ messages in thread
From: Paolo Abeni @ 2024-01-16 10:49 UTC (permalink / raw)
To: jmaloy, netdev, davem; +Cc: kuba, passt-dev, sbrivio, lvivier, dgibson
On Thu, 2024-01-11 at 18:00 -0500, jmaloy@redhat.com wrote:
> From: Jon Maloy <jmaloy@redhat.com>
>
> When reading received messages from a socket with MSG_PEEK, we may want
> to read the contents with an offset, like we can do with pread/preadv()
> when reading files. Currently, it is not possible to do that.
>
> In this commit, we allow the user to set iovec.iov_base in the first
> vector entry to NULL. This tells the socket to skip the first entry,
> hence letting the iov_len field of that entry indicate the offset value.
> This way, there is no need to add any new arguments or flags.
>
> In the iperf3 log examples shown below, we can observe a throughput
> improvement of ~15 % in the direction host->namespace when using the
> protocol splicer 'pasta' (https://passt.top).
> This is a consistent result.
>
> pasta(1) and passt(1) implement user-mode networking for network
> namespaces (containers) and virtual machines by means of a translation
> layer between Layer-2 network interface and native Layer-4 sockets
> (TCP, UDP, ICMP/ICMPv6 echo).
>
> Received, pending TCP data to the container/guest is kept in kernel
> buffers until acknowledged, so the tool routinely needs to fetch new
> data from socket, skipping data that was already sent.
>
> At the moment this is implemented using a dummy buffer passed to
> recvmsg(). With this change, we don't need a dummy buffer and the
> related buffer copy (copy_to_user()) anymore.
>
> passt and pasta are supported in KubeVirt and libvirt/qemu.
>
> jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f
> MSG_PEEK with offset not supported by kernel.
>
> jmaloy@freyr:~/passt# iperf3 -s
> -----------------------------------------------------------
> Server listening on 5201 (test #1)
> -----------------------------------------------------------
> Accepted connection from 192.168.122.1, port 44822
> [ 5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 44832
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-1.00 sec 1.02 GBytes 8.78 Gbits/sec
> [ 5] 1.00-2.00 sec 1.06 GBytes 9.08 Gbits/sec
> [ 5] 2.00-3.00 sec 1.07 GBytes 9.15 Gbits/sec
> [ 5] 3.00-4.00 sec 1.10 GBytes 9.46 Gbits/sec
> [ 5] 4.00-5.00 sec 1.03 GBytes 8.85 Gbits/sec
> [ 5] 5.00-6.00 sec 1.10 GBytes 9.44 Gbits/sec
> [ 5] 6.00-7.00 sec 1.11 GBytes 9.56 Gbits/sec
> [ 5] 7.00-8.00 sec 1.07 GBytes 9.20 Gbits/sec
> [ 5] 8.00-9.00 sec 667 MBytes 5.59 Gbits/sec
> [ 5] 9.00-10.00 sec 1.03 GBytes 8.83 Gbits/sec
> [ 5] 10.00-10.04 sec 30.1 MBytes 6.36 Gbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-10.04 sec 10.3 GBytes 8.78 Gbits/sec receiver
> -----------------------------------------------------------
> Server listening on 5201 (test #2)
> -----------------------------------------------------------
> ^Ciperf3: interrupt - the server has terminated
> jmaloy@freyr:~/passt#
> logout
> [ perf record: Woken up 23 times to write data ]
> [ perf record: Captured and wrote 5.696 MB perf.data (35580 samples) ]
> jmaloy@freyr:~/passt$
>
> jmaloy@freyr:~/passt$ perf record -g ./pasta --config-net -f
> MSG_PEEK with offset supported by kernel.
>
> jmaloy@freyr:~/passt# iperf3 -s
> -----------------------------------------------------------
> Server listening on 5201 (test #1)
> -----------------------------------------------------------
> Accepted connection from 192.168.122.1, port 40854
> [ 5] local 192.168.122.180 port 5201 connected to 192.168.122.1 port 40862
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-1.00 sec 1.22 GBytes 10.5 Gbits/sec
> [ 5] 1.00-2.00 sec 1.19 GBytes 10.2 Gbits/sec
> [ 5] 2.00-3.00 sec 1.22 GBytes 10.5 Gbits/sec
> [ 5] 3.00-4.00 sec 1.11 GBytes 9.56 Gbits/sec
> [ 5] 4.00-5.00 sec 1.20 GBytes 10.3 Gbits/sec
> [ 5] 5.00-6.00 sec 1.14 GBytes 9.80 Gbits/sec
> [ 5] 6.00-7.00 sec 1.17 GBytes 10.0 Gbits/sec
> [ 5] 7.00-8.00 sec 1.12 GBytes 9.61 Gbits/sec
> [ 5] 8.00-9.00 sec 1.13 GBytes 9.74 Gbits/sec
> [ 5] 9.00-10.00 sec 1.26 GBytes 10.8 Gbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-10.04 sec 11.8 GBytes 10.1 Gbits/sec receiver
> -----------------------------------------------------------
> Server listening on 5201 (test #2)
> -----------------------------------------------------------
> ^Ciperf3: interrupt - the server has terminated
> logout
> [ perf record: Woken up 20 times to write data ]
> [ perf record: Captured and wrote 5.040 MB perf.data (33411 samples) ]
> jmaloy@freyr:~/passt$
>
> The perf record confirms this result. Below, we can observe that the
> CPU spends significantly less time in the function ____sys_recvmsg()
> when we have offset support.
>
> Without offset support:
> ----------------------
> jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 -p ____sys_recvmsg -x --stdio -i perf.data | head -1
> 46.32% 0.00% passt.avx2 [kernel.vmlinux] [k] do_syscall_64 ____sys_recvmsg
>
> With offset support:
> ----------------------
> jmaloy@freyr:~/passt$ perf report -q --symbol-filter=do_syscall_64 -p ____sys_recvmsg -x --stdio -i perf.data | head -1
> 27.24% 0.00% passt.avx2 [kernel.vmlinux] [k] do_syscall_64 ____sys_recvmsg
>
> Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
> Signed-off-by: Jon Maloy <jmaloy@redhat.com>
> ---
> net/ipv4/tcp.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 1baa484d2190..82e1da3f0f98 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2351,6 +2351,20 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
> if (flags & MSG_PEEK) {
> peek_seq = tp->copied_seq;
> seq = &peek_seq;
> + if (!msg->msg_iter.__iov[0].iov_base) {
> + size_t peek_offset;
> +
> + if (msg->msg_iter.nr_segs < 2) {
> + err = -EINVAL;
> + goto out;
> + }
> + peek_offset = msg->msg_iter.__iov[0].iov_len;
> + msg->msg_iter.__iov = &msg->msg_iter.__iov[1];
> + msg->msg_iter.nr_segs -= 1;
> + msg->msg_iter.count -= peek_offset;
> + len -= peek_offset;
> + *seq += peek_offset;
> + }
IMHO this does not look like the correct interface to expose such
functionality. Doing the same with a different protocol should cause a
SIGSEG or the like, right?
What about using/implementing SO_PEEK_OFF support instead?
Cheers,
Paolo
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [RFC net-next] tcp: add support for read with offset when using MSG_PEEK
2024-01-16 10:49 ` Paolo Abeni
@ 2024-01-18 22:22 ` Jon Maloy
2024-01-21 22:16 ` Stefano Brivio
0 siblings, 1 reply; 13+ messages in thread
From: Jon Maloy @ 2024-01-18 22:22 UTC (permalink / raw)
To: Paolo Abeni, netdev, davem; +Cc: kuba, passt-dev, sbrivio, lvivier, dgibson
On 2024-01-16 05:49, Paolo Abeni wrote:
> On Thu, 2024-01-11 at 18:00 -0500, jmaloy@redhat.com wrote:
>> From: Jon Maloy <jmaloy@redhat.com>
>>
>> When reading received messages from a socket with MSG_PEEK, we may want
>> to read the contents with an offset, like we can do with pread/preadv()
>> when reading files. Currently, it is not possible to do that.
[...]
>> + err = -EINVAL;
>> + goto out;
>> + }
>> + peek_offset = msg->msg_iter.__iov[0].iov_len;
>> + msg->msg_iter.__iov = &msg->msg_iter.__iov[1];
>> + msg->msg_iter.nr_segs -= 1;
>> + msg->msg_iter.count -= peek_offset;
>> + len -= peek_offset;
>> + *seq += peek_offset;
>> + }
> IMHO this does not look like the correct interface to expose such
> functionality. Doing the same with a different protocol should cause a
> SIGSEG or the like, right?
I would expect doing the same thing with a different protocol to cause
an EFAULT, as it should. But I haven't tried it.
This is a change to TCP only, at least until somebody decides to
implement it elsewhere (why not?)
>
> What about using/implementing SO_PEEK_OFF support instead?
I looked at SO_PEEK_OFF, and it honestly looks both awkward and limited.
We would have to make frequent calls to setsockopt(), something that
would beat much of the purpose of this feature.
I stand by my opinion here.
This feature is simple, non-intrusive, totally backwards compatible and
implies no changes to the API or BPI.
I would love to hear other opinions on this, though.
Regards
/jon
>
> Cheers,
>
> Paolo
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC net-next] tcp: add support for read with offset when using MSG_PEEK
2024-01-18 22:22 ` Jon Maloy
@ 2024-01-21 22:16 ` Stefano Brivio
2024-01-22 16:22 ` Jon Maloy
0 siblings, 1 reply; 13+ messages in thread
From: Stefano Brivio @ 2024-01-21 22:16 UTC (permalink / raw)
To: Jon Maloy, Paolo Abeni; +Cc: netdev, davem, kuba, passt-dev, lvivier, dgibson
On Thu, 18 Jan 2024 17:22:52 -0500
Jon Maloy <jmaloy@redhat.com> wrote:
> On 2024-01-16 05:49, Paolo Abeni wrote:
> > On Thu, 2024-01-11 at 18:00 -0500, jmaloy@redhat.com wrote:
> >> From: Jon Maloy <jmaloy@redhat.com>
> >>
> >> When reading received messages from a socket with MSG_PEEK, we may want
> >> to read the contents with an offset, like we can do with pread/preadv()
> >> when reading files. Currently, it is not possible to do that.
> [...]
> >> + err = -EINVAL;
> >> + goto out;
> >> + }
> >> + peek_offset = msg->msg_iter.__iov[0].iov_len;
> >> + msg->msg_iter.__iov = &msg->msg_iter.__iov[1];
> >> + msg->msg_iter.nr_segs -= 1;
> >> + msg->msg_iter.count -= peek_offset;
> >> + len -= peek_offset;
> >> + *seq += peek_offset;
> >> + }
> > IMHO this does not look like the correct interface to expose such
> > functionality. Doing the same with a different protocol should cause a
> > SIGSEG or the like, right?
>
> I would expect doing the same thing with a different protocol to cause
> an EFAULT, as it should. But I haven't tried it.
So, out of curiosity, I actually tried: the current behaviour is
recvmsg() failing with EFAULT, only as data is received (!), for TCP
and UDP with AF_INET, and for AF_UNIX (both datagram and stream).
EFAULT, however, is not in the list of "shall fail", nor "may fail"
conditions described by POSIX.1-2008, so there isn't really anything
that mandates it API-wise.
Likewise, POSIX doesn't require any signal to be delivered (and no
signals are delivered on Linux in any case: note that iov_base is not
dereferenced).
For TCP sockets only, passing a NULL buffer is already supported by
recv() with MSG_TRUNC (same here, Linux extension). This change would
finally make recvmsg() consistent with that TCP-specific bit.
> This is a change to TCP only, at least until somebody decides to
> implement it elsewhere (why not?)
Side note, I can't really think of a reasonable use case for UDP -- it
doesn't quite fit with the notion of message boundaries.
Even letting alone the fact that passt(1) and pasta(1) don't need this
for UDP (no acknowledgement means no need to keep unacknowledged data
anywhere), if another application wants to do something conceptually
similar, we should probably target recvmmsg().
> > What about using/implementing SO_PEEK_OFF support instead?
>
> I looked at SO_PEEK_OFF, and it honestly looks both awkward and limited.
I think it's rather intended to skip headers with fixed size or
suchlike.
> We would have to make frequent calls to setsockopt(), something that
> would beat much of the purpose of this feature.
...right, we would need to reset the SO_PEEK_OFF value at every
recvmsg(), which is probably even worse than the current overhead.
> I stand by my opinion here.
> This feature is simple, non-intrusive, totally backwards compatible and
> implies no changes to the API or BPI.
My thoughts as well, plus the advantage for our user-mode networking
case is quite remarkable given how simple the change is.
> I would love to hear other opinions on this, though.
>
> Regards
> /jon
>
> >
> > Cheers,
> >
> > Paolo
--
Stefano
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [RFC net-next] tcp: add support for read with offset when using MSG_PEEK
2024-01-21 22:16 ` Stefano Brivio
@ 2024-01-22 16:22 ` Jon Maloy
0 siblings, 0 replies; 13+ messages in thread
From: Jon Maloy @ 2024-01-22 16:22 UTC (permalink / raw)
To: Stefano Brivio, Paolo Abeni
Cc: netdev, davem, kuba, passt-dev, lvivier, dgibson
On 2024-01-21 17:16, Stefano Brivio wrote:
> On Thu, 18 Jan 2024 17:22:52 -0500
> Jon Maloy <jmaloy@redhat.com> wrote:
>
>> On 2024-01-16 05:49, Paolo Abeni wrote:
>>> On Thu, 2024-01-11 at 18:00 -0500, jmaloy@redhat.com wrote:
>>>> From: Jon Maloy <jmaloy@redhat.com>
>>>>
>>>> When reading received messages from a socket with MSG_PEEK, we may want
>>>> to read the contents with an offset, like we can do with pread/preadv()
>>>> when reading files. Currently, it is not possible to do that.
>> [...]
>>>> + err = -EINVAL;
>>>> + goto out;
>>>> + }
>>>> + peek_offset = msg->msg_iter.__iov[0].iov_len;
>>>> + msg->msg_iter.__iov = &msg->msg_iter.__iov[1];
>>>> + msg->msg_iter.nr_segs -= 1;
>>>> + msg->msg_iter.count -= peek_offset;
>>>> + len -= peek_offset;
>>>> + *seq += peek_offset;
>>>> + }
>>> IMHO this does not look like the correct interface to expose such
>>> functionality. Doing the same with a different protocol should cause a
>>> SIGSEG or the like, right?
>> I would expect doing the same thing with a different protocol to cause
>> an EFAULT, as it should. But I haven't tried it.
> So, out of curiosity, I actually tried: the current behaviour is
> recvmsg() failing with EFAULT, only as data is received (!), for TCP
> and UDP with AF_INET, and for AF_UNIX (both datagram and stream).
>
> EFAULT, however, is not in the list of "shall fail", nor "may fail"
> conditions described by POSIX.1-2008, so there isn't really anything
> that mandates it API-wise.
>
> Likewise, POSIX doesn't require any signal to be delivered (and no
> signals are delivered on Linux in any case: note that iov_base is not
> dereferenced).
>
> For TCP sockets only, passing a NULL buffer is already supported by
> recv() with MSG_TRUNC (same here, Linux extension). This change would
> finally make recvmsg() consistent with that TCP-specific bit.
>
>> This is a change to TCP only, at least until somebody decides to
>> implement it elsewhere (why not?)
> Side note, I can't really think of a reasonable use case for UDP -- it
> doesn't quite fit with the notion of message boundaries.
>
> Even letting alone the fact that passt(1) and pasta(1) don't need this
> for UDP (no acknowledgement means no need to keep unacknowledged data
> anywhere), if another application wants to do something conceptually
> similar, we should probably target recvmmsg().
>
>>> What about using/implementing SO_PEEK_OFF support instead?
>> I looked at SO_PEEK_OFF, and it honestly looks both awkward and limited.
> I think it's rather intended to skip headers with fixed size or
> suchlike.
>
>> We would have to make frequent calls to setsockopt(), something that
>> would beat much of the purpose of this feature.
> ...right, we would need to reset the SO_PEEK_OFF value at every
> recvmsg(), which is probably even worse than the current overhead.
>
>> I stand by my opinion here.
>> This feature is simple, non-intrusive, totally backwards compatible and
>> implies no changes to the API or BPI.
> My thoughts as well, plus the advantage for our user-mode networking
> case is quite remarkable given how simple the change is.
After pondering more upon this, and also some team internal discussions,
I have decided to give it a try with SO_PEEK_OFF, just to see to see the
outcome, both at kernel level and in user space.
So please wait with any possible application of this , if that ever
happens with RFCs.
///jon
>
>> I would love to hear other opinions on this, though.
>>
>> Regards
>> /jon
>>
>>> Cheers,
>>>
>>> Paolo
^ permalink raw reply [flat|nested] 13+ messages in thread
* [RFC net-next] tcp: add support for read with offset when using MSG_PEEK
@ 2024-01-11 22:22 jmaloy
0 siblings, 0 replies; 13+ messages in thread
From: jmaloy @ 2024-01-11 22:22 UTC (permalink / raw)
To: netdev, davem; +Cc: kuba, passt-dev, jmaloy, sbrivio, lvivier, dgibson
From: Jon Maloy <jmaloy@redhat.com>
When reading received messages with MSG_PEEK, we sometines have to read
the leading bytes of the stream several times, only to reach the bytes
we really want. This is clearly non-optimal.
What we would want is something similar to pread/preadv(), but working
even for tcp sockets. At the same time, we don't want to add any new
arguments to the recv/recvmsg() calls.
In this commit, we allow the user to set iovec.iov_base in the first
vector entry to NULL. This tells the socket to skip the first entry,
hence letting the iov_len field of that entry indicate the offset value.
This way, there is no need to add any new arguments or flags.
In the iperf3 logs examples shown below, we can observe a throughput
improvement of ~20 % in the direction host->namespace when using the
protocol splicer 'passt'. This is a consistent result.
$ ./passt/passt/pasta --config-net -f
MSG_PEEK with offset not supported.
[root@fedora37 ~]# perf record iperf3 -s
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 192.168.122.1, port 60344
[ 6] local 192.168.122.163 port 5201 connected to 192.168.122.1 port 60360
[ ID] Interval Transfer Bitrate
{...]
[ 6] 13.00-14.00 sec 2.54 GBytes 21.8 Gbits/sec
[ 6] 14.00-15.00 sec 2.52 GBytes 21.7 Gbits/sec
[ 6] 15.00-16.00 sec 2.50 GBytes 21.5 Gbits/sec
[ 6] 16.00-17.00 sec 2.49 GBytes 21.4 Gbits/sec
[ 6] 17.00-18.00 sec 2.51 GBytes 21.6 Gbits/sec
[ 6] 18.00-19.00 sec 2.48 GBytes 21.3 Gbits/sec
[ 6] 19.00-20.00 sec 2.49 GBytes 21.4 Gbits/sec
[ 6] 20.00-20.04 sec 87.4 MBytes 19.2 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 6] 0.00-20.04 sec 48.9 GBytes 21.0 Gbits/sec receiver
-----------------------------------------------------------
[jmaloy@fedora37 ~]$ ./passt/passt/pasta --config-net -f
MSG_PEEK with offset supported.
[root@fedora37 ~]# perf record iperf3 -s
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 192.168.122.1, port 46362
[ 6] local 192.168.122.163 port 5201 connected to 192.168.122.1 port 46374
[ ID] Interval Transfer Bitrate
[...]
[ 6] 12.00-13.00 sec 3.18 GBytes 27.3 Gbits/sec
[ 6] 13.00-14.00 sec 3.17 GBytes 27.3 Gbits/sec
[ 6] 14.00-15.00 sec 3.13 GBytes 26.9 Gbits/sec
[ 6] 15.00-16.00 sec 3.17 GBytes 27.3 Gbits/sec
[ 6] 16.00-17.00 sec 3.17 GBytes 27.2 Gbits/sec
[ 6] 17.00-18.00 sec 3.14 GBytes 27.0 Gbits/sec
[ 6] 18.00-19.00 sec 3.17 GBytes 27.2 Gbits/sec
[ 6] 19.00-20.00 sec 3.12 GBytes 26.8 Gbits/sec
[ 6] 20.00-20.04 sec 119 MBytes 25.5 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 6] 0.00-20.04 sec 59.4 GBytes 25.4 Gbits/sec receiver
-----------------------------------------------------------
Passt is used to support VMs in containers, such as KubeVirt, and
is also generally supported in libvirt/QEMU since release 9.2 / 7.2.
Signed-off-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Jon Paul Maloy <jmaloy@redhat.com>
---
net/ipv4/tcp.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 53bcc17c91e4..e9d3b5bf2f66 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2310,6 +2310,7 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
int *cmsg_flags)
{
struct tcp_sock *tp = tcp_sk(sk);
+ size_t peek_offset;
int copied = 0;
u32 peek_seq;
u32 *seq;
@@ -2353,6 +2354,20 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
if (flags & MSG_PEEK) {
peek_seq = tp->copied_seq;
seq = &peek_seq;
+ if (!msg->msg_iter.__iov[0].iov_base) {
+ peek_offset = msg->msg_iter.__iov[0].iov_len;
+ msg->msg_iter.__iov = &msg->msg_iter.__iov[1];
+ if (msg->msg_iter.nr_segs <= 1)
+ goto out;
+ msg->msg_iter.nr_segs -= 1;
+ if (msg->msg_iter.count <= peek_offset)
+ goto out;
+ msg->msg_iter.count -= peek_offset;
+ if (len <= peek_offset)
+ goto out;
+ len -= peek_offset;
+ *seq += peek_offset;
+ }
}
target = sock_rcvlowat(sk, flags & MSG_WAITALL, len);
--
2.39.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-01-28 18:52 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-15 21:51 [RFC,net-next] tcp: add support for read with offset when using MSG_PEEK Martin Zaharinov
2024-01-15 22:41 ` Jon Maloy
2024-01-16 4:59 ` Martin Zaharinov
2024-01-17 16:33 ` Jon Maloy
2024-01-17 17:11 ` Martin Zaharinov
2024-01-26 15:01 ` Martin Zaharinov
2024-01-28 18:52 ` Jon Maloy
-- strict thread matches above, loose matches on Subject: below --
2024-01-11 23:00 [RFC net-next] " jmaloy
2024-01-16 10:49 ` Paolo Abeni
2024-01-18 22:22 ` Jon Maloy
2024-01-21 22:16 ` Stefano Brivio
2024-01-22 16:22 ` Jon Maloy
2024-01-11 22:22 jmaloy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).