* [PATCH] net: micro-optimize skb_datagram_iter @ 2024-06-13 11:35 Sagi Grimberg 2024-06-15 2:40 ` patchwork-bot+netdevbpf 2024-06-16 8:06 ` kernel test robot 0 siblings, 2 replies; 17+ messages in thread From: Sagi Grimberg @ 2024-06-13 11:35 UTC (permalink / raw) To: netdev; +Cc: Eric Dumazet, Jakub Kicinski We only use the mapping in a single context in a short and contained scope, so kmap_local_page is sufficient and cheaper. This will also allow skb_datagram_iter to be called from softirq context. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> --- net/core/datagram.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/core/datagram.c b/net/core/datagram.c index a8b625abe242..ac74df248205 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -436,14 +436,14 @@ static int __skb_datagram_iter(const struct sk_buff *skb, int offset, end = start + skb_frag_size(frag); if ((copy = end - offset) > 0) { struct page *page = skb_frag_page(frag); - u8 *vaddr = kmap(page); + u8 *vaddr = kmap_local_page(page); if (copy > len) copy = len; n = INDIRECT_CALL_1(cb, simple_copy_to_iter, vaddr + skb_frag_off(frag) + offset - start, copy, data, to); - kunmap(page); + kunmap_local(vaddr); offset += n; if (n != copy) goto short_copy; -- 2.43.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-13 11:35 [PATCH] net: micro-optimize skb_datagram_iter Sagi Grimberg @ 2024-06-15 2:40 ` patchwork-bot+netdevbpf 2024-06-16 8:06 ` kernel test robot 1 sibling, 0 replies; 17+ messages in thread From: patchwork-bot+netdevbpf @ 2024-06-15 2:40 UTC (permalink / raw) To: Sagi Grimberg; +Cc: netdev, edumazet, kuba Hello: This patch was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Thu, 13 Jun 2024 14:35:04 +0300 you wrote: > We only use the mapping in a single context in a short and contained scope, > so kmap_local_page is sufficient and cheaper. This will also allow > skb_datagram_iter to be called from softirq context. > > Signed-off-by: Sagi Grimberg <sagi@grimberg.me> > --- > net/core/datagram.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Here is the summary with links: - net: micro-optimize skb_datagram_iter https://git.kernel.org/netdev/net-next/c/934c29999b57 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-13 11:35 [PATCH] net: micro-optimize skb_datagram_iter Sagi Grimberg 2024-06-15 2:40 ` patchwork-bot+netdevbpf @ 2024-06-16 8:06 ` kernel test robot 2024-06-16 9:24 ` Sagi Grimberg 1 sibling, 1 reply; 17+ messages in thread From: kernel test robot @ 2024-06-16 8:06 UTC (permalink / raw) To: Sagi Grimberg Cc: oe-lkp, lkp, netdev, Eric Dumazet, Jakub Kicinski, oliver.sang Hello, kernel test robot noticed "kernel_BUG_at_mm/usercopy.c" on: commit: 18f0423b9eccb781310af6709ceebf654175af14 ("[PATCH] net: micro-optimize skb_datagram_iter") url: https://github.com/intel-lab-lkp/linux/commits/Sagi-Grimberg/net-micro-optimize-skb_datagram_iter/20240613-193620 base: https://git.kernel.org/cgit/linux/kernel/git/davem/net.git b60b1bdc1888f51da7a2a22c48c5f1eb2bd12e97 patch link: https://lore.kernel.org/all/20240613113504.1079860-1-sagi@grimberg.me/ patch subject: [PATCH] net: micro-optimize skb_datagram_iter in testcase: boot compiler: gcc-8 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G (please refer to attached dmesg/kmsg for entire log/backtrace) +------------------------------------------+------------+------------+ | | b60b1bdc18 | 18f0423b9e | +------------------------------------------+------------+------------+ | boot_successes | 6 | 0 | | boot_failures | 0 | 6 | | kernel_BUG_at_mm/usercopy.c | 0 | 6 | | Oops:invalid_opcode:#[##]PREEMPT_SMP | 0 | 6 | | EIP:usercopy_abort | 0 | 6 | | Kernel_panic-not_syncing:Fatal_exception | 0 | 6 | +------------------------------------------+------------+------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202406161539.b5ff7b20-oliver.sang@intel.com [ 13.495377][ T189] ------------[ cut here ]------------ [ 13.495862][ T189] kernel BUG at mm/usercopy.c:102! [ 13.496372][ T189] Oops: invalid opcode: 0000 [#1] PREEMPT SMP [ 13.496927][ T189] CPU: 0 PID: 189 Comm: systemctl Not tainted 6.10.0-rc2-00258-g18f0423b9ecc #1 [ 13.497741][ T189] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12)) [ 13.499424][ T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size 8192)! [ 13.499647][ T189] Code: d6 89 44 24 0c 0f 45 cf 8b 7d 0c 89 74 24 10 89 4c 24 04 c7 04 24 a4 55 8a d6 89 7c 24 20 8b 7d 08 89 7c 24 1c e8 20 3c df ff <0f> 0b b8 80 91 d7 d6 e8 a8 de 68 00 ba 3d 17 86 d6 89 55 f0 89 d6 All code ======== 0: d6 (bad) 1: 89 44 24 0c mov %eax,0xc(%rsp) 5: 0f 45 cf cmovne %edi,%ecx 8: 8b 7d 0c mov 0xc(%rbp),%edi b: 89 74 24 10 mov %esi,0x10(%rsp) f: 89 4c 24 04 mov %ecx,0x4(%rsp) 13: c7 04 24 a4 55 8a d6 movl $0xd68a55a4,(%rsp) 1a: 89 7c 24 20 mov %edi,0x20(%rsp) 1e: 8b 7d 08 mov 0x8(%rbp),%edi 21: 89 7c 24 1c mov %edi,0x1c(%rsp) 25: e8 20 3c df ff call 0xffffffffffdf3c4a 2a:* 0f 0b ud2 <-- trapping instruction 2c: b8 80 91 d7 d6 mov $0xd6d79180,%eax 31: e8 a8 de 68 00 call 0x68dede 36: ba 3d 17 86 d6 mov $0xd686173d,%edx 3b: 89 55 f0 mov %edx,-0x10(%rbp) 3e: 89 d6 mov %edx,%esi Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: b8 80 91 d7 d6 mov $0xd6d79180,%eax 7: e8 a8 de 68 00 call 0x68deb4 c: ba 3d 17 86 d6 mov $0xd686173d,%edx 11: 89 55 f0 mov %edx,-0x10(%rbp) 14: 89 d6 mov %edx,%esi [ 13.500502][ T194] ------------[ cut here ]------------ [ 13.502187][ T189] EAX: 00000052 EBX: d680da68 ECX: e0435480 EDX: 01000232 [ 13.502666][ T194] kernel BUG at mm/usercopy.c:102! [ 13.503236][ T189] ESI: d686173d EDI: 00000000 EBP: ece37c44 ESP: ece37c10 [ 13.504266][ T189] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286 [ 13.504856][ T189] CR0: 80050033 CR2: 0135eb6c CR3: 2beff000 CR4: 000406d0 [ 13.505464][ T189] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 13.506083][ T189] DR6: fffe0ff0 DR7: 00000400 [ 13.506495][ T189] Call Trace: [ 13.506795][ T189] ? show_regs (arch/x86/kernel/dumpstack.c:479) [ 13.507187][ T189] ? __die_body (arch/x86/kernel/dumpstack.c:421) [ 13.507576][ T189] ? die (arch/x86/kernel/dumpstack.c:449) [ 13.507894][ T189] ? do_trap (arch/x86/kernel/traps.c:114 arch/x86/kernel/traps.c:155) [ 13.508270][ T189] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359 kernel/locking/lockdep.c:4311) [ 13.508783][ T189] ? do_error_trap (arch/x86/include/asm/traps.h:58 arch/x86/kernel/traps.c:176) [ 13.509182][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12)) [ 13.509588][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252) [ 13.509983][ T189] ? exc_invalid_op (arch/x86/kernel/traps.c:267) [ 13.510396][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12)) [ 13.510797][ T189] ? handle_exception (arch/x86/entry/entry_32.S:1054) [ 13.511242][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252) [ 13.511646][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12)) [ 13.512070][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252) [ 13.512434][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12)) [ 13.512832][ T189] __check_object_size (mm/usercopy.c:180 mm/usercopy.c:251 mm/usercopy.c:213) [ 13.513275][ T189] simple_copy_to_iter (include/linux/uio.h:196 net/core/datagram.c:513) [ 13.513693][ T189] __skb_datagram_iter (net/core/datagram.c:424 (discriminator 1)) [ 13.514138][ T189] skb_copy_datagram_iter (net/core/datagram.c:529) [ 13.514606][ T189] ? skb_free_datagram (net/core/datagram.c:512) [ 13.515028][ T189] ? scm_stat_del (net/unix/af_unix.c:2883) [ 13.515429][ T189] unix_stream_read_actor (net/unix/af_unix.c:2889) [ 13.515884][ T189] unix_stream_read_generic (net/unix/af_unix.c:2805) [ 13.516377][ T189] ? cma_for_each_area (mm/page_ext.c:518) [ 13.516826][ T189] ? unix_stream_splice_read (net/unix/af_unix.c:2907) [ 13.517301][ T189] unix_stream_recvmsg (net/unix/af_unix.c:2923) [ 13.517720][ T189] ? scm_stat_del (net/unix/af_unix.c:2883) [ 13.518108][ T189] ____sys_recvmsg (net/socket.c:1046 net/socket.c:1068 net/socket.c:2804) [ 13.518527][ T189] ? import_iovec (lib/iov_iter.c:1348) [ 13.518930][ T189] ? copy_msghdr_from_user (net/socket.c:2525) [ 13.519396][ T189] ___sys_recvmsg (net/socket.c:2846) [ 13.519811][ T189] ? __fdget (fs/file.c:1160) [ 13.520186][ T189] ? sockfd_lookup_light (net/socket.c:558) [ 13.520643][ T189] __sys_recvmsg (include/linux/file.h:34 net/socket.c:2878) [ 13.521046][ T189] __ia32_sys_socketcall (net/socket.c:3173 net/socket.c:3077 net/socket.c:3077) [ 13.521513][ T189] ia32_sys_call (arch/x86/entry/syscall_32.c:42) [ 13.521923][ T189] __do_fast_syscall_32 (arch/x86/entry/common.c:165 arch/x86/entry/common.c:386) [ 13.522362][ T189] do_fast_syscall_32 (arch/x86/entry/common.c:411) [ 13.522787][ T189] do_SYSENTER_32 (arch/x86/entry/common.c:450) [ 13.523188][ T189] entry_SYSENTER_32 (arch/x86/entry/entry_32.S:836) [ 13.523613][ T189] EIP: 0xb7ee6579 [ 13.523933][ T189] Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76 All code ======== 0: b8 01 10 06 03 mov $0x3061001,%eax 5: 74 b4 je 0xffffffffffffffbb 7: 01 10 add %edx,(%rax) 9: 07 (bad) a: 03 74 b0 01 add 0x1(%rax,%rsi,4),%esi e: 10 08 adc %cl,(%rax) 10: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi ... 20: 00 51 52 add %dl,0x52(%rcx) 23: 55 push %rbp 24:* 89 e5 mov %esp,%ebp <-- trapping instruction 26: 0f 34 sysenter 28: cd 80 int $0x80 2a: 5d pop %rbp 2b: 5a pop %rdx 2c: 59 pop %rcx 2d: c3 ret 2e: 90 nop 2f: 90 nop 30: 90 nop 31: 90 nop 32: 8d 76 00 lea 0x0(%rsi),%esi 35: 58 pop %rax 36: b8 77 00 00 00 mov $0x77,%eax 3b: cd 80 int $0x80 3d: 90 nop 3e: 8d .byte 0x8d 3f: 76 .byte 0x76 Code starting with the faulting instruction =========================================== 0: 5d pop %rbp 1: 5a pop %rdx 2: 59 pop %rcx 3: c3 ret 4: 90 nop 5: 90 nop 6: 90 nop 7: 90 nop 8: 8d 76 00 lea 0x0(%rsi),%esi b: 58 pop %rax c: b8 77 00 00 00 mov $0x77,%eax 11: cd 80 int $0x80 13: 90 nop 14: 8d .byte 0x8d 15: 76 .byte 0x76 [ 13.525624][ T189] EAX: ffffffda EBX: 00000011 ECX: bfdf5450 EDX: 00000000 [ 13.526233][ T189] ESI: b7c46000 EDI: bfdf54ac EBP: bfdf54a8 ESP: bfdf5440 [ 13.526853][ T189] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000286 [ 13.527519][ T189] Modules linked in: i2c_piix4(+) agpgart(+) qemu_fw_cfg button fuse drm drm_panel_orientation_quirks ip_tables [ 13.528566][ T194] Oops: invalid opcode: 0000 [#2] PREEMPT SMP [ 13.528804][ T189] ---[ end trace 0000000000000000 ]--- [ 13.529217][ T194] CPU: 1 PID: 194 Comm: systemctl Tainted: G D 6.10.0-rc2-00258-g18f0423b9ecc #1 [ 13.529725][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12)) [ 13.530536][ T194] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 13.531060][ T189] Code: d6 89 44 24 0c 0f 45 cf 8b 7d 0c 89 74 24 10 89 4c 24 04 c7 04 24 a4 55 8a d6 89 7c 24 20 8b 7d 08 89 7c 24 1c e8 20 3c df ff <0f> 0b b8 80 91 d7 d6 e8 a8 de 68 00 ba 3d 17 86 d6 89 55 f0 89 d6 All code ======== 0: d6 (bad) 1: 89 44 24 0c mov %eax,0xc(%rsp) 5: 0f 45 cf cmovne %edi,%ecx 8: 8b 7d 0c mov 0xc(%rbp),%edi b: 89 74 24 10 mov %esi,0x10(%rsp) f: 89 4c 24 04 mov %ecx,0x4(%rsp) 13: c7 04 24 a4 55 8a d6 movl $0xd68a55a4,(%rsp) 1a: 89 7c 24 20 mov %edi,0x20(%rsp) 1e: 8b 7d 08 mov 0x8(%rbp),%edi 21: 89 7c 24 1c mov %edi,0x1c(%rsp) 25: e8 20 3c df ff call 0xffffffffffdf3c4a 2a:* 0f 0b ud2 <-- trapping instruction 2c: b8 80 91 d7 d6 mov $0xd6d79180,%eax 31: e8 a8 de 68 00 call 0x68dede 36: ba 3d 17 86 d6 mov $0xd686173d,%edx 3b: 89 55 f0 mov %edx,-0x10(%rbp) 3e: 89 d6 mov %edx,%esi Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: b8 80 91 d7 d6 mov $0xd6d79180,%eax 7: e8 a8 de 68 00 call 0x68deb4 c: ba 3d 17 86 d6 mov $0xd686173d,%edx 11: 89 55 f0 mov %edx,-0x10(%rbp) 14: 89 d6 mov %edx,%esi The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240616/202406161539.b5ff7b20-oliver.sang@intel.com -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-16 8:06 ` kernel test robot @ 2024-06-16 9:24 ` Sagi Grimberg 2024-06-16 21:51 ` David Laight 2024-06-16 21:56 ` Matthew Wilcox 0 siblings, 2 replies; 17+ messages in thread From: Sagi Grimberg @ 2024-06-16 9:24 UTC (permalink / raw) To: kernel test robot, Matthew Wilcox Cc: oe-lkp, lkp, netdev, Eric Dumazet, Jakub Kicinski On 16/06/2024 11:06, kernel test robot wrote: > > Hello, > > kernel test robot noticed "kernel_BUG_at_mm/usercopy.c" on: > > commit: 18f0423b9eccb781310af6709ceebf654175af14 ("[PATCH] net: micro-optimize skb_datagram_iter") > url: https://github.com/intel-lab-lkp/linux/commits/Sagi-Grimberg/net-micro-optimize-skb_datagram_iter/20240613-193620 > base: https://git.kernel.org/cgit/linux/kernel/git/davem/net.git b60b1bdc1888f51da7a2a22c48c5f1eb2bd12e97 > patch link: https://lore.kernel.org/all/20240613113504.1079860-1-sagi@grimberg.me/ > patch subject: [PATCH] net: micro-optimize skb_datagram_iter > > in testcase: boot > > compiler: gcc-8 > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > +------------------------------------------+------------+------------+ > | | b60b1bdc18 | 18f0423b9e | > +------------------------------------------+------------+------------+ > | boot_successes | 6 | 0 | > | boot_failures | 0 | 6 | > | kernel_BUG_at_mm/usercopy.c | 0 | 6 | > | Oops:invalid_opcode:#[##]PREEMPT_SMP | 0 | 6 | > | EIP:usercopy_abort | 0 | 6 | > | Kernel_panic-not_syncing:Fatal_exception | 0 | 6 | > +------------------------------------------+------------+------------+ > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@intel.com> > | Closes: https://lore.kernel.org/oe-lkp/202406161539.b5ff7b20-oliver.sang@intel.com > > > [ 13.495377][ T189] ------------[ cut here ]------------ > [ 13.495862][ T189] kernel BUG at mm/usercopy.c:102! > [ 13.496372][ T189] Oops: invalid opcode: 0000 [#1] PREEMPT SMP > [ 13.496927][ T189] CPU: 0 PID: 189 Comm: systemctl Not tainted 6.10.0-rc2-00258-g18f0423b9ecc #1 > [ 13.497741][ T189] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12)) > [ 13.499424][ T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size 8192)! Hmm, not sure I understand exactly why changing kmap() to kmap_local_page() expose this, but it looks like mm/usercopy does not like size=8192 when copying for the skb frag. quick git browse directs to: -- commit 4e140f59d285c1ca1e5c81b4c13e27366865bd09 Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Mon Jan 10 23:15:27 2022 +0000 mm/usercopy: Check kmap addresses properly If you are copying to an address in the kmap region, you may not copy across a page boundary, no matter what the size of the underlying allocation. You can't kmap() a slab page because slab pages always come from low memory. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220110231530.665970-2-willy@infradead.org -- CCing willy. The documentation suggest that under single-context usage kmap() can be freely converted to kmap_local_page()? But seems that when using kmap() the size is not an issue, still trying to understand why. > [ 13.499647][ T189] Code: d6 89 44 24 0c 0f 45 cf 8b 7d 0c 89 74 24 10 89 4c 24 04 c7 04 24 a4 55 8a d6 89 7c 24 20 8b 7d 08 89 7c 24 1c e8 20 3c df ff <0f> 0b b8 80 91 d7 d6 e8 a8 de 68 00 ba 3d 17 86 d6 89 55 f0 89 d6 > All code > ======== > 0: d6 (bad) > 1: 89 44 24 0c mov %eax,0xc(%rsp) > 5: 0f 45 cf cmovne %edi,%ecx > 8: 8b 7d 0c mov 0xc(%rbp),%edi > b: 89 74 24 10 mov %esi,0x10(%rsp) > f: 89 4c 24 04 mov %ecx,0x4(%rsp) > 13: c7 04 24 a4 55 8a d6 movl $0xd68a55a4,(%rsp) > 1a: 89 7c 24 20 mov %edi,0x20(%rsp) > 1e: 8b 7d 08 mov 0x8(%rbp),%edi > 21: 89 7c 24 1c mov %edi,0x1c(%rsp) > 25: e8 20 3c df ff call 0xffffffffffdf3c4a > 2a:* 0f 0b ud2 <-- trapping instruction > 2c: b8 80 91 d7 d6 mov $0xd6d79180,%eax > 31: e8 a8 de 68 00 call 0x68dede > 36: ba 3d 17 86 d6 mov $0xd686173d,%edx > 3b: 89 55 f0 mov %edx,-0x10(%rbp) > 3e: 89 d6 mov %edx,%esi > > Code starting with the faulting instruction > =========================================== > 0: 0f 0b ud2 > 2: b8 80 91 d7 d6 mov $0xd6d79180,%eax > 7: e8 a8 de 68 00 call 0x68deb4 > c: ba 3d 17 86 d6 mov $0xd686173d,%edx > 11: 89 55 f0 mov %edx,-0x10(%rbp) > 14: 89 d6 mov %edx,%esi > [ 13.500502][ T194] ------------[ cut here ]------------ > [ 13.502187][ T189] EAX: 00000052 EBX: d680da68 ECX: e0435480 EDX: 01000232 > [ 13.502666][ T194] kernel BUG at mm/usercopy.c:102! > [ 13.503236][ T189] ESI: d686173d EDI: 00000000 EBP: ece37c44 ESP: ece37c10 > [ 13.504266][ T189] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286 > [ 13.504856][ T189] CR0: 80050033 CR2: 0135eb6c CR3: 2beff000 CR4: 000406d0 > [ 13.505464][ T189] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 > [ 13.506083][ T189] DR6: fffe0ff0 DR7: 00000400 > [ 13.506495][ T189] Call Trace: > [ 13.506795][ T189] ? show_regs (arch/x86/kernel/dumpstack.c:479) > [ 13.507187][ T189] ? __die_body (arch/x86/kernel/dumpstack.c:421) > [ 13.507576][ T189] ? die (arch/x86/kernel/dumpstack.c:449) > [ 13.507894][ T189] ? do_trap (arch/x86/kernel/traps.c:114 arch/x86/kernel/traps.c:155) > [ 13.508270][ T189] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359 kernel/locking/lockdep.c:4311) > [ 13.508783][ T189] ? do_error_trap (arch/x86/include/asm/traps.h:58 arch/x86/kernel/traps.c:176) > [ 13.509182][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12)) > [ 13.509588][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252) > [ 13.509983][ T189] ? exc_invalid_op (arch/x86/kernel/traps.c:267) > [ 13.510396][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12)) > [ 13.510797][ T189] ? handle_exception (arch/x86/entry/entry_32.S:1054) > [ 13.511242][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252) > [ 13.511646][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12)) > [ 13.512070][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252) > [ 13.512434][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12)) > [ 13.512832][ T189] __check_object_size (mm/usercopy.c:180 mm/usercopy.c:251 mm/usercopy.c:213) > [ 13.513275][ T189] simple_copy_to_iter (include/linux/uio.h:196 net/core/datagram.c:513) > [ 13.513693][ T189] __skb_datagram_iter (net/core/datagram.c:424 (discriminator 1)) > [ 13.514138][ T189] skb_copy_datagram_iter (net/core/datagram.c:529) > [ 13.514606][ T189] ? skb_free_datagram (net/core/datagram.c:512) > [ 13.515028][ T189] ? scm_stat_del (net/unix/af_unix.c:2883) > [ 13.515429][ T189] unix_stream_read_actor (net/unix/af_unix.c:2889) > [ 13.515884][ T189] unix_stream_read_generic (net/unix/af_unix.c:2805) > [ 13.516377][ T189] ? cma_for_each_area (mm/page_ext.c:518) > [ 13.516826][ T189] ? unix_stream_splice_read (net/unix/af_unix.c:2907) > [ 13.517301][ T189] unix_stream_recvmsg (net/unix/af_unix.c:2923) > [ 13.517720][ T189] ? scm_stat_del (net/unix/af_unix.c:2883) > [ 13.518108][ T189] ____sys_recvmsg (net/socket.c:1046 net/socket.c:1068 net/socket.c:2804) > [ 13.518527][ T189] ? import_iovec (lib/iov_iter.c:1348) > [ 13.518930][ T189] ? copy_msghdr_from_user (net/socket.c:2525) > [ 13.519396][ T189] ___sys_recvmsg (net/socket.c:2846) > [ 13.519811][ T189] ? __fdget (fs/file.c:1160) > [ 13.520186][ T189] ? sockfd_lookup_light (net/socket.c:558) > [ 13.520643][ T189] __sys_recvmsg (include/linux/file.h:34 net/socket.c:2878) > [ 13.521046][ T189] __ia32_sys_socketcall (net/socket.c:3173 net/socket.c:3077 net/socket.c:3077) > [ 13.521513][ T189] ia32_sys_call (arch/x86/entry/syscall_32.c:42) > [ 13.521923][ T189] __do_fast_syscall_32 (arch/x86/entry/common.c:165 arch/x86/entry/common.c:386) > [ 13.522362][ T189] do_fast_syscall_32 (arch/x86/entry/common.c:411) > [ 13.522787][ T189] do_SYSENTER_32 (arch/x86/entry/common.c:450) > [ 13.523188][ T189] entry_SYSENTER_32 (arch/x86/entry/entry_32.S:836) > [ 13.523613][ T189] EIP: 0xb7ee6579 > [ 13.523933][ T189] Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76 > All code > ======== > 0: b8 01 10 06 03 mov $0x3061001,%eax > 5: 74 b4 je 0xffffffffffffffbb > 7: 01 10 add %edx,(%rax) > 9: 07 (bad) > a: 03 74 b0 01 add 0x1(%rax,%rsi,4),%esi > e: 10 08 adc %cl,(%rax) > 10: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi > ... > 20: 00 51 52 add %dl,0x52(%rcx) > 23: 55 push %rbp > 24:* 89 e5 mov %esp,%ebp <-- trapping instruction > 26: 0f 34 sysenter > 28: cd 80 int $0x80 > 2a: 5d pop %rbp > 2b: 5a pop %rdx > 2c: 59 pop %rcx > 2d: c3 ret > 2e: 90 nop > 2f: 90 nop > 30: 90 nop > 31: 90 nop > 32: 8d 76 00 lea 0x0(%rsi),%esi > 35: 58 pop %rax > 36: b8 77 00 00 00 mov $0x77,%eax > 3b: cd 80 int $0x80 > 3d: 90 nop > 3e: 8d .byte 0x8d > 3f: 76 .byte 0x76 > > Code starting with the faulting instruction > =========================================== > 0: 5d pop %rbp > 1: 5a pop %rdx > 2: 59 pop %rcx > 3: c3 ret > 4: 90 nop > 5: 90 nop > 6: 90 nop > 7: 90 nop > 8: 8d 76 00 lea 0x0(%rsi),%esi > b: 58 pop %rax > c: b8 77 00 00 00 mov $0x77,%eax > 11: cd 80 int $0x80 > 13: 90 nop > 14: 8d .byte 0x8d > 15: 76 .byte 0x76 > [ 13.525624][ T189] EAX: ffffffda EBX: 00000011 ECX: bfdf5450 EDX: 00000000 > [ 13.526233][ T189] ESI: b7c46000 EDI: bfdf54ac EBP: bfdf54a8 ESP: bfdf5440 > [ 13.526853][ T189] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000286 > [ 13.527519][ T189] Modules linked in: i2c_piix4(+) agpgart(+) qemu_fw_cfg button fuse drm drm_panel_orientation_quirks ip_tables > [ 13.528566][ T194] Oops: invalid opcode: 0000 [#2] PREEMPT SMP > [ 13.528804][ T189] ---[ end trace 0000000000000000 ]--- > [ 13.529217][ T194] CPU: 1 PID: 194 Comm: systemctl Tainted: G D 6.10.0-rc2-00258-g18f0423b9ecc #1 > [ 13.529725][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12)) > [ 13.530536][ T194] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > [ 13.531060][ T189] Code: d6 89 44 24 0c 0f 45 cf 8b 7d 0c 89 74 24 10 89 4c 24 04 c7 04 24 a4 55 8a d6 89 7c 24 20 8b 7d 08 89 7c 24 1c e8 20 3c df ff <0f> 0b b8 80 91 d7 d6 e8 a8 de 68 00 ba 3d 17 86 d6 89 55 f0 89 d6 > All code > ======== > 0: d6 (bad) > 1: 89 44 24 0c mov %eax,0xc(%rsp) > 5: 0f 45 cf cmovne %edi,%ecx > 8: 8b 7d 0c mov 0xc(%rbp),%edi > b: 89 74 24 10 mov %esi,0x10(%rsp) > f: 89 4c 24 04 mov %ecx,0x4(%rsp) > 13: c7 04 24 a4 55 8a d6 movl $0xd68a55a4,(%rsp) > 1a: 89 7c 24 20 mov %edi,0x20(%rsp) > 1e: 8b 7d 08 mov 0x8(%rbp),%edi > 21: 89 7c 24 1c mov %edi,0x1c(%rsp) > 25: e8 20 3c df ff call 0xffffffffffdf3c4a > 2a:* 0f 0b ud2 <-- trapping instruction > 2c: b8 80 91 d7 d6 mov $0xd6d79180,%eax > 31: e8 a8 de 68 00 call 0x68dede > 36: ba 3d 17 86 d6 mov $0xd686173d,%edx > 3b: 89 55 f0 mov %edx,-0x10(%rbp) > 3e: 89 d6 mov %edx,%esi > > Code starting with the faulting instruction > =========================================== > 0: 0f 0b ud2 > 2: b8 80 91 d7 d6 mov $0xd6d79180,%eax > 7: e8 a8 de 68 00 call 0x68deb4 > c: ba 3d 17 86 d6 mov $0xd686173d,%edx > 11: 89 55 f0 mov %edx,-0x10(%rbp) > 14: 89 d6 mov %edx,%esi > > > The kernel config and materials to reproduce are available at: > https://download.01.org/0day-ci/archive/20240616/202406161539.b5ff7b20-oliver.sang@intel.com > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-16 9:24 ` Sagi Grimberg @ 2024-06-16 21:51 ` David Laight 2024-06-16 21:53 ` Matthew Wilcox 2024-06-16 21:56 ` Matthew Wilcox 1 sibling, 1 reply; 17+ messages in thread From: David Laight @ 2024-06-16 21:51 UTC (permalink / raw) To: 'Sagi Grimberg', kernel test robot, Matthew Wilcox Cc: oe-lkp@lists.linux.dev, lkp@intel.com, netdev@vger.kernel.org, Eric Dumazet, Jakub Kicinski From: Sagi Grimberg > Sent: 16 June 2024 10:24 ... > > [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12)) > > [ 13.499424][ T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size > 8192)! > > Hmm, not sure I understand exactly why changing kmap() to > kmap_local_page() expose this, > but it looks like mm/usercopy does not like size=8192 when copying for > the skb frag. Can't a usercopy fault and have to read the page from swap? So the process can sleep and then be rescheduled on a different cpu? So you can't use kmap_local_page() here at all. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-16 21:51 ` David Laight @ 2024-06-16 21:53 ` Matthew Wilcox 2024-06-17 10:18 ` David Laight 0 siblings, 1 reply; 17+ messages in thread From: Matthew Wilcox @ 2024-06-16 21:53 UTC (permalink / raw) To: David Laight Cc: 'Sagi Grimberg', kernel test robot, oe-lkp@lists.linux.dev, lkp@intel.com, netdev@vger.kernel.org, Eric Dumazet, Jakub Kicinski On Sun, Jun 16, 2024 at 09:51:05PM +0000, David Laight wrote: > From: Sagi Grimberg > > Sent: 16 June 2024 10:24 > ... > > > [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12)) > > > [ 13.499424][ T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size > > 8192)! > > > > Hmm, not sure I understand exactly why changing kmap() to > > kmap_local_page() expose this, > > but it looks like mm/usercopy does not like size=8192 when copying for > > the skb frag. > > Can't a usercopy fault and have to read the page from swap? > So the process can sleep and then be rescheduled on a different cpu? > So you can't use kmap_local_page() here at all. I don't think you understand how kmap_local_page() works. ^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-16 21:53 ` Matthew Wilcox @ 2024-06-17 10:18 ` David Laight 0 siblings, 0 replies; 17+ messages in thread From: David Laight @ 2024-06-17 10:18 UTC (permalink / raw) To: 'Matthew Wilcox' Cc: 'Sagi Grimberg', kernel test robot, oe-lkp@lists.linux.dev, lkp@intel.com, netdev@vger.kernel.org, Eric Dumazet, Jakub Kicinski From: Matthew Wilcox > Sent: 16 June 2024 22:53 > > On Sun, Jun 16, 2024 at 09:51:05PM +0000, David Laight wrote: > > From: Sagi Grimberg > > > Sent: 16 June 2024 10:24 > > ... > > > > [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12)) > > > > [ 13.499424][ T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, > size > > > 8192)! > > > > > > Hmm, not sure I understand exactly why changing kmap() to > > > kmap_local_page() expose this, > > > but it looks like mm/usercopy does not like size=8192 when copying for > > > the skb frag. > > > > Can't a usercopy fault and have to read the page from swap? > > So the process can sleep and then be rescheduled on a different cpu? > > So you can't use kmap_local_page() here at all. > > I don't think you understand how kmap_local_page() works. Quite likely :-) But I thought it was a cheap way of temporarily mapping a physical memory page into the current cpu's page tables without having to do any IPI to tell other cpu about the insert or removal? Which would require that the process not be migrated, which pretty much implies that pre-emption be disabled. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales) ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-16 9:24 ` Sagi Grimberg 2024-06-16 21:51 ` David Laight @ 2024-06-16 21:56 ` Matthew Wilcox 2024-06-17 6:29 ` Sagi Grimberg 1 sibling, 1 reply; 17+ messages in thread From: Matthew Wilcox @ 2024-06-16 21:56 UTC (permalink / raw) To: Sagi Grimberg Cc: kernel test robot, oe-lkp, lkp, netdev, Eric Dumazet, Jakub Kicinski On Sun, Jun 16, 2024 at 12:24:28PM +0300, Sagi Grimberg wrote: > > [ 13.495377][ T189] ------------[ cut here ]------------ > > [ 13.495862][ T189] kernel BUG at mm/usercopy.c:102! > > [ 13.496372][ T189] Oops: invalid opcode: 0000 [#1] PREEMPT SMP > > [ 13.496927][ T189] CPU: 0 PID: 189 Comm: systemctl Not tainted 6.10.0-rc2-00258-g18f0423b9ecc #1 > > [ 13.497741][ T189] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 > > [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12)) > > [ 13.499424][ T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size 8192)! > > Hmm, not sure I understand exactly why changing kmap() to kmap_local_page() > expose this, > but it looks like mm/usercopy does not like size=8192 when copying for the > skb frag. > > quick git browse directs to: > -- > commit 4e140f59d285c1ca1e5c81b4c13e27366865bd09 > Author: Matthew Wilcox (Oracle) <willy@infradead.org> > Date: Mon Jan 10 23:15:27 2022 +0000 > > mm/usercopy: Check kmap addresses properly > > If you are copying to an address in the kmap region, you may not copy > across a page boundary, no matter what the size of the underlying > allocation. You can't kmap() a slab page because slab pages always > come from low memory. > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > Acked-by: Kees Cook <keescook@chromium.org> > Signed-off-by: Kees Cook <keescook@chromium.org> > Link: > https://lore.kernel.org/r/20220110231530.665970-2-willy@infradead.org > -- > > CCing willy. > > The documentation suggest that under single-context usage kmap() can be > freely converted > to kmap_local_page()? But seems that when using kmap() the size is not an > issue, still trying to > understand why. Probably because kmap() returns page_address() for non-highmem pages while kmap_local_page() actually returns a kmap address: if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page)) return page_address(page); return __kmap_local_pfn_prot(page_to_pfn(page), prot); so if skb frags are always lowmem (are they?) this is a false positive. if they can be highmem, then you've uncovered a bug that nobody's noticed because nobody's testing on 32-bit any more. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-16 21:56 ` Matthew Wilcox @ 2024-06-17 6:29 ` Sagi Grimberg 2024-06-17 16:58 ` Jakub Kicinski 2024-06-19 12:46 ` David Howells 0 siblings, 2 replies; 17+ messages in thread From: Sagi Grimberg @ 2024-06-17 6:29 UTC (permalink / raw) To: Matthew Wilcox Cc: kernel test robot, oe-lkp, lkp, netdev, Eric Dumazet, Jakub Kicinski On 17/06/2024 0:56, Matthew Wilcox wrote: > On Sun, Jun 16, 2024 at 12:24:28PM +0300, Sagi Grimberg wrote: >>> [ 13.495377][ T189] ------------[ cut here ]------------ >>> [ 13.495862][ T189] kernel BUG at mm/usercopy.c:102! >>> [ 13.496372][ T189] Oops: invalid opcode: 0000 [#1] PREEMPT SMP >>> [ 13.496927][ T189] CPU: 0 PID: 189 Comm: systemctl Not tainted 6.10.0-rc2-00258-g18f0423b9ecc #1 >>> [ 13.497741][ T189] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 >>> [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12)) >>> [ 13.499424][ T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size 8192)! >> Hmm, not sure I understand exactly why changing kmap() to kmap_local_page() >> expose this, >> but it looks like mm/usercopy does not like size=8192 when copying for the >> skb frag. >> >> quick git browse directs to: >> -- >> commit 4e140f59d285c1ca1e5c81b4c13e27366865bd09 >> Author: Matthew Wilcox (Oracle) <willy@infradead.org> >> Date: Mon Jan 10 23:15:27 2022 +0000 >> >> mm/usercopy: Check kmap addresses properly >> >> If you are copying to an address in the kmap region, you may not copy >> across a page boundary, no matter what the size of the underlying >> allocation. You can't kmap() a slab page because slab pages always >> come from low memory. >> >> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> >> Acked-by: Kees Cook <keescook@chromium.org> >> Signed-off-by: Kees Cook <keescook@chromium.org> >> Link: >> https://lore.kernel.org/r/20220110231530.665970-2-willy@infradead.org >> -- >> >> CCing willy. >> >> The documentation suggest that under single-context usage kmap() can be >> freely converted >> to kmap_local_page()? But seems that when using kmap() the size is not an >> issue, still trying to >> understand why. > Probably because kmap() returns page_address() for non-highmem pages > while kmap_local_page() actually returns a kmap address: > > if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page)) > return page_address(page); > return __kmap_local_pfn_prot(page_to_pfn(page), prot); > > so if skb frags are always lowmem (are they?) this is a false positive. AFAIR these buffers are coming from the RX ring, so they should be coming from a page_frag_cache, so I want to say always low memory? > if they can be highmem, then you've uncovered a bug that nobody's > noticed because nobody's testing on 32-bit any more. Not sure, Jakub? Eric? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-17 6:29 ` Sagi Grimberg @ 2024-06-17 16:58 ` Jakub Kicinski 2024-06-18 6:37 ` Sagi Grimberg 2024-06-19 12:46 ` David Howells 1 sibling, 1 reply; 17+ messages in thread From: Jakub Kicinski @ 2024-06-17 16:58 UTC (permalink / raw) To: Sagi Grimberg Cc: Matthew Wilcox, kernel test robot, oe-lkp, lkp, netdev, Eric Dumazet, David Howells On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote: > > Probably because kmap() returns page_address() for non-highmem pages > > while kmap_local_page() actually returns a kmap address: > > > > if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page)) > > return page_address(page); > > return __kmap_local_pfn_prot(page_to_pfn(page), prot); > > > > so if skb frags are always lowmem (are they?) this is a false positive. > > AFAIR these buffers are coming from the RX ring, so they should be > coming from a page_frag_cache, > so I want to say always low memory? > > > if they can be highmem, then you've uncovered a bug that nobody's > > noticed because nobody's testing on 32-bit any more. > > Not sure, Jakub? Eric? My uneducated guess would be that until recent(ish) sendpage rework from David Howells all high mem pages would have been single pages. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-17 16:58 ` Jakub Kicinski @ 2024-06-18 6:37 ` Sagi Grimberg 0 siblings, 0 replies; 17+ messages in thread From: Sagi Grimberg @ 2024-06-18 6:37 UTC (permalink / raw) To: Jakub Kicinski, David Howells Cc: Matthew Wilcox, kernel test robot, oe-lkp, lkp, netdev, Eric Dumazet On 17/06/2024 19:58, Jakub Kicinski wrote: > On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote: >>> Probably because kmap() returns page_address() for non-highmem pages >>> while kmap_local_page() actually returns a kmap address: >>> >>> if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page)) >>> return page_address(page); >>> return __kmap_local_pfn_prot(page_to_pfn(page), prot); >>> >>> so if skb frags are always lowmem (are they?) this is a false positive. >> AFAIR these buffers are coming from the RX ring, so they should be >> coming from a page_frag_cache, >> so I want to say always low memory? >> >>> if they can be highmem, then you've uncovered a bug that nobody's >>> noticed because nobody's testing on 32-bit any more. >> Not sure, Jakub? Eric? > My uneducated guess would be that until recent(ish) sendpage rework > from David Howells all high mem pages would have been single pages. dHowells? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-17 6:29 ` Sagi Grimberg 2024-06-17 16:58 ` Jakub Kicinski @ 2024-06-19 12:46 ` David Howells 2024-06-19 13:54 ` Sagi Grimberg 1 sibling, 1 reply; 17+ messages in thread From: David Howells @ 2024-06-19 12:46 UTC (permalink / raw) To: Jakub Kicinski Cc: dhowells, Sagi Grimberg, Matthew Wilcox, kernel test robot, oe-lkp, lkp, netdev, Eric Dumazet Jakub Kicinski <kuba@kernel.org> wrote: > On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote: > > > Probably because kmap() returns page_address() for non-highmem pages > > > while kmap_local_page() actually returns a kmap address: > > > > > > if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page)) > > > return page_address(page); > > > return __kmap_local_pfn_prot(page_to_pfn(page), prot); > > > > > > so if skb frags are always lowmem (are they?) this is a false positive. > > > > AFAIR these buffers are coming from the RX ring, so they should be > > coming from a page_frag_cache, > > so I want to say always low memory? > > > > > if they can be highmem, then you've uncovered a bug that nobody's > > > noticed because nobody's testing on 32-bit any more. > > > > Not sure, Jakub? Eric? > > My uneducated guess would be that until recent(ish) sendpage rework > from David Howells all high mem pages would have been single pages. Um. I touched the Tx side, not the Rx side. I also don't know whether all high mem pages would be single pages. I'll have to defer that one to the MM folks. David ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-19 12:46 ` David Howells @ 2024-06-19 13:54 ` Sagi Grimberg 2024-06-19 14:51 ` Eric Dumazet 2024-06-21 7:54 ` David Howells 0 siblings, 2 replies; 17+ messages in thread From: Sagi Grimberg @ 2024-06-19 13:54 UTC (permalink / raw) To: David Howells, Jakub Kicinski Cc: Matthew Wilcox, kernel test robot, oe-lkp, lkp, netdev, Eric Dumazet On 19/06/2024 15:46, David Howells wrote: > Jakub Kicinski <kuba@kernel.org> wrote: > >> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote: >>>> Probably because kmap() returns page_address() for non-highmem pages >>>> while kmap_local_page() actually returns a kmap address: >>>> >>>> if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page)) >>>> return page_address(page); >>>> return __kmap_local_pfn_prot(page_to_pfn(page), prot); >>>> >>>> so if skb frags are always lowmem (are they?) this is a false positive. >>> AFAIR these buffers are coming from the RX ring, so they should be >>> coming from a page_frag_cache, >>> so I want to say always low memory? >>> >>>> if they can be highmem, then you've uncovered a bug that nobody's >>>> noticed because nobody's testing on 32-bit any more. >>> Not sure, Jakub? Eric? >> My uneducated guess would be that until recent(ish) sendpage rework >> from David Howells all high mem pages would have been single pages. > Um. I touched the Tx side, not the Rx side. > > I also don't know whether all high mem pages would be single pages. I'll have > to defer that one to the MM folks. What prevents from gro to expand frags from crossing PAGE_SIZE? btw, at least from the code in skb_gro_receive() it appears that page_address() is called directly, which suggest that these netmem pages are lowmem? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-19 13:54 ` Sagi Grimberg @ 2024-06-19 14:51 ` Eric Dumazet 2024-06-19 14:56 ` Eric Dumazet 2024-06-19 15:25 ` Sagi Grimberg 2024-06-21 7:54 ` David Howells 1 sibling, 2 replies; 17+ messages in thread From: Eric Dumazet @ 2024-06-19 14:51 UTC (permalink / raw) To: Sagi Grimberg Cc: David Howells, Jakub Kicinski, Matthew Wilcox, kernel test robot, oe-lkp, lkp, netdev On Wed, Jun 19, 2024 at 3:54 PM Sagi Grimberg <sagi@grimberg.me> wrote: > > > > On 19/06/2024 15:46, David Howells wrote: > > Jakub Kicinski <kuba@kernel.org> wrote: > > > >> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote: > >>>> Probably because kmap() returns page_address() for non-highmem pages > >>>> while kmap_local_page() actually returns a kmap address: > >>>> > >>>> if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page)) > >>>> return page_address(page); > >>>> return __kmap_local_pfn_prot(page_to_pfn(page), prot); > >>>> > >>>> so if skb frags are always lowmem (are they?) this is a false positive. > >>> AFAIR these buffers are coming from the RX ring, so they should be > >>> coming from a page_frag_cache, > >>> so I want to say always low memory? > >>> > >>>> if they can be highmem, then you've uncovered a bug that nobody's > >>>> noticed because nobody's testing on 32-bit any more. > >>> Not sure, Jakub? Eric? > >> My uneducated guess would be that until recent(ish) sendpage rework > >> from David Howells all high mem pages would have been single pages. > > Um. I touched the Tx side, not the Rx side. > > > > I also don't know whether all high mem pages would be single pages. I'll have > > to defer that one to the MM folks. > > What prevents from gro to expand frags from crossing PAGE_SIZE? > > btw, at least from the code in skb_gro_receive() it appears that > page_address() is called directly, > which suggest that these netmem pages are lowmem? GRO should only be fed with lowmem pages. But the trace involves af_unix, not GRO ? I guess that with splice games, it is possible to add high order pages to skbs. I think skb_frag_foreach_page() could be used to fix this issue. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-19 14:51 ` Eric Dumazet @ 2024-06-19 14:56 ` Eric Dumazet 2024-06-19 15:25 ` Sagi Grimberg 1 sibling, 0 replies; 17+ messages in thread From: Eric Dumazet @ 2024-06-19 14:56 UTC (permalink / raw) To: Sagi Grimberg Cc: David Howells, Jakub Kicinski, Matthew Wilcox, kernel test robot, oe-lkp, lkp, netdev, Willem de Bruijn On Wed, Jun 19, 2024 at 4:51 PM Eric Dumazet <edumazet@google.com> wrote: > > On Wed, Jun 19, 2024 at 3:54 PM Sagi Grimberg <sagi@grimberg.me> wrote: > > > > > > > > On 19/06/2024 15:46, David Howells wrote: > > > Jakub Kicinski <kuba@kernel.org> wrote: > > > > > >> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote: > > >>>> Probably because kmap() returns page_address() for non-highmem pages > > >>>> while kmap_local_page() actually returns a kmap address: > > >>>> > > >>>> if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page)) > > >>>> return page_address(page); > > >>>> return __kmap_local_pfn_prot(page_to_pfn(page), prot); > > >>>> > > >>>> so if skb frags are always lowmem (are they?) this is a false positive. > > >>> AFAIR these buffers are coming from the RX ring, so they should be > > >>> coming from a page_frag_cache, > > >>> so I want to say always low memory? > > >>> > > >>>> if they can be highmem, then you've uncovered a bug that nobody's > > >>>> noticed because nobody's testing on 32-bit any more. > > >>> Not sure, Jakub? Eric? > > >> My uneducated guess would be that until recent(ish) sendpage rework > > >> from David Howells all high mem pages would have been single pages. > > > Um. I touched the Tx side, not the Rx side. > > > > > > I also don't know whether all high mem pages would be single pages. I'll have > > > to defer that one to the MM folks. > > > > What prevents from gro to expand frags from crossing PAGE_SIZE? > > > > btw, at least from the code in skb_gro_receive() it appears that > > page_address() is called directly, > > which suggest that these netmem pages are lowmem? > > GRO should only be fed with lowmem pages. > > But the trace involves af_unix, not GRO ? > > I guess that with splice games, it is possible to add high order pages to skbs. > > I think skb_frag_foreach_page() could be used to fix this issue. For reference, please look at commit c613c209c3f351d47158f728271d0c73b6dd24c6 Author: Willem de Bruijn <willemb@google.com> Date: Mon Jul 31 08:15:47 2017 -0400 net: add skb_frag_foreach_page and use with kmap_atomic ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-19 14:51 ` Eric Dumazet 2024-06-19 14:56 ` Eric Dumazet @ 2024-06-19 15:25 ` Sagi Grimberg 1 sibling, 0 replies; 17+ messages in thread From: Sagi Grimberg @ 2024-06-19 15:25 UTC (permalink / raw) To: Eric Dumazet Cc: David Howells, Jakub Kicinski, Matthew Wilcox, kernel test robot, oe-lkp, lkp, netdev On 19/06/2024 17:51, Eric Dumazet wrote: > On Wed, Jun 19, 2024 at 3:54 PM Sagi Grimberg <sagi@grimberg.me> wrote: >> >> >> On 19/06/2024 15:46, David Howells wrote: >>> Jakub Kicinski <kuba@kernel.org> wrote: >>> >>>> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote: >>>>>> Probably because kmap() returns page_address() for non-highmem pages >>>>>> while kmap_local_page() actually returns a kmap address: >>>>>> >>>>>> if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page)) >>>>>> return page_address(page); >>>>>> return __kmap_local_pfn_prot(page_to_pfn(page), prot); >>>>>> >>>>>> so if skb frags are always lowmem (are they?) this is a false positive. >>>>> AFAIR these buffers are coming from the RX ring, so they should be >>>>> coming from a page_frag_cache, >>>>> so I want to say always low memory? >>>>> >>>>>> if they can be highmem, then you've uncovered a bug that nobody's >>>>>> noticed because nobody's testing on 32-bit any more. >>>>> Not sure, Jakub? Eric? >>>> My uneducated guess would be that until recent(ish) sendpage rework >>>> from David Howells all high mem pages would have been single pages. >>> Um. I touched the Tx side, not the Rx side. >>> >>> I also don't know whether all high mem pages would be single pages. I'll have >>> to defer that one to the MM folks. >> What prevents from gro to expand frags from crossing PAGE_SIZE? >> >> btw, at least from the code in skb_gro_receive() it appears that >> page_address() is called directly, >> which suggest that these netmem pages are lowmem? > GRO should only be fed with lowmem pages. > > But the trace involves af_unix, not GRO ? > > I guess that with splice games, it is possible to add high order pages to skbs. > > I think skb_frag_foreach_page() could be used to fix this issue. Thanks Eric, that is indeed the missing piece. I'll give it go. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter 2024-06-19 13:54 ` Sagi Grimberg 2024-06-19 14:51 ` Eric Dumazet @ 2024-06-21 7:54 ` David Howells 1 sibling, 0 replies; 17+ messages in thread From: David Howells @ 2024-06-21 7:54 UTC (permalink / raw) To: Eric Dumazet Cc: dhowells, Sagi Grimberg, Jakub Kicinski, Matthew Wilcox, kernel test robot, oe-lkp, lkp, netdev Eric Dumazet <edumazet@google.com> wrote: > But the trace involves af_unix, not GRO ? Ah. splice()/vmsplice() can feed multipage folios into the socket. Possibly those can be in highmem. David ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-06-21 7:54 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-06-13 11:35 [PATCH] net: micro-optimize skb_datagram_iter Sagi Grimberg 2024-06-15 2:40 ` patchwork-bot+netdevbpf 2024-06-16 8:06 ` kernel test robot 2024-06-16 9:24 ` Sagi Grimberg 2024-06-16 21:51 ` David Laight 2024-06-16 21:53 ` Matthew Wilcox 2024-06-17 10:18 ` David Laight 2024-06-16 21:56 ` Matthew Wilcox 2024-06-17 6:29 ` Sagi Grimberg 2024-06-17 16:58 ` Jakub Kicinski 2024-06-18 6:37 ` Sagi Grimberg 2024-06-19 12:46 ` David Howells 2024-06-19 13:54 ` Sagi Grimberg 2024-06-19 14:51 ` Eric Dumazet 2024-06-19 14:56 ` Eric Dumazet 2024-06-19 15:25 ` Sagi Grimberg 2024-06-21 7:54 ` David Howells
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).