* [PATCH] net: micro-optimize skb_datagram_iter
@ 2024-06-13 11:35 Sagi Grimberg
2024-06-15 2:40 ` patchwork-bot+netdevbpf
2024-06-16 8:06 ` kernel test robot
0 siblings, 2 replies; 17+ messages in thread
From: Sagi Grimberg @ 2024-06-13 11:35 UTC (permalink / raw)
To: netdev; +Cc: Eric Dumazet, Jakub Kicinski
We only use the mapping in a single context in a short and contained scope,
so kmap_local_page is sufficient and cheaper. This will also allow
skb_datagram_iter to be called from softirq context.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
net/core/datagram.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/datagram.c b/net/core/datagram.c
index a8b625abe242..ac74df248205 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -436,14 +436,14 @@ static int __skb_datagram_iter(const struct sk_buff *skb, int offset,
end = start + skb_frag_size(frag);
if ((copy = end - offset) > 0) {
struct page *page = skb_frag_page(frag);
- u8 *vaddr = kmap(page);
+ u8 *vaddr = kmap_local_page(page);
if (copy > len)
copy = len;
n = INDIRECT_CALL_1(cb, simple_copy_to_iter,
vaddr + skb_frag_off(frag) + offset - start,
copy, data, to);
- kunmap(page);
+ kunmap_local(vaddr);
offset += n;
if (n != copy)
goto short_copy;
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-13 11:35 [PATCH] net: micro-optimize skb_datagram_iter Sagi Grimberg
@ 2024-06-15 2:40 ` patchwork-bot+netdevbpf
2024-06-16 8:06 ` kernel test robot
1 sibling, 0 replies; 17+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-06-15 2:40 UTC (permalink / raw)
To: Sagi Grimberg; +Cc: netdev, edumazet, kuba
Hello:
This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 13 Jun 2024 14:35:04 +0300 you wrote:
> We only use the mapping in a single context in a short and contained scope,
> so kmap_local_page is sufficient and cheaper. This will also allow
> skb_datagram_iter to be called from softirq context.
>
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
> net/core/datagram.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Here is the summary with links:
- net: micro-optimize skb_datagram_iter
https://git.kernel.org/netdev/net-next/c/934c29999b57
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-13 11:35 [PATCH] net: micro-optimize skb_datagram_iter Sagi Grimberg
2024-06-15 2:40 ` patchwork-bot+netdevbpf
@ 2024-06-16 8:06 ` kernel test robot
2024-06-16 9:24 ` Sagi Grimberg
1 sibling, 1 reply; 17+ messages in thread
From: kernel test robot @ 2024-06-16 8:06 UTC (permalink / raw)
To: Sagi Grimberg
Cc: oe-lkp, lkp, netdev, Eric Dumazet, Jakub Kicinski, oliver.sang
Hello,
kernel test robot noticed "kernel_BUG_at_mm/usercopy.c" on:
commit: 18f0423b9eccb781310af6709ceebf654175af14 ("[PATCH] net: micro-optimize skb_datagram_iter")
url: https://github.com/intel-lab-lkp/linux/commits/Sagi-Grimberg/net-micro-optimize-skb_datagram_iter/20240613-193620
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net.git b60b1bdc1888f51da7a2a22c48c5f1eb2bd12e97
patch link: https://lore.kernel.org/all/20240613113504.1079860-1-sagi@grimberg.me/
patch subject: [PATCH] net: micro-optimize skb_datagram_iter
in testcase: boot
compiler: gcc-8
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
(please refer to attached dmesg/kmsg for entire log/backtrace)
+------------------------------------------+------------+------------+
| | b60b1bdc18 | 18f0423b9e |
+------------------------------------------+------------+------------+
| boot_successes | 6 | 0 |
| boot_failures | 0 | 6 |
| kernel_BUG_at_mm/usercopy.c | 0 | 6 |
| Oops:invalid_opcode:#[##]PREEMPT_SMP | 0 | 6 |
| EIP:usercopy_abort | 0 | 6 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 6 |
+------------------------------------------+------------+------------+
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202406161539.b5ff7b20-oliver.sang@intel.com
[ 13.495377][ T189] ------------[ cut here ]------------
[ 13.495862][ T189] kernel BUG at mm/usercopy.c:102!
[ 13.496372][ T189] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
[ 13.496927][ T189] CPU: 0 PID: 189 Comm: systemctl Not tainted 6.10.0-rc2-00258-g18f0423b9ecc #1
[ 13.497741][ T189] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
[ 13.499424][ T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size 8192)!
[ 13.499647][ T189] Code: d6 89 44 24 0c 0f 45 cf 8b 7d 0c 89 74 24 10 89 4c 24 04 c7 04 24 a4 55 8a d6 89 7c 24 20 8b 7d 08 89 7c 24 1c e8 20 3c df ff <0f> 0b b8 80 91 d7 d6 e8 a8 de 68 00 ba 3d 17 86 d6 89 55 f0 89 d6
All code
========
0: d6 (bad)
1: 89 44 24 0c mov %eax,0xc(%rsp)
5: 0f 45 cf cmovne %edi,%ecx
8: 8b 7d 0c mov 0xc(%rbp),%edi
b: 89 74 24 10 mov %esi,0x10(%rsp)
f: 89 4c 24 04 mov %ecx,0x4(%rsp)
13: c7 04 24 a4 55 8a d6 movl $0xd68a55a4,(%rsp)
1a: 89 7c 24 20 mov %edi,0x20(%rsp)
1e: 8b 7d 08 mov 0x8(%rbp),%edi
21: 89 7c 24 1c mov %edi,0x1c(%rsp)
25: e8 20 3c df ff call 0xffffffffffdf3c4a
2a:* 0f 0b ud2 <-- trapping instruction
2c: b8 80 91 d7 d6 mov $0xd6d79180,%eax
31: e8 a8 de 68 00 call 0x68dede
36: ba 3d 17 86 d6 mov $0xd686173d,%edx
3b: 89 55 f0 mov %edx,-0x10(%rbp)
3e: 89 d6 mov %edx,%esi
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: b8 80 91 d7 d6 mov $0xd6d79180,%eax
7: e8 a8 de 68 00 call 0x68deb4
c: ba 3d 17 86 d6 mov $0xd686173d,%edx
11: 89 55 f0 mov %edx,-0x10(%rbp)
14: 89 d6 mov %edx,%esi
[ 13.500502][ T194] ------------[ cut here ]------------
[ 13.502187][ T189] EAX: 00000052 EBX: d680da68 ECX: e0435480 EDX: 01000232
[ 13.502666][ T194] kernel BUG at mm/usercopy.c:102!
[ 13.503236][ T189] ESI: d686173d EDI: 00000000 EBP: ece37c44 ESP: ece37c10
[ 13.504266][ T189] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
[ 13.504856][ T189] CR0: 80050033 CR2: 0135eb6c CR3: 2beff000 CR4: 000406d0
[ 13.505464][ T189] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 13.506083][ T189] DR6: fffe0ff0 DR7: 00000400
[ 13.506495][ T189] Call Trace:
[ 13.506795][ T189] ? show_regs (arch/x86/kernel/dumpstack.c:479)
[ 13.507187][ T189] ? __die_body (arch/x86/kernel/dumpstack.c:421)
[ 13.507576][ T189] ? die (arch/x86/kernel/dumpstack.c:449)
[ 13.507894][ T189] ? do_trap (arch/x86/kernel/traps.c:114 arch/x86/kernel/traps.c:155)
[ 13.508270][ T189] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359 kernel/locking/lockdep.c:4311)
[ 13.508783][ T189] ? do_error_trap (arch/x86/include/asm/traps.h:58 arch/x86/kernel/traps.c:176)
[ 13.509182][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12))
[ 13.509588][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252)
[ 13.509983][ T189] ? exc_invalid_op (arch/x86/kernel/traps.c:267)
[ 13.510396][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12))
[ 13.510797][ T189] ? handle_exception (arch/x86/entry/entry_32.S:1054)
[ 13.511242][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252)
[ 13.511646][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12))
[ 13.512070][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252)
[ 13.512434][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12))
[ 13.512832][ T189] __check_object_size (mm/usercopy.c:180 mm/usercopy.c:251 mm/usercopy.c:213)
[ 13.513275][ T189] simple_copy_to_iter (include/linux/uio.h:196 net/core/datagram.c:513)
[ 13.513693][ T189] __skb_datagram_iter (net/core/datagram.c:424 (discriminator 1))
[ 13.514138][ T189] skb_copy_datagram_iter (net/core/datagram.c:529)
[ 13.514606][ T189] ? skb_free_datagram (net/core/datagram.c:512)
[ 13.515028][ T189] ? scm_stat_del (net/unix/af_unix.c:2883)
[ 13.515429][ T189] unix_stream_read_actor (net/unix/af_unix.c:2889)
[ 13.515884][ T189] unix_stream_read_generic (net/unix/af_unix.c:2805)
[ 13.516377][ T189] ? cma_for_each_area (mm/page_ext.c:518)
[ 13.516826][ T189] ? unix_stream_splice_read (net/unix/af_unix.c:2907)
[ 13.517301][ T189] unix_stream_recvmsg (net/unix/af_unix.c:2923)
[ 13.517720][ T189] ? scm_stat_del (net/unix/af_unix.c:2883)
[ 13.518108][ T189] ____sys_recvmsg (net/socket.c:1046 net/socket.c:1068 net/socket.c:2804)
[ 13.518527][ T189] ? import_iovec (lib/iov_iter.c:1348)
[ 13.518930][ T189] ? copy_msghdr_from_user (net/socket.c:2525)
[ 13.519396][ T189] ___sys_recvmsg (net/socket.c:2846)
[ 13.519811][ T189] ? __fdget (fs/file.c:1160)
[ 13.520186][ T189] ? sockfd_lookup_light (net/socket.c:558)
[ 13.520643][ T189] __sys_recvmsg (include/linux/file.h:34 net/socket.c:2878)
[ 13.521046][ T189] __ia32_sys_socketcall (net/socket.c:3173 net/socket.c:3077 net/socket.c:3077)
[ 13.521513][ T189] ia32_sys_call (arch/x86/entry/syscall_32.c:42)
[ 13.521923][ T189] __do_fast_syscall_32 (arch/x86/entry/common.c:165 arch/x86/entry/common.c:386)
[ 13.522362][ T189] do_fast_syscall_32 (arch/x86/entry/common.c:411)
[ 13.522787][ T189] do_SYSENTER_32 (arch/x86/entry/common.c:450)
[ 13.523188][ T189] entry_SYSENTER_32 (arch/x86/entry/entry_32.S:836)
[ 13.523613][ T189] EIP: 0xb7ee6579
[ 13.523933][ T189] Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
All code
========
0: b8 01 10 06 03 mov $0x3061001,%eax
5: 74 b4 je 0xffffffffffffffbb
7: 01 10 add %edx,(%rax)
9: 07 (bad)
a: 03 74 b0 01 add 0x1(%rax,%rsi,4),%esi
e: 10 08 adc %cl,(%rax)
10: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi
...
20: 00 51 52 add %dl,0x52(%rcx)
23: 55 push %rbp
24:* 89 e5 mov %esp,%ebp <-- trapping instruction
26: 0f 34 sysenter
28: cd 80 int $0x80
2a: 5d pop %rbp
2b: 5a pop %rdx
2c: 59 pop %rcx
2d: c3 ret
2e: 90 nop
2f: 90 nop
30: 90 nop
31: 90 nop
32: 8d 76 00 lea 0x0(%rsi),%esi
35: 58 pop %rax
36: b8 77 00 00 00 mov $0x77,%eax
3b: cd 80 int $0x80
3d: 90 nop
3e: 8d .byte 0x8d
3f: 76 .byte 0x76
Code starting with the faulting instruction
===========================================
0: 5d pop %rbp
1: 5a pop %rdx
2: 59 pop %rcx
3: c3 ret
4: 90 nop
5: 90 nop
6: 90 nop
7: 90 nop
8: 8d 76 00 lea 0x0(%rsi),%esi
b: 58 pop %rax
c: b8 77 00 00 00 mov $0x77,%eax
11: cd 80 int $0x80
13: 90 nop
14: 8d .byte 0x8d
15: 76 .byte 0x76
[ 13.525624][ T189] EAX: ffffffda EBX: 00000011 ECX: bfdf5450 EDX: 00000000
[ 13.526233][ T189] ESI: b7c46000 EDI: bfdf54ac EBP: bfdf54a8 ESP: bfdf5440
[ 13.526853][ T189] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000286
[ 13.527519][ T189] Modules linked in: i2c_piix4(+) agpgart(+) qemu_fw_cfg button fuse drm drm_panel_orientation_quirks ip_tables
[ 13.528566][ T194] Oops: invalid opcode: 0000 [#2] PREEMPT SMP
[ 13.528804][ T189] ---[ end trace 0000000000000000 ]---
[ 13.529217][ T194] CPU: 1 PID: 194 Comm: systemctl Tainted: G D 6.10.0-rc2-00258-g18f0423b9ecc #1
[ 13.529725][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
[ 13.530536][ T194] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 13.531060][ T189] Code: d6 89 44 24 0c 0f 45 cf 8b 7d 0c 89 74 24 10 89 4c 24 04 c7 04 24 a4 55 8a d6 89 7c 24 20 8b 7d 08 89 7c 24 1c e8 20 3c df ff <0f> 0b b8 80 91 d7 d6 e8 a8 de 68 00 ba 3d 17 86 d6 89 55 f0 89 d6
All code
========
0: d6 (bad)
1: 89 44 24 0c mov %eax,0xc(%rsp)
5: 0f 45 cf cmovne %edi,%ecx
8: 8b 7d 0c mov 0xc(%rbp),%edi
b: 89 74 24 10 mov %esi,0x10(%rsp)
f: 89 4c 24 04 mov %ecx,0x4(%rsp)
13: c7 04 24 a4 55 8a d6 movl $0xd68a55a4,(%rsp)
1a: 89 7c 24 20 mov %edi,0x20(%rsp)
1e: 8b 7d 08 mov 0x8(%rbp),%edi
21: 89 7c 24 1c mov %edi,0x1c(%rsp)
25: e8 20 3c df ff call 0xffffffffffdf3c4a
2a:* 0f 0b ud2 <-- trapping instruction
2c: b8 80 91 d7 d6 mov $0xd6d79180,%eax
31: e8 a8 de 68 00 call 0x68dede
36: ba 3d 17 86 d6 mov $0xd686173d,%edx
3b: 89 55 f0 mov %edx,-0x10(%rbp)
3e: 89 d6 mov %edx,%esi
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: b8 80 91 d7 d6 mov $0xd6d79180,%eax
7: e8 a8 de 68 00 call 0x68deb4
c: ba 3d 17 86 d6 mov $0xd686173d,%edx
11: 89 55 f0 mov %edx,-0x10(%rbp)
14: 89 d6 mov %edx,%esi
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240616/202406161539.b5ff7b20-oliver.sang@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-16 8:06 ` kernel test robot
@ 2024-06-16 9:24 ` Sagi Grimberg
2024-06-16 21:51 ` David Laight
2024-06-16 21:56 ` Matthew Wilcox
0 siblings, 2 replies; 17+ messages in thread
From: Sagi Grimberg @ 2024-06-16 9:24 UTC (permalink / raw)
To: kernel test robot, Matthew Wilcox
Cc: oe-lkp, lkp, netdev, Eric Dumazet, Jakub Kicinski
On 16/06/2024 11:06, kernel test robot wrote:
>
> Hello,
>
> kernel test robot noticed "kernel_BUG_at_mm/usercopy.c" on:
>
> commit: 18f0423b9eccb781310af6709ceebf654175af14 ("[PATCH] net: micro-optimize skb_datagram_iter")
> url: https://github.com/intel-lab-lkp/linux/commits/Sagi-Grimberg/net-micro-optimize-skb_datagram_iter/20240613-193620
> base: https://git.kernel.org/cgit/linux/kernel/git/davem/net.git b60b1bdc1888f51da7a2a22c48c5f1eb2bd12e97
> patch link: https://lore.kernel.org/all/20240613113504.1079860-1-sagi@grimberg.me/
> patch subject: [PATCH] net: micro-optimize skb_datagram_iter
>
> in testcase: boot
>
> compiler: gcc-8
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> +------------------------------------------+------------+------------+
> | | b60b1bdc18 | 18f0423b9e |
> +------------------------------------------+------------+------------+
> | boot_successes | 6 | 0 |
> | boot_failures | 0 | 6 |
> | kernel_BUG_at_mm/usercopy.c | 0 | 6 |
> | Oops:invalid_opcode:#[##]PREEMPT_SMP | 0 | 6 |
> | EIP:usercopy_abort | 0 | 6 |
> | Kernel_panic-not_syncing:Fatal_exception | 0 | 6 |
> +------------------------------------------+------------+------------+
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202406161539.b5ff7b20-oliver.sang@intel.com
>
>
> [ 13.495377][ T189] ------------[ cut here ]------------
> [ 13.495862][ T189] kernel BUG at mm/usercopy.c:102!
> [ 13.496372][ T189] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
> [ 13.496927][ T189] CPU: 0 PID: 189 Comm: systemctl Not tainted 6.10.0-rc2-00258-g18f0423b9ecc #1
> [ 13.497741][ T189] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> [ 13.499424][ T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size 8192)!
Hmm, not sure I understand exactly why changing kmap() to
kmap_local_page() expose this,
but it looks like mm/usercopy does not like size=8192 when copying for
the skb frag.
quick git browse directs to:
--
commit 4e140f59d285c1ca1e5c81b4c13e27366865bd09
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Mon Jan 10 23:15:27 2022 +0000
mm/usercopy: Check kmap addresses properly
If you are copying to an address in the kmap region, you may not copy
across a page boundary, no matter what the size of the underlying
allocation. You can't kmap() a slab page because slab pages always
come from low memory.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link:
https://lore.kernel.org/r/20220110231530.665970-2-willy@infradead.org
--
CCing willy.
The documentation suggest that under single-context usage kmap() can be
freely converted
to kmap_local_page()? But seems that when using kmap() the size is not
an issue, still trying to
understand why.
> [ 13.499647][ T189] Code: d6 89 44 24 0c 0f 45 cf 8b 7d 0c 89 74 24 10 89 4c 24 04 c7 04 24 a4 55 8a d6 89 7c 24 20 8b 7d 08 89 7c 24 1c e8 20 3c df ff <0f> 0b b8 80 91 d7 d6 e8 a8 de 68 00 ba 3d 17 86 d6 89 55 f0 89 d6
> All code
> ========
> 0: d6 (bad)
> 1: 89 44 24 0c mov %eax,0xc(%rsp)
> 5: 0f 45 cf cmovne %edi,%ecx
> 8: 8b 7d 0c mov 0xc(%rbp),%edi
> b: 89 74 24 10 mov %esi,0x10(%rsp)
> f: 89 4c 24 04 mov %ecx,0x4(%rsp)
> 13: c7 04 24 a4 55 8a d6 movl $0xd68a55a4,(%rsp)
> 1a: 89 7c 24 20 mov %edi,0x20(%rsp)
> 1e: 8b 7d 08 mov 0x8(%rbp),%edi
> 21: 89 7c 24 1c mov %edi,0x1c(%rsp)
> 25: e8 20 3c df ff call 0xffffffffffdf3c4a
> 2a:* 0f 0b ud2 <-- trapping instruction
> 2c: b8 80 91 d7 d6 mov $0xd6d79180,%eax
> 31: e8 a8 de 68 00 call 0x68dede
> 36: ba 3d 17 86 d6 mov $0xd686173d,%edx
> 3b: 89 55 f0 mov %edx,-0x10(%rbp)
> 3e: 89 d6 mov %edx,%esi
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 0b ud2
> 2: b8 80 91 d7 d6 mov $0xd6d79180,%eax
> 7: e8 a8 de 68 00 call 0x68deb4
> c: ba 3d 17 86 d6 mov $0xd686173d,%edx
> 11: 89 55 f0 mov %edx,-0x10(%rbp)
> 14: 89 d6 mov %edx,%esi
> [ 13.500502][ T194] ------------[ cut here ]------------
> [ 13.502187][ T189] EAX: 00000052 EBX: d680da68 ECX: e0435480 EDX: 01000232
> [ 13.502666][ T194] kernel BUG at mm/usercopy.c:102!
> [ 13.503236][ T189] ESI: d686173d EDI: 00000000 EBP: ece37c44 ESP: ece37c10
> [ 13.504266][ T189] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
> [ 13.504856][ T189] CR0: 80050033 CR2: 0135eb6c CR3: 2beff000 CR4: 000406d0
> [ 13.505464][ T189] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 13.506083][ T189] DR6: fffe0ff0 DR7: 00000400
> [ 13.506495][ T189] Call Trace:
> [ 13.506795][ T189] ? show_regs (arch/x86/kernel/dumpstack.c:479)
> [ 13.507187][ T189] ? __die_body (arch/x86/kernel/dumpstack.c:421)
> [ 13.507576][ T189] ? die (arch/x86/kernel/dumpstack.c:449)
> [ 13.507894][ T189] ? do_trap (arch/x86/kernel/traps.c:114 arch/x86/kernel/traps.c:155)
> [ 13.508270][ T189] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359 kernel/locking/lockdep.c:4311)
> [ 13.508783][ T189] ? do_error_trap (arch/x86/include/asm/traps.h:58 arch/x86/kernel/traps.c:176)
> [ 13.509182][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> [ 13.509588][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252)
> [ 13.509983][ T189] ? exc_invalid_op (arch/x86/kernel/traps.c:267)
> [ 13.510396][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> [ 13.510797][ T189] ? handle_exception (arch/x86/entry/entry_32.S:1054)
> [ 13.511242][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252)
> [ 13.511646][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> [ 13.512070][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252)
> [ 13.512434][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> [ 13.512832][ T189] __check_object_size (mm/usercopy.c:180 mm/usercopy.c:251 mm/usercopy.c:213)
> [ 13.513275][ T189] simple_copy_to_iter (include/linux/uio.h:196 net/core/datagram.c:513)
> [ 13.513693][ T189] __skb_datagram_iter (net/core/datagram.c:424 (discriminator 1))
> [ 13.514138][ T189] skb_copy_datagram_iter (net/core/datagram.c:529)
> [ 13.514606][ T189] ? skb_free_datagram (net/core/datagram.c:512)
> [ 13.515028][ T189] ? scm_stat_del (net/unix/af_unix.c:2883)
> [ 13.515429][ T189] unix_stream_read_actor (net/unix/af_unix.c:2889)
> [ 13.515884][ T189] unix_stream_read_generic (net/unix/af_unix.c:2805)
> [ 13.516377][ T189] ? cma_for_each_area (mm/page_ext.c:518)
> [ 13.516826][ T189] ? unix_stream_splice_read (net/unix/af_unix.c:2907)
> [ 13.517301][ T189] unix_stream_recvmsg (net/unix/af_unix.c:2923)
> [ 13.517720][ T189] ? scm_stat_del (net/unix/af_unix.c:2883)
> [ 13.518108][ T189] ____sys_recvmsg (net/socket.c:1046 net/socket.c:1068 net/socket.c:2804)
> [ 13.518527][ T189] ? import_iovec (lib/iov_iter.c:1348)
> [ 13.518930][ T189] ? copy_msghdr_from_user (net/socket.c:2525)
> [ 13.519396][ T189] ___sys_recvmsg (net/socket.c:2846)
> [ 13.519811][ T189] ? __fdget (fs/file.c:1160)
> [ 13.520186][ T189] ? sockfd_lookup_light (net/socket.c:558)
> [ 13.520643][ T189] __sys_recvmsg (include/linux/file.h:34 net/socket.c:2878)
> [ 13.521046][ T189] __ia32_sys_socketcall (net/socket.c:3173 net/socket.c:3077 net/socket.c:3077)
> [ 13.521513][ T189] ia32_sys_call (arch/x86/entry/syscall_32.c:42)
> [ 13.521923][ T189] __do_fast_syscall_32 (arch/x86/entry/common.c:165 arch/x86/entry/common.c:386)
> [ 13.522362][ T189] do_fast_syscall_32 (arch/x86/entry/common.c:411)
> [ 13.522787][ T189] do_SYSENTER_32 (arch/x86/entry/common.c:450)
> [ 13.523188][ T189] entry_SYSENTER_32 (arch/x86/entry/entry_32.S:836)
> [ 13.523613][ T189] EIP: 0xb7ee6579
> [ 13.523933][ T189] Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
> All code
> ========
> 0: b8 01 10 06 03 mov $0x3061001,%eax
> 5: 74 b4 je 0xffffffffffffffbb
> 7: 01 10 add %edx,(%rax)
> 9: 07 (bad)
> a: 03 74 b0 01 add 0x1(%rax,%rsi,4),%esi
> e: 10 08 adc %cl,(%rax)
> 10: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi
> ...
> 20: 00 51 52 add %dl,0x52(%rcx)
> 23: 55 push %rbp
> 24:* 89 e5 mov %esp,%ebp <-- trapping instruction
> 26: 0f 34 sysenter
> 28: cd 80 int $0x80
> 2a: 5d pop %rbp
> 2b: 5a pop %rdx
> 2c: 59 pop %rcx
> 2d: c3 ret
> 2e: 90 nop
> 2f: 90 nop
> 30: 90 nop
> 31: 90 nop
> 32: 8d 76 00 lea 0x0(%rsi),%esi
> 35: 58 pop %rax
> 36: b8 77 00 00 00 mov $0x77,%eax
> 3b: cd 80 int $0x80
> 3d: 90 nop
> 3e: 8d .byte 0x8d
> 3f: 76 .byte 0x76
>
> Code starting with the faulting instruction
> ===========================================
> 0: 5d pop %rbp
> 1: 5a pop %rdx
> 2: 59 pop %rcx
> 3: c3 ret
> 4: 90 nop
> 5: 90 nop
> 6: 90 nop
> 7: 90 nop
> 8: 8d 76 00 lea 0x0(%rsi),%esi
> b: 58 pop %rax
> c: b8 77 00 00 00 mov $0x77,%eax
> 11: cd 80 int $0x80
> 13: 90 nop
> 14: 8d .byte 0x8d
> 15: 76 .byte 0x76
> [ 13.525624][ T189] EAX: ffffffda EBX: 00000011 ECX: bfdf5450 EDX: 00000000
> [ 13.526233][ T189] ESI: b7c46000 EDI: bfdf54ac EBP: bfdf54a8 ESP: bfdf5440
> [ 13.526853][ T189] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000286
> [ 13.527519][ T189] Modules linked in: i2c_piix4(+) agpgart(+) qemu_fw_cfg button fuse drm drm_panel_orientation_quirks ip_tables
> [ 13.528566][ T194] Oops: invalid opcode: 0000 [#2] PREEMPT SMP
> [ 13.528804][ T189] ---[ end trace 0000000000000000 ]---
> [ 13.529217][ T194] CPU: 1 PID: 194 Comm: systemctl Tainted: G D 6.10.0-rc2-00258-g18f0423b9ecc #1
> [ 13.529725][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> [ 13.530536][ T194] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 13.531060][ T189] Code: d6 89 44 24 0c 0f 45 cf 8b 7d 0c 89 74 24 10 89 4c 24 04 c7 04 24 a4 55 8a d6 89 7c 24 20 8b 7d 08 89 7c 24 1c e8 20 3c df ff <0f> 0b b8 80 91 d7 d6 e8 a8 de 68 00 ba 3d 17 86 d6 89 55 f0 89 d6
> All code
> ========
> 0: d6 (bad)
> 1: 89 44 24 0c mov %eax,0xc(%rsp)
> 5: 0f 45 cf cmovne %edi,%ecx
> 8: 8b 7d 0c mov 0xc(%rbp),%edi
> b: 89 74 24 10 mov %esi,0x10(%rsp)
> f: 89 4c 24 04 mov %ecx,0x4(%rsp)
> 13: c7 04 24 a4 55 8a d6 movl $0xd68a55a4,(%rsp)
> 1a: 89 7c 24 20 mov %edi,0x20(%rsp)
> 1e: 8b 7d 08 mov 0x8(%rbp),%edi
> 21: 89 7c 24 1c mov %edi,0x1c(%rsp)
> 25: e8 20 3c df ff call 0xffffffffffdf3c4a
> 2a:* 0f 0b ud2 <-- trapping instruction
> 2c: b8 80 91 d7 d6 mov $0xd6d79180,%eax
> 31: e8 a8 de 68 00 call 0x68dede
> 36: ba 3d 17 86 d6 mov $0xd686173d,%edx
> 3b: 89 55 f0 mov %edx,-0x10(%rbp)
> 3e: 89 d6 mov %edx,%esi
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 0b ud2
> 2: b8 80 91 d7 d6 mov $0xd6d79180,%eax
> 7: e8 a8 de 68 00 call 0x68deb4
> c: ba 3d 17 86 d6 mov $0xd686173d,%edx
> 11: 89 55 f0 mov %edx,-0x10(%rbp)
> 14: 89 d6 mov %edx,%esi
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240616/202406161539.b5ff7b20-oliver.sang@intel.com
>
>
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-16 9:24 ` Sagi Grimberg
@ 2024-06-16 21:51 ` David Laight
2024-06-16 21:53 ` Matthew Wilcox
2024-06-16 21:56 ` Matthew Wilcox
1 sibling, 1 reply; 17+ messages in thread
From: David Laight @ 2024-06-16 21:51 UTC (permalink / raw)
To: 'Sagi Grimberg', kernel test robot, Matthew Wilcox
Cc: oe-lkp@lists.linux.dev, lkp@intel.com, netdev@vger.kernel.org,
Eric Dumazet, Jakub Kicinski
From: Sagi Grimberg
> Sent: 16 June 2024 10:24
...
> > [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> > [ 13.499424][ T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size
> 8192)!
>
> Hmm, not sure I understand exactly why changing kmap() to
> kmap_local_page() expose this,
> but it looks like mm/usercopy does not like size=8192 when copying for
> the skb frag.
Can't a usercopy fault and have to read the page from swap?
So the process can sleep and then be rescheduled on a different cpu?
So you can't use kmap_local_page() here at all.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-16 21:51 ` David Laight
@ 2024-06-16 21:53 ` Matthew Wilcox
2024-06-17 10:18 ` David Laight
0 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2024-06-16 21:53 UTC (permalink / raw)
To: David Laight
Cc: 'Sagi Grimberg', kernel test robot,
oe-lkp@lists.linux.dev, lkp@intel.com, netdev@vger.kernel.org,
Eric Dumazet, Jakub Kicinski
On Sun, Jun 16, 2024 at 09:51:05PM +0000, David Laight wrote:
> From: Sagi Grimberg
> > Sent: 16 June 2024 10:24
> ...
> > > [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> > > [ 13.499424][ T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size
> > 8192)!
> >
> > Hmm, not sure I understand exactly why changing kmap() to
> > kmap_local_page() expose this,
> > but it looks like mm/usercopy does not like size=8192 when copying for
> > the skb frag.
>
> Can't a usercopy fault and have to read the page from swap?
> So the process can sleep and then be rescheduled on a different cpu?
> So you can't use kmap_local_page() here at all.
I don't think you understand how kmap_local_page() works.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-16 9:24 ` Sagi Grimberg
2024-06-16 21:51 ` David Laight
@ 2024-06-16 21:56 ` Matthew Wilcox
2024-06-17 6:29 ` Sagi Grimberg
1 sibling, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2024-06-16 21:56 UTC (permalink / raw)
To: Sagi Grimberg
Cc: kernel test robot, oe-lkp, lkp, netdev, Eric Dumazet,
Jakub Kicinski
On Sun, Jun 16, 2024 at 12:24:28PM +0300, Sagi Grimberg wrote:
> > [ 13.495377][ T189] ------------[ cut here ]------------
> > [ 13.495862][ T189] kernel BUG at mm/usercopy.c:102!
> > [ 13.496372][ T189] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
> > [ 13.496927][ T189] CPU: 0 PID: 189 Comm: systemctl Not tainted 6.10.0-rc2-00258-g18f0423b9ecc #1
> > [ 13.497741][ T189] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> > [ 13.499424][ T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size 8192)!
>
> Hmm, not sure I understand exactly why changing kmap() to kmap_local_page()
> expose this,
> but it looks like mm/usercopy does not like size=8192 when copying for the
> skb frag.
>
> quick git browse directs to:
> --
> commit 4e140f59d285c1ca1e5c81b4c13e27366865bd09
> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
> Date: Mon Jan 10 23:15:27 2022 +0000
>
> mm/usercopy: Check kmap addresses properly
>
> If you are copying to an address in the kmap region, you may not copy
> across a page boundary, no matter what the size of the underlying
> allocation. You can't kmap() a slab page because slab pages always
> come from low memory.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Acked-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> Link:
> https://lore.kernel.org/r/20220110231530.665970-2-willy@infradead.org
> --
>
> CCing willy.
>
> The documentation suggest that under single-context usage kmap() can be
> freely converted
> to kmap_local_page()? But seems that when using kmap() the size is not an
> issue, still trying to
> understand why.
Probably because kmap() returns page_address() for non-highmem pages
while kmap_local_page() actually returns a kmap address:
if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
return page_address(page);
return __kmap_local_pfn_prot(page_to_pfn(page), prot);
so if skb frags are always lowmem (are they?) this is a false positive.
if they can be highmem, then you've uncovered a bug that nobody's
noticed because nobody's testing on 32-bit any more.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-16 21:56 ` Matthew Wilcox
@ 2024-06-17 6:29 ` Sagi Grimberg
2024-06-17 16:58 ` Jakub Kicinski
2024-06-19 12:46 ` David Howells
0 siblings, 2 replies; 17+ messages in thread
From: Sagi Grimberg @ 2024-06-17 6:29 UTC (permalink / raw)
To: Matthew Wilcox
Cc: kernel test robot, oe-lkp, lkp, netdev, Eric Dumazet,
Jakub Kicinski
On 17/06/2024 0:56, Matthew Wilcox wrote:
> On Sun, Jun 16, 2024 at 12:24:28PM +0300, Sagi Grimberg wrote:
>>> [ 13.495377][ T189] ------------[ cut here ]------------
>>> [ 13.495862][ T189] kernel BUG at mm/usercopy.c:102!
>>> [ 13.496372][ T189] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
>>> [ 13.496927][ T189] CPU: 0 PID: 189 Comm: systemctl Not tainted 6.10.0-rc2-00258-g18f0423b9ecc #1
>>> [ 13.497741][ T189] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>>> [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
>>> [ 13.499424][ T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size 8192)!
>> Hmm, not sure I understand exactly why changing kmap() to kmap_local_page()
>> expose this,
>> but it looks like mm/usercopy does not like size=8192 when copying for the
>> skb frag.
>>
>> quick git browse directs to:
>> --
>> commit 4e140f59d285c1ca1e5c81b4c13e27366865bd09
>> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
>> Date: Mon Jan 10 23:15:27 2022 +0000
>>
>> mm/usercopy: Check kmap addresses properly
>>
>> If you are copying to an address in the kmap region, you may not copy
>> across a page boundary, no matter what the size of the underlying
>> allocation. You can't kmap() a slab page because slab pages always
>> come from low memory.
>>
>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>> Acked-by: Kees Cook <keescook@chromium.org>
>> Signed-off-by: Kees Cook <keescook@chromium.org>
>> Link:
>> https://lore.kernel.org/r/20220110231530.665970-2-willy@infradead.org
>> --
>>
>> CCing willy.
>>
>> The documentation suggest that under single-context usage kmap() can be
>> freely converted
>> to kmap_local_page()? But seems that when using kmap() the size is not an
>> issue, still trying to
>> understand why.
> Probably because kmap() returns page_address() for non-highmem pages
> while kmap_local_page() actually returns a kmap address:
>
> if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
> return page_address(page);
> return __kmap_local_pfn_prot(page_to_pfn(page), prot);
>
> so if skb frags are always lowmem (are they?) this is a false positive.
AFAIR these buffers are coming from the RX ring, so they should be
coming from a page_frag_cache,
so I want to say always low memory?
> if they can be highmem, then you've uncovered a bug that nobody's
> noticed because nobody's testing on 32-bit any more.
Not sure, Jakub? Eric?
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-16 21:53 ` Matthew Wilcox
@ 2024-06-17 10:18 ` David Laight
0 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2024-06-17 10:18 UTC (permalink / raw)
To: 'Matthew Wilcox'
Cc: 'Sagi Grimberg', kernel test robot,
oe-lkp@lists.linux.dev, lkp@intel.com, netdev@vger.kernel.org,
Eric Dumazet, Jakub Kicinski
From: Matthew Wilcox
> Sent: 16 June 2024 22:53
>
> On Sun, Jun 16, 2024 at 09:51:05PM +0000, David Laight wrote:
> > From: Sagi Grimberg
> > > Sent: 16 June 2024 10:24
> > ...
> > > > [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> > > > [ 13.499424][ T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0,
> size
> > > 8192)!
> > >
> > > Hmm, not sure I understand exactly why changing kmap() to
> > > kmap_local_page() expose this,
> > > but it looks like mm/usercopy does not like size=8192 when copying for
> > > the skb frag.
> >
> > Can't a usercopy fault and have to read the page from swap?
> > So the process can sleep and then be rescheduled on a different cpu?
> > So you can't use kmap_local_page() here at all.
>
> I don't think you understand how kmap_local_page() works.
Quite likely :-)
But I thought it was a cheap way of temporarily mapping a physical memory
page into the current cpu's page tables without having to do any IPI to
tell other cpu about the insert or removal?
Which would require that the process not be migrated, which pretty much
implies that pre-emption be disabled.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-17 6:29 ` Sagi Grimberg
@ 2024-06-17 16:58 ` Jakub Kicinski
2024-06-18 6:37 ` Sagi Grimberg
2024-06-19 12:46 ` David Howells
1 sibling, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-06-17 16:58 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Matthew Wilcox, kernel test robot, oe-lkp, lkp, netdev,
Eric Dumazet, David Howells
On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote:
> > Probably because kmap() returns page_address() for non-highmem pages
> > while kmap_local_page() actually returns a kmap address:
> >
> > if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
> > return page_address(page);
> > return __kmap_local_pfn_prot(page_to_pfn(page), prot);
> >
> > so if skb frags are always lowmem (are they?) this is a false positive.
>
> AFAIR these buffers are coming from the RX ring, so they should be
> coming from a page_frag_cache,
> so I want to say always low memory?
>
> > if they can be highmem, then you've uncovered a bug that nobody's
> > noticed because nobody's testing on 32-bit any more.
>
> Not sure, Jakub? Eric?
My uneducated guess would be that until recent(ish) sendpage rework
from David Howells all high mem pages would have been single pages.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-17 16:58 ` Jakub Kicinski
@ 2024-06-18 6:37 ` Sagi Grimberg
0 siblings, 0 replies; 17+ messages in thread
From: Sagi Grimberg @ 2024-06-18 6:37 UTC (permalink / raw)
To: Jakub Kicinski, David Howells
Cc: Matthew Wilcox, kernel test robot, oe-lkp, lkp, netdev,
Eric Dumazet
On 17/06/2024 19:58, Jakub Kicinski wrote:
> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote:
>>> Probably because kmap() returns page_address() for non-highmem pages
>>> while kmap_local_page() actually returns a kmap address:
>>>
>>> if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
>>> return page_address(page);
>>> return __kmap_local_pfn_prot(page_to_pfn(page), prot);
>>>
>>> so if skb frags are always lowmem (are they?) this is a false positive.
>> AFAIR these buffers are coming from the RX ring, so they should be
>> coming from a page_frag_cache,
>> so I want to say always low memory?
>>
>>> if they can be highmem, then you've uncovered a bug that nobody's
>>> noticed because nobody's testing on 32-bit any more.
>> Not sure, Jakub? Eric?
> My uneducated guess would be that until recent(ish) sendpage rework
> from David Howells all high mem pages would have been single pages.
dHowells?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-17 6:29 ` Sagi Grimberg
2024-06-17 16:58 ` Jakub Kicinski
@ 2024-06-19 12:46 ` David Howells
2024-06-19 13:54 ` Sagi Grimberg
1 sibling, 1 reply; 17+ messages in thread
From: David Howells @ 2024-06-19 12:46 UTC (permalink / raw)
To: Jakub Kicinski
Cc: dhowells, Sagi Grimberg, Matthew Wilcox, kernel test robot,
oe-lkp, lkp, netdev, Eric Dumazet
Jakub Kicinski <kuba@kernel.org> wrote:
> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote:
> > > Probably because kmap() returns page_address() for non-highmem pages
> > > while kmap_local_page() actually returns a kmap address:
> > >
> > > if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
> > > return page_address(page);
> > > return __kmap_local_pfn_prot(page_to_pfn(page), prot);
> > >
> > > so if skb frags are always lowmem (are they?) this is a false positive.
> >
> > AFAIR these buffers are coming from the RX ring, so they should be
> > coming from a page_frag_cache,
> > so I want to say always low memory?
> >
> > > if they can be highmem, then you've uncovered a bug that nobody's
> > > noticed because nobody's testing on 32-bit any more.
> >
> > Not sure, Jakub? Eric?
>
> My uneducated guess would be that until recent(ish) sendpage rework
> from David Howells all high mem pages would have been single pages.
Um. I touched the Tx side, not the Rx side.
I also don't know whether all high mem pages would be single pages. I'll have
to defer that one to the MM folks.
David
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-19 12:46 ` David Howells
@ 2024-06-19 13:54 ` Sagi Grimberg
2024-06-19 14:51 ` Eric Dumazet
2024-06-21 7:54 ` David Howells
0 siblings, 2 replies; 17+ messages in thread
From: Sagi Grimberg @ 2024-06-19 13:54 UTC (permalink / raw)
To: David Howells, Jakub Kicinski
Cc: Matthew Wilcox, kernel test robot, oe-lkp, lkp, netdev,
Eric Dumazet
On 19/06/2024 15:46, David Howells wrote:
> Jakub Kicinski <kuba@kernel.org> wrote:
>
>> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote:
>>>> Probably because kmap() returns page_address() for non-highmem pages
>>>> while kmap_local_page() actually returns a kmap address:
>>>>
>>>> if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
>>>> return page_address(page);
>>>> return __kmap_local_pfn_prot(page_to_pfn(page), prot);
>>>>
>>>> so if skb frags are always lowmem (are they?) this is a false positive.
>>> AFAIR these buffers are coming from the RX ring, so they should be
>>> coming from a page_frag_cache,
>>> so I want to say always low memory?
>>>
>>>> if they can be highmem, then you've uncovered a bug that nobody's
>>>> noticed because nobody's testing on 32-bit any more.
>>> Not sure, Jakub? Eric?
>> My uneducated guess would be that until recent(ish) sendpage rework
>> from David Howells all high mem pages would have been single pages.
> Um. I touched the Tx side, not the Rx side.
>
> I also don't know whether all high mem pages would be single pages. I'll have
> to defer that one to the MM folks.
What prevents from gro to expand frags from crossing PAGE_SIZE?
btw, at least from the code in skb_gro_receive() it appears that
page_address() is called directly,
which suggest that these netmem pages are lowmem?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-19 13:54 ` Sagi Grimberg
@ 2024-06-19 14:51 ` Eric Dumazet
2024-06-19 14:56 ` Eric Dumazet
2024-06-19 15:25 ` Sagi Grimberg
2024-06-21 7:54 ` David Howells
1 sibling, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2024-06-19 14:51 UTC (permalink / raw)
To: Sagi Grimberg
Cc: David Howells, Jakub Kicinski, Matthew Wilcox, kernel test robot,
oe-lkp, lkp, netdev
On Wed, Jun 19, 2024 at 3:54 PM Sagi Grimberg <sagi@grimberg.me> wrote:
>
>
>
> On 19/06/2024 15:46, David Howells wrote:
> > Jakub Kicinski <kuba@kernel.org> wrote:
> >
> >> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote:
> >>>> Probably because kmap() returns page_address() for non-highmem pages
> >>>> while kmap_local_page() actually returns a kmap address:
> >>>>
> >>>> if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
> >>>> return page_address(page);
> >>>> return __kmap_local_pfn_prot(page_to_pfn(page), prot);
> >>>>
> >>>> so if skb frags are always lowmem (are they?) this is a false positive.
> >>> AFAIR these buffers are coming from the RX ring, so they should be
> >>> coming from a page_frag_cache,
> >>> so I want to say always low memory?
> >>>
> >>>> if they can be highmem, then you've uncovered a bug that nobody's
> >>>> noticed because nobody's testing on 32-bit any more.
> >>> Not sure, Jakub? Eric?
> >> My uneducated guess would be that until recent(ish) sendpage rework
> >> from David Howells all high mem pages would have been single pages.
> > Um. I touched the Tx side, not the Rx side.
> >
> > I also don't know whether all high mem pages would be single pages. I'll have
> > to defer that one to the MM folks.
>
> What prevents from gro to expand frags from crossing PAGE_SIZE?
>
> btw, at least from the code in skb_gro_receive() it appears that
> page_address() is called directly,
> which suggest that these netmem pages are lowmem?
GRO should only be fed with lowmem pages.
But the trace involves af_unix, not GRO ?
I guess that with splice games, it is possible to add high order pages to skbs.
I think skb_frag_foreach_page() could be used to fix this issue.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-19 14:51 ` Eric Dumazet
@ 2024-06-19 14:56 ` Eric Dumazet
2024-06-19 15:25 ` Sagi Grimberg
1 sibling, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2024-06-19 14:56 UTC (permalink / raw)
To: Sagi Grimberg
Cc: David Howells, Jakub Kicinski, Matthew Wilcox, kernel test robot,
oe-lkp, lkp, netdev, Willem de Bruijn
On Wed, Jun 19, 2024 at 4:51 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Jun 19, 2024 at 3:54 PM Sagi Grimberg <sagi@grimberg.me> wrote:
> >
> >
> >
> > On 19/06/2024 15:46, David Howells wrote:
> > > Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > >> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote:
> > >>>> Probably because kmap() returns page_address() for non-highmem pages
> > >>>> while kmap_local_page() actually returns a kmap address:
> > >>>>
> > >>>> if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
> > >>>> return page_address(page);
> > >>>> return __kmap_local_pfn_prot(page_to_pfn(page), prot);
> > >>>>
> > >>>> so if skb frags are always lowmem (are they?) this is a false positive.
> > >>> AFAIR these buffers are coming from the RX ring, so they should be
> > >>> coming from a page_frag_cache,
> > >>> so I want to say always low memory?
> > >>>
> > >>>> if they can be highmem, then you've uncovered a bug that nobody's
> > >>>> noticed because nobody's testing on 32-bit any more.
> > >>> Not sure, Jakub? Eric?
> > >> My uneducated guess would be that until recent(ish) sendpage rework
> > >> from David Howells all high mem pages would have been single pages.
> > > Um. I touched the Tx side, not the Rx side.
> > >
> > > I also don't know whether all high mem pages would be single pages. I'll have
> > > to defer that one to the MM folks.
> >
> > What prevents from gro to expand frags from crossing PAGE_SIZE?
> >
> > btw, at least from the code in skb_gro_receive() it appears that
> > page_address() is called directly,
> > which suggest that these netmem pages are lowmem?
>
> GRO should only be fed with lowmem pages.
>
> But the trace involves af_unix, not GRO ?
>
> I guess that with splice games, it is possible to add high order pages to skbs.
>
> I think skb_frag_foreach_page() could be used to fix this issue.
For reference, please look at
commit c613c209c3f351d47158f728271d0c73b6dd24c6
Author: Willem de Bruijn <willemb@google.com>
Date: Mon Jul 31 08:15:47 2017 -0400
net: add skb_frag_foreach_page and use with kmap_atomic
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-19 14:51 ` Eric Dumazet
2024-06-19 14:56 ` Eric Dumazet
@ 2024-06-19 15:25 ` Sagi Grimberg
1 sibling, 0 replies; 17+ messages in thread
From: Sagi Grimberg @ 2024-06-19 15:25 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Howells, Jakub Kicinski, Matthew Wilcox, kernel test robot,
oe-lkp, lkp, netdev
On 19/06/2024 17:51, Eric Dumazet wrote:
> On Wed, Jun 19, 2024 at 3:54 PM Sagi Grimberg <sagi@grimberg.me> wrote:
>>
>>
>> On 19/06/2024 15:46, David Howells wrote:
>>> Jakub Kicinski <kuba@kernel.org> wrote:
>>>
>>>> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote:
>>>>>> Probably because kmap() returns page_address() for non-highmem pages
>>>>>> while kmap_local_page() actually returns a kmap address:
>>>>>>
>>>>>> if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
>>>>>> return page_address(page);
>>>>>> return __kmap_local_pfn_prot(page_to_pfn(page), prot);
>>>>>>
>>>>>> so if skb frags are always lowmem (are they?) this is a false positive.
>>>>> AFAIR these buffers are coming from the RX ring, so they should be
>>>>> coming from a page_frag_cache,
>>>>> so I want to say always low memory?
>>>>>
>>>>>> if they can be highmem, then you've uncovered a bug that nobody's
>>>>>> noticed because nobody's testing on 32-bit any more.
>>>>> Not sure, Jakub? Eric?
>>>> My uneducated guess would be that until recent(ish) sendpage rework
>>>> from David Howells all high mem pages would have been single pages.
>>> Um. I touched the Tx side, not the Rx side.
>>>
>>> I also don't know whether all high mem pages would be single pages. I'll have
>>> to defer that one to the MM folks.
>> What prevents from gro to expand frags from crossing PAGE_SIZE?
>>
>> btw, at least from the code in skb_gro_receive() it appears that
>> page_address() is called directly,
>> which suggest that these netmem pages are lowmem?
> GRO should only be fed with lowmem pages.
>
> But the trace involves af_unix, not GRO ?
>
> I guess that with splice games, it is possible to add high order pages to skbs.
>
> I think skb_frag_foreach_page() could be used to fix this issue.
Thanks Eric, that is indeed the missing piece.
I'll give it go.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] net: micro-optimize skb_datagram_iter
2024-06-19 13:54 ` Sagi Grimberg
2024-06-19 14:51 ` Eric Dumazet
@ 2024-06-21 7:54 ` David Howells
1 sibling, 0 replies; 17+ messages in thread
From: David Howells @ 2024-06-21 7:54 UTC (permalink / raw)
To: Eric Dumazet
Cc: dhowells, Sagi Grimberg, Jakub Kicinski, Matthew Wilcox,
kernel test robot, oe-lkp, lkp, netdev
Eric Dumazet <edumazet@google.com> wrote:
> But the trace involves af_unix, not GRO ?
Ah. splice()/vmsplice() can feed multipage folios into the socket. Possibly
those can be in highmem.
David
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-06-21 7:54 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 11:35 [PATCH] net: micro-optimize skb_datagram_iter Sagi Grimberg
2024-06-15 2:40 ` patchwork-bot+netdevbpf
2024-06-16 8:06 ` kernel test robot
2024-06-16 9:24 ` Sagi Grimberg
2024-06-16 21:51 ` David Laight
2024-06-16 21:53 ` Matthew Wilcox
2024-06-17 10:18 ` David Laight
2024-06-16 21:56 ` Matthew Wilcox
2024-06-17 6:29 ` Sagi Grimberg
2024-06-17 16:58 ` Jakub Kicinski
2024-06-18 6:37 ` Sagi Grimberg
2024-06-19 12:46 ` David Howells
2024-06-19 13:54 ` Sagi Grimberg
2024-06-19 14:51 ` Eric Dumazet
2024-06-19 14:56 ` Eric Dumazet
2024-06-19 15:25 ` Sagi Grimberg
2024-06-21 7:54 ` David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).