netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net: micro-optimize skb_datagram_iter
@ 2024-06-13 11:35 Sagi Grimberg
  2024-06-15  2:40 ` patchwork-bot+netdevbpf
  2024-06-16  8:06 ` kernel test robot
  0 siblings, 2 replies; 17+ messages in thread
From: Sagi Grimberg @ 2024-06-13 11:35 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, Jakub Kicinski

We only use the mapping in a single context in a short and contained scope,
so kmap_local_page is sufficient and cheaper. This will also allow
skb_datagram_iter to be called from softirq context.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 net/core/datagram.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/datagram.c b/net/core/datagram.c
index a8b625abe242..ac74df248205 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -436,14 +436,14 @@ static int __skb_datagram_iter(const struct sk_buff *skb, int offset,
 		end = start + skb_frag_size(frag);
 		if ((copy = end - offset) > 0) {
 			struct page *page = skb_frag_page(frag);
-			u8 *vaddr = kmap(page);
+			u8 *vaddr = kmap_local_page(page);
 
 			if (copy > len)
 				copy = len;
 			n = INDIRECT_CALL_1(cb, simple_copy_to_iter,
 					vaddr + skb_frag_off(frag) + offset - start,
 					copy, data, to);
-			kunmap(page);
+			kunmap_local(vaddr);
 			offset += n;
 			if (n != copy)
 				goto short_copy;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-13 11:35 [PATCH] net: micro-optimize skb_datagram_iter Sagi Grimberg
@ 2024-06-15  2:40 ` patchwork-bot+netdevbpf
  2024-06-16  8:06 ` kernel test robot
  1 sibling, 0 replies; 17+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-06-15  2:40 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: netdev, edumazet, kuba

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 13 Jun 2024 14:35:04 +0300 you wrote:
> We only use the mapping in a single context in a short and contained scope,
> so kmap_local_page is sufficient and cheaper. This will also allow
> skb_datagram_iter to be called from softirq context.
> 
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>  net/core/datagram.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Here is the summary with links:
  - net: micro-optimize skb_datagram_iter
    https://git.kernel.org/netdev/net-next/c/934c29999b57

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-13 11:35 [PATCH] net: micro-optimize skb_datagram_iter Sagi Grimberg
  2024-06-15  2:40 ` patchwork-bot+netdevbpf
@ 2024-06-16  8:06 ` kernel test robot
  2024-06-16  9:24   ` Sagi Grimberg
  1 sibling, 1 reply; 17+ messages in thread
From: kernel test robot @ 2024-06-16  8:06 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: oe-lkp, lkp, netdev, Eric Dumazet, Jakub Kicinski, oliver.sang



Hello,

kernel test robot noticed "kernel_BUG_at_mm/usercopy.c" on:

commit: 18f0423b9eccb781310af6709ceebf654175af14 ("[PATCH] net: micro-optimize skb_datagram_iter")
url: https://github.com/intel-lab-lkp/linux/commits/Sagi-Grimberg/net-micro-optimize-skb_datagram_iter/20240613-193620
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net.git b60b1bdc1888f51da7a2a22c48c5f1eb2bd12e97
patch link: https://lore.kernel.org/all/20240613113504.1079860-1-sagi@grimberg.me/
patch subject: [PATCH] net: micro-optimize skb_datagram_iter

in testcase: boot

compiler: gcc-8
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+------------------------------------------+------------+------------+
|                                          | b60b1bdc18 | 18f0423b9e |
+------------------------------------------+------------+------------+
| boot_successes                           | 6          | 0          |
| boot_failures                            | 0          | 6          |
| kernel_BUG_at_mm/usercopy.c              | 0          | 6          |
| Oops:invalid_opcode:#[##]PREEMPT_SMP     | 0          | 6          |
| EIP:usercopy_abort                       | 0          | 6          |
| Kernel_panic-not_syncing:Fatal_exception | 0          | 6          |
+------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202406161539.b5ff7b20-oliver.sang@intel.com


[   13.495377][  T189] ------------[ cut here ]------------
[   13.495862][  T189] kernel BUG at mm/usercopy.c:102!
[   13.496372][  T189] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
[   13.496927][  T189] CPU: 0 PID: 189 Comm: systemctl Not tainted 6.10.0-rc2-00258-g18f0423b9ecc #1
[   13.497741][  T189] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12)) 
[   13.499424][  T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size 8192)!
[ 13.499647][ T189] Code: d6 89 44 24 0c 0f 45 cf 8b 7d 0c 89 74 24 10 89 4c 24 04 c7 04 24 a4 55 8a d6 89 7c 24 20 8b 7d 08 89 7c 24 1c e8 20 3c df ff <0f> 0b b8 80 91 d7 d6 e8 a8 de 68 00 ba 3d 17 86 d6 89 55 f0 89 d6
All code
========
   0:	d6                   	(bad)
   1:	89 44 24 0c          	mov    %eax,0xc(%rsp)
   5:	0f 45 cf             	cmovne %edi,%ecx
   8:	8b 7d 0c             	mov    0xc(%rbp),%edi
   b:	89 74 24 10          	mov    %esi,0x10(%rsp)
   f:	89 4c 24 04          	mov    %ecx,0x4(%rsp)
  13:	c7 04 24 a4 55 8a d6 	movl   $0xd68a55a4,(%rsp)
  1a:	89 7c 24 20          	mov    %edi,0x20(%rsp)
  1e:	8b 7d 08             	mov    0x8(%rbp),%edi
  21:	89 7c 24 1c          	mov    %edi,0x1c(%rsp)
  25:	e8 20 3c df ff       	call   0xffffffffffdf3c4a
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	b8 80 91 d7 d6       	mov    $0xd6d79180,%eax
  31:	e8 a8 de 68 00       	call   0x68dede
  36:	ba 3d 17 86 d6       	mov    $0xd686173d,%edx
  3b:	89 55 f0             	mov    %edx,-0x10(%rbp)
  3e:	89 d6                	mov    %edx,%esi

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	b8 80 91 d7 d6       	mov    $0xd6d79180,%eax
   7:	e8 a8 de 68 00       	call   0x68deb4
   c:	ba 3d 17 86 d6       	mov    $0xd686173d,%edx
  11:	89 55 f0             	mov    %edx,-0x10(%rbp)
  14:	89 d6                	mov    %edx,%esi
[   13.500502][  T194] ------------[ cut here ]------------
[   13.502187][  T189] EAX: 00000052 EBX: d680da68 ECX: e0435480 EDX: 01000232
[   13.502666][  T194] kernel BUG at mm/usercopy.c:102!
[   13.503236][  T189] ESI: d686173d EDI: 00000000 EBP: ece37c44 ESP: ece37c10
[   13.504266][  T189] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
[   13.504856][  T189] CR0: 80050033 CR2: 0135eb6c CR3: 2beff000 CR4: 000406d0
[   13.505464][  T189] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   13.506083][  T189] DR6: fffe0ff0 DR7: 00000400
[   13.506495][  T189] Call Trace:
[ 13.506795][ T189] ? show_regs (arch/x86/kernel/dumpstack.c:479) 
[ 13.507187][ T189] ? __die_body (arch/x86/kernel/dumpstack.c:421) 
[ 13.507576][ T189] ? die (arch/x86/kernel/dumpstack.c:449) 
[ 13.507894][ T189] ? do_trap (arch/x86/kernel/traps.c:114 arch/x86/kernel/traps.c:155) 
[ 13.508270][ T189] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359 kernel/locking/lockdep.c:4311) 
[ 13.508783][ T189] ? do_error_trap (arch/x86/include/asm/traps.h:58 arch/x86/kernel/traps.c:176) 
[ 13.509182][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12)) 
[ 13.509588][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252) 
[ 13.509983][ T189] ? exc_invalid_op (arch/x86/kernel/traps.c:267) 
[ 13.510396][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12)) 
[ 13.510797][ T189] ? handle_exception (arch/x86/entry/entry_32.S:1054) 
[ 13.511242][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252) 
[ 13.511646][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12)) 
[ 13.512070][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252) 
[ 13.512434][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12)) 
[ 13.512832][ T189] __check_object_size (mm/usercopy.c:180 mm/usercopy.c:251 mm/usercopy.c:213) 
[ 13.513275][ T189] simple_copy_to_iter (include/linux/uio.h:196 net/core/datagram.c:513) 
[ 13.513693][ T189] __skb_datagram_iter (net/core/datagram.c:424 (discriminator 1)) 
[ 13.514138][ T189] skb_copy_datagram_iter (net/core/datagram.c:529) 
[ 13.514606][ T189] ? skb_free_datagram (net/core/datagram.c:512) 
[ 13.515028][ T189] ? scm_stat_del (net/unix/af_unix.c:2883) 
[ 13.515429][ T189] unix_stream_read_actor (net/unix/af_unix.c:2889) 
[ 13.515884][ T189] unix_stream_read_generic (net/unix/af_unix.c:2805) 
[ 13.516377][ T189] ? cma_for_each_area (mm/page_ext.c:518) 
[ 13.516826][ T189] ? unix_stream_splice_read (net/unix/af_unix.c:2907) 
[ 13.517301][ T189] unix_stream_recvmsg (net/unix/af_unix.c:2923) 
[ 13.517720][ T189] ? scm_stat_del (net/unix/af_unix.c:2883) 
[ 13.518108][ T189] ____sys_recvmsg (net/socket.c:1046 net/socket.c:1068 net/socket.c:2804) 
[ 13.518527][ T189] ? import_iovec (lib/iov_iter.c:1348) 
[ 13.518930][ T189] ? copy_msghdr_from_user (net/socket.c:2525) 
[ 13.519396][ T189] ___sys_recvmsg (net/socket.c:2846) 
[ 13.519811][ T189] ? __fdget (fs/file.c:1160) 
[ 13.520186][ T189] ? sockfd_lookup_light (net/socket.c:558) 
[ 13.520643][ T189] __sys_recvmsg (include/linux/file.h:34 net/socket.c:2878) 
[ 13.521046][ T189] __ia32_sys_socketcall (net/socket.c:3173 net/socket.c:3077 net/socket.c:3077) 
[ 13.521513][ T189] ia32_sys_call (arch/x86/entry/syscall_32.c:42) 
[ 13.521923][ T189] __do_fast_syscall_32 (arch/x86/entry/common.c:165 arch/x86/entry/common.c:386) 
[ 13.522362][ T189] do_fast_syscall_32 (arch/x86/entry/common.c:411) 
[ 13.522787][ T189] do_SYSENTER_32 (arch/x86/entry/common.c:450) 
[ 13.523188][ T189] entry_SYSENTER_32 (arch/x86/entry/entry_32.S:836) 
[   13.523613][  T189] EIP: 0xb7ee6579
[ 13.523933][ T189] Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
All code
========
   0:	b8 01 10 06 03       	mov    $0x3061001,%eax
   5:	74 b4                	je     0xffffffffffffffbb
   7:	01 10                	add    %edx,(%rax)
   9:	07                   	(bad)
   a:	03 74 b0 01          	add    0x1(%rax,%rsi,4),%esi
   e:	10 08                	adc    %cl,(%rax)
  10:	03 74 d8 01          	add    0x1(%rax,%rbx,8),%esi
	...
  20:	00 51 52             	add    %dl,0x52(%rcx)
  23:	55                   	push   %rbp
  24:*	89 e5                	mov    %esp,%ebp		<-- trapping instruction
  26:	0f 34                	sysenter
  28:	cd 80                	int    $0x80
  2a:	5d                   	pop    %rbp
  2b:	5a                   	pop    %rdx
  2c:	59                   	pop    %rcx
  2d:	c3                   	ret
  2e:	90                   	nop
  2f:	90                   	nop
  30:	90                   	nop
  31:	90                   	nop
  32:	8d 76 00             	lea    0x0(%rsi),%esi
  35:	58                   	pop    %rax
  36:	b8 77 00 00 00       	mov    $0x77,%eax
  3b:	cd 80                	int    $0x80
  3d:	90                   	nop
  3e:	8d                   	.byte 0x8d
  3f:	76                   	.byte 0x76

Code starting with the faulting instruction
===========================================
   0:	5d                   	pop    %rbp
   1:	5a                   	pop    %rdx
   2:	59                   	pop    %rcx
   3:	c3                   	ret
   4:	90                   	nop
   5:	90                   	nop
   6:	90                   	nop
   7:	90                   	nop
   8:	8d 76 00             	lea    0x0(%rsi),%esi
   b:	58                   	pop    %rax
   c:	b8 77 00 00 00       	mov    $0x77,%eax
  11:	cd 80                	int    $0x80
  13:	90                   	nop
  14:	8d                   	.byte 0x8d
  15:	76                   	.byte 0x76
[   13.525624][  T189] EAX: ffffffda EBX: 00000011 ECX: bfdf5450 EDX: 00000000
[   13.526233][  T189] ESI: b7c46000 EDI: bfdf54ac EBP: bfdf54a8 ESP: bfdf5440
[   13.526853][  T189] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000286
[   13.527519][  T189] Modules linked in: i2c_piix4(+) agpgart(+) qemu_fw_cfg button fuse drm drm_panel_orientation_quirks ip_tables
[   13.528566][  T194] Oops: invalid opcode: 0000 [#2] PREEMPT SMP
[   13.528804][  T189] ---[ end trace 0000000000000000 ]---
[   13.529217][  T194] CPU: 1 PID: 194 Comm: systemctl Tainted: G      D            6.10.0-rc2-00258-g18f0423b9ecc #1
[ 13.529725][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12)) 
[   13.530536][  T194] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 13.531060][ T189] Code: d6 89 44 24 0c 0f 45 cf 8b 7d 0c 89 74 24 10 89 4c 24 04 c7 04 24 a4 55 8a d6 89 7c 24 20 8b 7d 08 89 7c 24 1c e8 20 3c df ff <0f> 0b b8 80 91 d7 d6 e8 a8 de 68 00 ba 3d 17 86 d6 89 55 f0 89 d6
All code
========
   0:	d6                   	(bad)
   1:	89 44 24 0c          	mov    %eax,0xc(%rsp)
   5:	0f 45 cf             	cmovne %edi,%ecx
   8:	8b 7d 0c             	mov    0xc(%rbp),%edi
   b:	89 74 24 10          	mov    %esi,0x10(%rsp)
   f:	89 4c 24 04          	mov    %ecx,0x4(%rsp)
  13:	c7 04 24 a4 55 8a d6 	movl   $0xd68a55a4,(%rsp)
  1a:	89 7c 24 20          	mov    %edi,0x20(%rsp)
  1e:	8b 7d 08             	mov    0x8(%rbp),%edi
  21:	89 7c 24 1c          	mov    %edi,0x1c(%rsp)
  25:	e8 20 3c df ff       	call   0xffffffffffdf3c4a
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	b8 80 91 d7 d6       	mov    $0xd6d79180,%eax
  31:	e8 a8 de 68 00       	call   0x68dede
  36:	ba 3d 17 86 d6       	mov    $0xd686173d,%edx
  3b:	89 55 f0             	mov    %edx,-0x10(%rbp)
  3e:	89 d6                	mov    %edx,%esi

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	b8 80 91 d7 d6       	mov    $0xd6d79180,%eax
   7:	e8 a8 de 68 00       	call   0x68deb4
   c:	ba 3d 17 86 d6       	mov    $0xd686173d,%edx
  11:	89 55 f0             	mov    %edx,-0x10(%rbp)
  14:	89 d6                	mov    %edx,%esi


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240616/202406161539.b5ff7b20-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-16  8:06 ` kernel test robot
@ 2024-06-16  9:24   ` Sagi Grimberg
  2024-06-16 21:51     ` David Laight
  2024-06-16 21:56     ` Matthew Wilcox
  0 siblings, 2 replies; 17+ messages in thread
From: Sagi Grimberg @ 2024-06-16  9:24 UTC (permalink / raw)
  To: kernel test robot, Matthew Wilcox
  Cc: oe-lkp, lkp, netdev, Eric Dumazet, Jakub Kicinski



On 16/06/2024 11:06, kernel test robot wrote:
>
> Hello,
>
> kernel test robot noticed "kernel_BUG_at_mm/usercopy.c" on:
>
> commit: 18f0423b9eccb781310af6709ceebf654175af14 ("[PATCH] net: micro-optimize skb_datagram_iter")
> url: https://github.com/intel-lab-lkp/linux/commits/Sagi-Grimberg/net-micro-optimize-skb_datagram_iter/20240613-193620
> base: https://git.kernel.org/cgit/linux/kernel/git/davem/net.git b60b1bdc1888f51da7a2a22c48c5f1eb2bd12e97
> patch link: https://lore.kernel.org/all/20240613113504.1079860-1-sagi@grimberg.me/
> patch subject: [PATCH] net: micro-optimize skb_datagram_iter
>
> in testcase: boot
>
> compiler: gcc-8
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
> +------------------------------------------+------------+------------+
> |                                          | b60b1bdc18 | 18f0423b9e |
> +------------------------------------------+------------+------------+
> | boot_successes                           | 6          | 0          |
> | boot_failures                            | 0          | 6          |
> | kernel_BUG_at_mm/usercopy.c              | 0          | 6          |
> | Oops:invalid_opcode:#[##]PREEMPT_SMP     | 0          | 6          |
> | EIP:usercopy_abort                       | 0          | 6          |
> | Kernel_panic-not_syncing:Fatal_exception | 0          | 6          |
> +------------------------------------------+------------+------------+
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202406161539.b5ff7b20-oliver.sang@intel.com
>
>
> [   13.495377][  T189] ------------[ cut here ]------------
> [   13.495862][  T189] kernel BUG at mm/usercopy.c:102!
> [   13.496372][  T189] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
> [   13.496927][  T189] CPU: 0 PID: 189 Comm: systemctl Not tainted 6.10.0-rc2-00258-g18f0423b9ecc #1
> [   13.497741][  T189] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> [   13.499424][  T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size 8192)!

Hmm, not sure I understand exactly why changing kmap() to 
kmap_local_page() expose this,
but it looks like mm/usercopy does not like size=8192 when copying for 
the skb frag.

quick git browse directs to:
--
commit 4e140f59d285c1ca1e5c81b4c13e27366865bd09
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Mon Jan 10 23:15:27 2022 +0000

     mm/usercopy: Check kmap addresses properly

     If you are copying to an address in the kmap region, you may not copy
     across a page boundary, no matter what the size of the underlying
     allocation.  You can't kmap() a slab page because slab pages always
     come from low memory.

     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
     Acked-by: Kees Cook <keescook@chromium.org>
     Signed-off-by: Kees Cook <keescook@chromium.org>
     Link: 
https://lore.kernel.org/r/20220110231530.665970-2-willy@infradead.org
--

CCing willy.

The documentation suggest that under single-context usage kmap() can be 
freely converted
to kmap_local_page()? But seems that when using kmap() the size is not 
an issue, still trying to
understand why.



> [ 13.499647][ T189] Code: d6 89 44 24 0c 0f 45 cf 8b 7d 0c 89 74 24 10 89 4c 24 04 c7 04 24 a4 55 8a d6 89 7c 24 20 8b 7d 08 89 7c 24 1c e8 20 3c df ff <0f> 0b b8 80 91 d7 d6 e8 a8 de 68 00 ba 3d 17 86 d6 89 55 f0 89 d6
> All code
> ========
>     0:	d6                   	(bad)
>     1:	89 44 24 0c          	mov    %eax,0xc(%rsp)
>     5:	0f 45 cf             	cmovne %edi,%ecx
>     8:	8b 7d 0c             	mov    0xc(%rbp),%edi
>     b:	89 74 24 10          	mov    %esi,0x10(%rsp)
>     f:	89 4c 24 04          	mov    %ecx,0x4(%rsp)
>    13:	c7 04 24 a4 55 8a d6 	movl   $0xd68a55a4,(%rsp)
>    1a:	89 7c 24 20          	mov    %edi,0x20(%rsp)
>    1e:	8b 7d 08             	mov    0x8(%rbp),%edi
>    21:	89 7c 24 1c          	mov    %edi,0x1c(%rsp)
>    25:	e8 20 3c df ff       	call   0xffffffffffdf3c4a
>    2a:*	0f 0b                	ud2		<-- trapping instruction
>    2c:	b8 80 91 d7 d6       	mov    $0xd6d79180,%eax
>    31:	e8 a8 de 68 00       	call   0x68dede
>    36:	ba 3d 17 86 d6       	mov    $0xd686173d,%edx
>    3b:	89 55 f0             	mov    %edx,-0x10(%rbp)
>    3e:	89 d6                	mov    %edx,%esi
>
> Code starting with the faulting instruction
> ===========================================
>     0:	0f 0b                	ud2
>     2:	b8 80 91 d7 d6       	mov    $0xd6d79180,%eax
>     7:	e8 a8 de 68 00       	call   0x68deb4
>     c:	ba 3d 17 86 d6       	mov    $0xd686173d,%edx
>    11:	89 55 f0             	mov    %edx,-0x10(%rbp)
>    14:	89 d6                	mov    %edx,%esi
> [   13.500502][  T194] ------------[ cut here ]------------
> [   13.502187][  T189] EAX: 00000052 EBX: d680da68 ECX: e0435480 EDX: 01000232
> [   13.502666][  T194] kernel BUG at mm/usercopy.c:102!
> [   13.503236][  T189] ESI: d686173d EDI: 00000000 EBP: ece37c44 ESP: ece37c10
> [   13.504266][  T189] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010286
> [   13.504856][  T189] CR0: 80050033 CR2: 0135eb6c CR3: 2beff000 CR4: 000406d0
> [   13.505464][  T189] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [   13.506083][  T189] DR6: fffe0ff0 DR7: 00000400
> [   13.506495][  T189] Call Trace:
> [ 13.506795][ T189] ? show_regs (arch/x86/kernel/dumpstack.c:479)
> [ 13.507187][ T189] ? __die_body (arch/x86/kernel/dumpstack.c:421)
> [ 13.507576][ T189] ? die (arch/x86/kernel/dumpstack.c:449)
> [ 13.507894][ T189] ? do_trap (arch/x86/kernel/traps.c:114 arch/x86/kernel/traps.c:155)
> [ 13.508270][ T189] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359 kernel/locking/lockdep.c:4311)
> [ 13.508783][ T189] ? do_error_trap (arch/x86/include/asm/traps.h:58 arch/x86/kernel/traps.c:176)
> [ 13.509182][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> [ 13.509588][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252)
> [ 13.509983][ T189] ? exc_invalid_op (arch/x86/kernel/traps.c:267)
> [ 13.510396][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> [ 13.510797][ T189] ? handle_exception (arch/x86/entry/entry_32.S:1054)
> [ 13.511242][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252)
> [ 13.511646][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> [ 13.512070][ T189] ? exc_overflow (arch/x86/kernel/traps.c:252)
> [ 13.512434][ T189] ? usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> [ 13.512832][ T189] __check_object_size (mm/usercopy.c:180 mm/usercopy.c:251 mm/usercopy.c:213)
> [ 13.513275][ T189] simple_copy_to_iter (include/linux/uio.h:196 net/core/datagram.c:513)
> [ 13.513693][ T189] __skb_datagram_iter (net/core/datagram.c:424 (discriminator 1))
> [ 13.514138][ T189] skb_copy_datagram_iter (net/core/datagram.c:529)
> [ 13.514606][ T189] ? skb_free_datagram (net/core/datagram.c:512)
> [ 13.515028][ T189] ? scm_stat_del (net/unix/af_unix.c:2883)
> [ 13.515429][ T189] unix_stream_read_actor (net/unix/af_unix.c:2889)
> [ 13.515884][ T189] unix_stream_read_generic (net/unix/af_unix.c:2805)
> [ 13.516377][ T189] ? cma_for_each_area (mm/page_ext.c:518)
> [ 13.516826][ T189] ? unix_stream_splice_read (net/unix/af_unix.c:2907)
> [ 13.517301][ T189] unix_stream_recvmsg (net/unix/af_unix.c:2923)
> [ 13.517720][ T189] ? scm_stat_del (net/unix/af_unix.c:2883)
> [ 13.518108][ T189] ____sys_recvmsg (net/socket.c:1046 net/socket.c:1068 net/socket.c:2804)
> [ 13.518527][ T189] ? import_iovec (lib/iov_iter.c:1348)
> [ 13.518930][ T189] ? copy_msghdr_from_user (net/socket.c:2525)
> [ 13.519396][ T189] ___sys_recvmsg (net/socket.c:2846)
> [ 13.519811][ T189] ? __fdget (fs/file.c:1160)
> [ 13.520186][ T189] ? sockfd_lookup_light (net/socket.c:558)
> [ 13.520643][ T189] __sys_recvmsg (include/linux/file.h:34 net/socket.c:2878)
> [ 13.521046][ T189] __ia32_sys_socketcall (net/socket.c:3173 net/socket.c:3077 net/socket.c:3077)
> [ 13.521513][ T189] ia32_sys_call (arch/x86/entry/syscall_32.c:42)
> [ 13.521923][ T189] __do_fast_syscall_32 (arch/x86/entry/common.c:165 arch/x86/entry/common.c:386)
> [ 13.522362][ T189] do_fast_syscall_32 (arch/x86/entry/common.c:411)
> [ 13.522787][ T189] do_SYSENTER_32 (arch/x86/entry/common.c:450)
> [ 13.523188][ T189] entry_SYSENTER_32 (arch/x86/entry/entry_32.S:836)
> [   13.523613][  T189] EIP: 0xb7ee6579
> [ 13.523933][ T189] Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
> All code
> ========
>     0:	b8 01 10 06 03       	mov    $0x3061001,%eax
>     5:	74 b4                	je     0xffffffffffffffbb
>     7:	01 10                	add    %edx,(%rax)
>     9:	07                   	(bad)
>     a:	03 74 b0 01          	add    0x1(%rax,%rsi,4),%esi
>     e:	10 08                	adc    %cl,(%rax)
>    10:	03 74 d8 01          	add    0x1(%rax,%rbx,8),%esi
> 	...
>    20:	00 51 52             	add    %dl,0x52(%rcx)
>    23:	55                   	push   %rbp
>    24:*	89 e5                	mov    %esp,%ebp		<-- trapping instruction
>    26:	0f 34                	sysenter
>    28:	cd 80                	int    $0x80
>    2a:	5d                   	pop    %rbp
>    2b:	5a                   	pop    %rdx
>    2c:	59                   	pop    %rcx
>    2d:	c3                   	ret
>    2e:	90                   	nop
>    2f:	90                   	nop
>    30:	90                   	nop
>    31:	90                   	nop
>    32:	8d 76 00             	lea    0x0(%rsi),%esi
>    35:	58                   	pop    %rax
>    36:	b8 77 00 00 00       	mov    $0x77,%eax
>    3b:	cd 80                	int    $0x80
>    3d:	90                   	nop
>    3e:	8d                   	.byte 0x8d
>    3f:	76                   	.byte 0x76
>
> Code starting with the faulting instruction
> ===========================================
>     0:	5d                   	pop    %rbp
>     1:	5a                   	pop    %rdx
>     2:	59                   	pop    %rcx
>     3:	c3                   	ret
>     4:	90                   	nop
>     5:	90                   	nop
>     6:	90                   	nop
>     7:	90                   	nop
>     8:	8d 76 00             	lea    0x0(%rsi),%esi
>     b:	58                   	pop    %rax
>     c:	b8 77 00 00 00       	mov    $0x77,%eax
>    11:	cd 80                	int    $0x80
>    13:	90                   	nop
>    14:	8d                   	.byte 0x8d
>    15:	76                   	.byte 0x76
> [   13.525624][  T189] EAX: ffffffda EBX: 00000011 ECX: bfdf5450 EDX: 00000000
> [   13.526233][  T189] ESI: b7c46000 EDI: bfdf54ac EBP: bfdf54a8 ESP: bfdf5440
> [   13.526853][  T189] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000286
> [   13.527519][  T189] Modules linked in: i2c_piix4(+) agpgart(+) qemu_fw_cfg button fuse drm drm_panel_orientation_quirks ip_tables
> [   13.528566][  T194] Oops: invalid opcode: 0000 [#2] PREEMPT SMP
> [   13.528804][  T189] ---[ end trace 0000000000000000 ]---
> [   13.529217][  T194] CPU: 1 PID: 194 Comm: systemctl Tainted: G      D            6.10.0-rc2-00258-g18f0423b9ecc #1
> [ 13.529725][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> [   13.530536][  T194] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 13.531060][ T189] Code: d6 89 44 24 0c 0f 45 cf 8b 7d 0c 89 74 24 10 89 4c 24 04 c7 04 24 a4 55 8a d6 89 7c 24 20 8b 7d 08 89 7c 24 1c e8 20 3c df ff <0f> 0b b8 80 91 d7 d6 e8 a8 de 68 00 ba 3d 17 86 d6 89 55 f0 89 d6
> All code
> ========
>     0:	d6                   	(bad)
>     1:	89 44 24 0c          	mov    %eax,0xc(%rsp)
>     5:	0f 45 cf             	cmovne %edi,%ecx
>     8:	8b 7d 0c             	mov    0xc(%rbp),%edi
>     b:	89 74 24 10          	mov    %esi,0x10(%rsp)
>     f:	89 4c 24 04          	mov    %ecx,0x4(%rsp)
>    13:	c7 04 24 a4 55 8a d6 	movl   $0xd68a55a4,(%rsp)
>    1a:	89 7c 24 20          	mov    %edi,0x20(%rsp)
>    1e:	8b 7d 08             	mov    0x8(%rbp),%edi
>    21:	89 7c 24 1c          	mov    %edi,0x1c(%rsp)
>    25:	e8 20 3c df ff       	call   0xffffffffffdf3c4a
>    2a:*	0f 0b                	ud2		<-- trapping instruction
>    2c:	b8 80 91 d7 d6       	mov    $0xd6d79180,%eax
>    31:	e8 a8 de 68 00       	call   0x68dede
>    36:	ba 3d 17 86 d6       	mov    $0xd686173d,%edx
>    3b:	89 55 f0             	mov    %edx,-0x10(%rbp)
>    3e:	89 d6                	mov    %edx,%esi
>
> Code starting with the faulting instruction
> ===========================================
>     0:	0f 0b                	ud2
>     2:	b8 80 91 d7 d6       	mov    $0xd6d79180,%eax
>     7:	e8 a8 de 68 00       	call   0x68deb4
>     c:	ba 3d 17 86 d6       	mov    $0xd686173d,%edx
>    11:	89 55 f0             	mov    %edx,-0x10(%rbp)
>    14:	89 d6                	mov    %edx,%esi
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240616/202406161539.b5ff7b20-oliver.sang@intel.com
>
>
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-16  9:24   ` Sagi Grimberg
@ 2024-06-16 21:51     ` David Laight
  2024-06-16 21:53       ` Matthew Wilcox
  2024-06-16 21:56     ` Matthew Wilcox
  1 sibling, 1 reply; 17+ messages in thread
From: David Laight @ 2024-06-16 21:51 UTC (permalink / raw)
  To: 'Sagi Grimberg', kernel test robot, Matthew Wilcox
  Cc: oe-lkp@lists.linux.dev, lkp@intel.com, netdev@vger.kernel.org,
	Eric Dumazet, Jakub Kicinski

From: Sagi Grimberg
> Sent: 16 June 2024 10:24
...
> > [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> > [   13.499424][  T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size
> 8192)!
> 
> Hmm, not sure I understand exactly why changing kmap() to
> kmap_local_page() expose this,
> but it looks like mm/usercopy does not like size=8192 when copying for
> the skb frag.

Can't a usercopy fault and have to read the page from swap?
So the process can sleep and then be rescheduled on a different cpu?
So you can't use kmap_local_page() here at all.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-16 21:51     ` David Laight
@ 2024-06-16 21:53       ` Matthew Wilcox
  2024-06-17 10:18         ` David Laight
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2024-06-16 21:53 UTC (permalink / raw)
  To: David Laight
  Cc: 'Sagi Grimberg', kernel test robot,
	oe-lkp@lists.linux.dev, lkp@intel.com, netdev@vger.kernel.org,
	Eric Dumazet, Jakub Kicinski

On Sun, Jun 16, 2024 at 09:51:05PM +0000, David Laight wrote:
> From: Sagi Grimberg
> > Sent: 16 June 2024 10:24
> ...
> > > [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> > > [   13.499424][  T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size
> > 8192)!
> > 
> > Hmm, not sure I understand exactly why changing kmap() to
> > kmap_local_page() expose this,
> > but it looks like mm/usercopy does not like size=8192 when copying for
> > the skb frag.
> 
> Can't a usercopy fault and have to read the page from swap?
> So the process can sleep and then be rescheduled on a different cpu?
> So you can't use kmap_local_page() here at all.

I don't think you understand how kmap_local_page() works.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-16  9:24   ` Sagi Grimberg
  2024-06-16 21:51     ` David Laight
@ 2024-06-16 21:56     ` Matthew Wilcox
  2024-06-17  6:29       ` Sagi Grimberg
  1 sibling, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2024-06-16 21:56 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: kernel test robot, oe-lkp, lkp, netdev, Eric Dumazet,
	Jakub Kicinski

On Sun, Jun 16, 2024 at 12:24:28PM +0300, Sagi Grimberg wrote:
> > [   13.495377][  T189] ------------[ cut here ]------------
> > [   13.495862][  T189] kernel BUG at mm/usercopy.c:102!
> > [   13.496372][  T189] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
> > [   13.496927][  T189] CPU: 0 PID: 189 Comm: systemctl Not tainted 6.10.0-rc2-00258-g18f0423b9ecc #1
> > [   13.497741][  T189] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> > [   13.499424][  T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size 8192)!
> 
> Hmm, not sure I understand exactly why changing kmap() to kmap_local_page()
> expose this,
> but it looks like mm/usercopy does not like size=8192 when copying for the
> skb frag.
> 
> quick git browse directs to:
> --
> commit 4e140f59d285c1ca1e5c81b4c13e27366865bd09
> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
> Date:   Mon Jan 10 23:15:27 2022 +0000
> 
>     mm/usercopy: Check kmap addresses properly
> 
>     If you are copying to an address in the kmap region, you may not copy
>     across a page boundary, no matter what the size of the underlying
>     allocation.  You can't kmap() a slab page because slab pages always
>     come from low memory.
> 
>     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>     Acked-by: Kees Cook <keescook@chromium.org>
>     Signed-off-by: Kees Cook <keescook@chromium.org>
>     Link:
> https://lore.kernel.org/r/20220110231530.665970-2-willy@infradead.org
> --
> 
> CCing willy.
> 
> The documentation suggest that under single-context usage kmap() can be
> freely converted
> to kmap_local_page()? But seems that when using kmap() the size is not an
> issue, still trying to
> understand why.

Probably because kmap() returns page_address() for non-highmem pages
while kmap_local_page() actually returns a kmap address:

        if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
                return page_address(page);
        return __kmap_local_pfn_prot(page_to_pfn(page), prot);

so if skb frags are always lowmem (are they?) this is a false positive.
if they can be highmem, then you've uncovered a bug that nobody's
noticed because nobody's testing on 32-bit any more.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-16 21:56     ` Matthew Wilcox
@ 2024-06-17  6:29       ` Sagi Grimberg
  2024-06-17 16:58         ` Jakub Kicinski
  2024-06-19 12:46         ` David Howells
  0 siblings, 2 replies; 17+ messages in thread
From: Sagi Grimberg @ 2024-06-17  6:29 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: kernel test robot, oe-lkp, lkp, netdev, Eric Dumazet,
	Jakub Kicinski



On 17/06/2024 0:56, Matthew Wilcox wrote:
> On Sun, Jun 16, 2024 at 12:24:28PM +0300, Sagi Grimberg wrote:
>>> [   13.495377][  T189] ------------[ cut here ]------------
>>> [   13.495862][  T189] kernel BUG at mm/usercopy.c:102!
>>> [   13.496372][  T189] Oops: invalid opcode: 0000 [#1] PREEMPT SMP
>>> [   13.496927][  T189] CPU: 0 PID: 189 Comm: systemctl Not tainted 6.10.0-rc2-00258-g18f0423b9ecc #1
>>> [   13.497741][  T189] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>>> [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
>>> [   13.499424][  T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0, size 8192)!
>> Hmm, not sure I understand exactly why changing kmap() to kmap_local_page()
>> expose this,
>> but it looks like mm/usercopy does not like size=8192 when copying for the
>> skb frag.
>>
>> quick git browse directs to:
>> --
>> commit 4e140f59d285c1ca1e5c81b4c13e27366865bd09
>> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
>> Date:   Mon Jan 10 23:15:27 2022 +0000
>>
>>      mm/usercopy: Check kmap addresses properly
>>
>>      If you are copying to an address in the kmap region, you may not copy
>>      across a page boundary, no matter what the size of the underlying
>>      allocation.  You can't kmap() a slab page because slab pages always
>>      come from low memory.
>>
>>      Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>      Acked-by: Kees Cook <keescook@chromium.org>
>>      Signed-off-by: Kees Cook <keescook@chromium.org>
>>      Link:
>> https://lore.kernel.org/r/20220110231530.665970-2-willy@infradead.org
>> --
>>
>> CCing willy.
>>
>> The documentation suggest that under single-context usage kmap() can be
>> freely converted
>> to kmap_local_page()? But seems that when using kmap() the size is not an
>> issue, still trying to
>> understand why.
> Probably because kmap() returns page_address() for non-highmem pages
> while kmap_local_page() actually returns a kmap address:
>
>          if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
>                  return page_address(page);
>          return __kmap_local_pfn_prot(page_to_pfn(page), prot);
>
> so if skb frags are always lowmem (are they?) this is a false positive.

AFAIR these buffers are coming from the RX ring, so they should be 
coming from a page_frag_cache,
so I want to say always low memory?

> if they can be highmem, then you've uncovered a bug that nobody's
> noticed because nobody's testing on 32-bit any more.

Not sure, Jakub? Eric?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-16 21:53       ` Matthew Wilcox
@ 2024-06-17 10:18         ` David Laight
  0 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2024-06-17 10:18 UTC (permalink / raw)
  To: 'Matthew Wilcox'
  Cc: 'Sagi Grimberg', kernel test robot,
	oe-lkp@lists.linux.dev, lkp@intel.com, netdev@vger.kernel.org,
	Eric Dumazet, Jakub Kicinski

From: Matthew Wilcox
> Sent: 16 June 2024 22:53
> 
> On Sun, Jun 16, 2024 at 09:51:05PM +0000, David Laight wrote:
> > From: Sagi Grimberg
> > > Sent: 16 June 2024 10:24
> > ...
> > > > [ 13.498663][ T189] EIP: usercopy_abort (mm/usercopy.c:102 (discriminator 12))
> > > > [   13.499424][  T194] usercopy: Kernel memory exposure attempt detected from kmap (offset 0,
> size
> > > 8192)!
> > >
> > > Hmm, not sure I understand exactly why changing kmap() to
> > > kmap_local_page() expose this,
> > > but it looks like mm/usercopy does not like size=8192 when copying for
> > > the skb frag.
> >
> > Can't a usercopy fault and have to read the page from swap?
> > So the process can sleep and then be rescheduled on a different cpu?
> > So you can't use kmap_local_page() here at all.
> 
> I don't think you understand how kmap_local_page() works.

Quite likely :-)

But I thought it was a cheap way of temporarily mapping a physical memory
page into the current cpu's page tables without having to do any IPI to
tell other cpu about the insert or removal?
Which would require that the process not be migrated, which pretty much
implies that pre-emption be disabled.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-17  6:29       ` Sagi Grimberg
@ 2024-06-17 16:58         ` Jakub Kicinski
  2024-06-18  6:37           ` Sagi Grimberg
  2024-06-19 12:46         ` David Howells
  1 sibling, 1 reply; 17+ messages in thread
From: Jakub Kicinski @ 2024-06-17 16:58 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Matthew Wilcox, kernel test robot, oe-lkp, lkp, netdev,
	Eric Dumazet, David Howells

On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote:
> > Probably because kmap() returns page_address() for non-highmem pages
> > while kmap_local_page() actually returns a kmap address:
> >
> >          if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
> >                  return page_address(page);
> >          return __kmap_local_pfn_prot(page_to_pfn(page), prot);
> >
> > so if skb frags are always lowmem (are they?) this is a false positive.  
> 
> AFAIR these buffers are coming from the RX ring, so they should be 
> coming from a page_frag_cache,
> so I want to say always low memory?
> 
> > if they can be highmem, then you've uncovered a bug that nobody's
> > noticed because nobody's testing on 32-bit any more.  
> 
> Not sure, Jakub? Eric?

My uneducated guess would be that until recent(ish) sendpage rework
from David Howells all high mem pages would have been single pages.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-17 16:58         ` Jakub Kicinski
@ 2024-06-18  6:37           ` Sagi Grimberg
  0 siblings, 0 replies; 17+ messages in thread
From: Sagi Grimberg @ 2024-06-18  6:37 UTC (permalink / raw)
  To: Jakub Kicinski, David Howells
  Cc: Matthew Wilcox, kernel test robot, oe-lkp, lkp, netdev,
	Eric Dumazet



On 17/06/2024 19:58, Jakub Kicinski wrote:
> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote:
>>> Probably because kmap() returns page_address() for non-highmem pages
>>> while kmap_local_page() actually returns a kmap address:
>>>
>>>           if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
>>>                   return page_address(page);
>>>           return __kmap_local_pfn_prot(page_to_pfn(page), prot);
>>>
>>> so if skb frags are always lowmem (are they?) this is a false positive.
>> AFAIR these buffers are coming from the RX ring, so they should be
>> coming from a page_frag_cache,
>> so I want to say always low memory?
>>
>>> if they can be highmem, then you've uncovered a bug that nobody's
>>> noticed because nobody's testing on 32-bit any more.
>> Not sure, Jakub? Eric?
> My uneducated guess would be that until recent(ish) sendpage rework
> from David Howells all high mem pages would have been single pages.

dHowells?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-17  6:29       ` Sagi Grimberg
  2024-06-17 16:58         ` Jakub Kicinski
@ 2024-06-19 12:46         ` David Howells
  2024-06-19 13:54           ` Sagi Grimberg
  1 sibling, 1 reply; 17+ messages in thread
From: David Howells @ 2024-06-19 12:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: dhowells, Sagi Grimberg, Matthew Wilcox, kernel test robot,
	oe-lkp, lkp, netdev, Eric Dumazet

Jakub Kicinski <kuba@kernel.org> wrote:

> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote:
> > > Probably because kmap() returns page_address() for non-highmem pages
> > > while kmap_local_page() actually returns a kmap address:
> > >
> > >          if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
> > >                  return page_address(page);
> > >          return __kmap_local_pfn_prot(page_to_pfn(page), prot);
> > >
> > > so if skb frags are always lowmem (are they?) this is a false positive.  
> > 
> > AFAIR these buffers are coming from the RX ring, so they should be 
> > coming from a page_frag_cache,
> > so I want to say always low memory?
> > 
> > > if they can be highmem, then you've uncovered a bug that nobody's
> > > noticed because nobody's testing on 32-bit any more.  
> > 
> > Not sure, Jakub? Eric?
> 
> My uneducated guess would be that until recent(ish) sendpage rework
> from David Howells all high mem pages would have been single pages.

Um.  I touched the Tx side, not the Rx side.

I also don't know whether all high mem pages would be single pages.  I'll have
to defer that one to the MM folks.

David


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-19 12:46         ` David Howells
@ 2024-06-19 13:54           ` Sagi Grimberg
  2024-06-19 14:51             ` Eric Dumazet
  2024-06-21  7:54             ` David Howells
  0 siblings, 2 replies; 17+ messages in thread
From: Sagi Grimberg @ 2024-06-19 13:54 UTC (permalink / raw)
  To: David Howells, Jakub Kicinski
  Cc: Matthew Wilcox, kernel test robot, oe-lkp, lkp, netdev,
	Eric Dumazet



On 19/06/2024 15:46, David Howells wrote:
> Jakub Kicinski <kuba@kernel.org> wrote:
>
>> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote:
>>>> Probably because kmap() returns page_address() for non-highmem pages
>>>> while kmap_local_page() actually returns a kmap address:
>>>>
>>>>           if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
>>>>                   return page_address(page);
>>>>           return __kmap_local_pfn_prot(page_to_pfn(page), prot);
>>>>
>>>> so if skb frags are always lowmem (are they?) this is a false positive.
>>> AFAIR these buffers are coming from the RX ring, so they should be
>>> coming from a page_frag_cache,
>>> so I want to say always low memory?
>>>
>>>> if they can be highmem, then you've uncovered a bug that nobody's
>>>> noticed because nobody's testing on 32-bit any more.
>>> Not sure, Jakub? Eric?
>> My uneducated guess would be that until recent(ish) sendpage rework
>> from David Howells all high mem pages would have been single pages.
> Um.  I touched the Tx side, not the Rx side.
>
> I also don't know whether all high mem pages would be single pages.  I'll have
> to defer that one to the MM folks.

What prevents from gro to expand frags from crossing PAGE_SIZE?

btw, at least from the code in skb_gro_receive() it appears that 
page_address() is called directly,
which suggest that these netmem pages are lowmem?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-19 13:54           ` Sagi Grimberg
@ 2024-06-19 14:51             ` Eric Dumazet
  2024-06-19 14:56               ` Eric Dumazet
  2024-06-19 15:25               ` Sagi Grimberg
  2024-06-21  7:54             ` David Howells
  1 sibling, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2024-06-19 14:51 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: David Howells, Jakub Kicinski, Matthew Wilcox, kernel test robot,
	oe-lkp, lkp, netdev

On Wed, Jun 19, 2024 at 3:54 PM Sagi Grimberg <sagi@grimberg.me> wrote:
>
>
>
> On 19/06/2024 15:46, David Howells wrote:
> > Jakub Kicinski <kuba@kernel.org> wrote:
> >
> >> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote:
> >>>> Probably because kmap() returns page_address() for non-highmem pages
> >>>> while kmap_local_page() actually returns a kmap address:
> >>>>
> >>>>           if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
> >>>>                   return page_address(page);
> >>>>           return __kmap_local_pfn_prot(page_to_pfn(page), prot);
> >>>>
> >>>> so if skb frags are always lowmem (are they?) this is a false positive.
> >>> AFAIR these buffers are coming from the RX ring, so they should be
> >>> coming from a page_frag_cache,
> >>> so I want to say always low memory?
> >>>
> >>>> if they can be highmem, then you've uncovered a bug that nobody's
> >>>> noticed because nobody's testing on 32-bit any more.
> >>> Not sure, Jakub? Eric?
> >> My uneducated guess would be that until recent(ish) sendpage rework
> >> from David Howells all high mem pages would have been single pages.
> > Um.  I touched the Tx side, not the Rx side.
> >
> > I also don't know whether all high mem pages would be single pages.  I'll have
> > to defer that one to the MM folks.
>
> What prevents from gro to expand frags from crossing PAGE_SIZE?
>
> btw, at least from the code in skb_gro_receive() it appears that
> page_address() is called directly,
> which suggest that these netmem pages are lowmem?

GRO should only be fed with lowmem pages.

But the trace involves af_unix, not GRO ?

I guess that with splice games, it is possible to add high order pages to skbs.

I think skb_frag_foreach_page() could be used to fix this issue.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-19 14:51             ` Eric Dumazet
@ 2024-06-19 14:56               ` Eric Dumazet
  2024-06-19 15:25               ` Sagi Grimberg
  1 sibling, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2024-06-19 14:56 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: David Howells, Jakub Kicinski, Matthew Wilcox, kernel test robot,
	oe-lkp, lkp, netdev, Willem de Bruijn

On Wed, Jun 19, 2024 at 4:51 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Jun 19, 2024 at 3:54 PM Sagi Grimberg <sagi@grimberg.me> wrote:
> >
> >
> >
> > On 19/06/2024 15:46, David Howells wrote:
> > > Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > >> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote:
> > >>>> Probably because kmap() returns page_address() for non-highmem pages
> > >>>> while kmap_local_page() actually returns a kmap address:
> > >>>>
> > >>>>           if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
> > >>>>                   return page_address(page);
> > >>>>           return __kmap_local_pfn_prot(page_to_pfn(page), prot);
> > >>>>
> > >>>> so if skb frags are always lowmem (are they?) this is a false positive.
> > >>> AFAIR these buffers are coming from the RX ring, so they should be
> > >>> coming from a page_frag_cache,
> > >>> so I want to say always low memory?
> > >>>
> > >>>> if they can be highmem, then you've uncovered a bug that nobody's
> > >>>> noticed because nobody's testing on 32-bit any more.
> > >>> Not sure, Jakub? Eric?
> > >> My uneducated guess would be that until recent(ish) sendpage rework
> > >> from David Howells all high mem pages would have been single pages.
> > > Um.  I touched the Tx side, not the Rx side.
> > >
> > > I also don't know whether all high mem pages would be single pages.  I'll have
> > > to defer that one to the MM folks.
> >
> > What prevents from gro to expand frags from crossing PAGE_SIZE?
> >
> > btw, at least from the code in skb_gro_receive() it appears that
> > page_address() is called directly,
> > which suggest that these netmem pages are lowmem?
>
> GRO should only be fed with lowmem pages.
>
> But the trace involves af_unix, not GRO ?
>
> I guess that with splice games, it is possible to add high order pages to skbs.
>
> I think skb_frag_foreach_page() could be used to fix this issue.

For reference, please look at

commit c613c209c3f351d47158f728271d0c73b6dd24c6
Author: Willem de Bruijn <willemb@google.com>
Date:   Mon Jul 31 08:15:47 2017 -0400

    net: add skb_frag_foreach_page and use with kmap_atomic

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-19 14:51             ` Eric Dumazet
  2024-06-19 14:56               ` Eric Dumazet
@ 2024-06-19 15:25               ` Sagi Grimberg
  1 sibling, 0 replies; 17+ messages in thread
From: Sagi Grimberg @ 2024-06-19 15:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Howells, Jakub Kicinski, Matthew Wilcox, kernel test robot,
	oe-lkp, lkp, netdev



On 19/06/2024 17:51, Eric Dumazet wrote:
> On Wed, Jun 19, 2024 at 3:54 PM Sagi Grimberg <sagi@grimberg.me> wrote:
>>
>>
>> On 19/06/2024 15:46, David Howells wrote:
>>> Jakub Kicinski <kuba@kernel.org> wrote:
>>>
>>>> On Mon, 17 Jun 2024 09:29:53 +0300 Sagi Grimberg wrote:
>>>>>> Probably because kmap() returns page_address() for non-highmem pages
>>>>>> while kmap_local_page() actually returns a kmap address:
>>>>>>
>>>>>>            if (!IS_ENABLED(CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP) && !PageHighMem(page))
>>>>>>                    return page_address(page);
>>>>>>            return __kmap_local_pfn_prot(page_to_pfn(page), prot);
>>>>>>
>>>>>> so if skb frags are always lowmem (are they?) this is a false positive.
>>>>> AFAIR these buffers are coming from the RX ring, so they should be
>>>>> coming from a page_frag_cache,
>>>>> so I want to say always low memory?
>>>>>
>>>>>> if they can be highmem, then you've uncovered a bug that nobody's
>>>>>> noticed because nobody's testing on 32-bit any more.
>>>>> Not sure, Jakub? Eric?
>>>> My uneducated guess would be that until recent(ish) sendpage rework
>>>> from David Howells all high mem pages would have been single pages.
>>> Um.  I touched the Tx side, not the Rx side.
>>>
>>> I also don't know whether all high mem pages would be single pages.  I'll have
>>> to defer that one to the MM folks.
>> What prevents from gro to expand frags from crossing PAGE_SIZE?
>>
>> btw, at least from the code in skb_gro_receive() it appears that
>> page_address() is called directly,
>> which suggest that these netmem pages are lowmem?
> GRO should only be fed with lowmem pages.
>
> But the trace involves af_unix, not GRO ?
>
> I guess that with splice games, it is possible to add high order pages to skbs.
>
> I think skb_frag_foreach_page() could be used to fix this issue.

Thanks Eric, that is indeed the missing piece.

I'll give it go.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] net: micro-optimize skb_datagram_iter
  2024-06-19 13:54           ` Sagi Grimberg
  2024-06-19 14:51             ` Eric Dumazet
@ 2024-06-21  7:54             ` David Howells
  1 sibling, 0 replies; 17+ messages in thread
From: David Howells @ 2024-06-21  7:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: dhowells, Sagi Grimberg, Jakub Kicinski, Matthew Wilcox,
	kernel test robot, oe-lkp, lkp, netdev

Eric Dumazet <edumazet@google.com> wrote:

> But the trace involves af_unix, not GRO ?

Ah.  splice()/vmsplice() can feed multipage folios into the socket.  Possibly
those can be in highmem.

David


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-06-21  7:54 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-13 11:35 [PATCH] net: micro-optimize skb_datagram_iter Sagi Grimberg
2024-06-15  2:40 ` patchwork-bot+netdevbpf
2024-06-16  8:06 ` kernel test robot
2024-06-16  9:24   ` Sagi Grimberg
2024-06-16 21:51     ` David Laight
2024-06-16 21:53       ` Matthew Wilcox
2024-06-17 10:18         ` David Laight
2024-06-16 21:56     ` Matthew Wilcox
2024-06-17  6:29       ` Sagi Grimberg
2024-06-17 16:58         ` Jakub Kicinski
2024-06-18  6:37           ` Sagi Grimberg
2024-06-19 12:46         ` David Howells
2024-06-19 13:54           ` Sagi Grimberg
2024-06-19 14:51             ` Eric Dumazet
2024-06-19 14:56               ` Eric Dumazet
2024-06-19 15:25               ` Sagi Grimberg
2024-06-21  7:54             ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).