[PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default
@ 2024-05-11  3:14 Xuan Zhuo
  2024-05-11  3:14 ` [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Xuan Zhuo @ 2024-05-11  3:14 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization

Actually, for the virtio drivers, we can enable premapped mode whatever
the value of use_dma_api. Because we provide the virtio dma apis.
So the driver can enable premapped mode unconditionally.

This patch set makes the big mode of virtio-net to support premapped mode.
And enable premapped mode for rx by default.

Based on the following points, we do not use page pool to manage these
    pages:

    1. virtio-net uses the DMA APIs wrapped by virtio core. Therefore,
       we can only prevent the page pool from performing DMA operations, and
       let the driver perform DMA operations on the allocated pages.
    2. But when the page pool releases the page, we have no chance to
       execute dma unmap.
    3. A solution to #2 is to execute dma unmap every time before putting
       the page back to the page pool. (This is actually a waste, we don't
       execute unmap so frequently.)
    4. But there is another problem, we still need to use page.dma_addr to
       save the dma address. Using page.dma_addr while using page pool is
       unsafe behavior.
    5. And we need space the chain the pages submitted once to virtio core.

    More:
        https://lore.kernel.org/all/CACGkMEu=Aok9z2imB_c5qVuujSh=vjj1kx12fy9N7hqyi+M5Ow@mail.gmail.com/

Why we do not use the page space to store the dma?
    http://lore.kernel.org/all/CACGkMEuyeJ9mMgYnnB42=hw6umNuo=agn7VBqBqYPd7GN=+39Q@mail.gmail.com

Please review.

v5: 1. Fix the comments from @Larysa Zaremba
        http://lore.kernel.org/all/20240508063718.69806-1-xuanzhuo@linux.alibaba.com

v4:
    1. For the conflict, switch to the net-next branch

v3:
    1. big mode still use the mode that virtio core does the dma map/unmap

v2:
    1. make gcc happy in page_chain_get_dma()
        http://lore.kernel.org/all/202404221325.SX5ChRGP-lkp@intel.com

v1:
    1. discussed for using page pool
    2. use dma sync to replace the unmap for the first page

Thanks.






Xuan Zhuo (4):
  virtio_ring: enable premapped mode whatever use_dma_api
  virtio_net: big mode skip the unmap check
  virtio_net: rx remove premapped failover code
  virtio_net: remove the misleading comment

 drivers/net/virtio_net.c     | 90 +++++++++++++++---------------------
 drivers/virtio/virtio_ring.c |  7 +--
 2 files changed, 38 insertions(+), 59 deletions(-)

--
2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api
  2024-05-11  3:14 [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default Xuan Zhuo
@ 2024-05-11  3:14 ` Xuan Zhuo
  2024-08-13 19:28   ` Si-Wei Liu
  2024-05-11  3:14 ` [PATCH net-next v5 2/4] virtio_net: big mode skip the unmap check Xuan Zhuo
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 20+ messages in thread
From: Xuan Zhuo @ 2024-05-11  3:14 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization

Now, we have virtio DMA APIs, the driver can be the premapped
mode whatever the virtio core uses dma api or not.

So remove the limit of checking use_dma_api from
virtqueue_set_dma_premapped().

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6f7e5010a673..2a972752ff1b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2782,7 +2782,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
  *
  * Returns zero or a negative error.
  * 0: success.
- * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
+ * -EINVAL: too late to enable premapped mode, the vq already contains buffers.
  */
 int virtqueue_set_dma_premapped(struct virtqueue *_vq)
 {
@@ -2798,11 +2798,6 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
 		return -EINVAL;
 	}
 
-	if (!vq->use_dma_api) {
-		END_USE(vq);
-		return -EINVAL;
-	}
-
 	vq->premapped = true;
 	vq->do_unmap = false;
 
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api
  2024-05-11  3:14 ` [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
@ 2024-08-13 19:28   ` Si-Wei Liu
  2024-08-13 19:46     ` Michael S. Tsirkin
  2024-08-17 13:20     ` Xuan Zhuo
  0 siblings, 2 replies; 20+ messages in thread
From: Si-Wei Liu @ 2024-08-13 19:28 UTC (permalink / raw)
  To: Xuan Zhuo, netdev, Michael S. Tsirkin, Jason Wang, Jakub Kicinski
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, virtualization,
	Darren Kenny, Boris Ostrovsky


Turning out this below commit to unconditionally enable premapped 
virtio-net:

commit f9dac92ba9081062a6477ee015bd3b8c5914efc4
Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date:   Sat May 11 11:14:01 2024 +0800

leads to regression on VM with no ACCESS_PLATFORM, and with the sysctl 
value of:

- net.core.high_order_alloc_disable=1

which could see reliable crashes or scp failure (scp a file 100M in size 
to VM):

[  332.079333] __vm_enough_memory: pid: 18440, comm: sshd, bytes: 
5285790347661783040 not enough memory for the allocation
[  332.079651] ------------[ cut here ]------------
[  332.079655] kernel BUG at mm/mmap.c:3514!
[  332.080095] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  332.080826] CPU: 18 PID: 18440 Comm: sshd Kdump: loaded Not tainted 
6.10.0-2.x86_64 #2
[  332.081514] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014
[  332.082451] RIP: 0010:exit_mmap+0x3a1/0x3b0
[  332.082871] Code: be 01 00 00 00 48 89 df e8 0c 94 fe ff eb d7 be 01 
00 00 00 48 89 df e8 5d 98 fe ff eb be 31 f6 48 89 df e8 31 99 fe ff eb 
a8 <0f> 0b e8 68 bc ae 00 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90
[  332.084230] RSP: 0018:ffff9988b1c8f948 EFLAGS: 00010293
[  332.084635] RAX: 0000000000000406 RBX: ffff8d47583e7380 RCX: 
0000000000000000
[  332.085171] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
[  332.085699] RBP: 000000000000008f R08: 0000000000000000 R09: 
0000000000000000
[  332.086233] R10: 0000000000000000 R11: 0000000000000000 R12: 
ffff8d47583e7430
[  332.086761] R13: ffff8d47583e73c0 R14: 0000000000000406 R15: 
000495ae650dda58
[  332.087300] FS:  00007ff443899980(0000) GS:ffff8df1c5700000(0000) 
knlGS:0000000000000000
[  332.087888] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  332.088334] CR2: 000055a42d30b730 CR3: 00000102e956a004 CR4: 
0000000000770ef0
[  332.088867] PKRU: 55555554
[  332.089114] Call Trace:
[  332.089349] <TASK>
[  332.089556]  ? die+0x36/0x90
[  332.089818]  ? do_trap+0xed/0x110
[  332.090110]  ? exit_mmap+0x3a1/0x3b0
[  332.090411]  ? do_error_trap+0x6a/0xa0
[  332.090722]  ? exit_mmap+0x3a1/0x3b0
[  332.091029]  ? exc_invalid_op+0x50/0x80
[  332.091348]  ? exit_mmap+0x3a1/0x3b0
[  332.091648]  ? asm_exc_invalid_op+0x1a/0x20
[  332.091998]  ? exit_mmap+0x3a1/0x3b0
[  332.092299]  ? exit_mmap+0x1d6/0x3b0
[  332.092604] __mmput+0x3e/0x130
[  332.092882] dup_mm.constprop.0+0x10c/0x110
[  332.093226] copy_process+0xbd0/0x1570
[  332.093539] kernel_clone+0xbf/0x430
[  332.093838]  ? syscall_exit_work+0x103/0x130
[  332.094197] __do_sys_clone+0x66/0xa0
[  332.094506]  do_syscall_64+0x8c/0x1d0
[  332.094814]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.095198]  ? audit_reset_context+0x232/0x310
[  332.095558]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.095936]  ? syscall_exit_work+0x103/0x130
[  332.096288]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.096668]  ? syscall_exit_to_user_mode+0x7d/0x220
[  332.097059]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.097436]  ? do_syscall_64+0xba/0x1d0
[  332.097752]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.098137]  ? syscall_exit_to_user_mode+0x7d/0x220
[  332.098525]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.098903]  ? do_syscall_64+0xba/0x1d0
[  332.099227]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.099606]  ? __audit_filter_op+0xbe/0x140
[  332.099943]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.100328]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.100706]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.101089]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.101468]  ? wp_page_reuse+0x8e/0xb0
[  332.101779]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.102163]  ? do_wp_page+0xe6/0x470
[  332.102465]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.102843]  ? __handle_mm_fault+0x5ff/0x720
[  332.103197]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.103574]  ? __count_memcg_events+0x4d/0xd0
[  332.103938]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.104323]  ? count_memcg_events.constprop.0+0x26/0x50
[  332.104729]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.105114]  ? handle_mm_fault+0xae/0x320
[  332.105442]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.105820]  ? do_user_addr_fault+0x31f/0x6c0
[  332.106181]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  332.106576] RIP: 0033:0x7ff43f8f9a73
[  332.106876] Code: db 0f 85 28 01 00 00 64 4c 8b 0c 25 10 00 00 00 45 
31 c0 4d 8d 91 d0 02 00 00 31 d2 31 f6 bf 11
00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 b9 00 00 00 41 
89 c5 85 c0 0f 85 c6 00 00
[  332.108163] RSP: 002b:00007ffc690909b0 EFLAGS: 00000246 ORIG_RAX: 
0000000000000038
[  332.108719] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 
00007ff43f8f9a73
[  332.109253] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000001200011
[  332.109782] RBP: 0000000000000000 R08: 0000000000000000 R09: 
00007ff443899980
[  332.110313] R10: 00007ff443899c50 R11: 0000000000000246 R12: 
0000000000000002
[  332.110842] R13: 0000562e56cd4780 R14: 0000000000000006 R15: 
0000562e800346b0
[  332.111381]  </TASK>
[  332.111590] Modules linked in: rdmaip_notify scsi_transport_iscsi 
target_core_mod rfkill mstflint_access cuse rds$
rdma rds rdma_ucm rdma_cm iw_cm dm_multipath ib_umad ib_ipoib ib_cm 
mlx5_ib iTCO_wdt iTCO_vendor_support intel_rapl_$
sr ib_uverbs intel_rapl_common ib_core crc32_pclmul i2c_i801 joydev 
virtio_balloon i2c_smbus lpc_ich binfmt_misc xfs
sd_mod t10_pi crc64_rocksoft sg crct10dif_pclmul mlx5_core virtio_net 
ahci net_failover mlxfw ghash_clmulni_intel vi$
tio_scsi failover libahci sha512_ssse3 tls sha256_ssse3 pci_hyperv_intf 
virtio_pci libata psample sha1_ssse3 virtio_$
ci_legacy_dev serio_raw dimlib virtio_pci_modern_dev qemu_fw_cfg 
dm_mirror dm_region_hash dm_log dm_mod fuse aesni_i$
tel crypto_simd cryptd
[  332.115851] ---[ end trace 0000000000000000 ]---

and another instance splats:

BUG: Bad page map in process PsWatcher.sh  pte:9402e1e2b18c8ae9 
pmd:10fe4f067
[  193.046098] addr:00007ff912a00000 vm_flags:08000070 
anon_vma:0000000000000000 mapping:ffff8ec28047eeb0 index:200
[  193.046863] file:libtinfo.so.6.1 fault:xfs_filemap_fault [xfs] 
mmap:xfs_file_mmap [xfs] read_folio:xfs_vm_read_folio [xfs]
[  193.049564] get_swap_device: Bad swap file entry 3803ad7a32eab547
[  193.050902] BUG: Bad rss-counter state mm:00000000ff28307a 
type:MM_SWAPENTS val:-1
[  193.758147] Kernel panic - not syncing: corrupted stack end detected 
inside scheduler
[  193.759151] CPU: 5 PID: 22932 Comm: LogFlusher Tainted: G 
B              6.10.0-rc2+ #1
[  193.759764] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014
[  193.760435] Call Trace:
[  193.760624]  <TASK>
[  193.760799]  panic+0x31d/0x340
[  193.761033]  __schedule+0xb30/0xb30
[  193.761283]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.761605]  ? enqueue_hrtimer+0x35/0x90
[  193.761883]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.762207]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.762532]  ? hrtimer_start_range_ns+0x121/0x300
[  193.762856]  schedule+0x27/0xb0
[  193.763083]  futex_wait_queue+0x63/0x90
[  193.763354]  __futex_wait+0x13d/0x1b0
[  193.763610]  ? __pfx_futex_wake_mark+0x10/0x10
[  193.763918]  futex_wait+0x69/0xd0
[  193.764153]  ? pick_next_task+0x9fb/0xa30
[  193.764430]  ? __pfx_hrtimer_wakeup+0x10/0x10
[  193.764734]  do_futex+0x11a/0x1d0
[  193.764976]  __x64_sys_futex+0x68/0x1c0
[  193.765243]  do_syscall_64+0x80/0x160
[  193.765504]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.765834]  ? __audit_filter_op+0xaa/0xf0
[  193.766117]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.766437]  ? audit_reset_context.part.16+0x270/0x2d0
[  193.766895]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.767237]  ? syscall_exit_to_user_mode_prepare+0x17b/0x1a0
[  193.767624]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.767972]  ? syscall_exit_to_user_mode+0x80/0x1e0
[  193.768309]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.768628]  ? do_syscall_64+0x8c/0x160
[  193.768901]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.769225]  ? audit_reset_context.part.16+0x270/0x2d0
[  193.769573]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.769901]  ? restore_fpregs_from_fpstate+0x3c/0xa0
[  193.770241]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.770561]  ? switch_fpu_return+0x4f/0xd0
[  193.770848]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.771171]  ? syscall_exit_to_user_mode+0x80/0x1e0
[  193.771505]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.771830]  ? do_syscall_64+0x8c/0x160
[  193.772098]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.772426]  ? syscall_exit_to_user_mode_prepare+0x17b/0x1a0
[  193.772805]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.773124]  ? syscall_exit_to_user_mode+0x80/0x1e0
[  193.773458]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.773781]  ? do_syscall_64+0x8c/0x160
[  193.774047]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.774376]  ? task_mm_cid_work+0x1c1/0x210
[  193.774669]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  193.775010] RIP: 0033:0x7f4da640e898
[  193.775270] Code: 24 58 48 85 c0 0f 88 8f 00 00 00 e8 f2 2e 00 00 89 
ee 4c 8b 54 24 38 31 d2 41 89 c0 40 80 f6 80 4c 89 ef b8 ca 00 00 00 0f 
05 <48> 3d 00 f0 ff ff 0f 87 ff 00 00 00 44 89 c7 e8 24 2f 00 00 48 8b
[  193.776404] RSP: 002b:00007f4d797f2750 EFLAGS: 00000282 ORIG_RAX: 
00000000000000ca
[  193.776893] RAX: ffffffffffffffda RBX: 00007f4d402c1b50 RCX: 
00007f4da640e898
[  193.777355] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 
00007f4d402c1b7c
[  193.777813] RBP: 0000000000000000 R08: 0000000000000000 R09: 
00007f4da6ece000
[  193.778276] R10: 00007f4d797f27a0 R11: 0000000000000282 R12: 
00007f4d402c1b28
[  193.778732] R13: 00007f4d402c1b7c R14: 00007f4d797f2840 R15: 
0000000000000002
[  193.779189]  </TASK>
[  193.780419] Kernel Offset: 0x13c00000 from 0xffffffff81000000 
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  193.781097] Rebooting in 60 seconds..

Even in premapped mode with use_dma_api, in virtnet_rq_alloc(), 
skb_page_frag_refill() could return order-0 page in honor of disabled 
high order page allocation. Though I still see

        alloc_frag->offset += size;

gets accounted irrespective of the actual page size returned (dma->len). 
And virtnet_rq_unmap() seems only cares for high order pages.

Suggest to revert this whole series, or at least the 
virtqueue_set_dma_premapped() should block !use_dma_api user from using 
the virtio DMA APIs.

Regards,
-Siwei


On 5/10/2024 8:14 PM, Xuan Zhuo wrote:
> Now, we have virtio DMA APIs, the driver can be the premapped
> mode whatever the virtio core uses dma api or not.
>
> So remove the limit of checking use_dma_api from
> virtqueue_set_dma_premapped().
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Acked-by: Jason Wang <jasowang@redhat.com>
> ---
>   drivers/virtio/virtio_ring.c | 7 +------
>   1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 6f7e5010a673..2a972752ff1b 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2782,7 +2782,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
>    *
>    * Returns zero or a negative error.
>    * 0: success.
> - * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> + * -EINVAL: too late to enable premapped mode, the vq already contains buffers.
>    */
>   int virtqueue_set_dma_premapped(struct virtqueue *_vq)
>   {
> @@ -2798,11 +2798,6 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
>   		return -EINVAL;
>   	}
>   
> -	if (!vq->use_dma_api) {
> -		END_USE(vq);
> -		return -EINVAL;
> -	}
> -
>   	vq->premapped = true;
>   	vq->do_unmap = false;
>   


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api
  2024-08-13 19:28   ` Si-Wei Liu
@ 2024-08-13 19:46     ` Michael S. Tsirkin
  2024-08-14  3:39       ` Si-Wei Liu
  2024-08-17 13:20     ` Xuan Zhuo
  1 sibling, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2024-08-13 19:46 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Xuan Zhuo, netdev, Jason Wang, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, virtualization, Darren Kenny,
	Boris Ostrovsky

On Tue, Aug 13, 2024 at 12:28:41PM -0700, Si-Wei Liu wrote:
> 
> Turning out this below commit to unconditionally enable premapped
> virtio-net:
> 
> commit f9dac92ba9081062a6477ee015bd3b8c5914efc4
> Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Date:   Sat May 11 11:14:01 2024 +0800
> 
> leads to regression on VM with no ACCESS_PLATFORM, and with the sysctl value
> of:
> 
> - net.core.high_order_alloc_disable=1
> 
> which could see reliable crashes or scp failure (scp a file 100M in size to
> VM):
> 
> [  332.079333] __vm_enough_memory: pid: 18440, comm: sshd, bytes:
> 5285790347661783040 not enough memory for the allocation
> [  332.079651] ------------[ cut here ]------------
> [  332.079655] kernel BUG at mm/mmap.c:3514!
> [  332.080095] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [  332.080826] CPU: 18 PID: 18440 Comm: sshd Kdump: loaded Not tainted
> 6.10.0-2.x86_64 #2
> [  332.081514] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014
> [  332.082451] RIP: 0010:exit_mmap+0x3a1/0x3b0
> [  332.082871] Code: be 01 00 00 00 48 89 df e8 0c 94 fe ff eb d7 be 01 00
> 00 00 48 89 df e8 5d 98 fe ff eb be 31 f6 48 89 df e8 31 99 fe ff eb a8 <0f>
> 0b e8 68 bc ae 00 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90
> [  332.084230] RSP: 0018:ffff9988b1c8f948 EFLAGS: 00010293
> [  332.084635] RAX: 0000000000000406 RBX: ffff8d47583e7380 RCX:
> 0000000000000000
> [  332.085171] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 0000000000000000
> [  332.085699] RBP: 000000000000008f R08: 0000000000000000 R09:
> 0000000000000000
> [  332.086233] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff8d47583e7430
> [  332.086761] R13: ffff8d47583e73c0 R14: 0000000000000406 R15:
> 000495ae650dda58
> [  332.087300] FS:  00007ff443899980(0000) GS:ffff8df1c5700000(0000)
> knlGS:0000000000000000
> [  332.087888] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  332.088334] CR2: 000055a42d30b730 CR3: 00000102e956a004 CR4:
> 0000000000770ef0
> [  332.088867] PKRU: 55555554
> [  332.089114] Call Trace:
> [  332.089349] <TASK>
> [  332.089556]  ? die+0x36/0x90
> [  332.089818]  ? do_trap+0xed/0x110
> [  332.090110]  ? exit_mmap+0x3a1/0x3b0
> [  332.090411]  ? do_error_trap+0x6a/0xa0
> [  332.090722]  ? exit_mmap+0x3a1/0x3b0
> [  332.091029]  ? exc_invalid_op+0x50/0x80
> [  332.091348]  ? exit_mmap+0x3a1/0x3b0
> [  332.091648]  ? asm_exc_invalid_op+0x1a/0x20
> [  332.091998]  ? exit_mmap+0x3a1/0x3b0
> [  332.092299]  ? exit_mmap+0x1d6/0x3b0
> [  332.092604] __mmput+0x3e/0x130
> [  332.092882] dup_mm.constprop.0+0x10c/0x110
> [  332.093226] copy_process+0xbd0/0x1570
> [  332.093539] kernel_clone+0xbf/0x430
> [  332.093838]  ? syscall_exit_work+0x103/0x130
> [  332.094197] __do_sys_clone+0x66/0xa0
> [  332.094506]  do_syscall_64+0x8c/0x1d0
> [  332.094814]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.095198]  ? audit_reset_context+0x232/0x310
> [  332.095558]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.095936]  ? syscall_exit_work+0x103/0x130
> [  332.096288]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.096668]  ? syscall_exit_to_user_mode+0x7d/0x220
> [  332.097059]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.097436]  ? do_syscall_64+0xba/0x1d0
> [  332.097752]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.098137]  ? syscall_exit_to_user_mode+0x7d/0x220
> [  332.098525]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.098903]  ? do_syscall_64+0xba/0x1d0
> [  332.099227]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.099606]  ? __audit_filter_op+0xbe/0x140
> [  332.099943]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.100328]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.100706]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.101089]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.101468]  ? wp_page_reuse+0x8e/0xb0
> [  332.101779]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.102163]  ? do_wp_page+0xe6/0x470
> [  332.102465]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.102843]  ? __handle_mm_fault+0x5ff/0x720
> [  332.103197]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.103574]  ? __count_memcg_events+0x4d/0xd0
> [  332.103938]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.104323]  ? count_memcg_events.constprop.0+0x26/0x50
> [  332.104729]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.105114]  ? handle_mm_fault+0xae/0x320
> [  332.105442]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.105820]  ? do_user_addr_fault+0x31f/0x6c0
> [  332.106181]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  332.106576] RIP: 0033:0x7ff43f8f9a73
> [  332.106876] Code: db 0f 85 28 01 00 00 64 4c 8b 0c 25 10 00 00 00 45 31
> c0 4d 8d 91 d0 02 00 00 31 d2 31 f6 bf 11
> 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 b9 00 00 00 41 89 c5
> 85 c0 0f 85 c6 00 00
> [  332.108163] RSP: 002b:00007ffc690909b0 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000038
> [  332.108719] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff43f8f9a73
> [  332.109253] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 0000000001200011
> [  332.109782] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 00007ff443899980
> [  332.110313] R10: 00007ff443899c50 R11: 0000000000000246 R12:
> 0000000000000002
> [  332.110842] R13: 0000562e56cd4780 R14: 0000000000000006 R15:
> 0000562e800346b0
> [  332.111381]  </TASK>
> [  332.111590] Modules linked in: rdmaip_notify scsi_transport_iscsi
> target_core_mod rfkill mstflint_access cuse rds$
> rdma rds rdma_ucm rdma_cm iw_cm dm_multipath ib_umad ib_ipoib ib_cm mlx5_ib
> iTCO_wdt iTCO_vendor_support intel_rapl_$
> sr ib_uverbs intel_rapl_common ib_core crc32_pclmul i2c_i801 joydev
> virtio_balloon i2c_smbus lpc_ich binfmt_misc xfs
> sd_mod t10_pi crc64_rocksoft sg crct10dif_pclmul mlx5_core virtio_net ahci
> net_failover mlxfw ghash_clmulni_intel vi$
> tio_scsi failover libahci sha512_ssse3 tls sha256_ssse3 pci_hyperv_intf
> virtio_pci libata psample sha1_ssse3 virtio_$
> ci_legacy_dev serio_raw dimlib virtio_pci_modern_dev qemu_fw_cfg dm_mirror
> dm_region_hash dm_log dm_mod fuse aesni_i$
> tel crypto_simd cryptd
> [  332.115851] ---[ end trace 0000000000000000 ]---
> 
> and another instance splats:
> 
> BUG: Bad page map in process PsWatcher.sh  pte:9402e1e2b18c8ae9
> pmd:10fe4f067
> [  193.046098] addr:00007ff912a00000 vm_flags:08000070
> anon_vma:0000000000000000 mapping:ffff8ec28047eeb0 index:200
> [  193.046863] file:libtinfo.so.6.1 fault:xfs_filemap_fault [xfs]
> mmap:xfs_file_mmap [xfs] read_folio:xfs_vm_read_folio [xfs]
> [  193.049564] get_swap_device: Bad swap file entry 3803ad7a32eab547
> [  193.050902] BUG: Bad rss-counter state mm:00000000ff28307a
> type:MM_SWAPENTS val:-1
> [  193.758147] Kernel panic - not syncing: corrupted stack end detected
> inside scheduler
> [  193.759151] CPU: 5 PID: 22932 Comm: LogFlusher Tainted: G B             
> 6.10.0-rc2+ #1
> [  193.759764] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014
> [  193.760435] Call Trace:
> [  193.760624]  <TASK>
> [  193.760799]  panic+0x31d/0x340
> [  193.761033]  __schedule+0xb30/0xb30
> [  193.761283]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.761605]  ? enqueue_hrtimer+0x35/0x90
> [  193.761883]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.762207]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.762532]  ? hrtimer_start_range_ns+0x121/0x300
> [  193.762856]  schedule+0x27/0xb0
> [  193.763083]  futex_wait_queue+0x63/0x90
> [  193.763354]  __futex_wait+0x13d/0x1b0
> [  193.763610]  ? __pfx_futex_wake_mark+0x10/0x10
> [  193.763918]  futex_wait+0x69/0xd0
> [  193.764153]  ? pick_next_task+0x9fb/0xa30
> [  193.764430]  ? __pfx_hrtimer_wakeup+0x10/0x10
> [  193.764734]  do_futex+0x11a/0x1d0
> [  193.764976]  __x64_sys_futex+0x68/0x1c0
> [  193.765243]  do_syscall_64+0x80/0x160
> [  193.765504]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.765834]  ? __audit_filter_op+0xaa/0xf0
> [  193.766117]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.766437]  ? audit_reset_context.part.16+0x270/0x2d0
> [  193.766895]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.767237]  ? syscall_exit_to_user_mode_prepare+0x17b/0x1a0
> [  193.767624]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.767972]  ? syscall_exit_to_user_mode+0x80/0x1e0
> [  193.768309]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.768628]  ? do_syscall_64+0x8c/0x160
> [  193.768901]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.769225]  ? audit_reset_context.part.16+0x270/0x2d0
> [  193.769573]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.769901]  ? restore_fpregs_from_fpstate+0x3c/0xa0
> [  193.770241]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.770561]  ? switch_fpu_return+0x4f/0xd0
> [  193.770848]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.771171]  ? syscall_exit_to_user_mode+0x80/0x1e0
> [  193.771505]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.771830]  ? do_syscall_64+0x8c/0x160
> [  193.772098]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.772426]  ? syscall_exit_to_user_mode_prepare+0x17b/0x1a0
> [  193.772805]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.773124]  ? syscall_exit_to_user_mode+0x80/0x1e0
> [  193.773458]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.773781]  ? do_syscall_64+0x8c/0x160
> [  193.774047]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.774376]  ? task_mm_cid_work+0x1c1/0x210
> [  193.774669]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  193.775010] RIP: 0033:0x7f4da640e898
> [  193.775270] Code: 24 58 48 85 c0 0f 88 8f 00 00 00 e8 f2 2e 00 00 89 ee
> 4c 8b 54 24 38 31 d2 41 89 c0 40 80 f6 80 4c 89 ef b8 ca 00 00 00 0f 05 <48>
> 3d 00 f0 ff ff 0f 87 ff 00 00 00 44 89 c7 e8 24 2f 00 00 48 8b
> [  193.776404] RSP: 002b:00007f4d797f2750 EFLAGS: 00000282 ORIG_RAX:
> 00000000000000ca
> [  193.776893] RAX: ffffffffffffffda RBX: 00007f4d402c1b50 RCX:
> 00007f4da640e898
> [  193.777355] RDX: 0000000000000000 RSI: 0000000000000080 RDI:
> 00007f4d402c1b7c
> [  193.777813] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 00007f4da6ece000
> [  193.778276] R10: 00007f4d797f27a0 R11: 0000000000000282 R12:
> 00007f4d402c1b28
> [  193.778732] R13: 00007f4d402c1b7c R14: 00007f4d797f2840 R15:
> 0000000000000002
> [  193.779189]  </TASK>
> [  193.780419] Kernel Offset: 0x13c00000 from 0xffffffff81000000 (relocation
> range: 0xffffffff80000000-0xffffffffbfffffff)
> [  193.781097] Rebooting in 60 seconds..
> 
> Even in premapped mode with use_dma_api, in virtnet_rq_alloc(),
> skb_page_frag_refill() could return order-0 page in honor of disabled high
> order page allocation. Though I still see
> 
>        alloc_frag->offset += size;
> 
> gets accounted irrespective of the actual page size returned (dma->len). And
> virtnet_rq_unmap() seems only cares for high order pages.
> 
> Suggest to revert this whole series, or at least the
> virtqueue_set_dma_premapped() should block !use_dma_api user from using the
> virtio DMA APIs.
> 
> Regards,
> -Siwei

Want to post a patchset to revert?

> 
> On 5/10/2024 8:14 PM, Xuan Zhuo wrote:
> > Now, we have virtio DMA APIs, the driver can be the premapped
> > mode whatever the virtio core uses dma api or not.
> > 
> > So remove the limit of checking use_dma_api from
> > virtqueue_set_dma_premapped().
> > 
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Acked-by: Jason Wang <jasowang@redhat.com>
> > ---
> >   drivers/virtio/virtio_ring.c | 7 +------
> >   1 file changed, 1 insertion(+), 6 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 6f7e5010a673..2a972752ff1b 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -2782,7 +2782,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> >    *
> >    * Returns zero or a negative error.
> >    * 0: success.
> > - * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> > + * -EINVAL: too late to enable premapped mode, the vq already contains buffers.
> >    */
> >   int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> >   {
> > @@ -2798,11 +2798,6 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> >   		return -EINVAL;
> >   	}
> > -	if (!vq->use_dma_api) {
> > -		END_USE(vq);
> > -		return -EINVAL;
> > -	}
> > -
> >   	vq->premapped = true;
> >   	vq->do_unmap = false;


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api
  2024-08-13 19:46     ` Michael S. Tsirkin
@ 2024-08-14  3:39       ` Si-Wei Liu
  2024-08-14  7:00         ` Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Si-Wei Liu @ 2024-08-14  3:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xuan Zhuo, netdev, Jason Wang, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, virtualization, Darren Kenny,
	Boris Ostrovsky

Hi Michael,

I'll look for someone else from Oracle to help you on this, as the 
relevant team already did verify internally that reverting all 4 patches 
from this series could help address the regression. Just reverting one 
single commit won't help.

   9719f039d328 virtio_net: remove the misleading comment
   defd28aa5acb virtio_net: rx remove premapped failover code
   a377ae542d8d virtio_net: big mode skip the unmap check
   f9dac92ba908 virtio_ring: enable premapped mode whatever use_dma_api

In case I fail to get someone to help, could you work with Darren 
(cc'ed) directly? He could reach out to the corresponding team in Oracle 
to help with testing.

Thanks,
-Siwei

On 8/13/2024 12:46 PM, Michael S. Tsirkin wrote:
> Want to post a patchset to revert?
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api
  2024-08-14  3:39       ` Si-Wei Liu
@ 2024-08-14  7:00         ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2024-08-14  7:00 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: Xuan Zhuo, netdev, Jason Wang, Jakub Kicinski, David S. Miller,
	Eric Dumazet, Paolo Abeni, virtualization, Darren Kenny,
	Boris Ostrovsky

On Tue, Aug 13, 2024 at 08:39:53PM -0700, Si-Wei Liu wrote:
> Hi Michael,
> 
> I'll look for someone else from Oracle to help you on this, as the relevant
> team already did verify internally that reverting all 4 patches from this
> series could help address the regression. Just reverting one single commit
> won't help.
> 
>   9719f039d328 virtio_net: remove the misleading comment
>   defd28aa5acb virtio_net: rx remove premapped failover code
>   a377ae542d8d virtio_net: big mode skip the unmap check
>   f9dac92ba908 virtio_ring: enable premapped mode whatever use_dma_api
> 
> In case I fail to get someone to help, could you work with Darren (cc'ed)
> directly? He could reach out to the corresponding team in Oracle to help
> with testing.
> 
> Thanks,
> -Siwei
> 

OK, I posted an untested revert for your testing:

Message-ID: <20240511031404.30903-1-xuanzhuo@linux.alibaba.com>



> On 8/13/2024 12:46 PM, Michael S. Tsirkin wrote:
> > Want to post a patchset to revert?
> > 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api
  2024-08-13 19:28   ` Si-Wei Liu
  2024-08-13 19:46     ` Michael S. Tsirkin
@ 2024-08-17 13:20     ` Xuan Zhuo
  2024-08-20  1:06       ` Si-Wei Liu
  1 sibling, 1 reply; 20+ messages in thread
From: Xuan Zhuo @ 2024-08-17 13:20 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, virtualization,
	Darren Kenny, Boris Ostrovsky, netdev, Michael S. Tsirkin,
	Jason Wang, Jakub Kicinski

Hi, guys, I have a fix patch for this.
Could anybody test it?

Thanks.

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index af474cc191d0..426d68c2d01d 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2492,13 +2492,15 @@ static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
 {
        struct virtnet_info *vi = rq->vq->vdev->priv;
        const size_t hdr_len = vi->hdr_len;
-       unsigned int len;
+       unsigned int len, max_len;
+
+       max_len = PAGE_SIZE - ALIGN(sizeof(struct virtnet_rq_dma), L1_CACHE_BYTES);

        if (room)
-               return PAGE_SIZE - room;
+               return max_len - room;

        len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
-                               rq->min_buf_len, PAGE_SIZE - hdr_len);
+                               rq->min_buf_len, max_len - hdr_len);

        return ALIGN(len, L1_CACHE_BYTES);
 }

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api
  2024-08-17 13:20     ` Xuan Zhuo
@ 2024-08-20  1:06       ` Si-Wei Liu
  2024-08-20  6:19         ` Xuan Zhuo
  0 siblings, 1 reply; 20+ messages in thread
From: Si-Wei Liu @ 2024-08-20  1:06 UTC (permalink / raw)
  To: Xuan Zhuo, Michael S. Tsirkin
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, virtualization,
	Darren Kenny, Boris Ostrovsky, netdev, Jason Wang, Jakub Kicinski

Hi,

May I know if this is really an intended fix to post officially, or just 
a workaround/probe to make the offset in page_frag happy when 
net_high_order_alloc_disable is true? In case it's the former, even 
though this could fix the issue, I would assume clamping to a smaller 
page_frag than a regular page size for every buffer may have certain 
performance regression for the merge-able buffer case? Can you justify 
the performance impact with some benchmark runs with larger MTU and 
merge-able rx buffers to prove the regression is negligible? You would 
need to compare against where you don't have the inadvertent 
virtnet_rq_dma cost on any page i.e. getting all 4 patches of this 
series reverted. Both tests with net_high_order_alloc_disable set to on 
and off are needed.

Thanks,
-Siwei

On 8/17/2024 6:20 AM, Xuan Zhuo wrote:
> Hi, guys, I have a fix patch for this.
> Could anybody test it?
>
> Thanks.
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index af474cc191d0..426d68c2d01d 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2492,13 +2492,15 @@ static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
>   {
>          struct virtnet_info *vi = rq->vq->vdev->priv;
>          const size_t hdr_len = vi->hdr_len;
> -       unsigned int len;
> +       unsigned int len, max_len;
> +
> +       max_len = PAGE_SIZE - ALIGN(sizeof(struct virtnet_rq_dma), L1_CACHE_BYTES);
>
>          if (room)
> -               return PAGE_SIZE - room;
> +               return max_len - room;
>
>          len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> -                               rq->min_buf_len, PAGE_SIZE - hdr_len);
> +                               rq->min_buf_len, max_len - hdr_len);
>
>          return ALIGN(len, L1_CACHE_BYTES);
>   }

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api
  2024-08-20  1:06       ` Si-Wei Liu
@ 2024-08-20  6:19         ` Xuan Zhuo
  0 siblings, 0 replies; 20+ messages in thread
From: Xuan Zhuo @ 2024-08-20  6:19 UTC (permalink / raw)
  To: Si-Wei Liu
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, virtualization,
	Darren Kenny, Boris Ostrovsky, netdev, Jason Wang, Jakub Kicinski,
	Michael S. Tsirkin

On Mon, 19 Aug 2024 18:06:07 -0700, "Si-Wei Liu" <si-wei.liu@oracle.com> wrote:
> Hi,
>
> May I know if this is really an intended fix to post officially, or just
> a workaround/probe to make the offset in page_frag happy when
> net_high_order_alloc_disable is true? In case it's the former, even
> though this could fix the issue, I would assume clamping to a smaller
> page_frag than a regular page size for every buffer may have certain
> performance regression for the merge-able buffer case? Can you justify
> the performance impact with some benchmark runs with larger MTU and
> merge-able rx buffers to prove the regression is negligible? You would
> need to compare against where you don't have the inadvertent
> virtnet_rq_dma cost on any page i.e. getting all 4 patches of this
> series reverted. Both tests with net_high_order_alloc_disable set to on
> and off are needed.


I will post a PATCH, let we discuss under that.

Thanks.

>
> Thanks,
> -Siwei
>
> On 8/17/2024 6:20 AM, Xuan Zhuo wrote:
> > Hi, guys, I have a fix patch for this.
> > Could anybody test it?
> >
> > Thanks.
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index af474cc191d0..426d68c2d01d 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -2492,13 +2492,15 @@ static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
> >   {
> >          struct virtnet_info *vi = rq->vq->vdev->priv;
> >          const size_t hdr_len = vi->hdr_len;
> > -       unsigned int len;
> > +       unsigned int len, max_len;
> > +
> > +       max_len = PAGE_SIZE - ALIGN(sizeof(struct virtnet_rq_dma), L1_CACHE_BYTES);
> >
> >          if (room)
> > -               return PAGE_SIZE - room;
> > +               return max_len - room;
> >
> >          len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> > -                               rq->min_buf_len, PAGE_SIZE - hdr_len);
> > +                               rq->min_buf_len, max_len - hdr_len);
> >
> >          return ALIGN(len, L1_CACHE_BYTES);
> >   }
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next v5 2/4] virtio_net: big mode skip the unmap check
  2024-05-11  3:14 [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default Xuan Zhuo
  2024-05-11  3:14 ` [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
@ 2024-05-11  3:14 ` Xuan Zhuo
  2024-05-11  3:14 ` [PATCH net-next v5 3/4] virtio_net: rx remove premapped failover code Xuan Zhuo
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Xuan Zhuo @ 2024-05-11  3:14 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization

The virtio-net big mode did not enable premapped mode,
so we did not need to check the unmap. And the subsequent
commit will remove the failover code for failing enable
premapped for merge and small mode. So we need to remove
the checking do_dma code in the big mode path.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ad0fb832b538..724f9310e732 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -959,7 +959,7 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
 
 	rq = &vi->rq[i];
 
-	if (rq->do_dma)
+	if (!vi->big_packets || vi->mergeable_rx_bufs)
 		virtnet_rq_unmap(rq, buf, 0);
 
 	virtnet_rq_free_buf(vi, rq, buf);
@@ -2267,7 +2267,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 		}
 	} else {
 		while (packets < budget &&
-		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
+		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
 			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
 			packets++;
 		}
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v5 3/4] virtio_net: rx remove premapped failover code
  2024-05-11  3:14 [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default Xuan Zhuo
  2024-05-11  3:14 ` [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
  2024-05-11  3:14 ` [PATCH net-next v5 2/4] virtio_net: big mode skip the unmap check Xuan Zhuo
@ 2024-05-11  3:14 ` Xuan Zhuo
  2024-05-11  3:14 ` [PATCH net-next v5 4/4] virtio_net: remove the misleading comment Xuan Zhuo
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Xuan Zhuo @ 2024-05-11  3:14 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization,
	Larysa Zaremba

Now, the premapped mode can be enabled unconditionally.

So we can remove the failover code for merge and small mode.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
---
 drivers/net/virtio_net.c | 85 +++++++++++++++++-----------------------
 1 file changed, 35 insertions(+), 50 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 724f9310e732..3ffcb2e2185f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -344,9 +344,6 @@ struct receive_queue {
 
 	/* Record the last dma info to free after new pages is allocated. */
 	struct virtnet_rq_dma *last_dma;
-
-	/* Do dma by self */
-	bool do_dma;
 };
 
 /* This structure can contain rss message with maximum settings for indirection table and keysize
@@ -846,7 +843,7 @@ static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
 	void *buf;
 
 	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
-	if (buf && rq->do_dma)
+	if (buf)
 		virtnet_rq_unmap(rq, buf, *len);
 
 	return buf;
@@ -859,11 +856,6 @@ static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 len)
 	u32 offset;
 	void *head;
 
-	if (!rq->do_dma) {
-		sg_init_one(rq->sg, buf, len);
-		return;
-	}
-
 	head = page_address(rq->alloc_frag.page);
 
 	offset = buf - head;
@@ -889,44 +881,42 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
 
 	head = page_address(alloc_frag->page);
 
-	if (rq->do_dma) {
-		dma = head;
-
-		/* new pages */
-		if (!alloc_frag->offset) {
-			if (rq->last_dma) {
-				/* Now, the new page is allocated, the last dma
-				 * will not be used. So the dma can be unmapped
-				 * if the ref is 0.
-				 */
-				virtnet_rq_unmap(rq, rq->last_dma, 0);
-				rq->last_dma = NULL;
-			}
+	dma = head;
 
-			dma->len = alloc_frag->size - sizeof(*dma);
+	/* new pages */
+	if (!alloc_frag->offset) {
+		if (rq->last_dma) {
+			/* Now, the new page is allocated, the last dma
+			 * will not be used. So the dma can be unmapped
+			 * if the ref is 0.
+			 */
+			virtnet_rq_unmap(rq, rq->last_dma, 0);
+			rq->last_dma = NULL;
+		}
 
-			addr = virtqueue_dma_map_single_attrs(rq->vq, dma + 1,
-							      dma->len, DMA_FROM_DEVICE, 0);
-			if (virtqueue_dma_mapping_error(rq->vq, addr))
-				return NULL;
+		dma->len = alloc_frag->size - sizeof(*dma);
 
-			dma->addr = addr;
-			dma->need_sync = virtqueue_dma_need_sync(rq->vq, addr);
+		addr = virtqueue_dma_map_single_attrs(rq->vq, dma + 1,
+						      dma->len, DMA_FROM_DEVICE, 0);
+		if (virtqueue_dma_mapping_error(rq->vq, addr))
+			return NULL;
 
-			/* Add a reference to dma to prevent the entire dma from
-			 * being released during error handling. This reference
-			 * will be freed after the pages are no longer used.
-			 */
-			get_page(alloc_frag->page);
-			dma->ref = 1;
-			alloc_frag->offset = sizeof(*dma);
+		dma->addr = addr;
+		dma->need_sync = virtqueue_dma_need_sync(rq->vq, addr);
 
-			rq->last_dma = dma;
-		}
+		/* Add a reference to dma to prevent the entire dma from
+		 * being released during error handling. This reference
+		 * will be freed after the pages are no longer used.
+		 */
+		get_page(alloc_frag->page);
+		dma->ref = 1;
+		alloc_frag->offset = sizeof(*dma);
 
-		++dma->ref;
+		rq->last_dma = dma;
 	}
 
+	++dma->ref;
+
 	buf = head + alloc_frag->offset;
 
 	get_page(alloc_frag->page);
@@ -943,12 +933,9 @@ static void virtnet_rq_set_premapped(struct virtnet_info *vi)
 	if (!vi->mergeable_rx_bufs && vi->big_packets)
 		return;
 
-	for (i = 0; i < vi->max_queue_pairs; i++) {
-		if (virtqueue_set_dma_premapped(vi->rq[i].vq))
-			continue;
-
-		vi->rq[i].do_dma = true;
-	}
+	for (i = 0; i < vi->max_queue_pairs; i++)
+		/* error should never happen */
+		BUG_ON(virtqueue_set_dma_premapped(vi->rq[i].vq));
 }
 
 static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
@@ -2020,8 +2007,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 
 	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0) {
-		if (rq->do_dma)
-			virtnet_rq_unmap(rq, buf, 0);
+		virtnet_rq_unmap(rq, buf, 0);
 		put_page(virt_to_head_page(buf));
 	}
 
@@ -2135,8 +2121,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 	ctx = mergeable_len_to_ctx(len + room, headroom);
 	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0) {
-		if (rq->do_dma)
-			virtnet_rq_unmap(rq, buf, 0);
+		virtnet_rq_unmap(rq, buf, 0);
 		put_page(virt_to_head_page(buf));
 	}
 
@@ -5205,7 +5190,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
 	int i;
 	for (i = 0; i < vi->max_queue_pairs; i++)
 		if (vi->rq[i].alloc_frag.page) {
-			if (vi->rq[i].do_dma && vi->rq[i].last_dma)
+			if (vi->rq[i].last_dma)
 				virtnet_rq_unmap(&vi->rq[i], vi->rq[i].last_dma, 0);
 			put_page(vi->rq[i].alloc_frag.page);
 		}
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next v5 4/4] virtio_net: remove the misleading comment
  2024-05-11  3:14 [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default Xuan Zhuo
                   ` (2 preceding siblings ...)
  2024-05-11  3:14 ` [PATCH net-next v5 3/4] virtio_net: rx remove premapped failover code Xuan Zhuo
@ 2024-05-11  3:14 ` Xuan Zhuo
  2024-05-14  0:20 ` [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default patchwork-bot+netdevbpf
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 20+ messages in thread
From: Xuan Zhuo @ 2024-05-11  3:14 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization

We call the build_skb() actually without copying data.
The comment is misleading. So remove it.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3ffcb2e2185f..f553c09e7ae4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -739,7 +739,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 
 	shinfo_size = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
-	/* copy small packet so we can reuse these pages */
 	if (!NET_IP_ALIGN && len > GOOD_COPY_LEN && tailroom >= shinfo_size) {
 		skb = virtnet_build_skb(buf, truesize, p - buf, len);
 		if (unlikely(!skb))
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default
  2024-05-11  3:14 [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default Xuan Zhuo
                   ` (3 preceding siblings ...)
  2024-05-11  3:14 ` [PATCH net-next v5 4/4] virtio_net: remove the misleading comment Xuan Zhuo
@ 2024-05-14  0:20 ` patchwork-bot+netdevbpf
  2024-08-14  6:59 ` [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default" Michael S. Tsirkin
  2024-08-15  7:14 ` Linux regression tracking (Thorsten Leemhuis)
  6 siblings, 0 replies; 20+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-05-14  0:20 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, mst, jasowang, davem, edumazet, kuba, pabeni,
	virtualization

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 11 May 2024 11:14:00 +0800 you wrote:
> Actually, for the virtio drivers, we can enable premapped mode whatever
> the value of use_dma_api. Because we provide the virtio dma apis.
> So the driver can enable premapped mode unconditionally.
> 
> This patch set makes the big mode of virtio-net to support premapped mode.
> And enable premapped mode for rx by default.
> 
> [...]

Here is the summary with links:
  - [net-next,v5,1/4] virtio_ring: enable premapped mode whatever use_dma_api
    https://git.kernel.org/netdev/net-next/c/f9dac92ba908
  - [net-next,v5,2/4] virtio_net: big mode skip the unmap check
    https://git.kernel.org/netdev/net-next/c/a377ae542d8d
  - [net-next,v5,3/4] virtio_net: rx remove premapped failover code
    https://git.kernel.org/netdev/net-next/c/defd28aa5acb
  - [net-next,v5,4/4] virtio_net: remove the misleading comment
    https://git.kernel.org/netdev/net-next/c/9719f039d328

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default"
  2024-05-11  3:14 [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default Xuan Zhuo
                   ` (4 preceding siblings ...)
  2024-05-14  0:20 ` [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default patchwork-bot+netdevbpf
@ 2024-08-14  6:59 ` Michael S. Tsirkin
  2024-08-15  7:14 ` Linux regression tracking (Thorsten Leemhuis)
  6 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2024-08-14  6:59 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Xuan Zhuo, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, virtualization, Darren Kenny,
	Boris Ostrovsky

Note: Xuan Zhuo, if you have a better idea, pls post an alternative
patch.

Note2: untested, posting for Darren to help with testing.

Turns out unconditionally enabling premapped 
virtio-net leads to a regression on VM with no ACCESS_PLATFORM, and with
sysctl net.core.high_order_alloc_disable=1

where crashes and scp failures were reported (scp a file 100M in size to VM):

[  332.079333] __vm_enough_memory: pid: 18440, comm: sshd, bytes: 5285790347661783040 not enough memory for the allocation
[  332.079651] ------------[ cut here ]------------
[  332.079655] kernel BUG at mm/mmap.c:3514!
[  332.080095] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  332.080826] CPU: 18 PID: 18440 Comm: sshd Kdump: loaded Not tainted 6.10.0-2.x86_64 #2
[  332.081514] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014
[  332.082451] RIP: 0010:exit_mmap+0x3a1/0x3b0
[  332.082871] Code: be 01 00 00 00 48 89 df e8 0c 94 fe ff eb d7 be 01 00 00 00 48 89 df e8 5d 98 fe ff eb be 31 f6 48 89 df e8 31 99 fe ff eb a8 <0f> 0b e8 68 bc ae 00 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90
[  332.084230] RSP: 0018:ffff9988b1c8f948 EFLAGS: 00010293
[  332.084635] RAX: 0000000000000406 RBX: ffff8d47583e7380 RCX: 0000000000000000
[  332.085171] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  332.085699] RBP: 000000000000008f R08: 0000000000000000 R09: 0000000000000000
[  332.086233] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d47583e7430
[  332.086761] R13: ffff8d47583e73c0 R14: 0000000000000406 R15: 000495ae650dda58
[  332.087300] FS:  00007ff443899980(0000) GS:ffff8df1c5700000(0000) knlGS:0000000000000000
[  332.087888] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  332.088334] CR2: 000055a42d30b730 CR3: 00000102e956a004 CR4: 0000000000770ef0
[  332.088867] PKRU: 55555554
[  332.089114] Call Trace:
[  332.089349] <TASK>
[  332.089556]  ? die+0x36/0x90
[  332.089818]  ? do_trap+0xed/0x110
[  332.090110]  ? exit_mmap+0x3a1/0x3b0
[  332.090411]  ? do_error_trap+0x6a/0xa0
[  332.090722]  ? exit_mmap+0x3a1/0x3b0
[  332.091029]  ? exc_invalid_op+0x50/0x80
[  332.091348]  ? exit_mmap+0x3a1/0x3b0
[  332.091648]  ? asm_exc_invalid_op+0x1a/0x20
[  332.091998]  ? exit_mmap+0x3a1/0x3b0
[  332.092299]  ? exit_mmap+0x1d6/0x3b0
[  332.092604] __mmput+0x3e/0x130
[  332.092882] dup_mm.constprop.0+0x10c/0x110
[  332.093226] copy_process+0xbd0/0x1570
[  332.093539] kernel_clone+0xbf/0x430
[  332.093838]  ? syscall_exit_work+0x103/0x130
[  332.094197] __do_sys_clone+0x66/0xa0
[  332.094506]  do_syscall_64+0x8c/0x1d0
[  332.094814]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.095198]  ? audit_reset_context+0x232/0x310
[  332.095558]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.095936]  ? syscall_exit_work+0x103/0x130
[  332.096288]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.096668]  ? syscall_exit_to_user_mode+0x7d/0x220
[  332.097059]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.097436]  ? do_syscall_64+0xba/0x1d0
[  332.097752]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.098137]  ? syscall_exit_to_user_mode+0x7d/0x220
[  332.098525]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.098903]  ? do_syscall_64+0xba/0x1d0
[  332.099227]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.099606]  ? __audit_filter_op+0xbe/0x140
[  332.099943]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.100328]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.100706]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.101089]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.101468]  ? wp_page_reuse+0x8e/0xb0
[  332.101779]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.102163]  ? do_wp_page+0xe6/0x470
[  332.102465]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.102843]  ? __handle_mm_fault+0x5ff/0x720
[  332.103197]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.103574]  ? __count_memcg_events+0x4d/0xd0
[  332.103938]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.104323]  ? count_memcg_events.constprop.0+0x26/0x50
[  332.104729]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.105114]  ? handle_mm_fault+0xae/0x320
[  332.105442]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.105820]  ? do_user_addr_fault+0x31f/0x6c0
[  332.106181]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  332.106576] RIP: 0033:0x7ff43f8f9a73
[  332.106876] Code: db 0f 85 28 01 00 00 64 4c 8b 0c 25 10 00 00 00 45 31 c0 4d 8d 91 d0 02 00 00 31 d2 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 b9 00 00 00 41 89 c5 85 c0 0f 85 c6 00 00
[  332.108163] RSP: 002b:00007ffc690909b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[  332.108719] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff43f8f9a73
[  332.109253] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[  332.109782] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ff443899980
[  332.110313] R10: 00007ff443899c50 R11: 0000000000000246 R12: 0000000000000002
[  332.110842] R13: 0000562e56cd4780 R14: 0000000000000006 R15: 0000562e800346b0
[  332.111381]  </TASK>
[  332.111590] Modules linked in: rdmaip_notify scsi_transport_iscsi target_core_mod rfkill mstflint_access cuse rds_rdma rds rdma_ucm rdma_cm iw_cm dm_multipath ib_umad ib_ipoib ib_cm mlx5_ib iTCO_wdt iTCO_vendor_support intel_rapl_msr ib_uverbs intel_rapl_common ib_core crc32_pclmul i2c_i801 joydev virtio_balloon i2c_smbus lpc_ich binfmt_misc xfs sd_mod t10_pi crc64_rocksoft sg crct10dif_pclmul mlx5_core virtio_net ahci net_failover mlxfw ghash_clmulni_intel virtio_scsi failover libahci sha512_ssse3 tls sha256_ssse3 pci_hyperv_intf virtio_pci libata psample sha1_ssse3 virtio_pci_legacy_dev serio_raw dimlib virtio_pci_modern_dev qemu_fw_cfg dm_mirror dm_region_hash dm_log dm_mod fuse aesni_intel crypto_simd cryptd
[  332.115851] ---[ end trace 0000000000000000 ]---

and another instance splats:

BUG: Bad page map in process PsWatcher.sh  pte:9402e1e2b18c8ae9 pmd:10fe4f067
[  193.046098] addr:00007ff912a00000 vm_flags:08000070 anon_vma:0000000000000000 mapping:ffff8ec28047eeb0 index:200
[  193.046863] file:libtinfo.so.6.1 fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] read_folio:xfs_vm_read_folio [xfs]
[  193.049564] get_swap_device: Bad swap file entry 3803ad7a32eab547
[  193.050902] BUG: Bad rss-counter state mm:00000000ff28307a type:MM_SWAPENTS val:-1
[  193.758147] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[  193.759151] CPU: 5 PID: 22932 Comm: LogFlusher Tainted: G B              6.10.0-rc2+ #1
[  193.759764] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014
[  193.760435] Call Trace:
[  193.760624]  <TASK>
[  193.760799]  panic+0x31d/0x340
[  193.761033]  __schedule+0xb30/0xb30
[  193.761283]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.761605]  ? enqueue_hrtimer+0x35/0x90
[  193.761883]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.762207]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.762532]  ? hrtimer_start_range_ns+0x121/0x300
[  193.762856]  schedule+0x27/0xb0
[  193.763083]  futex_wait_queue+0x63/0x90
[  193.763354]  __futex_wait+0x13d/0x1b0
[  193.763610]  ? __pfx_futex_wake_mark+0x10/0x10
[  193.763918]  futex_wait+0x69/0xd0
[  193.764153]  ? pick_next_task+0x9fb/0xa30
[  193.764430]  ? __pfx_hrtimer_wakeup+0x10/0x10
[  193.764734]  do_futex+0x11a/0x1d0
[  193.764976]  __x64_sys_futex+0x68/0x1c0
[  193.765243]  do_syscall_64+0x80/0x160
[  193.765504]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.765834]  ? __audit_filter_op+0xaa/0xf0
[  193.766117]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.766437]  ? audit_reset_context.part.16+0x270/0x2d0
[  193.766895]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.767237]  ? syscall_exit_to_user_mode_prepare+0x17b/0x1a0
[  193.767624]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.767972]  ? syscall_exit_to_user_mode+0x80/0x1e0
[  193.768309]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.768628]  ? do_syscall_64+0x8c/0x160
[  193.768901]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.769225]  ? audit_reset_context.part.16+0x270/0x2d0
[  193.769573]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.769901]  ? restore_fpregs_from_fpstate+0x3c/0xa0
[  193.770241]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.770561]  ? switch_fpu_return+0x4f/0xd0
[  193.770848]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.771171]  ? syscall_exit_to_user_mode+0x80/0x1e0
[  193.771505]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.771830]  ? do_syscall_64+0x8c/0x160
[  193.772098]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.772426]  ? syscall_exit_to_user_mode_prepare+0x17b/0x1a0
[  193.772805]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.773124]  ? syscall_exit_to_user_mode+0x80/0x1e0
[  193.773458]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.773781]  ? do_syscall_64+0x8c/0x160
[  193.774047]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.774376]  ? task_mm_cid_work+0x1c1/0x210
[  193.774669]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  193.775010] RIP: 0033:0x7f4da640e898
[  193.775270] Code: 24 58 48 85 c0 0f 88 8f 00 00 00 e8 f2 2e 00 00 89 ee 4c 8b 54 24 38 31 d2 41 89 c0 40 80 f6 80 4c 89 ef b8 ca 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 ff 00 00 00 44 89 c7 e8 24 2f 00 00 48 8b
[  193.776404] RSP: 002b:00007f4d797f2750 EFLAGS: 00000282 ORIG_RAX: 00000000000000ca
[  193.776893] RAX: ffffffffffffffda RBX: 00007f4d402c1b50 RCX: 00007f4da640e898
[  193.777355] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f4d402c1b7c
[  193.777813] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f4da6ece000
[  193.778276] R10: 00007f4d797f27a0 R11: 0000000000000282 R12: 00007f4d402c1b28
[  193.778732] R13: 00007f4d402c1b7c R14: 00007f4d797f2840 R15: 0000000000000002
[  193.779189]  </TASK>
[  193.780419] Kernel Offset: 0x13c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  193.781097] Rebooting in 60 seconds..

Even in premapped mode with use_dma_api, in virtnet_rq_alloc(), 
skb_page_frag_refill() can return order-0 page if
high order page allocation is disabled. But in current code

       alloc_frag->offset += size;

gets accounted irrespective of the actual page size returned (dma->len). 
And virtnet_rq_unmap() seems to only work with high order pages.

Suggest reverting for now.

Michael S. Tsirkin (3):
  Revert "virtio_net: rx remove premapped failover code"
  Revert "virtio_net: big mode skip the unmap check"
  Revert "virtio_ring: enable premapped mode whatever use_dma_api"

 drivers/net/virtio_net.c     | 93 +++++++++++++++++++++---------------
 drivers/virtio/virtio_ring.c |  7 ++-
 2 files changed, 60 insertions(+), 40 deletions(-)

-- 
MST


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default"
  2024-05-11  3:14 [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default Xuan Zhuo
                   ` (5 preceding siblings ...)
  2024-08-14  6:59 ` [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default" Michael S. Tsirkin
@ 2024-08-15  7:14 ` Linux regression tracking (Thorsten Leemhuis)
  2024-08-15 10:22   ` Darren Kenny
  2024-08-15 15:23   ` Michael S. Tsirkin
  6 siblings, 2 replies; 20+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-08-15  7:14 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel, netdev
  Cc: Xuan Zhuo, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, virtualization, Darren Kenny,
	Boris Ostrovsky, Linux kernel regressions list

[side note: the message I have been replying to at least when downloaded
from lore has two message-ids, one of them identical two a older
message, which is why this looks odd in the lore archives:
https://lore.kernel.org/all/20240511031404.30903-1-xuanzhuo@linux.alibaba.com/]

On 14.08.24 08:59, Michael S. Tsirkin wrote:
> Note: Xuan Zhuo, if you have a better idea, pls post an alternative
> patch.
> 
> Note2: untested, posting for Darren to help with testing.
> 
> Turns out unconditionally enabling premapped 
> virtio-net leads to a regression on VM with no ACCESS_PLATFORM, and with
> sysctl net.core.high_order_alloc_disable=1
> 
> where crashes and scp failures were reported (scp a file 100M in size to VM):
> [...]

TWIMC, there is a regression report on lore and I wonder if this might
be related or the same problem, as it also mentioned a "get_swap_device:
Bad swap file entry" error:
https://bugzilla.kernel.org/show_bug.cgi?id=219154

To quote:

"""
Hello,

I've encountered repeated crashes or freezes when a KVM VM receives
large amounts of data over the network while the system is under memory
load and performing I/O operations. The crashes sometimes occur in the
filesystem code (ext4 and btrfs, at least), but they also happen in
other locations.

This issue occurs on my custom builds using kernel versions v6.10 to
v6.11-rc2, with virtio network and disk drivers, and either Ubuntu 22.04
or Debian 12 user space.

The same kernel build did not crash on an Azure VM, which does not use
the virtio network driver. Since this issue only appears when receiving
data, I suspect there could be an issue related to the virtio interface
or receive buffer handling.

This issue did not occur on the Debian backport kernel 6.9.7-1~bpo12+1
amd64.

Steps to Reproduce:
1. Setup a small VM on a KVM host.
   I tested this on an x86_64 KVM VM with 1 CPU, 512 MB RAM, 2 GB SWAP
(the smallest configuration from Vultr), using a Debian 12 user space,
virtio disk, and virtio net.
2. Induce high memory and I/O load. Run the following command:
   stress --vm 2 --hdd 1
   (Adjust --vm to to occupy all the RAM)
   This slows down the system but does not cause a crash.
3. Send large data to the VM.
   I used `iperf3 -s` on the VM and sent data using `iperf3 -c` from
another host. The system crashes within a few seconds to a few minutes.
(The reverse direction `iperf3 -c -R` did not cause a crash.)

The OOPS messages are mostly general protection faults, but sometimes I
see "Bad pagetable" or other errors, such as:
Oops: general protection fault, probably for non-canonical address
0x2f9b7fa5e2bde696: 0000 [#1] PREEMPT SMP PTI
Oops: Oops: 0000 [#1] PREEMPT SMP PTI
Oops: Bad pagetable: 000d [#1] PREEMPT SMP PTI

In some cases, dmesg contains something like:
UBSAN: shift-out-of-bounds in lib/xarray.c:158:34

When the system freezes without crash, I sometimes found BUGON messages
in some cases, such as:
get_swap_device: Bad swap file entry 3403b0f5b2584992
BUG: Bad page map in process stress  pte:c42f93fac0299e1d pmd:0d9b2047
BUG: Bad rss-counter-state mm:000000004df3dd9a type:MM_ANONPAGES val:2
BUG: Bad rss-counter-state mm:000000004df3dd9a type:MM_SWAPENTS val:-1

Thanks.
"""

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default"
  2024-08-15  7:14 ` Linux regression tracking (Thorsten Leemhuis)
@ 2024-08-15 10:22   ` Darren Kenny
  2024-08-16  5:03     ` Linux regression tracking (Thorsten Leemhuis)
  2024-08-15 15:23   ` Michael S. Tsirkin
  1 sibling, 1 reply; 20+ messages in thread
From: Darren Kenny @ 2024-08-15 10:22 UTC (permalink / raw)
  To: Linux regression tracking (Thorsten Leemhuis), Michael S. Tsirkin,
	linux-kernel, netdev
  Cc: Xuan Zhuo, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, virtualization, Boris Ostrovsky,
	Linux kernel regressions list


On Thursday, 2024-08-15 at 09:14:27 +02, Linux regression tracking (Thorsten Leemhuis) wrote:
> [side note: the message I have been replying to at least when downloaded
> from lore has two message-ids, one of them identical two a older
> message, which is why this looks odd in the lore archives:
> https://lore.kernel.org/all/20240511031404.30903-1-xuanzhuo@linux.alibaba.com/]
>

Yes, I saw that too, hence I responded to patch 1 in the series, rather
than the cover letter.

> On 14.08.24 08:59, Michael S. Tsirkin wrote:
>> Note: Xuan Zhuo, if you have a better idea, pls post an alternative
>> patch.
>> 
>> Note2: untested, posting for Darren to help with testing.
>> 
>> Turns out unconditionally enabling premapped 
>> virtio-net leads to a regression on VM with no ACCESS_PLATFORM, and with
>> sysctl net.core.high_order_alloc_disable=1
>> 
>> where crashes and scp failures were reported (scp a file 100M in size to VM):
>> [...]
>
> TWIMC, there is a regression report on lore and I wonder if this might
> be related or the same problem, as it also mentioned a "get_swap_device:
> Bad swap file entry" error:
> https://bugzilla.kernel.org/show_bug.cgi?id=219154
>

I took a look at the stack traces, they don't look similar to what I was
seeing, but I wasn't running with an ASAN enabled in the kernel.

Most of the traces that I was seeing would look like as in the e-mail
from Si-Wei:

  https://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com/

We could trigger it only when the sysctl value was set like:

- net.core.high_order_alloc_disable=1

And it would immediately panic on any relatively large download, e.g.
wget of a few RPMS, or similar.

Best I can suggest would be to try reverting them in a custom kernel
and see if it fixes this problem too.

Thanks,

Darren.

> To quote:
>
> """
> Hello,
>
> I've encountered repeated crashes or freezes when a KVM VM receives
> large amounts of data over the network while the system is under memory
> load and performing I/O operations. The crashes sometimes occur in the
> filesystem code (ext4 and btrfs, at least), but they also happen in
> other locations.
>
> This issue occurs on my custom builds using kernel versions v6.10 to
> v6.11-rc2, with virtio network and disk drivers, and either Ubuntu 22.04
> or Debian 12 user space.
>
> The same kernel build did not crash on an Azure VM, which does not use
> the virtio network driver. Since this issue only appears when receiving
> data, I suspect there could be an issue related to the virtio interface
> or receive buffer handling.
>
> This issue did not occur on the Debian backport kernel 6.9.7-1~bpo12+1
> amd64.
>
> Steps to Reproduce:
> 1. Setup a small VM on a KVM host.
>    I tested this on an x86_64 KVM VM with 1 CPU, 512 MB RAM, 2 GB SWAP
> (the smallest configuration from Vultr), using a Debian 12 user space,
> virtio disk, and virtio net.
> 2. Induce high memory and I/O load. Run the following command:
>    stress --vm 2 --hdd 1
>    (Adjust --vm to to occupy all the RAM)
>    This slows down the system but does not cause a crash.
> 3. Send large data to the VM.
>    I used `iperf3 -s` on the VM and sent data using `iperf3 -c` from
> another host. The system crashes within a few seconds to a few minutes.
> (The reverse direction `iperf3 -c -R` did not cause a crash.)
>
>
> The OOPS messages are mostly general protection faults, but sometimes I
> see "Bad pagetable" or other errors, such as:
> Oops: general protection fault, probably for non-canonical address
> 0x2f9b7fa5e2bde696: 0000 [#1] PREEMPT SMP PTI
> Oops: Oops: 0000 [#1] PREEMPT SMP PTI
> Oops: Bad pagetable: 000d [#1] PREEMPT SMP PTI
>
> In some cases, dmesg contains something like:
> UBSAN: shift-out-of-bounds in lib/xarray.c:158:34
>
> When the system freezes without crash, I sometimes found BUGON messages
> in some cases, such as:
> get_swap_device: Bad swap file entry 3403b0f5b2584992
> BUG: Bad page map in process stress  pte:c42f93fac0299e1d pmd:0d9b2047
> BUG: Bad rss-counter-state mm:000000004df3dd9a type:MM_ANONPAGES val:2
> BUG: Bad rss-counter-state mm:000000004df3dd9a type:MM_SWAPENTS val:-1
>
> Thanks.
> """
>
> Ciao, Thorsten

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default"
  2024-08-15 10:22   ` Darren Kenny
@ 2024-08-16  5:03     ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 0 replies; 20+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-08-16  5:03 UTC (permalink / raw)
  To: Darren Kenny, Michael S. Tsirkin, linux-kernel, netdev
  Cc: Xuan Zhuo, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, virtualization, Boris Ostrovsky,
	Linux kernel regressions list, Takero Funaki

On 15.08.24 12:22, Darren Kenny wrote:
> On Thursday, 2024-08-15 at 09:14:27 +02, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 14.08.24 08:59, Michael S. Tsirkin wrote:
>>> Note: Xuan Zhuo, if you have a better idea, pls post an alternative
>>> patch.
>>>
>>> Note2: untested, posting for Darren to help with testing.
>>>
>>> Turns out unconditionally enabling premapped 
>>> virtio-net leads to a regression on VM with no ACCESS_PLATFORM, and with
>>> sysctl net.core.high_order_alloc_disable=1
>>>
>>> where crashes and scp failures were reported (scp a file 100M in size to VM):
>>> [...]
>>
>> TWIMC, there is a regression report on lore

Obviously I meant bugzilla here, sorry.

>> and I wonder if this might
>> be related or the same problem, as it also mentioned a "get_swap_device:
>> Bad swap file entry" error:
>> https://bugzilla.kernel.org/show_bug.cgi?id=219154
> 
> I took a look at the stack traces, they don't look similar to what I was
> seeing, but I wasn't running with an ASAN enabled in the kernel.
> [...]

Yeah, but in the end it seems it is the same problem: The reporter,
Takero Funaki (now CCed) meanwhile performed a bisection that ended up
on f9dac92ba908 (virtio_ring: enable premapped mode regardless of
use_dma_api) -- and later confirmed in bugzilla that reverting the three
patches resolved the problem. Feel free to CC Takero on further mails
about this.

Ciao, Thorsten

#regzbot report:
https://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com/
#regzbot dup: https://bugzilla.kernel.org/show_bug.cgi?id=219154

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default"
  2024-08-15  7:14 ` Linux regression tracking (Thorsten Leemhuis)
  2024-08-15 10:22   ` Darren Kenny
@ 2024-08-15 15:23   ` Michael S. Tsirkin
  2024-08-15 15:28     ` Michael S. Tsirkin
  1 sibling, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2024-08-15 15:23 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: linux-kernel, netdev, Xuan Zhuo, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization,
	Darren Kenny, Boris Ostrovsky

On Thu, Aug 15, 2024 at 09:14:27AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> [side note: the message I have been replying to at least when downloaded
> from lore has two message-ids, one of them identical two a older
> message, which is why this looks odd in the lore archives:
> https://lore.kernel.org/all/20240511031404.30903-1-xuanzhuo@linux.alibaba.com/]

Sorry, could you clarify - which message has two message IDs?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default"
  2024-08-15 15:23   ` Michael S. Tsirkin
@ 2024-08-15 15:28     ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2024-08-15 15:28 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: linux-kernel, netdev, Xuan Zhuo, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, virtualization,
	Darren Kenny, Boris Ostrovsky

On Thu, Aug 15, 2024 at 11:23:19AM -0400, Michael S. Tsirkin wrote:
> On Thu, Aug 15, 2024 at 09:14:27AM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> > [side note: the message I have been replying to at least when downloaded
> > from lore has two message-ids, one of them identical two a older
> > message, which is why this looks odd in the lore archives:
> > https://lore.kernel.org/all/20240511031404.30903-1-xuanzhuo@linux.alibaba.com/]
> 
> Sorry, could you clarify - which message has two message IDs?

Ouch. The one I sent had a bad message Id :(
Donnu how it happened, I guess I was mucking with it
manually and corrupted it. Really sorry.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH RFC 1/3] Revert "virtio_net: rx remove premapped failover code"
@ 2024-08-14  6:59 Michael S. Tsirkin
  2024-08-15 15:27 ` [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default" Michael S. Tsirkin
  0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2024-08-14  6:59 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Xuan Zhuo, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, virtualization, Darren Kenny,
	Boris Ostrovsky, Si-Wei Liu, Eugenio Pérez

This reverts commit defd28aa5acb0fd7c15adc6bc40a8ac277d04dea.

leads to crashes with no ACCESS_PLATFORM when
sysctl net.core.high_order_alloc_disable=1

Reported-by: Si-Wei Liu <si-wei.liu@oracle.com>
Message-ID: <8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 89 +++++++++++++++++++++++-----------------
 1 file changed, 52 insertions(+), 37 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index fd3d7e926022..4f7e686b8bf9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -348,6 +348,9 @@ struct receive_queue {
 
 	/* Record the last dma info to free after new pages is allocated. */
 	struct virtnet_rq_dma *last_dma;
+
+	/* Do dma by self */
+	bool do_dma;
 };
 
 /* This structure can contain rss message with maximum settings for indirection table and keysize
@@ -848,7 +851,7 @@ static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
 	void *buf;
 
 	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
-	if (buf)
+	if (buf && rq->do_dma)
 		virtnet_rq_unmap(rq, buf, *len);
 
 	return buf;
@@ -861,6 +864,11 @@ static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 len)
 	u32 offset;
 	void *head;
 
+	if (!rq->do_dma) {
+		sg_init_one(rq->sg, buf, len);
+		return;
+	}
+
 	head = page_address(rq->alloc_frag.page);
 
 	offset = buf - head;
@@ -886,42 +894,44 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
 
 	head = page_address(alloc_frag->page);
 
-	dma = head;
+	if (rq->do_dma) {
+		dma = head;
 
-	/* new pages */
-	if (!alloc_frag->offset) {
-		if (rq->last_dma) {
-			/* Now, the new page is allocated, the last dma
-			 * will not be used. So the dma can be unmapped
-			 * if the ref is 0.
+		/* new pages */
+		if (!alloc_frag->offset) {
+			if (rq->last_dma) {
+				/* Now, the new page is allocated, the last dma
+				 * will not be used. So the dma can be unmapped
+				 * if the ref is 0.
+				 */
+				virtnet_rq_unmap(rq, rq->last_dma, 0);
+				rq->last_dma = NULL;
+			}
+
+			dma->len = alloc_frag->size - sizeof(*dma);
+
+			addr = virtqueue_dma_map_single_attrs(rq->vq, dma + 1,
+							      dma->len, DMA_FROM_DEVICE, 0);
+			if (virtqueue_dma_mapping_error(rq->vq, addr))
+				return NULL;
+
+			dma->addr = addr;
+			dma->need_sync = virtqueue_dma_need_sync(rq->vq, addr);
+
+			/* Add a reference to dma to prevent the entire dma from
+			 * being released during error handling. This reference
+			 * will be freed after the pages are no longer used.
 			 */
-			virtnet_rq_unmap(rq, rq->last_dma, 0);
-			rq->last_dma = NULL;
+			get_page(alloc_frag->page);
+			dma->ref = 1;
+			alloc_frag->offset = sizeof(*dma);
+
+			rq->last_dma = dma;
 		}
 
-		dma->len = alloc_frag->size - sizeof(*dma);
-
-		addr = virtqueue_dma_map_single_attrs(rq->vq, dma + 1,
-						      dma->len, DMA_FROM_DEVICE, 0);
-		if (virtqueue_dma_mapping_error(rq->vq, addr))
-			return NULL;
-
-		dma->addr = addr;
-		dma->need_sync = virtqueue_dma_need_sync(rq->vq, addr);
-
-		/* Add a reference to dma to prevent the entire dma from
-		 * being released during error handling. This reference
-		 * will be freed after the pages are no longer used.
-		 */
-		get_page(alloc_frag->page);
-		dma->ref = 1;
-		alloc_frag->offset = sizeof(*dma);
-
-		rq->last_dma = dma;
+		++dma->ref;
 	}
 
-	++dma->ref;
-
 	buf = head + alloc_frag->offset;
 
 	get_page(alloc_frag->page);
@@ -938,9 +948,12 @@ static void virtnet_rq_set_premapped(struct virtnet_info *vi)
 	if (!vi->mergeable_rx_bufs && vi->big_packets)
 		return;
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
-		/* error should never happen */
-		BUG_ON(virtqueue_set_dma_premapped(vi->rq[i].vq));
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		if (virtqueue_set_dma_premapped(vi->rq[i].vq))
+			continue;
+
+		vi->rq[i].do_dma = true;
+	}
 }
 
 static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
@@ -2036,7 +2049,8 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 
 	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0) {
-		virtnet_rq_unmap(rq, buf, 0);
+		if (rq->do_dma)
+			virtnet_rq_unmap(rq, buf, 0);
 		put_page(virt_to_head_page(buf));
 	}
 
@@ -2150,7 +2164,8 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 	ctx = mergeable_len_to_ctx(len + room, headroom);
 	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0) {
-		virtnet_rq_unmap(rq, buf, 0);
+		if (rq->do_dma)
+			virtnet_rq_unmap(rq, buf, 0);
 		put_page(virt_to_head_page(buf));
 	}
 
@@ -5231,7 +5246,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
 	int i;
 	for (i = 0; i < vi->max_queue_pairs; i++)
 		if (vi->rq[i].alloc_frag.page) {
-			if (vi->rq[i].last_dma)
+			if (vi->rq[i].do_dma && vi->rq[i].last_dma)
 				virtnet_rq_unmap(&vi->rq[i], vi->rq[i].last_dma, 0);
 			put_page(vi->rq[i].alloc_frag.page);
 		}
-- 
MST


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default"
  2024-08-14  6:59 [PATCH RFC 1/3] Revert "virtio_net: rx remove premapped failover code" Michael S. Tsirkin
@ 2024-08-15 15:27 ` Michael S. Tsirkin
  0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2024-08-15 15:27 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Xuan Zhuo, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, virtualization, Darren Kenny,
	Boris Ostrovsky

Reposting just 0/3 to fix up archives.  Hope it works :(

Note: Xuan Zhuo, if you have a better idea, pls post an alternative
patch.

Note2: untested, posting for Darren to help with testing.

Turns out unconditionally enabling premapped 
virtio-net leads to a regression on VM with no ACCESS_PLATFORM, and with
sysctl net.core.high_order_alloc_disable=1

where crashes and scp failures were reported (scp a file 100M in size to VM):

[  332.079333] __vm_enough_memory: pid: 18440, comm: sshd, bytes: 5285790347661783040 not enough memory for the allocation
[  332.079651] ------------[ cut here ]------------
[  332.079655] kernel BUG at mm/mmap.c:3514!
[  332.080095] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  332.080826] CPU: 18 PID: 18440 Comm: sshd Kdump: loaded Not tainted 6.10.0-2.x86_64 #2
[  332.081514] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014
[  332.082451] RIP: 0010:exit_mmap+0x3a1/0x3b0
[  332.082871] Code: be 01 00 00 00 48 89 df e8 0c 94 fe ff eb d7 be 01 00 00 00 48 89 df e8 5d 98 fe ff eb be 31 f6 48 89 df e8 31 99 fe ff eb a8 <0f> 0b e8 68 bc ae 00 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90
[  332.084230] RSP: 0018:ffff9988b1c8f948 EFLAGS: 00010293
[  332.084635] RAX: 0000000000000406 RBX: ffff8d47583e7380 RCX: 0000000000000000
[  332.085171] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  332.085699] RBP: 000000000000008f R08: 0000000000000000 R09: 0000000000000000
[  332.086233] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d47583e7430
[  332.086761] R13: ffff8d47583e73c0 R14: 0000000000000406 R15: 000495ae650dda58
[  332.087300] FS:  00007ff443899980(0000) GS:ffff8df1c5700000(0000) knlGS:0000000000000000
[  332.087888] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  332.088334] CR2: 000055a42d30b730 CR3: 00000102e956a004 CR4: 0000000000770ef0
[  332.088867] PKRU: 55555554
[  332.089114] Call Trace:
[  332.089349] <TASK>
[  332.089556]  ? die+0x36/0x90
[  332.089818]  ? do_trap+0xed/0x110
[  332.090110]  ? exit_mmap+0x3a1/0x3b0
[  332.090411]  ? do_error_trap+0x6a/0xa0
[  332.090722]  ? exit_mmap+0x3a1/0x3b0
[  332.091029]  ? exc_invalid_op+0x50/0x80
[  332.091348]  ? exit_mmap+0x3a1/0x3b0
[  332.091648]  ? asm_exc_invalid_op+0x1a/0x20
[  332.091998]  ? exit_mmap+0x3a1/0x3b0
[  332.092299]  ? exit_mmap+0x1d6/0x3b0
[  332.092604] __mmput+0x3e/0x130
[  332.092882] dup_mm.constprop.0+0x10c/0x110
[  332.093226] copy_process+0xbd0/0x1570
[  332.093539] kernel_clone+0xbf/0x430
[  332.093838]  ? syscall_exit_work+0x103/0x130
[  332.094197] __do_sys_clone+0x66/0xa0
[  332.094506]  do_syscall_64+0x8c/0x1d0
[  332.094814]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.095198]  ? audit_reset_context+0x232/0x310
[  332.095558]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.095936]  ? syscall_exit_work+0x103/0x130
[  332.096288]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.096668]  ? syscall_exit_to_user_mode+0x7d/0x220
[  332.097059]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.097436]  ? do_syscall_64+0xba/0x1d0
[  332.097752]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.098137]  ? syscall_exit_to_user_mode+0x7d/0x220
[  332.098525]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.098903]  ? do_syscall_64+0xba/0x1d0
[  332.099227]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.099606]  ? __audit_filter_op+0xbe/0x140
[  332.099943]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.100328]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.100706]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.101089]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.101468]  ? wp_page_reuse+0x8e/0xb0
[  332.101779]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.102163]  ? do_wp_page+0xe6/0x470
[  332.102465]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.102843]  ? __handle_mm_fault+0x5ff/0x720
[  332.103197]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.103574]  ? __count_memcg_events+0x4d/0xd0
[  332.103938]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.104323]  ? count_memcg_events.constprop.0+0x26/0x50
[  332.104729]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.105114]  ? handle_mm_fault+0xae/0x320
[  332.105442]  ? srso_alias_return_thunk+0x5/0xfbef5
[  332.105820]  ? do_user_addr_fault+0x31f/0x6c0
[  332.106181]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  332.106576] RIP: 0033:0x7ff43f8f9a73
[  332.106876] Code: db 0f 85 28 01 00 00 64 4c 8b 0c 25 10 00 00 00 45 31 c0 4d 8d 91 d0 02 00 00 31 d2 31 f6 bf 11 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 b9 00 00 00 41 89 c5 85 c0 0f 85 c6 00 00
[  332.108163] RSP: 002b:00007ffc690909b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[  332.108719] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff43f8f9a73
[  332.109253] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[  332.109782] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ff443899980
[  332.110313] R10: 00007ff443899c50 R11: 0000000000000246 R12: 0000000000000002
[  332.110842] R13: 0000562e56cd4780 R14: 0000000000000006 R15: 0000562e800346b0
[  332.111381]  </TASK>
[  332.111590] Modules linked in: rdmaip_notify scsi_transport_iscsi target_core_mod rfkill mstflint_access cuse rds_rdma rds rdma_ucm rdma_cm iw_cm dm_multipath ib_umad ib_ipoib ib_cm mlx5_ib iTCO_wdt iTCO_vendor_support intel_rapl_msr ib_uverbs intel_rapl_common ib_core crc32_pclmul i2c_i801 joydev virtio_balloon i2c_smbus lpc_ich binfmt_misc xfs sd_mod t10_pi crc64_rocksoft sg crct10dif_pclmul mlx5_core virtio_net ahci net_failover mlxfw ghash_clmulni_intel virtio_scsi failover libahci sha512_ssse3 tls sha256_ssse3 pci_hyperv_intf virtio_pci libata psample sha1_ssse3 virtio_pci_legacy_dev serio_raw dimlib virtio_pci_modern_dev qemu_fw_cfg dm_mirror dm_region_hash dm_log dm_mod fuse aesni_intel crypto_simd cryptd
[  332.115851] ---[ end trace 0000000000000000 ]---

and another instance splats:

BUG: Bad page map in process PsWatcher.sh  pte:9402e1e2b18c8ae9 pmd:10fe4f067
[  193.046098] addr:00007ff912a00000 vm_flags:08000070 anon_vma:0000000000000000 mapping:ffff8ec28047eeb0 index:200
[  193.046863] file:libtinfo.so.6.1 fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] read_folio:xfs_vm_read_folio [xfs]
[  193.049564] get_swap_device: Bad swap file entry 3803ad7a32eab547
[  193.050902] BUG: Bad rss-counter state mm:00000000ff28307a type:MM_SWAPENTS val:-1
[  193.758147] Kernel panic - not syncing: corrupted stack end detected inside scheduler
[  193.759151] CPU: 5 PID: 22932 Comm: LogFlusher Tainted: G B              6.10.0-rc2+ #1
[  193.759764] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014
[  193.760435] Call Trace:
[  193.760624]  <TASK>
[  193.760799]  panic+0x31d/0x340
[  193.761033]  __schedule+0xb30/0xb30
[  193.761283]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.761605]  ? enqueue_hrtimer+0x35/0x90
[  193.761883]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.762207]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.762532]  ? hrtimer_start_range_ns+0x121/0x300
[  193.762856]  schedule+0x27/0xb0
[  193.763083]  futex_wait_queue+0x63/0x90
[  193.763354]  __futex_wait+0x13d/0x1b0
[  193.763610]  ? __pfx_futex_wake_mark+0x10/0x10
[  193.763918]  futex_wait+0x69/0xd0
[  193.764153]  ? pick_next_task+0x9fb/0xa30
[  193.764430]  ? __pfx_hrtimer_wakeup+0x10/0x10
[  193.764734]  do_futex+0x11a/0x1d0
[  193.764976]  __x64_sys_futex+0x68/0x1c0
[  193.765243]  do_syscall_64+0x80/0x160
[  193.765504]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.765834]  ? __audit_filter_op+0xaa/0xf0
[  193.766117]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.766437]  ? audit_reset_context.part.16+0x270/0x2d0
[  193.766895]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.767237]  ? syscall_exit_to_user_mode_prepare+0x17b/0x1a0
[  193.767624]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.767972]  ? syscall_exit_to_user_mode+0x80/0x1e0
[  193.768309]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.768628]  ? do_syscall_64+0x8c/0x160
[  193.768901]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.769225]  ? audit_reset_context.part.16+0x270/0x2d0
[  193.769573]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.769901]  ? restore_fpregs_from_fpstate+0x3c/0xa0
[  193.770241]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.770561]  ? switch_fpu_return+0x4f/0xd0
[  193.770848]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.771171]  ? syscall_exit_to_user_mode+0x80/0x1e0
[  193.771505]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.771830]  ? do_syscall_64+0x8c/0x160
[  193.772098]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.772426]  ? syscall_exit_to_user_mode_prepare+0x17b/0x1a0
[  193.772805]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.773124]  ? syscall_exit_to_user_mode+0x80/0x1e0
[  193.773458]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.773781]  ? do_syscall_64+0x8c/0x160
[  193.774047]  ? srso_alias_return_thunk+0x5/0xfbef5
[  193.774376]  ? task_mm_cid_work+0x1c1/0x210
[  193.774669]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  193.775010] RIP: 0033:0x7f4da640e898
[  193.775270] Code: 24 58 48 85 c0 0f 88 8f 00 00 00 e8 f2 2e 00 00 89 ee 4c 8b 54 24 38 31 d2 41 89 c0 40 80 f6 80 4c 89 ef b8 ca 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 ff 00 00 00 44 89 c7 e8 24 2f 00 00 48 8b
[  193.776404] RSP: 002b:00007f4d797f2750 EFLAGS: 00000282 ORIG_RAX: 00000000000000ca
[  193.776893] RAX: ffffffffffffffda RBX: 00007f4d402c1b50 RCX: 00007f4da640e898
[  193.777355] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f4d402c1b7c
[  193.777813] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007f4da6ece000
[  193.778276] R10: 00007f4d797f27a0 R11: 0000000000000282 R12: 00007f4d402c1b28
[  193.778732] R13: 00007f4d402c1b7c R14: 00007f4d797f2840 R15: 0000000000000002
[  193.779189]  </TASK>
[  193.780419] Kernel Offset: 0x13c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  193.781097] Rebooting in 60 seconds..

Even in premapped mode with use_dma_api, in virtnet_rq_alloc(), 
skb_page_frag_refill() can return order-0 page if
high order page allocation is disabled. But in current code

       alloc_frag->offset += size;

gets accounted irrespective of the actual page size returned (dma->len). 
And virtnet_rq_unmap() seems to only work with high order pages.

Suggest reverting for now.

Michael S. Tsirkin (3):
  Revert "virtio_net: rx remove premapped failover code"
  Revert "virtio_net: big mode skip the unmap check"
  Revert "virtio_ring: enable premapped mode whatever use_dma_api"

 drivers/net/virtio_net.c     | 93 +++++++++++++++++++++---------------
 drivers/virtio/virtio_ring.c |  7 ++-
 2 files changed, 60 insertions(+), 40 deletions(-)

-- 
MST


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-08-20  6:21 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-11  3:14 [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default Xuan Zhuo
2024-05-11  3:14 ` [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
2024-08-13 19:28   ` Si-Wei Liu
2024-08-13 19:46     ` Michael S. Tsirkin
2024-08-14  3:39       ` Si-Wei Liu
2024-08-14  7:00         ` Michael S. Tsirkin
2024-08-17 13:20     ` Xuan Zhuo
2024-08-20  1:06       ` Si-Wei Liu
2024-08-20  6:19         ` Xuan Zhuo
2024-05-11  3:14 ` [PATCH net-next v5 2/4] virtio_net: big mode skip the unmap check Xuan Zhuo
2024-05-11  3:14 ` [PATCH net-next v5 3/4] virtio_net: rx remove premapped failover code Xuan Zhuo
2024-05-11  3:14 ` [PATCH net-next v5 4/4] virtio_net: remove the misleading comment Xuan Zhuo
2024-05-14  0:20 ` [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default patchwork-bot+netdevbpf
2024-08-14  6:59 ` [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default" Michael S. Tsirkin
2024-08-15  7:14 ` Linux regression tracking (Thorsten Leemhuis)
2024-08-15 10:22   ` Darren Kenny
2024-08-16  5:03     ` Linux regression tracking (Thorsten Leemhuis)
2024-08-15 15:23   ` Michael S. Tsirkin
2024-08-15 15:28     ` Michael S. Tsirkin
  -- strict thread matches above, loose matches on Subject: below --
2024-08-14  6:59 [PATCH RFC 1/3] Revert "virtio_net: rx remove premapped failover code" Michael S. Tsirkin
2024-08-15 15:27 ` [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default" Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).