All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Si-Wei Liu <si-wei.liu@oracle.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>,
	netdev@vger.kernel.org, Jason Wang <jasowang@redhat.com>,
	Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	virtualization@lists.linux.dev,
	Darren Kenny <darren.kenny@oracle.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api
Date: Tue, 13 Aug 2024 15:46:27 -0400	[thread overview]
Message-ID: <20240813154458-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <8b20cc28-45a9-4643-8e87-ba164a540c0a@oracle.com>

On Tue, Aug 13, 2024 at 12:28:41PM -0700, Si-Wei Liu wrote:
> 
> Turning out this below commit to unconditionally enable premapped
> virtio-net:
> 
> commit f9dac92ba9081062a6477ee015bd3b8c5914efc4
> Author: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Date:   Sat May 11 11:14:01 2024 +0800
> 
> leads to regression on VM with no ACCESS_PLATFORM, and with the sysctl value
> of:
> 
> - net.core.high_order_alloc_disable=1
> 
> which could see reliable crashes or scp failure (scp a file 100M in size to
> VM):
> 
> [  332.079333] __vm_enough_memory: pid: 18440, comm: sshd, bytes:
> 5285790347661783040 not enough memory for the allocation
> [  332.079651] ------------[ cut here ]------------
> [  332.079655] kernel BUG at mm/mmap.c:3514!
> [  332.080095] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> [  332.080826] CPU: 18 PID: 18440 Comm: sshd Kdump: loaded Not tainted
> 6.10.0-2.x86_64 #2
> [  332.081514] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014
> [  332.082451] RIP: 0010:exit_mmap+0x3a1/0x3b0
> [  332.082871] Code: be 01 00 00 00 48 89 df e8 0c 94 fe ff eb d7 be 01 00
> 00 00 48 89 df e8 5d 98 fe ff eb be 31 f6 48 89 df e8 31 99 fe ff eb a8 <0f>
> 0b e8 68 bc ae 00 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90
> [  332.084230] RSP: 0018:ffff9988b1c8f948 EFLAGS: 00010293
> [  332.084635] RAX: 0000000000000406 RBX: ffff8d47583e7380 RCX:
> 0000000000000000
> [  332.085171] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 0000000000000000
> [  332.085699] RBP: 000000000000008f R08: 0000000000000000 R09:
> 0000000000000000
> [  332.086233] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff8d47583e7430
> [  332.086761] R13: ffff8d47583e73c0 R14: 0000000000000406 R15:
> 000495ae650dda58
> [  332.087300] FS:  00007ff443899980(0000) GS:ffff8df1c5700000(0000)
> knlGS:0000000000000000
> [  332.087888] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  332.088334] CR2: 000055a42d30b730 CR3: 00000102e956a004 CR4:
> 0000000000770ef0
> [  332.088867] PKRU: 55555554
> [  332.089114] Call Trace:
> [  332.089349] <TASK>
> [  332.089556]  ? die+0x36/0x90
> [  332.089818]  ? do_trap+0xed/0x110
> [  332.090110]  ? exit_mmap+0x3a1/0x3b0
> [  332.090411]  ? do_error_trap+0x6a/0xa0
> [  332.090722]  ? exit_mmap+0x3a1/0x3b0
> [  332.091029]  ? exc_invalid_op+0x50/0x80
> [  332.091348]  ? exit_mmap+0x3a1/0x3b0
> [  332.091648]  ? asm_exc_invalid_op+0x1a/0x20
> [  332.091998]  ? exit_mmap+0x3a1/0x3b0
> [  332.092299]  ? exit_mmap+0x1d6/0x3b0
> [  332.092604] __mmput+0x3e/0x130
> [  332.092882] dup_mm.constprop.0+0x10c/0x110
> [  332.093226] copy_process+0xbd0/0x1570
> [  332.093539] kernel_clone+0xbf/0x430
> [  332.093838]  ? syscall_exit_work+0x103/0x130
> [  332.094197] __do_sys_clone+0x66/0xa0
> [  332.094506]  do_syscall_64+0x8c/0x1d0
> [  332.094814]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.095198]  ? audit_reset_context+0x232/0x310
> [  332.095558]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.095936]  ? syscall_exit_work+0x103/0x130
> [  332.096288]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.096668]  ? syscall_exit_to_user_mode+0x7d/0x220
> [  332.097059]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.097436]  ? do_syscall_64+0xba/0x1d0
> [  332.097752]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.098137]  ? syscall_exit_to_user_mode+0x7d/0x220
> [  332.098525]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.098903]  ? do_syscall_64+0xba/0x1d0
> [  332.099227]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.099606]  ? __audit_filter_op+0xbe/0x140
> [  332.099943]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.100328]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.100706]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.101089]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.101468]  ? wp_page_reuse+0x8e/0xb0
> [  332.101779]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.102163]  ? do_wp_page+0xe6/0x470
> [  332.102465]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.102843]  ? __handle_mm_fault+0x5ff/0x720
> [  332.103197]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.103574]  ? __count_memcg_events+0x4d/0xd0
> [  332.103938]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.104323]  ? count_memcg_events.constprop.0+0x26/0x50
> [  332.104729]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.105114]  ? handle_mm_fault+0xae/0x320
> [  332.105442]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  332.105820]  ? do_user_addr_fault+0x31f/0x6c0
> [  332.106181]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  332.106576] RIP: 0033:0x7ff43f8f9a73
> [  332.106876] Code: db 0f 85 28 01 00 00 64 4c 8b 0c 25 10 00 00 00 45 31
> c0 4d 8d 91 d0 02 00 00 31 d2 31 f6 bf 11
> 00 20 01 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 b9 00 00 00 41 89 c5
> 85 c0 0f 85 c6 00 00
> [  332.108163] RSP: 002b:00007ffc690909b0 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000038
> [  332.108719] RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> 00007ff43f8f9a73
> [  332.109253] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 0000000001200011
> [  332.109782] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 00007ff443899980
> [  332.110313] R10: 00007ff443899c50 R11: 0000000000000246 R12:
> 0000000000000002
> [  332.110842] R13: 0000562e56cd4780 R14: 0000000000000006 R15:
> 0000562e800346b0
> [  332.111381]  </TASK>
> [  332.111590] Modules linked in: rdmaip_notify scsi_transport_iscsi
> target_core_mod rfkill mstflint_access cuse rds$
> rdma rds rdma_ucm rdma_cm iw_cm dm_multipath ib_umad ib_ipoib ib_cm mlx5_ib
> iTCO_wdt iTCO_vendor_support intel_rapl_$
> sr ib_uverbs intel_rapl_common ib_core crc32_pclmul i2c_i801 joydev
> virtio_balloon i2c_smbus lpc_ich binfmt_misc xfs
> sd_mod t10_pi crc64_rocksoft sg crct10dif_pclmul mlx5_core virtio_net ahci
> net_failover mlxfw ghash_clmulni_intel vi$
> tio_scsi failover libahci sha512_ssse3 tls sha256_ssse3 pci_hyperv_intf
> virtio_pci libata psample sha1_ssse3 virtio_$
> ci_legacy_dev serio_raw dimlib virtio_pci_modern_dev qemu_fw_cfg dm_mirror
> dm_region_hash dm_log dm_mod fuse aesni_i$
> tel crypto_simd cryptd
> [  332.115851] ---[ end trace 0000000000000000 ]---
> 
> and another instance splats:
> 
> BUG: Bad page map in process PsWatcher.sh  pte:9402e1e2b18c8ae9
> pmd:10fe4f067
> [  193.046098] addr:00007ff912a00000 vm_flags:08000070
> anon_vma:0000000000000000 mapping:ffff8ec28047eeb0 index:200
> [  193.046863] file:libtinfo.so.6.1 fault:xfs_filemap_fault [xfs]
> mmap:xfs_file_mmap [xfs] read_folio:xfs_vm_read_folio [xfs]
> [  193.049564] get_swap_device: Bad swap file entry 3803ad7a32eab547
> [  193.050902] BUG: Bad rss-counter state mm:00000000ff28307a
> type:MM_SWAPENTS val:-1
> [  193.758147] Kernel panic - not syncing: corrupted stack end detected
> inside scheduler
> [  193.759151] CPU: 5 PID: 22932 Comm: LogFlusher Tainted: G B             
> 6.10.0-rc2+ #1
> [  193.759764] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> 1.16.0-4.module+el8.9.0+90173+a3f3e83a 04/01/2014
> [  193.760435] Call Trace:
> [  193.760624]  <TASK>
> [  193.760799]  panic+0x31d/0x340
> [  193.761033]  __schedule+0xb30/0xb30
> [  193.761283]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.761605]  ? enqueue_hrtimer+0x35/0x90
> [  193.761883]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.762207]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.762532]  ? hrtimer_start_range_ns+0x121/0x300
> [  193.762856]  schedule+0x27/0xb0
> [  193.763083]  futex_wait_queue+0x63/0x90
> [  193.763354]  __futex_wait+0x13d/0x1b0
> [  193.763610]  ? __pfx_futex_wake_mark+0x10/0x10
> [  193.763918]  futex_wait+0x69/0xd0
> [  193.764153]  ? pick_next_task+0x9fb/0xa30
> [  193.764430]  ? __pfx_hrtimer_wakeup+0x10/0x10
> [  193.764734]  do_futex+0x11a/0x1d0
> [  193.764976]  __x64_sys_futex+0x68/0x1c0
> [  193.765243]  do_syscall_64+0x80/0x160
> [  193.765504]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.765834]  ? __audit_filter_op+0xaa/0xf0
> [  193.766117]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.766437]  ? audit_reset_context.part.16+0x270/0x2d0
> [  193.766895]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.767237]  ? syscall_exit_to_user_mode_prepare+0x17b/0x1a0
> [  193.767624]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.767972]  ? syscall_exit_to_user_mode+0x80/0x1e0
> [  193.768309]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.768628]  ? do_syscall_64+0x8c/0x160
> [  193.768901]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.769225]  ? audit_reset_context.part.16+0x270/0x2d0
> [  193.769573]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.769901]  ? restore_fpregs_from_fpstate+0x3c/0xa0
> [  193.770241]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.770561]  ? switch_fpu_return+0x4f/0xd0
> [  193.770848]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.771171]  ? syscall_exit_to_user_mode+0x80/0x1e0
> [  193.771505]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.771830]  ? do_syscall_64+0x8c/0x160
> [  193.772098]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.772426]  ? syscall_exit_to_user_mode_prepare+0x17b/0x1a0
> [  193.772805]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.773124]  ? syscall_exit_to_user_mode+0x80/0x1e0
> [  193.773458]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.773781]  ? do_syscall_64+0x8c/0x160
> [  193.774047]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  193.774376]  ? task_mm_cid_work+0x1c1/0x210
> [  193.774669]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  193.775010] RIP: 0033:0x7f4da640e898
> [  193.775270] Code: 24 58 48 85 c0 0f 88 8f 00 00 00 e8 f2 2e 00 00 89 ee
> 4c 8b 54 24 38 31 d2 41 89 c0 40 80 f6 80 4c 89 ef b8 ca 00 00 00 0f 05 <48>
> 3d 00 f0 ff ff 0f 87 ff 00 00 00 44 89 c7 e8 24 2f 00 00 48 8b
> [  193.776404] RSP: 002b:00007f4d797f2750 EFLAGS: 00000282 ORIG_RAX:
> 00000000000000ca
> [  193.776893] RAX: ffffffffffffffda RBX: 00007f4d402c1b50 RCX:
> 00007f4da640e898
> [  193.777355] RDX: 0000000000000000 RSI: 0000000000000080 RDI:
> 00007f4d402c1b7c
> [  193.777813] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 00007f4da6ece000
> [  193.778276] R10: 00007f4d797f27a0 R11: 0000000000000282 R12:
> 00007f4d402c1b28
> [  193.778732] R13: 00007f4d402c1b7c R14: 00007f4d797f2840 R15:
> 0000000000000002
> [  193.779189]  </TASK>
> [  193.780419] Kernel Offset: 0x13c00000 from 0xffffffff81000000 (relocation
> range: 0xffffffff80000000-0xffffffffbfffffff)
> [  193.781097] Rebooting in 60 seconds..
> 
> Even in premapped mode with use_dma_api, in virtnet_rq_alloc(),
> skb_page_frag_refill() could return order-0 page in honor of disabled high
> order page allocation. Though I still see
> 
>        alloc_frag->offset += size;
> 
> gets accounted irrespective of the actual page size returned (dma->len). And
> virtnet_rq_unmap() seems only cares for high order pages.
> 
> Suggest to revert this whole series, or at least the
> virtqueue_set_dma_premapped() should block !use_dma_api user from using the
> virtio DMA APIs.
> 
> Regards,
> -Siwei

Want to post a patchset to revert?

> 
> On 5/10/2024 8:14 PM, Xuan Zhuo wrote:
> > Now, we have virtio DMA APIs, the driver can be the premapped
> > mode whatever the virtio core uses dma api or not.
> > 
> > So remove the limit of checking use_dma_api from
> > virtqueue_set_dma_premapped().
> > 
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > Acked-by: Jason Wang <jasowang@redhat.com>
> > ---
> >   drivers/virtio/virtio_ring.c | 7 +------
> >   1 file changed, 1 insertion(+), 6 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 6f7e5010a673..2a972752ff1b 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -2782,7 +2782,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> >    *
> >    * Returns zero or a negative error.
> >    * 0: success.
> > - * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> > + * -EINVAL: too late to enable premapped mode, the vq already contains buffers.
> >    */
> >   int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> >   {
> > @@ -2798,11 +2798,6 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> >   		return -EINVAL;
> >   	}
> > -	if (!vq->use_dma_api) {
> > -		END_USE(vq);
> > -		return -EINVAL;
> > -	}
> > -
> >   	vq->premapped = true;
> >   	vq->do_unmap = false;


  reply	other threads:[~2024-08-13 19:46 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-11  3:14 [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default Xuan Zhuo
2024-08-14  6:59 ` [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default" Michael S. Tsirkin
2024-05-11  3:14 ` [PATCH net-next v5 1/4] virtio_ring: enable premapped mode whatever use_dma_api Xuan Zhuo
2024-08-13 19:28   ` Si-Wei Liu
2024-08-13 19:46     ` Michael S. Tsirkin [this message]
2024-08-14  3:39       ` Si-Wei Liu
2024-08-14  7:00         ` Michael S. Tsirkin
2024-08-17 13:20     ` Xuan Zhuo
2024-08-20  1:06       ` Si-Wei Liu
2024-08-20  6:19         ` Xuan Zhuo
2024-05-11  3:14 ` [PATCH net-next v5 2/4] virtio_net: big mode skip the unmap check Xuan Zhuo
2024-05-11  3:14 ` [PATCH net-next v5 3/4] virtio_net: rx remove premapped failover code Xuan Zhuo
2024-05-11  3:14 ` [PATCH net-next v5 4/4] virtio_net: remove the misleading comment Xuan Zhuo
2024-05-14  0:20 ` [PATCH net-next v5 0/4] virtio_net: rx enable premapped mode by default patchwork-bot+netdevbpf
2024-08-15  7:14 ` [PATCH RFC 0/3] Revert "virtio_net: rx enable premapped mode by default" Linux regression tracking (Thorsten Leemhuis)
2024-08-15 10:22   ` Darren Kenny
2024-08-16  5:03     ` Linux regression tracking (Thorsten Leemhuis)
2024-08-15 15:23   ` Michael S. Tsirkin
2024-08-15 15:28     ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240813154458-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=darren.kenny@oracle.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=si-wei.liu@oracle.com \
    --cc=virtualization@lists.linux.dev \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.