virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* mmap_assert_write_locked warnings during for vhost_vdpa_fault
@ 2024-06-17 15:50 Dragos Tatulea
  2024-06-18  1:17 ` Jason Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:50 UTC (permalink / raw)
  To: jasowang@redhat.com, mst@redhat.com, eperezma@redhat.com
  Cc: virtualization@lists.linux-foundation.org

Hi,

After commit ba168b52bf8e "mm: use rwsem assertion macros for 
mmap_lock") was submitted, we started getting a lot of the
following warnings about a missing mmap write lock during VM boot:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
track_pfn_remap+0x12b/0x130
Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle ip6table_nat
iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack xt_MASQUERADE
nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser
libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm mlx5_ib
ib_uverbs ib_core fuse mlx5_core
CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G        W
6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:track_pfn_remap+0x12b/0x130
Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
FS:  00007f678d800700(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
Call Trace:
 <TASK>
 ? __warn+0x78/0x110
 ? track_pfn_remap+0x12b/0x130
 ? report_bug+0x16d/0x180
 ? handle_bug+0x3c/0x60
 ? exc_invalid_op+0x14/0x70
 ? asm_exc_invalid_op+0x16/0x20
 ? track_pfn_remap+0x12b/0x130
 remap_pfn_range+0x41/0xa0
 vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
 __do_fault+0x2f/0xb0
 __handle_mm_fault+0x13d3/0x2210
 handle_mm_fault+0xb0/0x260
 fixup_user_fault+0x77/0x170
 hva_to_pfn+0x2c5/0x4b0
 kvm_faultin_pfn+0xd7/0x510
 kvm_tdp_page_fault+0x111/0x190
 kvm_mmu_do_page_fault+0x105/0x230
 kvm_mmu_page_fault+0x7d/0x620
 ? vmx_deliver_interrupt+0x110/0x190
 ? __apic_accept_irq+0x16c/0x270
 ? vmx_vmexit+0x8d/0xc0
 vmx_handle_exit+0x110/0x640
 kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
 kvm_vcpu_ioctl+0x263/0x6a0
 ? futex_wake+0x81/0x180
 __x64_sys_ioctl+0x4a7/0x9d0
 ? __x64_sys_futex+0x73/0x1c0
 ? kvm_on_user_return+0x86/0x90
 do_syscall_64+0x4c/0x100
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f679186a17b
Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
 </TASK>
---[ end trace 0000000000000000 ]---

The warnings show up only when the vdpa page-per-vq option is used (doorbell
mapping to guest).

The issue seems to have existed before, but was visible only with CONFIG_LOCKDEP
enabled. I tried finding if this was introduced in more recent kernels, but
stopped after going as far back as 6.5: the issue was still visible there.

The warning is triggered for the following call chain:
vhost_vdpa_fault()
 -> remap_pfn_range()
  -> remap_pfn_range_notrack()
   -> vm_flags_set()
    -> vma_start_write()
     -> __is_vma_write_locked()
      -> mmap_assert_write_locked()


I've been trying to follow how the mm write lock is dropped in the above call
chain or not taken at all. But I couldn't make much sense of it...
Any ideas of what could have gone wrong here?

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
  2024-06-17 15:50 mmap_assert_write_locked warnings during for vhost_vdpa_fault Dragos Tatulea
@ 2024-06-18  1:17 ` Jason Wang
  2024-06-18  2:03   ` Tian, Kevin
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2024-06-18  1:17 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: mst@redhat.com, eperezma@redhat.com,
	virtualization@lists.linux-foundation.org, Peter Xu

On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Hi,
>
> After commit ba168b52bf8e "mm: use rwsem assertion macros for
> mmap_lock") was submitted, we started getting a lot of the
> following warnings about a missing mmap write lock during VM boot:
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> track_pfn_remap+0x12b/0x130
> Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle ip6table_nat
> iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack xt_MASQUERADE
> nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser
> libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm mlx5_ib
> ib_uverbs ib_core fuse mlx5_core
> CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G        W
> 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> RIP: 0010:track_pfn_remap+0x12b/0x130
> Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> FS:  00007f678d800700(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> Call Trace:
>  <TASK>
>  ? __warn+0x78/0x110
>  ? track_pfn_remap+0x12b/0x130
>  ? report_bug+0x16d/0x180
>  ? handle_bug+0x3c/0x60
>  ? exc_invalid_op+0x14/0x70
>  ? asm_exc_invalid_op+0x16/0x20
>  ? track_pfn_remap+0x12b/0x130
>  remap_pfn_range+0x41/0xa0
>  vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
>  __do_fault+0x2f/0xb0
>  __handle_mm_fault+0x13d3/0x2210
>  handle_mm_fault+0xb0/0x260
>  fixup_user_fault+0x77/0x170
>  hva_to_pfn+0x2c5/0x4b0
>  kvm_faultin_pfn+0xd7/0x510
>  kvm_tdp_page_fault+0x111/0x190
>  kvm_mmu_do_page_fault+0x105/0x230
>  kvm_mmu_page_fault+0x7d/0x620
>  ? vmx_deliver_interrupt+0x110/0x190
>  ? __apic_accept_irq+0x16c/0x270
>  ? vmx_vmexit+0x8d/0xc0
>  vmx_handle_exit+0x110/0x640
>  kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
>  kvm_vcpu_ioctl+0x263/0x6a0
>  ? futex_wake+0x81/0x180
>  __x64_sys_ioctl+0x4a7/0x9d0
>  ? __x64_sys_futex+0x73/0x1c0
>  ? kvm_on_user_return+0x86/0x90
>  do_syscall_64+0x4c/0x100
>  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> RIP: 0033:0x7f679186a17b
> Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
>  </TASK>
> ---[ end trace 0000000000000000 ]---
>
> The warnings show up only when the vdpa page-per-vq option is used (doorbell
> mapping to guest).
>
> The issue seems to have existed before, but was visible only with CONFIG_LOCKDEP
> enabled. I tried finding if this was introduced in more recent kernels, but
> stopped after going as far back as 6.5: the issue was still visible there.
>
> The warning is triggered for the following call chain:
> vhost_vdpa_fault()
>  -> remap_pfn_range()
>   -> remap_pfn_range_notrack()
>    -> vm_flags_set()
>     -> vma_start_write()
>      -> __is_vma_write_locked()
>       -> mmap_assert_write_locked()
>
>
> I've been trying to follow how the mm write lock is dropped in the above call
> chain or not taken at all. But I couldn't make much sense of it...

I've also had a glance at vfio_pci_mmap_fault, it seems to do something similar.

> Any ideas of what could have gone wrong here?

Adding Peter for more thought here.

Thanks

>
> Thanks,
> Dragos


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: mmap_assert_write_locked warnings during for vhost_vdpa_fault
  2024-06-18  1:17 ` Jason Wang
@ 2024-06-18  2:03   ` Tian, Kevin
  2024-06-18  2:39     ` Jason Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Tian, Kevin @ 2024-06-18  2:03 UTC (permalink / raw)
  To: Jason Wang, Dragos Tatulea
  Cc: mst@redhat.com, eperezma@redhat.com,
	virtualization@lists.linux-foundation.org, Peter Xu

> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, June 18, 2024 9:18 AM
> 
> On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> wrote:
> >
> > Hi,
> >
> > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > mmap_lock") was submitted, we started getting a lot of the
> > following warnings about a missing mmap write lock during VM boot:
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > track_pfn_remap+0x12b/0x130
> > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> ip6table_nat
> > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> xt_MASQUERADE
> > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> ib_iser
> > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> mlx5_ib
> > ib_uverbs ib_core fuse mlx5_core
> > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G        W
> > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > RIP: 0010:track_pfn_remap+0x12b/0x130
> > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > FS:  00007f678d800700(0000) GS:ffff88852c880000(0000)
> knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > Call Trace:
> >  <TASK>
> >  ? __warn+0x78/0x110
> >  ? track_pfn_remap+0x12b/0x130
> >  ? report_bug+0x16d/0x180
> >  ? handle_bug+0x3c/0x60
> >  ? exc_invalid_op+0x14/0x70
> >  ? asm_exc_invalid_op+0x16/0x20
> >  ? track_pfn_remap+0x12b/0x130
> >  remap_pfn_range+0x41/0xa0
> >  vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> >  __do_fault+0x2f/0xb0
> >  __handle_mm_fault+0x13d3/0x2210
> >  handle_mm_fault+0xb0/0x260
> >  fixup_user_fault+0x77/0x170
> >  hva_to_pfn+0x2c5/0x4b0
> >  kvm_faultin_pfn+0xd7/0x510
> >  kvm_tdp_page_fault+0x111/0x190
> >  kvm_mmu_do_page_fault+0x105/0x230
> >  kvm_mmu_page_fault+0x7d/0x620
> >  ? vmx_deliver_interrupt+0x110/0x190
> >  ? __apic_accept_irq+0x16c/0x270
> >  ? vmx_vmexit+0x8d/0xc0
> >  vmx_handle_exit+0x110/0x640
> >  kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> >  kvm_vcpu_ioctl+0x263/0x6a0
> >  ? futex_wake+0x81/0x180
> >  __x64_sys_ioctl+0x4a7/0x9d0
> >  ? __x64_sys_futex+0x73/0x1c0
> >  ? kvm_on_user_return+0x86/0x90
> >  do_syscall_64+0x4c/0x100
> >  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > RIP: 0033:0x7f679186a17b
> > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> >  </TASK>
> > ---[ end trace 0000000000000000 ]---
> >
> > The warnings show up only when the vdpa page-per-vq option is used
> (doorbell
> > mapping to guest).
> >
> > The issue seems to have existed before, but was visible only with
> CONFIG_LOCKDEP
> > enabled. I tried finding if this was introduced in more recent kernels, but
> > stopped after going as far back as 6.5: the issue was still visible there.
> >
> > The warning is triggered for the following call chain:
> > vhost_vdpa_fault()
> >  -> remap_pfn_range()
> >   -> remap_pfn_range_notrack()
> >    -> vm_flags_set()
> >     -> vma_start_write()
> >      -> __is_vma_write_locked()
> >       -> mmap_assert_write_locked()
> >
> >
> > I've been trying to follow how the mm write lock is dropped in the above
> call
> > chain or not taken at all. But I couldn't make much sense of it...
> 
> I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> similar.
> 
> > Any ideas of what could have gone wrong here?
> 
> Adding Peter for more thought here.
> 

vfio-side fix was just queued for rc4:

https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
  2024-06-18  2:03   ` Tian, Kevin
@ 2024-06-18  2:39     ` Jason Wang
  2024-06-19  9:14       ` Dragos Tatulea
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2024-06-18  2:39 UTC (permalink / raw)
  To: Tian, Kevin, Dragos Tatulea
  Cc: mst@redhat.com, eperezma@redhat.com,
	virtualization@lists.linux-foundation.org, Peter Xu

On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, June 18, 2024 9:18 AM
> >
> > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > mmap_lock") was submitted, we started getting a lot of the
> > > following warnings about a missing mmap write lock during VM boot:
> > >
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > track_pfn_remap+0x12b/0x130
> > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > ip6table_nat
> > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > xt_MASQUERADE
> > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > ib_iser
> > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > mlx5_ib
> > > ib_uverbs ib_core fuse mlx5_core
> > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G        W
> > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > FS:  00007f678d800700(0000) GS:ffff88852c880000(0000)
> > knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > Call Trace:
> > >  <TASK>
> > >  ? __warn+0x78/0x110
> > >  ? track_pfn_remap+0x12b/0x130
> > >  ? report_bug+0x16d/0x180
> > >  ? handle_bug+0x3c/0x60
> > >  ? exc_invalid_op+0x14/0x70
> > >  ? asm_exc_invalid_op+0x16/0x20
> > >  ? track_pfn_remap+0x12b/0x130
> > >  remap_pfn_range+0x41/0xa0
> > >  vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > >  __do_fault+0x2f/0xb0
> > >  __handle_mm_fault+0x13d3/0x2210
> > >  handle_mm_fault+0xb0/0x260
> > >  fixup_user_fault+0x77/0x170
> > >  hva_to_pfn+0x2c5/0x4b0
> > >  kvm_faultin_pfn+0xd7/0x510
> > >  kvm_tdp_page_fault+0x111/0x190
> > >  kvm_mmu_do_page_fault+0x105/0x230
> > >  kvm_mmu_page_fault+0x7d/0x620
> > >  ? vmx_deliver_interrupt+0x110/0x190
> > >  ? __apic_accept_irq+0x16c/0x270
> > >  ? vmx_vmexit+0x8d/0xc0
> > >  vmx_handle_exit+0x110/0x640
> > >  kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > >  kvm_vcpu_ioctl+0x263/0x6a0
> > >  ? futex_wake+0x81/0x180
> > >  __x64_sys_ioctl+0x4a7/0x9d0
> > >  ? __x64_sys_futex+0x73/0x1c0
> > >  ? kvm_on_user_return+0x86/0x90
> > >  do_syscall_64+0x4c/0x100
> > >  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > RIP: 0033:0x7f679186a17b
> > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000010
> > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > >  </TASK>
> > > ---[ end trace 0000000000000000 ]---
> > >
> > > The warnings show up only when the vdpa page-per-vq option is used
> > (doorbell
> > > mapping to guest).
> > >
> > > The issue seems to have existed before, but was visible only with
> > CONFIG_LOCKDEP
> > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > stopped after going as far back as 6.5: the issue was still visible there.
> > >
> > > The warning is triggered for the following call chain:
> > > vhost_vdpa_fault()
> > >  -> remap_pfn_range()
> > >   -> remap_pfn_range_notrack()
> > >    -> vm_flags_set()
> > >     -> vma_start_write()
> > >      -> __is_vma_write_locked()
> > >       -> mmap_assert_write_locked()
> > >
> > >
> > > I've been trying to follow how the mm write lock is dropped in the above
> > call
> > > chain or not taken at all. But I couldn't make much sense of it...
> >
> > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > similar.
> >
> > > Any ideas of what could have gone wrong here?
> >
> > Adding Peter for more thought here.
> >
>
> vfio-side fix was just queued for rc4:
>
> https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/

Great, thanks for the pointer.

Dragos, do you want to propose a similar fix for vDPA?

Thanks


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
  2024-06-18  2:39     ` Jason Wang
@ 2024-06-19  9:14       ` Dragos Tatulea
  2024-06-19  9:51         ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Dragos Tatulea @ 2024-06-19  9:14 UTC (permalink / raw)
  To: kevin.tian@intel.com, jasowang@redhat.com
  Cc: virtualization@lists.linux-foundation.org, mst@redhat.com,
	eperezma@redhat.com, peterx@redhat.com

On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > 
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > 
> > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > mmap_lock") was submitted, we started getting a lot of the
> > > > following warnings about a missing mmap write lock during VM boot:
> > > > 
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > track_pfn_remap+0x12b/0x130
> > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > ip6table_nat
> > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > xt_MASQUERADE
> > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > ib_iser
> > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > mlx5_ib
> > > > ib_uverbs ib_core fuse mlx5_core
> > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G        W
> > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > FS:  00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > Call Trace:
> > > >  <TASK>
> > > >  ? __warn+0x78/0x110
> > > >  ? track_pfn_remap+0x12b/0x130
> > > >  ? report_bug+0x16d/0x180
> > > >  ? handle_bug+0x3c/0x60
> > > >  ? exc_invalid_op+0x14/0x70
> > > >  ? asm_exc_invalid_op+0x16/0x20
> > > >  ? track_pfn_remap+0x12b/0x130
> > > >  remap_pfn_range+0x41/0xa0
> > > >  vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > >  __do_fault+0x2f/0xb0
> > > >  __handle_mm_fault+0x13d3/0x2210
> > > >  handle_mm_fault+0xb0/0x260
> > > >  fixup_user_fault+0x77/0x170
> > > >  hva_to_pfn+0x2c5/0x4b0
> > > >  kvm_faultin_pfn+0xd7/0x510
> > > >  kvm_tdp_page_fault+0x111/0x190
> > > >  kvm_mmu_do_page_fault+0x105/0x230
> > > >  kvm_mmu_page_fault+0x7d/0x620
> > > >  ? vmx_deliver_interrupt+0x110/0x190
> > > >  ? __apic_accept_irq+0x16c/0x270
> > > >  ? vmx_vmexit+0x8d/0xc0
> > > >  vmx_handle_exit+0x110/0x640
> > > >  kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > >  kvm_vcpu_ioctl+0x263/0x6a0
> > > >  ? futex_wake+0x81/0x180
> > > >  __x64_sys_ioctl+0x4a7/0x9d0
> > > >  ? __x64_sys_futex+0x73/0x1c0
> > > >  ? kvm_on_user_return+0x86/0x90
> > > >  do_syscall_64+0x4c/0x100
> > > >  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > RIP: 0033:0x7f679186a17b
> > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > 0000000000000010
> > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > >  </TASK>
> > > > ---[ end trace 0000000000000000 ]---
> > > > 
> > > > The warnings show up only when the vdpa page-per-vq option is used
> > > (doorbell
> > > > mapping to guest).
> > > > 
> > > > The issue seems to have existed before, but was visible only with
> > > CONFIG_LOCKDEP
> > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > 
> > > > The warning is triggered for the following call chain:
> > > > vhost_vdpa_fault()
> > > >  -> remap_pfn_range()
> > > >   -> remap_pfn_range_notrack()
> > > >    -> vm_flags_set()
> > > >     -> vma_start_write()
> > > >      -> __is_vma_write_locked()
> > > >       -> mmap_assert_write_locked()
> > > > 
> > > > 
> > > > I've been trying to follow how the mm write lock is dropped in the above
> > > call
> > > > chain or not taken at all. But I couldn't make much sense of it...
> > > 
> > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > similar.
> > > 
> > > > Any ideas of what could have gone wrong here?
> > > 
> > > Adding Peter for more thought here.
> > > 
> > 
> > vfio-side fix was just queued for rc4:
> > 
> > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> 
> Great, thanks for the pointer.
> 
Yes, thanks!

> Dragos, do you want to propose a similar fix for vDPA?
> 
Had a first look: the fixes look a bit daunting. I will to "port" them, not
promising anything though.

Thanks,
Dragos

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
  2024-06-19  9:14       ` Dragos Tatulea
@ 2024-06-19  9:51         ` Michael S. Tsirkin
  2024-06-20  4:07           ` Jason Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2024-06-19  9:51 UTC (permalink / raw)
  To: Dragos Tatulea
  Cc: kevin.tian@intel.com, jasowang@redhat.com,
	virtualization@lists.linux-foundation.org, eperezma@redhat.com,
	peterx@redhat.com

On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > 
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > 
> > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > wrote:
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > 
> > > > > ------------[ cut here ]------------
> > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > track_pfn_remap+0x12b/0x130
> > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > ip6table_nat
> > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > xt_MASQUERADE
> > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > ib_iser
> > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > mlx5_ib
> > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G        W
> > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > FS:  00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > knlGS:0000000000000000
> > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > Call Trace:
> > > > >  <TASK>
> > > > >  ? __warn+0x78/0x110
> > > > >  ? track_pfn_remap+0x12b/0x130
> > > > >  ? report_bug+0x16d/0x180
> > > > >  ? handle_bug+0x3c/0x60
> > > > >  ? exc_invalid_op+0x14/0x70
> > > > >  ? asm_exc_invalid_op+0x16/0x20
> > > > >  ? track_pfn_remap+0x12b/0x130
> > > > >  remap_pfn_range+0x41/0xa0
> > > > >  vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > >  __do_fault+0x2f/0xb0
> > > > >  __handle_mm_fault+0x13d3/0x2210
> > > > >  handle_mm_fault+0xb0/0x260
> > > > >  fixup_user_fault+0x77/0x170
> > > > >  hva_to_pfn+0x2c5/0x4b0
> > > > >  kvm_faultin_pfn+0xd7/0x510
> > > > >  kvm_tdp_page_fault+0x111/0x190
> > > > >  kvm_mmu_do_page_fault+0x105/0x230
> > > > >  kvm_mmu_page_fault+0x7d/0x620
> > > > >  ? vmx_deliver_interrupt+0x110/0x190
> > > > >  ? __apic_accept_irq+0x16c/0x270
> > > > >  ? vmx_vmexit+0x8d/0xc0
> > > > >  vmx_handle_exit+0x110/0x640
> > > > >  kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > >  kvm_vcpu_ioctl+0x263/0x6a0
> > > > >  ? futex_wake+0x81/0x180
> > > > >  __x64_sys_ioctl+0x4a7/0x9d0
> > > > >  ? __x64_sys_futex+0x73/0x1c0
> > > > >  ? kvm_on_user_return+0x86/0x90
> > > > >  do_syscall_64+0x4c/0x100
> > > > >  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > RIP: 0033:0x7f679186a17b
> > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > 0000000000000010
> > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > >  </TASK>
> > > > > ---[ end trace 0000000000000000 ]---
> > > > > 
> > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > (doorbell
> > > > > mapping to guest).
> > > > > 
> > > > > The issue seems to have existed before, but was visible only with
> > > > CONFIG_LOCKDEP
> > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > 
> > > > > The warning is triggered for the following call chain:
> > > > > vhost_vdpa_fault()
> > > > >  -> remap_pfn_range()
> > > > >   -> remap_pfn_range_notrack()
> > > > >    -> vm_flags_set()
> > > > >     -> vma_start_write()
> > > > >      -> __is_vma_write_locked()
> > > > >       -> mmap_assert_write_locked()
> > > > > 
> > > > > 
> > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > call
> > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > 
> > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > similar.
> > > > 
> > > > > Any ideas of what could have gone wrong here?
> > > > 
> > > > Adding Peter for more thought here.
> > > > 
> > > 
> > > vfio-side fix was just queued for rc4:
> > > 
> > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > 
> > Great, thanks for the pointer.
> > 
> Yes, thanks!
> 
> > Dragos, do you want to propose a similar fix for vDPA?
> > 
> Had a first look: the fixes look a bit daunting. I will to "port" them, not
> promising anything though.
> 
> Thanks,
> Dragos

Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
seems a bit much to ask from a random reporter, this race
likely can bite anyone.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
  2024-06-19  9:51         ` Michael S. Tsirkin
@ 2024-06-20  4:07           ` Jason Wang
  2024-06-20  5:44             ` Michael S. Tsirkin
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2024-06-20  4:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Dragos Tatulea, kevin.tian@intel.com,
	virtualization@lists.linux-foundation.org, eperezma@redhat.com,
	peterx@redhat.com

[-- Attachment #1: Type: text/plain, Size: 7149 bytes --]

On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > >
> > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > >
> > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > >
> > > > > > ------------[ cut here ]------------
> > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > track_pfn_remap+0x12b/0x130
> > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > ip6table_nat
> > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > xt_MASQUERADE
> > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > ib_iser
> > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > mlx5_ib
> > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G        W
> > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > FS:  00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > knlGS:0000000000000000
> > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > Call Trace:
> > > > > >  <TASK>
> > > > > >  ? __warn+0x78/0x110
> > > > > >  ? track_pfn_remap+0x12b/0x130
> > > > > >  ? report_bug+0x16d/0x180
> > > > > >  ? handle_bug+0x3c/0x60
> > > > > >  ? exc_invalid_op+0x14/0x70
> > > > > >  ? asm_exc_invalid_op+0x16/0x20
> > > > > >  ? track_pfn_remap+0x12b/0x130
> > > > > >  remap_pfn_range+0x41/0xa0
> > > > > >  vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > >  __do_fault+0x2f/0xb0
> > > > > >  __handle_mm_fault+0x13d3/0x2210
> > > > > >  handle_mm_fault+0xb0/0x260
> > > > > >  fixup_user_fault+0x77/0x170
> > > > > >  hva_to_pfn+0x2c5/0x4b0
> > > > > >  kvm_faultin_pfn+0xd7/0x510
> > > > > >  kvm_tdp_page_fault+0x111/0x190
> > > > > >  kvm_mmu_do_page_fault+0x105/0x230
> > > > > >  kvm_mmu_page_fault+0x7d/0x620
> > > > > >  ? vmx_deliver_interrupt+0x110/0x190
> > > > > >  ? __apic_accept_irq+0x16c/0x270
> > > > > >  ? vmx_vmexit+0x8d/0xc0
> > > > > >  vmx_handle_exit+0x110/0x640
> > > > > >  kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > >  kvm_vcpu_ioctl+0x263/0x6a0
> > > > > >  ? futex_wake+0x81/0x180
> > > > > >  __x64_sys_ioctl+0x4a7/0x9d0
> > > > > >  ? __x64_sys_futex+0x73/0x1c0
> > > > > >  ? kvm_on_user_return+0x86/0x90
> > > > > >  do_syscall_64+0x4c/0x100
> > > > > >  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > RIP: 0033:0x7f679186a17b
> > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > 0000000000000010
> > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > >  </TASK>
> > > > > > ---[ end trace 0000000000000000 ]---
> > > > > >
> > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > (doorbell
> > > > > > mapping to guest).
> > > > > >
> > > > > > The issue seems to have existed before, but was visible only with
> > > > > CONFIG_LOCKDEP
> > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > >
> > > > > > The warning is triggered for the following call chain:
> > > > > > vhost_vdpa_fault()
> > > > > >  -> remap_pfn_range()
> > > > > >   -> remap_pfn_range_notrack()
> > > > > >    -> vm_flags_set()
> > > > > >     -> vma_start_write()
> > > > > >      -> __is_vma_write_locked()
> > > > > >       -> mmap_assert_write_locked()
> > > > > >
> > > > > >
> > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > call
> > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > >
> > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > similar.
> > > > >
> > > > > > Any ideas of what could have gone wrong here?
> > > > >
> > > > > Adding Peter for more thought here.
> > > > >
> > > >
> > > > vfio-side fix was just queued for rc4:
> > > >
> > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > >
> > > Great, thanks for the pointer.
> > >
> > Yes, thanks!
> >
> > > Dragos, do you want to propose a similar fix for vDPA?
> > >
> > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > promising anything though.
> >
> > Thanks,
> > Dragos
>
> Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> seems a bit much to ask from a random reporter,

Probably, just asking since Dragos has done some investigation.

> this race
> likely can bite anyone.
>

Dragos, I've drafted a patch, please try to see if it works (I had
tested it with LOCKDEP via vp_vdpa in L2).

Thanks

[-- Attachment #2: 0001-vhost-vdpa-switch-to-use-vmf_insert_pfn-in-the-fault.patch --]
[-- Type: application/octet-stream, Size: 1500 bytes --]

From a94a70372b702246436cb33ecbaa07d5c6127ce7 Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Wed, 19 Jun 2024 21:25:32 -0400
Subject: [PATCH] vhost-vdpa: switch to use vmf_insert_pfn() in the fault
 handler

remap_pfn_page() should not be called in the fault handler as it may
change the vma->flags which may trigger lockdep warning since the vma
write lock is not held. Actually there's no need to modify the
vma->flags as it has been set in the mmap(). So this patch switches to
use vmf_insert_pfn() instead.

Reported-by: Dragos Tatulea <dtatulea@nvidia.com>
Fixes: ddd89d0a059d ("vhost_vdpa: support doorbell mapping via mmap")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vdpa.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 63a53680a85c..6b9c12acf438 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -1483,13 +1483,7 @@ static vm_fault_t vhost_vdpa_fault(struct vm_fault *vmf)
 
 	notify = ops->get_vq_notification(vdpa, index);
 
-	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
-	if (remap_pfn_range(vma, vmf->address & PAGE_MASK,
-			    PFN_DOWN(notify.addr), PAGE_SIZE,
-			    vma->vm_page_prot))
-		return VM_FAULT_SIGBUS;
-
-	return VM_FAULT_NOPAGE;
+	return vmf_insert_pfn(vma, vmf->address & PAGE_MASK, PFN_DOWN(notify.addr));
 }
 
 static const struct vm_operations_struct vhost_vdpa_vm_ops = {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
  2024-06-20  4:07           ` Jason Wang
@ 2024-06-20  5:44             ` Michael S. Tsirkin
  2024-06-20  8:23               ` Jason Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2024-06-20  5:44 UTC (permalink / raw)
  To: Jason Wang
  Cc: Dragos Tatulea, kevin.tian@intel.com,
	virtualization@lists.linux-foundation.org, eperezma@redhat.com,
	peterx@redhat.com

On Thu, Jun 20, 2024 at 12:07:14PM +0800, Jason Wang wrote:
> On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > > >
> > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > > >
> > > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > > >
> > > > > > > ------------[ cut here ]------------
> > > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > > track_pfn_remap+0x12b/0x130
> > > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > > ip6table_nat
> > > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > > xt_MASQUERADE
> > > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > > ib_iser
> > > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > > mlx5_ib
> > > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G        W
> > > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > > FS:  00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > > knlGS:0000000000000000
> > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > > Call Trace:
> > > > > > >  <TASK>
> > > > > > >  ? __warn+0x78/0x110
> > > > > > >  ? track_pfn_remap+0x12b/0x130
> > > > > > >  ? report_bug+0x16d/0x180
> > > > > > >  ? handle_bug+0x3c/0x60
> > > > > > >  ? exc_invalid_op+0x14/0x70
> > > > > > >  ? asm_exc_invalid_op+0x16/0x20
> > > > > > >  ? track_pfn_remap+0x12b/0x130
> > > > > > >  remap_pfn_range+0x41/0xa0
> > > > > > >  vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > > >  __do_fault+0x2f/0xb0
> > > > > > >  __handle_mm_fault+0x13d3/0x2210
> > > > > > >  handle_mm_fault+0xb0/0x260
> > > > > > >  fixup_user_fault+0x77/0x170
> > > > > > >  hva_to_pfn+0x2c5/0x4b0
> > > > > > >  kvm_faultin_pfn+0xd7/0x510
> > > > > > >  kvm_tdp_page_fault+0x111/0x190
> > > > > > >  kvm_mmu_do_page_fault+0x105/0x230
> > > > > > >  kvm_mmu_page_fault+0x7d/0x620
> > > > > > >  ? vmx_deliver_interrupt+0x110/0x190
> > > > > > >  ? __apic_accept_irq+0x16c/0x270
> > > > > > >  ? vmx_vmexit+0x8d/0xc0
> > > > > > >  vmx_handle_exit+0x110/0x640
> > > > > > >  kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > > >  kvm_vcpu_ioctl+0x263/0x6a0
> > > > > > >  ? futex_wake+0x81/0x180
> > > > > > >  __x64_sys_ioctl+0x4a7/0x9d0
> > > > > > >  ? __x64_sys_futex+0x73/0x1c0
> > > > > > >  ? kvm_on_user_return+0x86/0x90
> > > > > > >  do_syscall_64+0x4c/0x100
> > > > > > >  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > > RIP: 0033:0x7f679186a17b
> > > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > > 0000000000000010
> > > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > > >  </TASK>
> > > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > >
> > > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > > (doorbell
> > > > > > > mapping to guest).
> > > > > > >
> > > > > > > The issue seems to have existed before, but was visible only with
> > > > > > CONFIG_LOCKDEP
> > > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > > >
> > > > > > > The warning is triggered for the following call chain:
> > > > > > > vhost_vdpa_fault()
> > > > > > >  -> remap_pfn_range()
> > > > > > >   -> remap_pfn_range_notrack()
> > > > > > >    -> vm_flags_set()
> > > > > > >     -> vma_start_write()
> > > > > > >      -> __is_vma_write_locked()
> > > > > > >       -> mmap_assert_write_locked()
> > > > > > >
> > > > > > >
> > > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > > call
> > > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > > >
> > > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > > similar.
> > > > > >
> > > > > > > Any ideas of what could have gone wrong here?
> > > > > >
> > > > > > Adding Peter for more thought here.
> > > > > >
> > > > >
> > > > > vfio-side fix was just queued for rc4:
> > > > >
> > > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > > >
> > > > Great, thanks for the pointer.
> > > >
> > > Yes, thanks!
> > >
> > > > Dragos, do you want to propose a similar fix for vDPA?
> > > >
> > > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > > promising anything though.
> > >
> > > Thanks,
> > > Dragos
> >
> > Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> > seems a bit much to ask from a random reporter,
> 
> Probably, just asking since Dragos has done some investigation.
> 
> > this race
> > likely can bite anyone.
> >
> 
> Dragos, I've drafted a patch, please try to see if it works (I had
> tested it with LOCKDEP via vp_vdpa in L2).
> 
> Thanks

What is going on here that you decided to do an attachment as
opposed to inlining normally?

-- 
MST


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
  2024-06-20  5:44             ` Michael S. Tsirkin
@ 2024-06-20  8:23               ` Jason Wang
  2024-06-20  9:05                 ` Michael S. Tsirkin
  2024-07-03 16:23                 ` Michael S. Tsirkin
  0 siblings, 2 replies; 13+ messages in thread
From: Jason Wang @ 2024-06-20  8:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Dragos Tatulea, kevin.tian@intel.com,
	virtualization@lists.linux-foundation.org, eperezma@redhat.com,
	peterx@redhat.com

On Thu, Jun 20, 2024 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jun 20, 2024 at 12:07:14PM +0800, Jason Wang wrote:
> > On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > > > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > > > >
> > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > > > >
> > > > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > > > >
> > > > > > > > ------------[ cut here ]------------
> > > > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > > > track_pfn_remap+0x12b/0x130
> > > > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > > > ip6table_nat
> > > > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > > > xt_MASQUERADE
> > > > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > > > ib_iser
> > > > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > > > mlx5_ib
> > > > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G        W
> > > > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > > > FS:  00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > > > knlGS:0000000000000000
> > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > > > Call Trace:
> > > > > > > >  <TASK>
> > > > > > > >  ? __warn+0x78/0x110
> > > > > > > >  ? track_pfn_remap+0x12b/0x130
> > > > > > > >  ? report_bug+0x16d/0x180
> > > > > > > >  ? handle_bug+0x3c/0x60
> > > > > > > >  ? exc_invalid_op+0x14/0x70
> > > > > > > >  ? asm_exc_invalid_op+0x16/0x20
> > > > > > > >  ? track_pfn_remap+0x12b/0x130
> > > > > > > >  remap_pfn_range+0x41/0xa0
> > > > > > > >  vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > > > >  __do_fault+0x2f/0xb0
> > > > > > > >  __handle_mm_fault+0x13d3/0x2210
> > > > > > > >  handle_mm_fault+0xb0/0x260
> > > > > > > >  fixup_user_fault+0x77/0x170
> > > > > > > >  hva_to_pfn+0x2c5/0x4b0
> > > > > > > >  kvm_faultin_pfn+0xd7/0x510
> > > > > > > >  kvm_tdp_page_fault+0x111/0x190
> > > > > > > >  kvm_mmu_do_page_fault+0x105/0x230
> > > > > > > >  kvm_mmu_page_fault+0x7d/0x620
> > > > > > > >  ? vmx_deliver_interrupt+0x110/0x190
> > > > > > > >  ? __apic_accept_irq+0x16c/0x270
> > > > > > > >  ? vmx_vmexit+0x8d/0xc0
> > > > > > > >  vmx_handle_exit+0x110/0x640
> > > > > > > >  kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > > > >  kvm_vcpu_ioctl+0x263/0x6a0
> > > > > > > >  ? futex_wake+0x81/0x180
> > > > > > > >  __x64_sys_ioctl+0x4a7/0x9d0
> > > > > > > >  ? __x64_sys_futex+0x73/0x1c0
> > > > > > > >  ? kvm_on_user_return+0x86/0x90
> > > > > > > >  do_syscall_64+0x4c/0x100
> > > > > > > >  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > > > RIP: 0033:0x7f679186a17b
> > > > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > > > 0000000000000010
> > > > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > > > >  </TASK>
> > > > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > > >
> > > > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > > > (doorbell
> > > > > > > > mapping to guest).
> > > > > > > >
> > > > > > > > The issue seems to have existed before, but was visible only with
> > > > > > > CONFIG_LOCKDEP
> > > > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > > > >
> > > > > > > > The warning is triggered for the following call chain:
> > > > > > > > vhost_vdpa_fault()
> > > > > > > >  -> remap_pfn_range()
> > > > > > > >   -> remap_pfn_range_notrack()
> > > > > > > >    -> vm_flags_set()
> > > > > > > >     -> vma_start_write()
> > > > > > > >      -> __is_vma_write_locked()
> > > > > > > >       -> mmap_assert_write_locked()
> > > > > > > >
> > > > > > > >
> > > > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > > > call
> > > > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > > > >
> > > > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > > > similar.
> > > > > > >
> > > > > > > > Any ideas of what could have gone wrong here?
> > > > > > >
> > > > > > > Adding Peter for more thought here.
> > > > > > >
> > > > > >
> > > > > > vfio-side fix was just queued for rc4:
> > > > > >
> > > > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > > > >
> > > > > Great, thanks for the pointer.
> > > > >
> > > > Yes, thanks!
> > > >
> > > > > Dragos, do you want to propose a similar fix for vDPA?
> > > > >
> > > > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > > > promising anything though.
> > > >
> > > > Thanks,
> > > > Dragos
> > >
> > > Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> > > seems a bit much to ask from a random reporter,
> >
> > Probably, just asking since Dragos has done some investigation.
> >
> > > this race
> > > likely can bite anyone.
> > >
> >
> > Dragos, I've drafted a patch, please try to see if it works (I had
> > tested it with LOCKDEP via vp_vdpa in L2).
> >
> > Thanks
>
> What is going on here that you decided to do an attachment as
> opposed to inlining normally?

Actually, I plan to send a formal patch separately but stop at the
last seconds since it is just tested by L2 + vp_vdpa in L1.

If inline really matters, I will do that next time.

Thanks

>
> --
> MST
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
  2024-06-20  8:23               ` Jason Wang
@ 2024-06-20  9:05                 ` Michael S. Tsirkin
  2024-06-26 10:54                   ` Dragos Tatulea
  2024-07-03 16:23                 ` Michael S. Tsirkin
  1 sibling, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2024-06-20  9:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: Dragos Tatulea, kevin.tian@intel.com,
	virtualization@lists.linux-foundation.org, eperezma@redhat.com,
	peterx@redhat.com

On Thu, Jun 20, 2024 at 04:23:30PM +0800, Jason Wang wrote:
> On Thu, Jun 20, 2024 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jun 20, 2024 at 12:07:14PM +0800, Jason Wang wrote:
> > > On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > > > > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > > > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > > > > >
> > > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > > > > >
> > > > > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > > > > >
> > > > > > > > > ------------[ cut here ]------------
> > > > > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > > > > track_pfn_remap+0x12b/0x130
> > > > > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > > > > ip6table_nat
> > > > > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > > > > xt_MASQUERADE
> > > > > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > > > > ib_iser
> > > > > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > > > > mlx5_ib
> > > > > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G        W
> > > > > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > > > > FS:  00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > > > > knlGS:0000000000000000
> > > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > > > > Call Trace:
> > > > > > > > >  <TASK>
> > > > > > > > >  ? __warn+0x78/0x110
> > > > > > > > >  ? track_pfn_remap+0x12b/0x130
> > > > > > > > >  ? report_bug+0x16d/0x180
> > > > > > > > >  ? handle_bug+0x3c/0x60
> > > > > > > > >  ? exc_invalid_op+0x14/0x70
> > > > > > > > >  ? asm_exc_invalid_op+0x16/0x20
> > > > > > > > >  ? track_pfn_remap+0x12b/0x130
> > > > > > > > >  remap_pfn_range+0x41/0xa0
> > > > > > > > >  vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > > > > >  __do_fault+0x2f/0xb0
> > > > > > > > >  __handle_mm_fault+0x13d3/0x2210
> > > > > > > > >  handle_mm_fault+0xb0/0x260
> > > > > > > > >  fixup_user_fault+0x77/0x170
> > > > > > > > >  hva_to_pfn+0x2c5/0x4b0
> > > > > > > > >  kvm_faultin_pfn+0xd7/0x510
> > > > > > > > >  kvm_tdp_page_fault+0x111/0x190
> > > > > > > > >  kvm_mmu_do_page_fault+0x105/0x230
> > > > > > > > >  kvm_mmu_page_fault+0x7d/0x620
> > > > > > > > >  ? vmx_deliver_interrupt+0x110/0x190
> > > > > > > > >  ? __apic_accept_irq+0x16c/0x270
> > > > > > > > >  ? vmx_vmexit+0x8d/0xc0
> > > > > > > > >  vmx_handle_exit+0x110/0x640
> > > > > > > > >  kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > > > > >  kvm_vcpu_ioctl+0x263/0x6a0
> > > > > > > > >  ? futex_wake+0x81/0x180
> > > > > > > > >  __x64_sys_ioctl+0x4a7/0x9d0
> > > > > > > > >  ? __x64_sys_futex+0x73/0x1c0
> > > > > > > > >  ? kvm_on_user_return+0x86/0x90
> > > > > > > > >  do_syscall_64+0x4c/0x100
> > > > > > > > >  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > > > > RIP: 0033:0x7f679186a17b
> > > > > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > > > > 0000000000000010
> > > > > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > > > > >  </TASK>
> > > > > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > > > >
> > > > > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > > > > (doorbell
> > > > > > > > > mapping to guest).
> > > > > > > > >
> > > > > > > > > The issue seems to have existed before, but was visible only with
> > > > > > > > CONFIG_LOCKDEP
> > > > > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > > > > >
> > > > > > > > > The warning is triggered for the following call chain:
> > > > > > > > > vhost_vdpa_fault()
> > > > > > > > >  -> remap_pfn_range()
> > > > > > > > >   -> remap_pfn_range_notrack()
> > > > > > > > >    -> vm_flags_set()
> > > > > > > > >     -> vma_start_write()
> > > > > > > > >      -> __is_vma_write_locked()
> > > > > > > > >       -> mmap_assert_write_locked()
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > > > > call
> > > > > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > > > > >
> > > > > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > > > > similar.
> > > > > > > >
> > > > > > > > > Any ideas of what could have gone wrong here?
> > > > > > > >
> > > > > > > > Adding Peter for more thought here.
> > > > > > > >
> > > > > > >
> > > > > > > vfio-side fix was just queued for rc4:
> > > > > > >
> > > > > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > > > > >
> > > > > > Great, thanks for the pointer.
> > > > > >
> > > > > Yes, thanks!
> > > > >
> > > > > > Dragos, do you want to propose a similar fix for vDPA?
> > > > > >
> > > > > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > > > > promising anything though.
> > > > >
> > > > > Thanks,
> > > > > Dragos
> > > >
> > > > Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> > > > seems a bit much to ask from a random reporter,
> > >
> > > Probably, just asking since Dragos has done some investigation.
> > >
> > > > this race
> > > > likely can bite anyone.
> > > >
> > >
> > > Dragos, I've drafted a patch, please try to see if it works (I had
> > > tested it with LOCKDEP via vp_vdpa in L2).
> > >
> > > Thanks
> >
> > What is going on here that you decided to do an attachment as
> > opposed to inlining normally?
> 
> Actually, I plan to send a formal patch separately but stop at the
> last seconds since it is just tested by L2 + vp_vdpa in L1.

tag it as RFC, explain the testing status in the mail.

> If inline really matters, I will do that next time.


yes, this way people can comment.

> Thanks
> 
> >
> > --
> > MST
> >


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
  2024-06-20  9:05                 ` Michael S. Tsirkin
@ 2024-06-26 10:54                   ` Dragos Tatulea
  0 siblings, 0 replies; 13+ messages in thread
From: Dragos Tatulea @ 2024-06-26 10:54 UTC (permalink / raw)
  To: mst@redhat.com, jasowang@redhat.com
  Cc: kevin.tian@intel.com, virtualization@lists.linux-foundation.org,
	eperezma@redhat.com, peterx@redhat.com

On Thu, 2024-06-20 at 05:05 -0400, Michael S. Tsirkin wrote:
> On Thu, Jun 20, 2024 at 04:23:30PM +0800, Jason Wang wrote:
> > On Thu, Jun 20, 2024 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > 
> > > On Thu, Jun 20, 2024 at 12:07:14PM +0800, Jason Wang wrote:
> > > > On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > 
> > > > > On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > > > > > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > > > > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > > > > > > 
> > > > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > > > > > > 
> > > > > > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > Hi,
> > > > > > > > > > 
> > > > > > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > > > > > > 
> > > > > > > > > > ------------[ cut here ]------------
> > > > > > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > > > > > track_pfn_remap+0x12b/0x130
> > > > > > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > > > > > ip6table_nat
> > > > > > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > > > > > xt_MASQUERADE
> > > > > > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > > > > > ib_iser
> > > > > > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > > > > > mlx5_ib
> > > > > > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G        W
> > > > > > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > > > > > FS:  00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > > > > > knlGS:0000000000000000
> > > > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > > > > > Call Trace:
> > > > > > > > > >  <TASK>
> > > > > > > > > >  ? __warn+0x78/0x110
> > > > > > > > > >  ? track_pfn_remap+0x12b/0x130
> > > > > > > > > >  ? report_bug+0x16d/0x180
> > > > > > > > > >  ? handle_bug+0x3c/0x60
> > > > > > > > > >  ? exc_invalid_op+0x14/0x70
> > > > > > > > > >  ? asm_exc_invalid_op+0x16/0x20
> > > > > > > > > >  ? track_pfn_remap+0x12b/0x130
> > > > > > > > > >  remap_pfn_range+0x41/0xa0
> > > > > > > > > >  vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > > > > > >  __do_fault+0x2f/0xb0
> > > > > > > > > >  __handle_mm_fault+0x13d3/0x2210
> > > > > > > > > >  handle_mm_fault+0xb0/0x260
> > > > > > > > > >  fixup_user_fault+0x77/0x170
> > > > > > > > > >  hva_to_pfn+0x2c5/0x4b0
> > > > > > > > > >  kvm_faultin_pfn+0xd7/0x510
> > > > > > > > > >  kvm_tdp_page_fault+0x111/0x190
> > > > > > > > > >  kvm_mmu_do_page_fault+0x105/0x230
> > > > > > > > > >  kvm_mmu_page_fault+0x7d/0x620
> > > > > > > > > >  ? vmx_deliver_interrupt+0x110/0x190
> > > > > > > > > >  ? __apic_accept_irq+0x16c/0x270
> > > > > > > > > >  ? vmx_vmexit+0x8d/0xc0
> > > > > > > > > >  vmx_handle_exit+0x110/0x640
> > > > > > > > > >  kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > > > > > >  kvm_vcpu_ioctl+0x263/0x6a0
> > > > > > > > > >  ? futex_wake+0x81/0x180
> > > > > > > > > >  __x64_sys_ioctl+0x4a7/0x9d0
> > > > > > > > > >  ? __x64_sys_futex+0x73/0x1c0
> > > > > > > > > >  ? kvm_on_user_return+0x86/0x90
> > > > > > > > > >  do_syscall_64+0x4c/0x100
> > > > > > > > > >  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > > > > > RIP: 0033:0x7f679186a17b
> > > > > > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > > > > > 0000000000000010
> > > > > > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > > > > > >  </TASK>
> > > > > > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > > > > > 
> > > > > > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > > > > > (doorbell
> > > > > > > > > > mapping to guest).
> > > > > > > > > > 
> > > > > > > > > > The issue seems to have existed before, but was visible only with
> > > > > > > > > CONFIG_LOCKDEP
> > > > > > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > > > > > > 
> > > > > > > > > > The warning is triggered for the following call chain:
> > > > > > > > > > vhost_vdpa_fault()
> > > > > > > > > >  -> remap_pfn_range()
> > > > > > > > > >   -> remap_pfn_range_notrack()
> > > > > > > > > >    -> vm_flags_set()
> > > > > > > > > >     -> vma_start_write()
> > > > > > > > > >      -> __is_vma_write_locked()
> > > > > > > > > >       -> mmap_assert_write_locked()
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > > > > > call
> > > > > > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > > > > > > 
> > > > > > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > > > > > similar.
> > > > > > > > > 
> > > > > > > > > > Any ideas of what could have gone wrong here?
> > > > > > > > > 
> > > > > > > > > Adding Peter for more thought here.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > vfio-side fix was just queued for rc4:
> > > > > > > > 
> > > > > > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > > > > > > 
> > > > > > > Great, thanks for the pointer.
> > > > > > > 
> > > > > > Yes, thanks!
> > > > > > 
> > > > > > > Dragos, do you want to propose a similar fix for vDPA?
> > > > > > > 
> > > > > > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > > > > > promising anything though.
> > > > > > 
> > > > > > Thanks,
> > > > > > Dragos
> > > > > 
> > > > > Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> > > > > seems a bit much to ask from a random reporter,
> > > > 
> > > > Probably, just asking since Dragos has done some investigation.
> > > > 
> > > > > this race
> > > > > likely can bite anyone.
> > > > > 
> > > > 
> > > > Dragos, I've drafted a patch, please try to see if it works (I had
> > > > tested it with LOCKDEP via vp_vdpa in L2).
> > > > 
> > > > Thanks
> > > 
> > > What is going on here that you decided to do an attachment as
> > > opposed to inlining normally?
> > 
> > Actually, I plan to send a formal patch separately but stop at the
> > last seconds since it is just tested by L2 + vp_vdpa in L1.
> 
> tag it as RFC, explain the testing status in the mail.
> 
> > If inline really matters, I will do that next time.
> 
> 
> yes, this way people can comment.
> 
The fix works. Thanks Jason! FWIW:

Tested-by: Dragos Tatulea <dtatulea@nvidia.com>

Thanks,
Dragos
> > Thanks
> > 
> > > 
> > > --
> > > MST
> > > 
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
  2024-06-20  8:23               ` Jason Wang
  2024-06-20  9:05                 ` Michael S. Tsirkin
@ 2024-07-03 16:23                 ` Michael S. Tsirkin
  2024-07-04  0:10                   ` Jason Wang
  1 sibling, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2024-07-03 16:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: Dragos Tatulea, kevin.tian@intel.com,
	virtualization@lists.linux-foundation.org, eperezma@redhat.com,
	peterx@redhat.com

On Thu, Jun 20, 2024 at 04:23:30PM +0800, Jason Wang wrote:
> On Thu, Jun 20, 2024 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jun 20, 2024 at 12:07:14PM +0800, Jason Wang wrote:
> > > On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > > > > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > > > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > > > > >
> > > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > > > > >
> > > > > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > > > > >
> > > > > > > > > ------------[ cut here ]------------
> > > > > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > > > > track_pfn_remap+0x12b/0x130
> > > > > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > > > > ip6table_nat
> > > > > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > > > > xt_MASQUERADE
> > > > > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > > > > ib_iser
> > > > > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > > > > mlx5_ib
> > > > > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G        W
> > > > > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > > > > FS:  00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > > > > knlGS:0000000000000000
> > > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > > > > Call Trace:
> > > > > > > > >  <TASK>
> > > > > > > > >  ? __warn+0x78/0x110
> > > > > > > > >  ? track_pfn_remap+0x12b/0x130
> > > > > > > > >  ? report_bug+0x16d/0x180
> > > > > > > > >  ? handle_bug+0x3c/0x60
> > > > > > > > >  ? exc_invalid_op+0x14/0x70
> > > > > > > > >  ? asm_exc_invalid_op+0x16/0x20
> > > > > > > > >  ? track_pfn_remap+0x12b/0x130
> > > > > > > > >  remap_pfn_range+0x41/0xa0
> > > > > > > > >  vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > > > > >  __do_fault+0x2f/0xb0
> > > > > > > > >  __handle_mm_fault+0x13d3/0x2210
> > > > > > > > >  handle_mm_fault+0xb0/0x260
> > > > > > > > >  fixup_user_fault+0x77/0x170
> > > > > > > > >  hva_to_pfn+0x2c5/0x4b0
> > > > > > > > >  kvm_faultin_pfn+0xd7/0x510
> > > > > > > > >  kvm_tdp_page_fault+0x111/0x190
> > > > > > > > >  kvm_mmu_do_page_fault+0x105/0x230
> > > > > > > > >  kvm_mmu_page_fault+0x7d/0x620
> > > > > > > > >  ? vmx_deliver_interrupt+0x110/0x190
> > > > > > > > >  ? __apic_accept_irq+0x16c/0x270
> > > > > > > > >  ? vmx_vmexit+0x8d/0xc0
> > > > > > > > >  vmx_handle_exit+0x110/0x640
> > > > > > > > >  kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > > > > >  kvm_vcpu_ioctl+0x263/0x6a0
> > > > > > > > >  ? futex_wake+0x81/0x180
> > > > > > > > >  __x64_sys_ioctl+0x4a7/0x9d0
> > > > > > > > >  ? __x64_sys_futex+0x73/0x1c0
> > > > > > > > >  ? kvm_on_user_return+0x86/0x90
> > > > > > > > >  do_syscall_64+0x4c/0x100
> > > > > > > > >  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > > > > RIP: 0033:0x7f679186a17b
> > > > > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > > > > 0000000000000010
> > > > > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > > > > >  </TASK>
> > > > > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > > > >
> > > > > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > > > > (doorbell
> > > > > > > > > mapping to guest).
> > > > > > > > >
> > > > > > > > > The issue seems to have existed before, but was visible only with
> > > > > > > > CONFIG_LOCKDEP
> > > > > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > > > > >
> > > > > > > > > The warning is triggered for the following call chain:
> > > > > > > > > vhost_vdpa_fault()
> > > > > > > > >  -> remap_pfn_range()
> > > > > > > > >   -> remap_pfn_range_notrack()
> > > > > > > > >    -> vm_flags_set()
> > > > > > > > >     -> vma_start_write()
> > > > > > > > >      -> __is_vma_write_locked()
> > > > > > > > >       -> mmap_assert_write_locked()
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > > > > call
> > > > > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > > > > >
> > > > > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > > > > similar.
> > > > > > > >
> > > > > > > > > Any ideas of what could have gone wrong here?
> > > > > > > >
> > > > > > > > Adding Peter for more thought here.
> > > > > > > >
> > > > > > >
> > > > > > > vfio-side fix was just queued for rc4:
> > > > > > >
> > > > > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > > > > >
> > > > > > Great, thanks for the pointer.
> > > > > >
> > > > > Yes, thanks!
> > > > >
> > > > > > Dragos, do you want to propose a similar fix for vDPA?
> > > > > >
> > > > > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > > > > promising anything though.
> > > > >
> > > > > Thanks,
> > > > > Dragos
> > > >
> > > > Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> > > > seems a bit much to ask from a random reporter,
> > >
> > > Probably, just asking since Dragos has done some investigation.
> > >
> > > > this race
> > > > likely can bite anyone.
> > > >
> > >
> > > Dragos, I've drafted a patch, please try to see if it works (I had
> > > tested it with LOCKDEP via vp_vdpa in L2).
> > >
> > > Thanks
> >
> > What is going on here that you decided to do an attachment as
> > opposed to inlining normally?
> 
> Actually, I plan to send a formal patch separately but stop at the
> last seconds since it is just tested by L2 + vp_vdpa in L1.
> 
> If inline really matters, I will do that next time.
> 
> Thanks

Jason are you going to submit a patch, now it's been tested?

> >
> > --
> > MST
> >


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
  2024-07-03 16:23                 ` Michael S. Tsirkin
@ 2024-07-04  0:10                   ` Jason Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2024-07-04  0:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Dragos Tatulea, kevin.tian@intel.com,
	virtualization@lists.linux-foundation.org, eperezma@redhat.com,
	peterx@redhat.com

On Thu, Jul 4, 2024 at 12:23 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jun 20, 2024 at 04:23:30PM +0800, Jason Wang wrote:
> > On Thu, Jun 20, 2024 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Jun 20, 2024 at 12:07:14PM +0800, Jason Wang wrote:
> > > > On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > > > > > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > > > > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > > > > > >
> > > > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > > > > > >
> > > > > > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > > > > > >
> > > > > > > > > > ------------[ cut here ]------------
> > > > > > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > > > > > track_pfn_remap+0x12b/0x130
> > > > > > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > > > > > ip6table_nat
> > > > > > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > > > > > xt_MASQUERADE
> > > > > > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > > > > > ib_iser
> > > > > > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > > > > > mlx5_ib
> > > > > > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G        W
> > > > > > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > > > > > FS:  00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > > > > > knlGS:0000000000000000
> > > > > > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > > > > > Call Trace:
> > > > > > > > > >  <TASK>
> > > > > > > > > >  ? __warn+0x78/0x110
> > > > > > > > > >  ? track_pfn_remap+0x12b/0x130
> > > > > > > > > >  ? report_bug+0x16d/0x180
> > > > > > > > > >  ? handle_bug+0x3c/0x60
> > > > > > > > > >  ? exc_invalid_op+0x14/0x70
> > > > > > > > > >  ? asm_exc_invalid_op+0x16/0x20
> > > > > > > > > >  ? track_pfn_remap+0x12b/0x130
> > > > > > > > > >  remap_pfn_range+0x41/0xa0
> > > > > > > > > >  vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > > > > > >  __do_fault+0x2f/0xb0
> > > > > > > > > >  __handle_mm_fault+0x13d3/0x2210
> > > > > > > > > >  handle_mm_fault+0xb0/0x260
> > > > > > > > > >  fixup_user_fault+0x77/0x170
> > > > > > > > > >  hva_to_pfn+0x2c5/0x4b0
> > > > > > > > > >  kvm_faultin_pfn+0xd7/0x510
> > > > > > > > > >  kvm_tdp_page_fault+0x111/0x190
> > > > > > > > > >  kvm_mmu_do_page_fault+0x105/0x230
> > > > > > > > > >  kvm_mmu_page_fault+0x7d/0x620
> > > > > > > > > >  ? vmx_deliver_interrupt+0x110/0x190
> > > > > > > > > >  ? __apic_accept_irq+0x16c/0x270
> > > > > > > > > >  ? vmx_vmexit+0x8d/0xc0
> > > > > > > > > >  vmx_handle_exit+0x110/0x640
> > > > > > > > > >  kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > > > > > >  kvm_vcpu_ioctl+0x263/0x6a0
> > > > > > > > > >  ? futex_wake+0x81/0x180
> > > > > > > > > >  __x64_sys_ioctl+0x4a7/0x9d0
> > > > > > > > > >  ? __x64_sys_futex+0x73/0x1c0
> > > > > > > > > >  ? kvm_on_user_return+0x86/0x90
> > > > > > > > > >  do_syscall_64+0x4c/0x100
> > > > > > > > > >  entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > > > > > RIP: 0033:0x7f679186a17b
> > > > > > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > > > > > 0000000000000010
> > > > > > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > > > > > >  </TASK>
> > > > > > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > > > > >
> > > > > > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > > > > > (doorbell
> > > > > > > > > > mapping to guest).
> > > > > > > > > >
> > > > > > > > > > The issue seems to have existed before, but was visible only with
> > > > > > > > > CONFIG_LOCKDEP
> > > > > > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > > > > > >
> > > > > > > > > > The warning is triggered for the following call chain:
> > > > > > > > > > vhost_vdpa_fault()
> > > > > > > > > >  -> remap_pfn_range()
> > > > > > > > > >   -> remap_pfn_range_notrack()
> > > > > > > > > >    -> vm_flags_set()
> > > > > > > > > >     -> vma_start_write()
> > > > > > > > > >      -> __is_vma_write_locked()
> > > > > > > > > >       -> mmap_assert_write_locked()
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > > > > > call
> > > > > > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > > > > > >
> > > > > > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > > > > > similar.
> > > > > > > > >
> > > > > > > > > > Any ideas of what could have gone wrong here?
> > > > > > > > >
> > > > > > > > > Adding Peter for more thought here.
> > > > > > > > >
> > > > > > > >
> > > > > > > > vfio-side fix was just queued for rc4:
> > > > > > > >
> > > > > > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > > > > > >
> > > > > > > Great, thanks for the pointer.
> > > > > > >
> > > > > > Yes, thanks!
> > > > > >
> > > > > > > Dragos, do you want to propose a similar fix for vDPA?
> > > > > > >
> > > > > > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > > > > > promising anything though.
> > > > > >
> > > > > > Thanks,
> > > > > > Dragos
> > > > >
> > > > > Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> > > > > seems a bit much to ask from a random reporter,
> > > >
> > > > Probably, just asking since Dragos has done some investigation.
> > > >
> > > > > this race
> > > > > likely can bite anyone.
> > > > >
> > > >
> > > > Dragos, I've drafted a patch, please try to see if it works (I had
> > > > tested it with LOCKDEP via vp_vdpa in L2).
> > > >
> > > > Thanks
> > >
> > > What is going on here that you decided to do an attachment as
> > > opposed to inlining normally?
> >
> > Actually, I plan to send a formal patch separately but stop at the
> > last seconds since it is just tested by L2 + vp_vdpa in L1.
> >
> > If inline really matters, I will do that next time.
> >
> > Thanks
>
> Jason are you going to submit a patch, now it's been tested?

I've posted it yesterday:

https://patchwork.kernel.org/project/netdevbpf/patch/20240701033159.18133-1-jasowang@redhat.com/

Thanks

>
> > >
> > > --
> > > MST
> > >
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-07-04  0:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-17 15:50 mmap_assert_write_locked warnings during for vhost_vdpa_fault Dragos Tatulea
2024-06-18  1:17 ` Jason Wang
2024-06-18  2:03   ` Tian, Kevin
2024-06-18  2:39     ` Jason Wang
2024-06-19  9:14       ` Dragos Tatulea
2024-06-19  9:51         ` Michael S. Tsirkin
2024-06-20  4:07           ` Jason Wang
2024-06-20  5:44             ` Michael S. Tsirkin
2024-06-20  8:23               ` Jason Wang
2024-06-20  9:05                 ` Michael S. Tsirkin
2024-06-26 10:54                   ` Dragos Tatulea
2024-07-03 16:23                 ` Michael S. Tsirkin
2024-07-04  0:10                   ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).