* mmap_assert_write_locked warnings during for vhost_vdpa_fault
@ 2024-06-17 15:50 Dragos Tatulea
2024-06-18 1:17 ` Jason Wang
0 siblings, 1 reply; 13+ messages in thread
From: Dragos Tatulea @ 2024-06-17 15:50 UTC (permalink / raw)
To: jasowang@redhat.com, mst@redhat.com, eperezma@redhat.com
Cc: virtualization@lists.linux-foundation.org
Hi,
After commit ba168b52bf8e "mm: use rwsem assertion macros for
mmap_lock") was submitted, we started getting a lot of the
following warnings about a missing mmap write lock during VM boot:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
track_pfn_remap+0x12b/0x130
Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle ip6table_nat
iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack xt_MASQUERADE
nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser
libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm mlx5_ib
ib_uverbs ib_core fuse mlx5_core
CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G W
6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:track_pfn_remap+0x12b/0x130
Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
FS: 00007f678d800700(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
Call Trace:
<TASK>
? __warn+0x78/0x110
? track_pfn_remap+0x12b/0x130
? report_bug+0x16d/0x180
? handle_bug+0x3c/0x60
? exc_invalid_op+0x14/0x70
? asm_exc_invalid_op+0x16/0x20
? track_pfn_remap+0x12b/0x130
remap_pfn_range+0x41/0xa0
vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
__do_fault+0x2f/0xb0
__handle_mm_fault+0x13d3/0x2210
handle_mm_fault+0xb0/0x260
fixup_user_fault+0x77/0x170
hva_to_pfn+0x2c5/0x4b0
kvm_faultin_pfn+0xd7/0x510
kvm_tdp_page_fault+0x111/0x190
kvm_mmu_do_page_fault+0x105/0x230
kvm_mmu_page_fault+0x7d/0x620
? vmx_deliver_interrupt+0x110/0x190
? __apic_accept_irq+0x16c/0x270
? vmx_vmexit+0x8d/0xc0
vmx_handle_exit+0x110/0x640
kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
kvm_vcpu_ioctl+0x263/0x6a0
? futex_wake+0x81/0x180
__x64_sys_ioctl+0x4a7/0x9d0
? __x64_sys_futex+0x73/0x1c0
? kvm_on_user_return+0x86/0x90
do_syscall_64+0x4c/0x100
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f679186a17b
Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
</TASK>
---[ end trace 0000000000000000 ]---
The warnings show up only when the vdpa page-per-vq option is used (doorbell
mapping to guest).
The issue seems to have existed before, but was visible only with CONFIG_LOCKDEP
enabled. I tried finding if this was introduced in more recent kernels, but
stopped after going as far back as 6.5: the issue was still visible there.
The warning is triggered for the following call chain:
vhost_vdpa_fault()
-> remap_pfn_range()
-> remap_pfn_range_notrack()
-> vm_flags_set()
-> vma_start_write()
-> __is_vma_write_locked()
-> mmap_assert_write_locked()
I've been trying to follow how the mm write lock is dropped in the above call
chain or not taken at all. But I couldn't make much sense of it...
Any ideas of what could have gone wrong here?
Thanks,
Dragos
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
2024-06-17 15:50 mmap_assert_write_locked warnings during for vhost_vdpa_fault Dragos Tatulea
@ 2024-06-18 1:17 ` Jason Wang
2024-06-18 2:03 ` Tian, Kevin
0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2024-06-18 1:17 UTC (permalink / raw)
To: Dragos Tatulea
Cc: mst@redhat.com, eperezma@redhat.com,
virtualization@lists.linux-foundation.org, Peter Xu
On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com> wrote:
>
> Hi,
>
> After commit ba168b52bf8e "mm: use rwsem assertion macros for
> mmap_lock") was submitted, we started getting a lot of the
> following warnings about a missing mmap write lock during VM boot:
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> track_pfn_remap+0x12b/0x130
> Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle ip6table_nat
> iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack xt_MASQUERADE
> nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser
> libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm mlx5_ib
> ib_uverbs ib_core fuse mlx5_core
> CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G W
> 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> RIP: 0010:track_pfn_remap+0x12b/0x130
> Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> FS: 00007f678d800700(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> Call Trace:
> <TASK>
> ? __warn+0x78/0x110
> ? track_pfn_remap+0x12b/0x130
> ? report_bug+0x16d/0x180
> ? handle_bug+0x3c/0x60
> ? exc_invalid_op+0x14/0x70
> ? asm_exc_invalid_op+0x16/0x20
> ? track_pfn_remap+0x12b/0x130
> remap_pfn_range+0x41/0xa0
> vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> __do_fault+0x2f/0xb0
> __handle_mm_fault+0x13d3/0x2210
> handle_mm_fault+0xb0/0x260
> fixup_user_fault+0x77/0x170
> hva_to_pfn+0x2c5/0x4b0
> kvm_faultin_pfn+0xd7/0x510
> kvm_tdp_page_fault+0x111/0x190
> kvm_mmu_do_page_fault+0x105/0x230
> kvm_mmu_page_fault+0x7d/0x620
> ? vmx_deliver_interrupt+0x110/0x190
> ? __apic_accept_irq+0x16c/0x270
> ? vmx_vmexit+0x8d/0xc0
> vmx_handle_exit+0x110/0x640
> kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> kvm_vcpu_ioctl+0x263/0x6a0
> ? futex_wake+0x81/0x180
> __x64_sys_ioctl+0x4a7/0x9d0
> ? __x64_sys_futex+0x73/0x1c0
> ? kvm_on_user_return+0x86/0x90
> do_syscall_64+0x4c/0x100
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> RIP: 0033:0x7f679186a17b
> Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> </TASK>
> ---[ end trace 0000000000000000 ]---
>
> The warnings show up only when the vdpa page-per-vq option is used (doorbell
> mapping to guest).
>
> The issue seems to have existed before, but was visible only with CONFIG_LOCKDEP
> enabled. I tried finding if this was introduced in more recent kernels, but
> stopped after going as far back as 6.5: the issue was still visible there.
>
> The warning is triggered for the following call chain:
> vhost_vdpa_fault()
> -> remap_pfn_range()
> -> remap_pfn_range_notrack()
> -> vm_flags_set()
> -> vma_start_write()
> -> __is_vma_write_locked()
> -> mmap_assert_write_locked()
>
>
> I've been trying to follow how the mm write lock is dropped in the above call
> chain or not taken at all. But I couldn't make much sense of it...
I've also had a glance at vfio_pci_mmap_fault, it seems to do something similar.
> Any ideas of what could have gone wrong here?
Adding Peter for more thought here.
Thanks
>
> Thanks,
> Dragos
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: mmap_assert_write_locked warnings during for vhost_vdpa_fault
2024-06-18 1:17 ` Jason Wang
@ 2024-06-18 2:03 ` Tian, Kevin
2024-06-18 2:39 ` Jason Wang
0 siblings, 1 reply; 13+ messages in thread
From: Tian, Kevin @ 2024-06-18 2:03 UTC (permalink / raw)
To: Jason Wang, Dragos Tatulea
Cc: mst@redhat.com, eperezma@redhat.com,
virtualization@lists.linux-foundation.org, Peter Xu
> From: Jason Wang <jasowang@redhat.com>
> Sent: Tuesday, June 18, 2024 9:18 AM
>
> On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> wrote:
> >
> > Hi,
> >
> > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > mmap_lock") was submitted, we started getting a lot of the
> > following warnings about a missing mmap write lock during VM boot:
> >
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > track_pfn_remap+0x12b/0x130
> > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> ip6table_nat
> > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> xt_MASQUERADE
> > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> ib_iser
> > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> mlx5_ib
> > ib_uverbs ib_core fuse mlx5_core
> > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G W
> > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > RIP: 0010:track_pfn_remap+0x12b/0x130
> > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > FS: 00007f678d800700(0000) GS:ffff88852c880000(0000)
> knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > Call Trace:
> > <TASK>
> > ? __warn+0x78/0x110
> > ? track_pfn_remap+0x12b/0x130
> > ? report_bug+0x16d/0x180
> > ? handle_bug+0x3c/0x60
> > ? exc_invalid_op+0x14/0x70
> > ? asm_exc_invalid_op+0x16/0x20
> > ? track_pfn_remap+0x12b/0x130
> > remap_pfn_range+0x41/0xa0
> > vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > __do_fault+0x2f/0xb0
> > __handle_mm_fault+0x13d3/0x2210
> > handle_mm_fault+0xb0/0x260
> > fixup_user_fault+0x77/0x170
> > hva_to_pfn+0x2c5/0x4b0
> > kvm_faultin_pfn+0xd7/0x510
> > kvm_tdp_page_fault+0x111/0x190
> > kvm_mmu_do_page_fault+0x105/0x230
> > kvm_mmu_page_fault+0x7d/0x620
> > ? vmx_deliver_interrupt+0x110/0x190
> > ? __apic_accept_irq+0x16c/0x270
> > ? vmx_vmexit+0x8d/0xc0
> > vmx_handle_exit+0x110/0x640
> > kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > kvm_vcpu_ioctl+0x263/0x6a0
> > ? futex_wake+0x81/0x180
> > __x64_sys_ioctl+0x4a7/0x9d0
> > ? __x64_sys_futex+0x73/0x1c0
> > ? kvm_on_user_return+0x86/0x90
> > do_syscall_64+0x4c/0x100
> > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > RIP: 0033:0x7f679186a17b
> > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > </TASK>
> > ---[ end trace 0000000000000000 ]---
> >
> > The warnings show up only when the vdpa page-per-vq option is used
> (doorbell
> > mapping to guest).
> >
> > The issue seems to have existed before, but was visible only with
> CONFIG_LOCKDEP
> > enabled. I tried finding if this was introduced in more recent kernels, but
> > stopped after going as far back as 6.5: the issue was still visible there.
> >
> > The warning is triggered for the following call chain:
> > vhost_vdpa_fault()
> > -> remap_pfn_range()
> > -> remap_pfn_range_notrack()
> > -> vm_flags_set()
> > -> vma_start_write()
> > -> __is_vma_write_locked()
> > -> mmap_assert_write_locked()
> >
> >
> > I've been trying to follow how the mm write lock is dropped in the above
> call
> > chain or not taken at all. But I couldn't make much sense of it...
>
> I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> similar.
>
> > Any ideas of what could have gone wrong here?
>
> Adding Peter for more thought here.
>
vfio-side fix was just queued for rc4:
https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
2024-06-18 2:03 ` Tian, Kevin
@ 2024-06-18 2:39 ` Jason Wang
2024-06-19 9:14 ` Dragos Tatulea
0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2024-06-18 2:39 UTC (permalink / raw)
To: Tian, Kevin, Dragos Tatulea
Cc: mst@redhat.com, eperezma@redhat.com,
virtualization@lists.linux-foundation.org, Peter Xu
On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
>
> > From: Jason Wang <jasowang@redhat.com>
> > Sent: Tuesday, June 18, 2024 9:18 AM
> >
> > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > mmap_lock") was submitted, we started getting a lot of the
> > > following warnings about a missing mmap write lock during VM boot:
> > >
> > > ------------[ cut here ]------------
> > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > track_pfn_remap+0x12b/0x130
> > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > ip6table_nat
> > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > xt_MASQUERADE
> > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > ib_iser
> > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > mlx5_ib
> > > ib_uverbs ib_core fuse mlx5_core
> > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G W
> > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > FS: 00007f678d800700(0000) GS:ffff88852c880000(0000)
> > knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > Call Trace:
> > > <TASK>
> > > ? __warn+0x78/0x110
> > > ? track_pfn_remap+0x12b/0x130
> > > ? report_bug+0x16d/0x180
> > > ? handle_bug+0x3c/0x60
> > > ? exc_invalid_op+0x14/0x70
> > > ? asm_exc_invalid_op+0x16/0x20
> > > ? track_pfn_remap+0x12b/0x130
> > > remap_pfn_range+0x41/0xa0
> > > vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > __do_fault+0x2f/0xb0
> > > __handle_mm_fault+0x13d3/0x2210
> > > handle_mm_fault+0xb0/0x260
> > > fixup_user_fault+0x77/0x170
> > > hva_to_pfn+0x2c5/0x4b0
> > > kvm_faultin_pfn+0xd7/0x510
> > > kvm_tdp_page_fault+0x111/0x190
> > > kvm_mmu_do_page_fault+0x105/0x230
> > > kvm_mmu_page_fault+0x7d/0x620
> > > ? vmx_deliver_interrupt+0x110/0x190
> > > ? __apic_accept_irq+0x16c/0x270
> > > ? vmx_vmexit+0x8d/0xc0
> > > vmx_handle_exit+0x110/0x640
> > > kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > kvm_vcpu_ioctl+0x263/0x6a0
> > > ? futex_wake+0x81/0x180
> > > __x64_sys_ioctl+0x4a7/0x9d0
> > > ? __x64_sys_futex+0x73/0x1c0
> > > ? kvm_on_user_return+0x86/0x90
> > > do_syscall_64+0x4c/0x100
> > > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > RIP: 0033:0x7f679186a17b
> > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000010
> > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > </TASK>
> > > ---[ end trace 0000000000000000 ]---
> > >
> > > The warnings show up only when the vdpa page-per-vq option is used
> > (doorbell
> > > mapping to guest).
> > >
> > > The issue seems to have existed before, but was visible only with
> > CONFIG_LOCKDEP
> > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > stopped after going as far back as 6.5: the issue was still visible there.
> > >
> > > The warning is triggered for the following call chain:
> > > vhost_vdpa_fault()
> > > -> remap_pfn_range()
> > > -> remap_pfn_range_notrack()
> > > -> vm_flags_set()
> > > -> vma_start_write()
> > > -> __is_vma_write_locked()
> > > -> mmap_assert_write_locked()
> > >
> > >
> > > I've been trying to follow how the mm write lock is dropped in the above
> > call
> > > chain or not taken at all. But I couldn't make much sense of it...
> >
> > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > similar.
> >
> > > Any ideas of what could have gone wrong here?
> >
> > Adding Peter for more thought here.
> >
>
> vfio-side fix was just queued for rc4:
>
> https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
Great, thanks for the pointer.
Dragos, do you want to propose a similar fix for vDPA?
Thanks
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
2024-06-18 2:39 ` Jason Wang
@ 2024-06-19 9:14 ` Dragos Tatulea
2024-06-19 9:51 ` Michael S. Tsirkin
0 siblings, 1 reply; 13+ messages in thread
From: Dragos Tatulea @ 2024-06-19 9:14 UTC (permalink / raw)
To: kevin.tian@intel.com, jasowang@redhat.com
Cc: virtualization@lists.linux-foundation.org, mst@redhat.com,
eperezma@redhat.com, peterx@redhat.com
On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> >
> > > From: Jason Wang <jasowang@redhat.com>
> > > Sent: Tuesday, June 18, 2024 9:18 AM
> > >
> > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > mmap_lock") was submitted, we started getting a lot of the
> > > > following warnings about a missing mmap write lock during VM boot:
> > > >
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > track_pfn_remap+0x12b/0x130
> > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > ip6table_nat
> > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > xt_MASQUERADE
> > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > ib_iser
> > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > mlx5_ib
> > > > ib_uverbs ib_core fuse mlx5_core
> > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G W
> > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > FS: 00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > knlGS:0000000000000000
> > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > Call Trace:
> > > > <TASK>
> > > > ? __warn+0x78/0x110
> > > > ? track_pfn_remap+0x12b/0x130
> > > > ? report_bug+0x16d/0x180
> > > > ? handle_bug+0x3c/0x60
> > > > ? exc_invalid_op+0x14/0x70
> > > > ? asm_exc_invalid_op+0x16/0x20
> > > > ? track_pfn_remap+0x12b/0x130
> > > > remap_pfn_range+0x41/0xa0
> > > > vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > __do_fault+0x2f/0xb0
> > > > __handle_mm_fault+0x13d3/0x2210
> > > > handle_mm_fault+0xb0/0x260
> > > > fixup_user_fault+0x77/0x170
> > > > hva_to_pfn+0x2c5/0x4b0
> > > > kvm_faultin_pfn+0xd7/0x510
> > > > kvm_tdp_page_fault+0x111/0x190
> > > > kvm_mmu_do_page_fault+0x105/0x230
> > > > kvm_mmu_page_fault+0x7d/0x620
> > > > ? vmx_deliver_interrupt+0x110/0x190
> > > > ? __apic_accept_irq+0x16c/0x270
> > > > ? vmx_vmexit+0x8d/0xc0
> > > > vmx_handle_exit+0x110/0x640
> > > > kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > kvm_vcpu_ioctl+0x263/0x6a0
> > > > ? futex_wake+0x81/0x180
> > > > __x64_sys_ioctl+0x4a7/0x9d0
> > > > ? __x64_sys_futex+0x73/0x1c0
> > > > ? kvm_on_user_return+0x86/0x90
> > > > do_syscall_64+0x4c/0x100
> > > > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > RIP: 0033:0x7f679186a17b
> > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > 0000000000000010
> > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > </TASK>
> > > > ---[ end trace 0000000000000000 ]---
> > > >
> > > > The warnings show up only when the vdpa page-per-vq option is used
> > > (doorbell
> > > > mapping to guest).
> > > >
> > > > The issue seems to have existed before, but was visible only with
> > > CONFIG_LOCKDEP
> > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > >
> > > > The warning is triggered for the following call chain:
> > > > vhost_vdpa_fault()
> > > > -> remap_pfn_range()
> > > > -> remap_pfn_range_notrack()
> > > > -> vm_flags_set()
> > > > -> vma_start_write()
> > > > -> __is_vma_write_locked()
> > > > -> mmap_assert_write_locked()
> > > >
> > > >
> > > > I've been trying to follow how the mm write lock is dropped in the above
> > > call
> > > > chain or not taken at all. But I couldn't make much sense of it...
> > >
> > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > similar.
> > >
> > > > Any ideas of what could have gone wrong here?
> > >
> > > Adding Peter for more thought here.
> > >
> >
> > vfio-side fix was just queued for rc4:
> >
> > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
>
> Great, thanks for the pointer.
>
Yes, thanks!
> Dragos, do you want to propose a similar fix for vDPA?
>
Had a first look: the fixes look a bit daunting. I will to "port" them, not
promising anything though.
Thanks,
Dragos
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
2024-06-19 9:14 ` Dragos Tatulea
@ 2024-06-19 9:51 ` Michael S. Tsirkin
2024-06-20 4:07 ` Jason Wang
0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2024-06-19 9:51 UTC (permalink / raw)
To: Dragos Tatulea
Cc: kevin.tian@intel.com, jasowang@redhat.com,
virtualization@lists.linux-foundation.org, eperezma@redhat.com,
peterx@redhat.com
On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > >
> > > > From: Jason Wang <jasowang@redhat.com>
> > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > >
> > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > following warnings about a missing mmap write lock during VM boot:
> > > > >
> > > > > ------------[ cut here ]------------
> > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > track_pfn_remap+0x12b/0x130
> > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > ip6table_nat
> > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > xt_MASQUERADE
> > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > ib_iser
> > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > mlx5_ib
> > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G W
> > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > FS: 00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > knlGS:0000000000000000
> > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > Call Trace:
> > > > > <TASK>
> > > > > ? __warn+0x78/0x110
> > > > > ? track_pfn_remap+0x12b/0x130
> > > > > ? report_bug+0x16d/0x180
> > > > > ? handle_bug+0x3c/0x60
> > > > > ? exc_invalid_op+0x14/0x70
> > > > > ? asm_exc_invalid_op+0x16/0x20
> > > > > ? track_pfn_remap+0x12b/0x130
> > > > > remap_pfn_range+0x41/0xa0
> > > > > vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > __do_fault+0x2f/0xb0
> > > > > __handle_mm_fault+0x13d3/0x2210
> > > > > handle_mm_fault+0xb0/0x260
> > > > > fixup_user_fault+0x77/0x170
> > > > > hva_to_pfn+0x2c5/0x4b0
> > > > > kvm_faultin_pfn+0xd7/0x510
> > > > > kvm_tdp_page_fault+0x111/0x190
> > > > > kvm_mmu_do_page_fault+0x105/0x230
> > > > > kvm_mmu_page_fault+0x7d/0x620
> > > > > ? vmx_deliver_interrupt+0x110/0x190
> > > > > ? __apic_accept_irq+0x16c/0x270
> > > > > ? vmx_vmexit+0x8d/0xc0
> > > > > vmx_handle_exit+0x110/0x640
> > > > > kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > kvm_vcpu_ioctl+0x263/0x6a0
> > > > > ? futex_wake+0x81/0x180
> > > > > __x64_sys_ioctl+0x4a7/0x9d0
> > > > > ? __x64_sys_futex+0x73/0x1c0
> > > > > ? kvm_on_user_return+0x86/0x90
> > > > > do_syscall_64+0x4c/0x100
> > > > > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > RIP: 0033:0x7f679186a17b
> > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > 0000000000000010
> > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > </TASK>
> > > > > ---[ end trace 0000000000000000 ]---
> > > > >
> > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > (doorbell
> > > > > mapping to guest).
> > > > >
> > > > > The issue seems to have existed before, but was visible only with
> > > > CONFIG_LOCKDEP
> > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > >
> > > > > The warning is triggered for the following call chain:
> > > > > vhost_vdpa_fault()
> > > > > -> remap_pfn_range()
> > > > > -> remap_pfn_range_notrack()
> > > > > -> vm_flags_set()
> > > > > -> vma_start_write()
> > > > > -> __is_vma_write_locked()
> > > > > -> mmap_assert_write_locked()
> > > > >
> > > > >
> > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > call
> > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > >
> > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > similar.
> > > >
> > > > > Any ideas of what could have gone wrong here?
> > > >
> > > > Adding Peter for more thought here.
> > > >
> > >
> > > vfio-side fix was just queued for rc4:
> > >
> > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> >
> > Great, thanks for the pointer.
> >
> Yes, thanks!
>
> > Dragos, do you want to propose a similar fix for vDPA?
> >
> Had a first look: the fixes look a bit daunting. I will to "port" them, not
> promising anything though.
>
> Thanks,
> Dragos
Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
seems a bit much to ask from a random reporter, this race
likely can bite anyone.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
2024-06-19 9:51 ` Michael S. Tsirkin
@ 2024-06-20 4:07 ` Jason Wang
2024-06-20 5:44 ` Michael S. Tsirkin
0 siblings, 1 reply; 13+ messages in thread
From: Jason Wang @ 2024-06-20 4:07 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Dragos Tatulea, kevin.tian@intel.com,
virtualization@lists.linux-foundation.org, eperezma@redhat.com,
peterx@redhat.com
[-- Attachment #1: Type: text/plain, Size: 7149 bytes --]
On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > >
> > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > >
> > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > >
> > > > > > ------------[ cut here ]------------
> > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > track_pfn_remap+0x12b/0x130
> > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > ip6table_nat
> > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > xt_MASQUERADE
> > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > ib_iser
> > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > mlx5_ib
> > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G W
> > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > FS: 00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > knlGS:0000000000000000
> > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > Call Trace:
> > > > > > <TASK>
> > > > > > ? __warn+0x78/0x110
> > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > ? report_bug+0x16d/0x180
> > > > > > ? handle_bug+0x3c/0x60
> > > > > > ? exc_invalid_op+0x14/0x70
> > > > > > ? asm_exc_invalid_op+0x16/0x20
> > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > remap_pfn_range+0x41/0xa0
> > > > > > vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > > __do_fault+0x2f/0xb0
> > > > > > __handle_mm_fault+0x13d3/0x2210
> > > > > > handle_mm_fault+0xb0/0x260
> > > > > > fixup_user_fault+0x77/0x170
> > > > > > hva_to_pfn+0x2c5/0x4b0
> > > > > > kvm_faultin_pfn+0xd7/0x510
> > > > > > kvm_tdp_page_fault+0x111/0x190
> > > > > > kvm_mmu_do_page_fault+0x105/0x230
> > > > > > kvm_mmu_page_fault+0x7d/0x620
> > > > > > ? vmx_deliver_interrupt+0x110/0x190
> > > > > > ? __apic_accept_irq+0x16c/0x270
> > > > > > ? vmx_vmexit+0x8d/0xc0
> > > > > > vmx_handle_exit+0x110/0x640
> > > > > > kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > > kvm_vcpu_ioctl+0x263/0x6a0
> > > > > > ? futex_wake+0x81/0x180
> > > > > > __x64_sys_ioctl+0x4a7/0x9d0
> > > > > > ? __x64_sys_futex+0x73/0x1c0
> > > > > > ? kvm_on_user_return+0x86/0x90
> > > > > > do_syscall_64+0x4c/0x100
> > > > > > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > RIP: 0033:0x7f679186a17b
> > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > 0000000000000010
> > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > > </TASK>
> > > > > > ---[ end trace 0000000000000000 ]---
> > > > > >
> > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > (doorbell
> > > > > > mapping to guest).
> > > > > >
> > > > > > The issue seems to have existed before, but was visible only with
> > > > > CONFIG_LOCKDEP
> > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > >
> > > > > > The warning is triggered for the following call chain:
> > > > > > vhost_vdpa_fault()
> > > > > > -> remap_pfn_range()
> > > > > > -> remap_pfn_range_notrack()
> > > > > > -> vm_flags_set()
> > > > > > -> vma_start_write()
> > > > > > -> __is_vma_write_locked()
> > > > > > -> mmap_assert_write_locked()
> > > > > >
> > > > > >
> > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > call
> > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > >
> > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > similar.
> > > > >
> > > > > > Any ideas of what could have gone wrong here?
> > > > >
> > > > > Adding Peter for more thought here.
> > > > >
> > > >
> > > > vfio-side fix was just queued for rc4:
> > > >
> > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > >
> > > Great, thanks for the pointer.
> > >
> > Yes, thanks!
> >
> > > Dragos, do you want to propose a similar fix for vDPA?
> > >
> > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > promising anything though.
> >
> > Thanks,
> > Dragos
>
> Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> seems a bit much to ask from a random reporter,
Probably, just asking since Dragos has done some investigation.
> this race
> likely can bite anyone.
>
Dragos, I've drafted a patch, please try to see if it works (I had
tested it with LOCKDEP via vp_vdpa in L2).
Thanks
[-- Attachment #2: 0001-vhost-vdpa-switch-to-use-vmf_insert_pfn-in-the-fault.patch --]
[-- Type: application/octet-stream, Size: 1500 bytes --]
From a94a70372b702246436cb33ecbaa07d5c6127ce7 Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Wed, 19 Jun 2024 21:25:32 -0400
Subject: [PATCH] vhost-vdpa: switch to use vmf_insert_pfn() in the fault
handler
remap_pfn_page() should not be called in the fault handler as it may
change the vma->flags which may trigger lockdep warning since the vma
write lock is not held. Actually there's no need to modify the
vma->flags as it has been set in the mmap(). So this patch switches to
use vmf_insert_pfn() instead.
Reported-by: Dragos Tatulea <dtatulea@nvidia.com>
Fixes: ddd89d0a059d ("vhost_vdpa: support doorbell mapping via mmap")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/vhost/vdpa.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 63a53680a85c..6b9c12acf438 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -1483,13 +1483,7 @@ static vm_fault_t vhost_vdpa_fault(struct vm_fault *vmf)
notify = ops->get_vq_notification(vdpa, index);
- vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
- if (remap_pfn_range(vma, vmf->address & PAGE_MASK,
- PFN_DOWN(notify.addr), PAGE_SIZE,
- vma->vm_page_prot))
- return VM_FAULT_SIGBUS;
-
- return VM_FAULT_NOPAGE;
+ return vmf_insert_pfn(vma, vmf->address & PAGE_MASK, PFN_DOWN(notify.addr));
}
static const struct vm_operations_struct vhost_vdpa_vm_ops = {
--
2.31.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
2024-06-20 4:07 ` Jason Wang
@ 2024-06-20 5:44 ` Michael S. Tsirkin
2024-06-20 8:23 ` Jason Wang
0 siblings, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2024-06-20 5:44 UTC (permalink / raw)
To: Jason Wang
Cc: Dragos Tatulea, kevin.tian@intel.com,
virtualization@lists.linux-foundation.org, eperezma@redhat.com,
peterx@redhat.com
On Thu, Jun 20, 2024 at 12:07:14PM +0800, Jason Wang wrote:
> On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > > >
> > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > > >
> > > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > > >
> > > > > > > ------------[ cut here ]------------
> > > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > > track_pfn_remap+0x12b/0x130
> > > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > > ip6table_nat
> > > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > > xt_MASQUERADE
> > > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > > ib_iser
> > > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > > mlx5_ib
> > > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G W
> > > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > > FS: 00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > > knlGS:0000000000000000
> > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > > Call Trace:
> > > > > > > <TASK>
> > > > > > > ? __warn+0x78/0x110
> > > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > > ? report_bug+0x16d/0x180
> > > > > > > ? handle_bug+0x3c/0x60
> > > > > > > ? exc_invalid_op+0x14/0x70
> > > > > > > ? asm_exc_invalid_op+0x16/0x20
> > > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > > remap_pfn_range+0x41/0xa0
> > > > > > > vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > > > __do_fault+0x2f/0xb0
> > > > > > > __handle_mm_fault+0x13d3/0x2210
> > > > > > > handle_mm_fault+0xb0/0x260
> > > > > > > fixup_user_fault+0x77/0x170
> > > > > > > hva_to_pfn+0x2c5/0x4b0
> > > > > > > kvm_faultin_pfn+0xd7/0x510
> > > > > > > kvm_tdp_page_fault+0x111/0x190
> > > > > > > kvm_mmu_do_page_fault+0x105/0x230
> > > > > > > kvm_mmu_page_fault+0x7d/0x620
> > > > > > > ? vmx_deliver_interrupt+0x110/0x190
> > > > > > > ? __apic_accept_irq+0x16c/0x270
> > > > > > > ? vmx_vmexit+0x8d/0xc0
> > > > > > > vmx_handle_exit+0x110/0x640
> > > > > > > kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > > > kvm_vcpu_ioctl+0x263/0x6a0
> > > > > > > ? futex_wake+0x81/0x180
> > > > > > > __x64_sys_ioctl+0x4a7/0x9d0
> > > > > > > ? __x64_sys_futex+0x73/0x1c0
> > > > > > > ? kvm_on_user_return+0x86/0x90
> > > > > > > do_syscall_64+0x4c/0x100
> > > > > > > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > > RIP: 0033:0x7f679186a17b
> > > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > > 0000000000000010
> > > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > > > </TASK>
> > > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > >
> > > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > > (doorbell
> > > > > > > mapping to guest).
> > > > > > >
> > > > > > > The issue seems to have existed before, but was visible only with
> > > > > > CONFIG_LOCKDEP
> > > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > > >
> > > > > > > The warning is triggered for the following call chain:
> > > > > > > vhost_vdpa_fault()
> > > > > > > -> remap_pfn_range()
> > > > > > > -> remap_pfn_range_notrack()
> > > > > > > -> vm_flags_set()
> > > > > > > -> vma_start_write()
> > > > > > > -> __is_vma_write_locked()
> > > > > > > -> mmap_assert_write_locked()
> > > > > > >
> > > > > > >
> > > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > > call
> > > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > > >
> > > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > > similar.
> > > > > >
> > > > > > > Any ideas of what could have gone wrong here?
> > > > > >
> > > > > > Adding Peter for more thought here.
> > > > > >
> > > > >
> > > > > vfio-side fix was just queued for rc4:
> > > > >
> > > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > > >
> > > > Great, thanks for the pointer.
> > > >
> > > Yes, thanks!
> > >
> > > > Dragos, do you want to propose a similar fix for vDPA?
> > > >
> > > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > > promising anything though.
> > >
> > > Thanks,
> > > Dragos
> >
> > Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> > seems a bit much to ask from a random reporter,
>
> Probably, just asking since Dragos has done some investigation.
>
> > this race
> > likely can bite anyone.
> >
>
> Dragos, I've drafted a patch, please try to see if it works (I had
> tested it with LOCKDEP via vp_vdpa in L2).
>
> Thanks
What is going on here that you decided to do an attachment as
opposed to inlining normally?
--
MST
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
2024-06-20 5:44 ` Michael S. Tsirkin
@ 2024-06-20 8:23 ` Jason Wang
2024-06-20 9:05 ` Michael S. Tsirkin
2024-07-03 16:23 ` Michael S. Tsirkin
0 siblings, 2 replies; 13+ messages in thread
From: Jason Wang @ 2024-06-20 8:23 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Dragos Tatulea, kevin.tian@intel.com,
virtualization@lists.linux-foundation.org, eperezma@redhat.com,
peterx@redhat.com
On Thu, Jun 20, 2024 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jun 20, 2024 at 12:07:14PM +0800, Jason Wang wrote:
> > On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > > > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > > > >
> > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > > > >
> > > > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > > > >
> > > > > > > > ------------[ cut here ]------------
> > > > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > > > track_pfn_remap+0x12b/0x130
> > > > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > > > ip6table_nat
> > > > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > > > xt_MASQUERADE
> > > > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > > > ib_iser
> > > > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > > > mlx5_ib
> > > > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G W
> > > > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > > > FS: 00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > > > knlGS:0000000000000000
> > > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > > > Call Trace:
> > > > > > > > <TASK>
> > > > > > > > ? __warn+0x78/0x110
> > > > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > > > ? report_bug+0x16d/0x180
> > > > > > > > ? handle_bug+0x3c/0x60
> > > > > > > > ? exc_invalid_op+0x14/0x70
> > > > > > > > ? asm_exc_invalid_op+0x16/0x20
> > > > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > > > remap_pfn_range+0x41/0xa0
> > > > > > > > vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > > > > __do_fault+0x2f/0xb0
> > > > > > > > __handle_mm_fault+0x13d3/0x2210
> > > > > > > > handle_mm_fault+0xb0/0x260
> > > > > > > > fixup_user_fault+0x77/0x170
> > > > > > > > hva_to_pfn+0x2c5/0x4b0
> > > > > > > > kvm_faultin_pfn+0xd7/0x510
> > > > > > > > kvm_tdp_page_fault+0x111/0x190
> > > > > > > > kvm_mmu_do_page_fault+0x105/0x230
> > > > > > > > kvm_mmu_page_fault+0x7d/0x620
> > > > > > > > ? vmx_deliver_interrupt+0x110/0x190
> > > > > > > > ? __apic_accept_irq+0x16c/0x270
> > > > > > > > ? vmx_vmexit+0x8d/0xc0
> > > > > > > > vmx_handle_exit+0x110/0x640
> > > > > > > > kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > > > > kvm_vcpu_ioctl+0x263/0x6a0
> > > > > > > > ? futex_wake+0x81/0x180
> > > > > > > > __x64_sys_ioctl+0x4a7/0x9d0
> > > > > > > > ? __x64_sys_futex+0x73/0x1c0
> > > > > > > > ? kvm_on_user_return+0x86/0x90
> > > > > > > > do_syscall_64+0x4c/0x100
> > > > > > > > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > > > RIP: 0033:0x7f679186a17b
> > > > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > > > 0000000000000010
> > > > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > > > > </TASK>
> > > > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > > >
> > > > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > > > (doorbell
> > > > > > > > mapping to guest).
> > > > > > > >
> > > > > > > > The issue seems to have existed before, but was visible only with
> > > > > > > CONFIG_LOCKDEP
> > > > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > > > >
> > > > > > > > The warning is triggered for the following call chain:
> > > > > > > > vhost_vdpa_fault()
> > > > > > > > -> remap_pfn_range()
> > > > > > > > -> remap_pfn_range_notrack()
> > > > > > > > -> vm_flags_set()
> > > > > > > > -> vma_start_write()
> > > > > > > > -> __is_vma_write_locked()
> > > > > > > > -> mmap_assert_write_locked()
> > > > > > > >
> > > > > > > >
> > > > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > > > call
> > > > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > > > >
> > > > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > > > similar.
> > > > > > >
> > > > > > > > Any ideas of what could have gone wrong here?
> > > > > > >
> > > > > > > Adding Peter for more thought here.
> > > > > > >
> > > > > >
> > > > > > vfio-side fix was just queued for rc4:
> > > > > >
> > > > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > > > >
> > > > > Great, thanks for the pointer.
> > > > >
> > > > Yes, thanks!
> > > >
> > > > > Dragos, do you want to propose a similar fix for vDPA?
> > > > >
> > > > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > > > promising anything though.
> > > >
> > > > Thanks,
> > > > Dragos
> > >
> > > Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> > > seems a bit much to ask from a random reporter,
> >
> > Probably, just asking since Dragos has done some investigation.
> >
> > > this race
> > > likely can bite anyone.
> > >
> >
> > Dragos, I've drafted a patch, please try to see if it works (I had
> > tested it with LOCKDEP via vp_vdpa in L2).
> >
> > Thanks
>
> What is going on here that you decided to do an attachment as
> opposed to inlining normally?
Actually, I plan to send a formal patch separately but stop at the
last seconds since it is just tested by L2 + vp_vdpa in L1.
If inline really matters, I will do that next time.
Thanks
>
> --
> MST
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
2024-06-20 8:23 ` Jason Wang
@ 2024-06-20 9:05 ` Michael S. Tsirkin
2024-06-26 10:54 ` Dragos Tatulea
2024-07-03 16:23 ` Michael S. Tsirkin
1 sibling, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2024-06-20 9:05 UTC (permalink / raw)
To: Jason Wang
Cc: Dragos Tatulea, kevin.tian@intel.com,
virtualization@lists.linux-foundation.org, eperezma@redhat.com,
peterx@redhat.com
On Thu, Jun 20, 2024 at 04:23:30PM +0800, Jason Wang wrote:
> On Thu, Jun 20, 2024 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jun 20, 2024 at 12:07:14PM +0800, Jason Wang wrote:
> > > On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > > > > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > > > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > > > > >
> > > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > > > > >
> > > > > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > > > > >
> > > > > > > > > ------------[ cut here ]------------
> > > > > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > > > > track_pfn_remap+0x12b/0x130
> > > > > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > > > > ip6table_nat
> > > > > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > > > > xt_MASQUERADE
> > > > > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > > > > ib_iser
> > > > > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > > > > mlx5_ib
> > > > > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G W
> > > > > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > > > > FS: 00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > > > > knlGS:0000000000000000
> > > > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > > > > Call Trace:
> > > > > > > > > <TASK>
> > > > > > > > > ? __warn+0x78/0x110
> > > > > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > > > > ? report_bug+0x16d/0x180
> > > > > > > > > ? handle_bug+0x3c/0x60
> > > > > > > > > ? exc_invalid_op+0x14/0x70
> > > > > > > > > ? asm_exc_invalid_op+0x16/0x20
> > > > > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > > > > remap_pfn_range+0x41/0xa0
> > > > > > > > > vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > > > > > __do_fault+0x2f/0xb0
> > > > > > > > > __handle_mm_fault+0x13d3/0x2210
> > > > > > > > > handle_mm_fault+0xb0/0x260
> > > > > > > > > fixup_user_fault+0x77/0x170
> > > > > > > > > hva_to_pfn+0x2c5/0x4b0
> > > > > > > > > kvm_faultin_pfn+0xd7/0x510
> > > > > > > > > kvm_tdp_page_fault+0x111/0x190
> > > > > > > > > kvm_mmu_do_page_fault+0x105/0x230
> > > > > > > > > kvm_mmu_page_fault+0x7d/0x620
> > > > > > > > > ? vmx_deliver_interrupt+0x110/0x190
> > > > > > > > > ? __apic_accept_irq+0x16c/0x270
> > > > > > > > > ? vmx_vmexit+0x8d/0xc0
> > > > > > > > > vmx_handle_exit+0x110/0x640
> > > > > > > > > kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > > > > > kvm_vcpu_ioctl+0x263/0x6a0
> > > > > > > > > ? futex_wake+0x81/0x180
> > > > > > > > > __x64_sys_ioctl+0x4a7/0x9d0
> > > > > > > > > ? __x64_sys_futex+0x73/0x1c0
> > > > > > > > > ? kvm_on_user_return+0x86/0x90
> > > > > > > > > do_syscall_64+0x4c/0x100
> > > > > > > > > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > > > > RIP: 0033:0x7f679186a17b
> > > > > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > > > > 0000000000000010
> > > > > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > > > > > </TASK>
> > > > > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > > > >
> > > > > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > > > > (doorbell
> > > > > > > > > mapping to guest).
> > > > > > > > >
> > > > > > > > > The issue seems to have existed before, but was visible only with
> > > > > > > > CONFIG_LOCKDEP
> > > > > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > > > > >
> > > > > > > > > The warning is triggered for the following call chain:
> > > > > > > > > vhost_vdpa_fault()
> > > > > > > > > -> remap_pfn_range()
> > > > > > > > > -> remap_pfn_range_notrack()
> > > > > > > > > -> vm_flags_set()
> > > > > > > > > -> vma_start_write()
> > > > > > > > > -> __is_vma_write_locked()
> > > > > > > > > -> mmap_assert_write_locked()
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > > > > call
> > > > > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > > > > >
> > > > > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > > > > similar.
> > > > > > > >
> > > > > > > > > Any ideas of what could have gone wrong here?
> > > > > > > >
> > > > > > > > Adding Peter for more thought here.
> > > > > > > >
> > > > > > >
> > > > > > > vfio-side fix was just queued for rc4:
> > > > > > >
> > > > > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > > > > >
> > > > > > Great, thanks for the pointer.
> > > > > >
> > > > > Yes, thanks!
> > > > >
> > > > > > Dragos, do you want to propose a similar fix for vDPA?
> > > > > >
> > > > > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > > > > promising anything though.
> > > > >
> > > > > Thanks,
> > > > > Dragos
> > > >
> > > > Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> > > > seems a bit much to ask from a random reporter,
> > >
> > > Probably, just asking since Dragos has done some investigation.
> > >
> > > > this race
> > > > likely can bite anyone.
> > > >
> > >
> > > Dragos, I've drafted a patch, please try to see if it works (I had
> > > tested it with LOCKDEP via vp_vdpa in L2).
> > >
> > > Thanks
> >
> > What is going on here that you decided to do an attachment as
> > opposed to inlining normally?
>
> Actually, I plan to send a formal patch separately but stop at the
> last seconds since it is just tested by L2 + vp_vdpa in L1.
tag it as RFC, explain the testing status in the mail.
> If inline really matters, I will do that next time.
yes, this way people can comment.
> Thanks
>
> >
> > --
> > MST
> >
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
2024-06-20 9:05 ` Michael S. Tsirkin
@ 2024-06-26 10:54 ` Dragos Tatulea
0 siblings, 0 replies; 13+ messages in thread
From: Dragos Tatulea @ 2024-06-26 10:54 UTC (permalink / raw)
To: mst@redhat.com, jasowang@redhat.com
Cc: kevin.tian@intel.com, virtualization@lists.linux-foundation.org,
eperezma@redhat.com, peterx@redhat.com
On Thu, 2024-06-20 at 05:05 -0400, Michael S. Tsirkin wrote:
> On Thu, Jun 20, 2024 at 04:23:30PM +0800, Jason Wang wrote:
> > On Thu, Jun 20, 2024 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Jun 20, 2024 at 12:07:14PM +0800, Jason Wang wrote:
> > > > On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > > > > > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > > > > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > > > > > >
> > > > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > > > > > >
> > > > > > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > > > > > >
> > > > > > > > > > ------------[ cut here ]------------
> > > > > > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > > > > > track_pfn_remap+0x12b/0x130
> > > > > > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > > > > > ip6table_nat
> > > > > > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > > > > > xt_MASQUERADE
> > > > > > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > > > > > ib_iser
> > > > > > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > > > > > mlx5_ib
> > > > > > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G W
> > > > > > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > > > > > FS: 00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > > > > > knlGS:0000000000000000
> > > > > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > > > > > Call Trace:
> > > > > > > > > > <TASK>
> > > > > > > > > > ? __warn+0x78/0x110
> > > > > > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > > > > > ? report_bug+0x16d/0x180
> > > > > > > > > > ? handle_bug+0x3c/0x60
> > > > > > > > > > ? exc_invalid_op+0x14/0x70
> > > > > > > > > > ? asm_exc_invalid_op+0x16/0x20
> > > > > > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > > > > > remap_pfn_range+0x41/0xa0
> > > > > > > > > > vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > > > > > > __do_fault+0x2f/0xb0
> > > > > > > > > > __handle_mm_fault+0x13d3/0x2210
> > > > > > > > > > handle_mm_fault+0xb0/0x260
> > > > > > > > > > fixup_user_fault+0x77/0x170
> > > > > > > > > > hva_to_pfn+0x2c5/0x4b0
> > > > > > > > > > kvm_faultin_pfn+0xd7/0x510
> > > > > > > > > > kvm_tdp_page_fault+0x111/0x190
> > > > > > > > > > kvm_mmu_do_page_fault+0x105/0x230
> > > > > > > > > > kvm_mmu_page_fault+0x7d/0x620
> > > > > > > > > > ? vmx_deliver_interrupt+0x110/0x190
> > > > > > > > > > ? __apic_accept_irq+0x16c/0x270
> > > > > > > > > > ? vmx_vmexit+0x8d/0xc0
> > > > > > > > > > vmx_handle_exit+0x110/0x640
> > > > > > > > > > kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > > > > > > kvm_vcpu_ioctl+0x263/0x6a0
> > > > > > > > > > ? futex_wake+0x81/0x180
> > > > > > > > > > __x64_sys_ioctl+0x4a7/0x9d0
> > > > > > > > > > ? __x64_sys_futex+0x73/0x1c0
> > > > > > > > > > ? kvm_on_user_return+0x86/0x90
> > > > > > > > > > do_syscall_64+0x4c/0x100
> > > > > > > > > > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > > > > > RIP: 0033:0x7f679186a17b
> > > > > > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > > > > > 0000000000000010
> > > > > > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > > > > > > </TASK>
> > > > > > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > > > > >
> > > > > > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > > > > > (doorbell
> > > > > > > > > > mapping to guest).
> > > > > > > > > >
> > > > > > > > > > The issue seems to have existed before, but was visible only with
> > > > > > > > > CONFIG_LOCKDEP
> > > > > > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > > > > > >
> > > > > > > > > > The warning is triggered for the following call chain:
> > > > > > > > > > vhost_vdpa_fault()
> > > > > > > > > > -> remap_pfn_range()
> > > > > > > > > > -> remap_pfn_range_notrack()
> > > > > > > > > > -> vm_flags_set()
> > > > > > > > > > -> vma_start_write()
> > > > > > > > > > -> __is_vma_write_locked()
> > > > > > > > > > -> mmap_assert_write_locked()
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > > > > > call
> > > > > > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > > > > > >
> > > > > > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > > > > > similar.
> > > > > > > > >
> > > > > > > > > > Any ideas of what could have gone wrong here?
> > > > > > > > >
> > > > > > > > > Adding Peter for more thought here.
> > > > > > > > >
> > > > > > > >
> > > > > > > > vfio-side fix was just queued for rc4:
> > > > > > > >
> > > > > > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > > > > > >
> > > > > > > Great, thanks for the pointer.
> > > > > > >
> > > > > > Yes, thanks!
> > > > > >
> > > > > > > Dragos, do you want to propose a similar fix for vDPA?
> > > > > > >
> > > > > > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > > > > > promising anything though.
> > > > > >
> > > > > > Thanks,
> > > > > > Dragos
> > > > >
> > > > > Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> > > > > seems a bit much to ask from a random reporter,
> > > >
> > > > Probably, just asking since Dragos has done some investigation.
> > > >
> > > > > this race
> > > > > likely can bite anyone.
> > > > >
> > > >
> > > > Dragos, I've drafted a patch, please try to see if it works (I had
> > > > tested it with LOCKDEP via vp_vdpa in L2).
> > > >
> > > > Thanks
> > >
> > > What is going on here that you decided to do an attachment as
> > > opposed to inlining normally?
> >
> > Actually, I plan to send a formal patch separately but stop at the
> > last seconds since it is just tested by L2 + vp_vdpa in L1.
>
> tag it as RFC, explain the testing status in the mail.
>
> > If inline really matters, I will do that next time.
>
>
> yes, this way people can comment.
>
The fix works. Thanks Jason! FWIW:
Tested-by: Dragos Tatulea <dtatulea@nvidia.com>
Thanks,
Dragos
> > Thanks
> >
> > >
> > > --
> > > MST
> > >
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
2024-06-20 8:23 ` Jason Wang
2024-06-20 9:05 ` Michael S. Tsirkin
@ 2024-07-03 16:23 ` Michael S. Tsirkin
2024-07-04 0:10 ` Jason Wang
1 sibling, 1 reply; 13+ messages in thread
From: Michael S. Tsirkin @ 2024-07-03 16:23 UTC (permalink / raw)
To: Jason Wang
Cc: Dragos Tatulea, kevin.tian@intel.com,
virtualization@lists.linux-foundation.org, eperezma@redhat.com,
peterx@redhat.com
On Thu, Jun 20, 2024 at 04:23:30PM +0800, Jason Wang wrote:
> On Thu, Jun 20, 2024 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jun 20, 2024 at 12:07:14PM +0800, Jason Wang wrote:
> > > On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > > > > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > > > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > > > > >
> > > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > > > > >
> > > > > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > > > > >
> > > > > > > > > ------------[ cut here ]------------
> > > > > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > > > > track_pfn_remap+0x12b/0x130
> > > > > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > > > > ip6table_nat
> > > > > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > > > > xt_MASQUERADE
> > > > > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > > > > ib_iser
> > > > > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > > > > mlx5_ib
> > > > > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G W
> > > > > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > > > > FS: 00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > > > > knlGS:0000000000000000
> > > > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > > > > Call Trace:
> > > > > > > > > <TASK>
> > > > > > > > > ? __warn+0x78/0x110
> > > > > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > > > > ? report_bug+0x16d/0x180
> > > > > > > > > ? handle_bug+0x3c/0x60
> > > > > > > > > ? exc_invalid_op+0x14/0x70
> > > > > > > > > ? asm_exc_invalid_op+0x16/0x20
> > > > > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > > > > remap_pfn_range+0x41/0xa0
> > > > > > > > > vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > > > > > __do_fault+0x2f/0xb0
> > > > > > > > > __handle_mm_fault+0x13d3/0x2210
> > > > > > > > > handle_mm_fault+0xb0/0x260
> > > > > > > > > fixup_user_fault+0x77/0x170
> > > > > > > > > hva_to_pfn+0x2c5/0x4b0
> > > > > > > > > kvm_faultin_pfn+0xd7/0x510
> > > > > > > > > kvm_tdp_page_fault+0x111/0x190
> > > > > > > > > kvm_mmu_do_page_fault+0x105/0x230
> > > > > > > > > kvm_mmu_page_fault+0x7d/0x620
> > > > > > > > > ? vmx_deliver_interrupt+0x110/0x190
> > > > > > > > > ? __apic_accept_irq+0x16c/0x270
> > > > > > > > > ? vmx_vmexit+0x8d/0xc0
> > > > > > > > > vmx_handle_exit+0x110/0x640
> > > > > > > > > kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > > > > > kvm_vcpu_ioctl+0x263/0x6a0
> > > > > > > > > ? futex_wake+0x81/0x180
> > > > > > > > > __x64_sys_ioctl+0x4a7/0x9d0
> > > > > > > > > ? __x64_sys_futex+0x73/0x1c0
> > > > > > > > > ? kvm_on_user_return+0x86/0x90
> > > > > > > > > do_syscall_64+0x4c/0x100
> > > > > > > > > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > > > > RIP: 0033:0x7f679186a17b
> > > > > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > > > > 0000000000000010
> > > > > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > > > > > </TASK>
> > > > > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > > > >
> > > > > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > > > > (doorbell
> > > > > > > > > mapping to guest).
> > > > > > > > >
> > > > > > > > > The issue seems to have existed before, but was visible only with
> > > > > > > > CONFIG_LOCKDEP
> > > > > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > > > > >
> > > > > > > > > The warning is triggered for the following call chain:
> > > > > > > > > vhost_vdpa_fault()
> > > > > > > > > -> remap_pfn_range()
> > > > > > > > > -> remap_pfn_range_notrack()
> > > > > > > > > -> vm_flags_set()
> > > > > > > > > -> vma_start_write()
> > > > > > > > > -> __is_vma_write_locked()
> > > > > > > > > -> mmap_assert_write_locked()
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > > > > call
> > > > > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > > > > >
> > > > > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > > > > similar.
> > > > > > > >
> > > > > > > > > Any ideas of what could have gone wrong here?
> > > > > > > >
> > > > > > > > Adding Peter for more thought here.
> > > > > > > >
> > > > > > >
> > > > > > > vfio-side fix was just queued for rc4:
> > > > > > >
> > > > > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > > > > >
> > > > > > Great, thanks for the pointer.
> > > > > >
> > > > > Yes, thanks!
> > > > >
> > > > > > Dragos, do you want to propose a similar fix for vDPA?
> > > > > >
> > > > > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > > > > promising anything though.
> > > > >
> > > > > Thanks,
> > > > > Dragos
> > > >
> > > > Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> > > > seems a bit much to ask from a random reporter,
> > >
> > > Probably, just asking since Dragos has done some investigation.
> > >
> > > > this race
> > > > likely can bite anyone.
> > > >
> > >
> > > Dragos, I've drafted a patch, please try to see if it works (I had
> > > tested it with LOCKDEP via vp_vdpa in L2).
> > >
> > > Thanks
> >
> > What is going on here that you decided to do an attachment as
> > opposed to inlining normally?
>
> Actually, I plan to send a formal patch separately but stop at the
> last seconds since it is just tested by L2 + vp_vdpa in L1.
>
> If inline really matters, I will do that next time.
>
> Thanks
Jason are you going to submit a patch, now it's been tested?
> >
> > --
> > MST
> >
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
2024-07-03 16:23 ` Michael S. Tsirkin
@ 2024-07-04 0:10 ` Jason Wang
0 siblings, 0 replies; 13+ messages in thread
From: Jason Wang @ 2024-07-04 0:10 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Dragos Tatulea, kevin.tian@intel.com,
virtualization@lists.linux-foundation.org, eperezma@redhat.com,
peterx@redhat.com
On Thu, Jul 4, 2024 at 12:23 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Jun 20, 2024 at 04:23:30PM +0800, Jason Wang wrote:
> > On Thu, Jun 20, 2024 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Thu, Jun 20, 2024 at 12:07:14PM +0800, Jason Wang wrote:
> > > > On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > > > > > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > > > > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > > > > > >
> > > > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > > > > > >
> > > > > > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > > > > > >
> > > > > > > > > > ------------[ cut here ]------------
> > > > > > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > > > > > track_pfn_remap+0x12b/0x130
> > > > > > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > > > > > ip6table_nat
> > > > > > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > > > > > xt_MASQUERADE
> > > > > > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > > > > > ib_iser
> > > > > > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > > > > > mlx5_ib
> > > > > > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G W
> > > > > > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > > > > > FS: 00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > > > > > knlGS:0000000000000000
> > > > > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > > > > > Call Trace:
> > > > > > > > > > <TASK>
> > > > > > > > > > ? __warn+0x78/0x110
> > > > > > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > > > > > ? report_bug+0x16d/0x180
> > > > > > > > > > ? handle_bug+0x3c/0x60
> > > > > > > > > > ? exc_invalid_op+0x14/0x70
> > > > > > > > > > ? asm_exc_invalid_op+0x16/0x20
> > > > > > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > > > > > remap_pfn_range+0x41/0xa0
> > > > > > > > > > vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > > > > > > __do_fault+0x2f/0xb0
> > > > > > > > > > __handle_mm_fault+0x13d3/0x2210
> > > > > > > > > > handle_mm_fault+0xb0/0x260
> > > > > > > > > > fixup_user_fault+0x77/0x170
> > > > > > > > > > hva_to_pfn+0x2c5/0x4b0
> > > > > > > > > > kvm_faultin_pfn+0xd7/0x510
> > > > > > > > > > kvm_tdp_page_fault+0x111/0x190
> > > > > > > > > > kvm_mmu_do_page_fault+0x105/0x230
> > > > > > > > > > kvm_mmu_page_fault+0x7d/0x620
> > > > > > > > > > ? vmx_deliver_interrupt+0x110/0x190
> > > > > > > > > > ? __apic_accept_irq+0x16c/0x270
> > > > > > > > > > ? vmx_vmexit+0x8d/0xc0
> > > > > > > > > > vmx_handle_exit+0x110/0x640
> > > > > > > > > > kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > > > > > > kvm_vcpu_ioctl+0x263/0x6a0
> > > > > > > > > > ? futex_wake+0x81/0x180
> > > > > > > > > > __x64_sys_ioctl+0x4a7/0x9d0
> > > > > > > > > > ? __x64_sys_futex+0x73/0x1c0
> > > > > > > > > > ? kvm_on_user_return+0x86/0x90
> > > > > > > > > > do_syscall_64+0x4c/0x100
> > > > > > > > > > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > > > > > RIP: 0033:0x7f679186a17b
> > > > > > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > > > > > 0000000000000010
> > > > > > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > > > > > > </TASK>
> > > > > > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > > > > >
> > > > > > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > > > > > (doorbell
> > > > > > > > > > mapping to guest).
> > > > > > > > > >
> > > > > > > > > > The issue seems to have existed before, but was visible only with
> > > > > > > > > CONFIG_LOCKDEP
> > > > > > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > > > > > >
> > > > > > > > > > The warning is triggered for the following call chain:
> > > > > > > > > > vhost_vdpa_fault()
> > > > > > > > > > -> remap_pfn_range()
> > > > > > > > > > -> remap_pfn_range_notrack()
> > > > > > > > > > -> vm_flags_set()
> > > > > > > > > > -> vma_start_write()
> > > > > > > > > > -> __is_vma_write_locked()
> > > > > > > > > > -> mmap_assert_write_locked()
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > > > > > call
> > > > > > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > > > > > >
> > > > > > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > > > > > similar.
> > > > > > > > >
> > > > > > > > > > Any ideas of what could have gone wrong here?
> > > > > > > > >
> > > > > > > > > Adding Peter for more thought here.
> > > > > > > > >
> > > > > > > >
> > > > > > > > vfio-side fix was just queued for rc4:
> > > > > > > >
> > > > > > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > > > > > >
> > > > > > > Great, thanks for the pointer.
> > > > > > >
> > > > > > Yes, thanks!
> > > > > >
> > > > > > > Dragos, do you want to propose a similar fix for vDPA?
> > > > > > >
> > > > > > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > > > > > promising anything though.
> > > > > >
> > > > > > Thanks,
> > > > > > Dragos
> > > > >
> > > > > Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> > > > > seems a bit much to ask from a random reporter,
> > > >
> > > > Probably, just asking since Dragos has done some investigation.
> > > >
> > > > > this race
> > > > > likely can bite anyone.
> > > > >
> > > >
> > > > Dragos, I've drafted a patch, please try to see if it works (I had
> > > > tested it with LOCKDEP via vp_vdpa in L2).
> > > >
> > > > Thanks
> > >
> > > What is going on here that you decided to do an attachment as
> > > opposed to inlining normally?
> >
> > Actually, I plan to send a formal patch separately but stop at the
> > last seconds since it is just tested by L2 + vp_vdpa in L1.
> >
> > If inline really matters, I will do that next time.
> >
> > Thanks
>
> Jason are you going to submit a patch, now it's been tested?
I've posted it yesterday:
https://patchwork.kernel.org/project/netdevbpf/patch/20240701033159.18133-1-jasowang@redhat.com/
Thanks
>
> > >
> > > --
> > > MST
> > >
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-07-04 0:11 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-17 15:50 mmap_assert_write_locked warnings during for vhost_vdpa_fault Dragos Tatulea
2024-06-18 1:17 ` Jason Wang
2024-06-18 2:03 ` Tian, Kevin
2024-06-18 2:39 ` Jason Wang
2024-06-19 9:14 ` Dragos Tatulea
2024-06-19 9:51 ` Michael S. Tsirkin
2024-06-20 4:07 ` Jason Wang
2024-06-20 5:44 ` Michael S. Tsirkin
2024-06-20 8:23 ` Jason Wang
2024-06-20 9:05 ` Michael S. Tsirkin
2024-06-26 10:54 ` Dragos Tatulea
2024-07-03 16:23 ` Michael S. Tsirkin
2024-07-04 0:10 ` Jason Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).