From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: Dragos Tatulea <dtatulea@nvidia.com>,
"kevin.tian@intel.com" <kevin.tian@intel.com>,
"virtualization@lists.linux-foundation.org"
<virtualization@lists.linux-foundation.org>,
"eperezma@redhat.com" <eperezma@redhat.com>,
"peterx@redhat.com" <peterx@redhat.com>
Subject: Re: mmap_assert_write_locked warnings during for vhost_vdpa_fault
Date: Thu, 20 Jun 2024 05:05:15 -0400 [thread overview]
Message-ID: <20240620050436-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CACGkMEvdsNi2Fd0ZoPJVZUARSRvNqJa4jVxE=8RCWr=dk2_kug@mail.gmail.com>
On Thu, Jun 20, 2024 at 04:23:30PM +0800, Jason Wang wrote:
> On Thu, Jun 20, 2024 at 1:44 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Thu, Jun 20, 2024 at 12:07:14PM +0800, Jason Wang wrote:
> > > On Wed, Jun 19, 2024 at 5:52 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Wed, Jun 19, 2024 at 09:14:41AM +0000, Dragos Tatulea wrote:
> > > > > On Tue, 2024-06-18 at 10:39 +0800, Jason Wang wrote:
> > > > > > On Tue, Jun 18, 2024 at 10:03 AM Tian, Kevin <kevin.tian@intel.com> wrote:
> > > > > > >
> > > > > > > > From: Jason Wang <jasowang@redhat.com>
> > > > > > > > Sent: Tuesday, June 18, 2024 9:18 AM
> > > > > > > >
> > > > > > > > On Mon, Jun 17, 2024 at 11:51 PM Dragos Tatulea <dtatulea@nvidia.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > After commit ba168b52bf8e "mm: use rwsem assertion macros for
> > > > > > > > > mmap_lock") was submitted, we started getting a lot of the
> > > > > > > > > following warnings about a missing mmap write lock during VM boot:
> > > > > > > > >
> > > > > > > > > ------------[ cut here ]------------
> > > > > > > > > WARNING: CPU: 1 PID: 58633 at include/linux/rwsem.h:85
> > > > > > > > > track_pfn_remap+0x12b/0x130
> > > > > > > > > Modules linked in: act_mirred act_skbedit vhost_vdpa cls_matchall
> > > > > > > > > nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vdpa
> > > > > > > > > openvswitch nsh vhost_net vhost vhost_iotlb tap ip6table_mangle
> > > > > > > > ip6table_nat
> > > > > > > > > iptable_mangle nf_tables ip6table_filter ip6_tables xt_conntrack
> > > > > > > > xt_MASQUERADE
> > > > > > > > > nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter
> > > > > > > > > rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm
> > > > > > > > ib_iser
> > > > > > > > > libiscsi ib_umad scsi_transport_iscsi ib_ipoib rdma_cm iw_cm ib_cm
> > > > > > > > mlx5_ib
> > > > > > > > > ib_uverbs ib_core fuse mlx5_core
> > > > > > > > > CPU: 1 PID: 58633 Comm: CPU 0/KVM Tainted: G W
> > > > > > > > > 6.10.0-rc1_for_upstream_min_debug_2024_05_29_17_06 #1
> > > > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > > > rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > > RIP: 0010:track_pfn_remap+0x12b/0x130
> > > > > > > > > Code: 48 83 c4 08 b8 ea ff ff ff 5b 5d 41 5c 41 5d c3 48 83 c4 08 48 89 ef 48
> > > > > > > > > 89 f2 5b 31 c9 4c 89 c6 5d 41 5c 41 5d e9 f5 fb ff ff <0f> 0b eb 9b 90 0f 1f 44
> > > > > > > > > 00 00 80 3d ac 59 96 01 00 74 01 c3 48 89
> > > > > > > > > RSP: 0018:ffff888350f8b8e0 EFLAGS: 00010246
> > > > > > > > > RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
> > > > > > > > > RDX: ffff8881080ca300 RSI: 0000000000001000 RDI: 0000000544003000
> > > > > > > > > RBP: 0000000544003000 R08: ffff888106730a60 R09: 0000000000000000
> > > > > > > > > R10: ffff888116eeff60 R11: 0000000000000000 R12: ffff888350f8b918
> > > > > > > > > R13: ffff888149f99da8 R14: 0000000000001000 R15: 0000000000001000
> > > > > > > > > FS: 00007f678d800700(0000) GS:ffff88852c880000(0000)
> > > > > > > > knlGS:0000000000000000
> > > > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > CR2: 00000000004e54f8 CR3: 0000000112290004 CR4: 0000000000372eb0
> > > > > > > > > Call Trace:
> > > > > > > > > <TASK>
> > > > > > > > > ? __warn+0x78/0x110
> > > > > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > > > > ? report_bug+0x16d/0x180
> > > > > > > > > ? handle_bug+0x3c/0x60
> > > > > > > > > ? exc_invalid_op+0x14/0x70
> > > > > > > > > ? asm_exc_invalid_op+0x16/0x20
> > > > > > > > > ? track_pfn_remap+0x12b/0x130
> > > > > > > > > remap_pfn_range+0x41/0xa0
> > > > > > > > > vhost_vdpa_fault+0x6c/0xa0 [vhost_vdpa]
> > > > > > > > > __do_fault+0x2f/0xb0
> > > > > > > > > __handle_mm_fault+0x13d3/0x2210
> > > > > > > > > handle_mm_fault+0xb0/0x260
> > > > > > > > > fixup_user_fault+0x77/0x170
> > > > > > > > > hva_to_pfn+0x2c5/0x4b0
> > > > > > > > > kvm_faultin_pfn+0xd7/0x510
> > > > > > > > > kvm_tdp_page_fault+0x111/0x190
> > > > > > > > > kvm_mmu_do_page_fault+0x105/0x230
> > > > > > > > > kvm_mmu_page_fault+0x7d/0x620
> > > > > > > > > ? vmx_deliver_interrupt+0x110/0x190
> > > > > > > > > ? __apic_accept_irq+0x16c/0x270
> > > > > > > > > ? vmx_vmexit+0x8d/0xc0
> > > > > > > > > vmx_handle_exit+0x110/0x640
> > > > > > > > > kvm_arch_vcpu_ioctl_run+0xdb0/0x1c20
> > > > > > > > > kvm_vcpu_ioctl+0x263/0x6a0
> > > > > > > > > ? futex_wake+0x81/0x180
> > > > > > > > > __x64_sys_ioctl+0x4a7/0x9d0
> > > > > > > > > ? __x64_sys_futex+0x73/0x1c0
> > > > > > > > > ? kvm_on_user_return+0x86/0x90
> > > > > > > > > do_syscall_64+0x4c/0x100
> > > > > > > > > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > > > > > > > > RIP: 0033:0x7f679186a17b
> > > > > > > > > Code: 0f 1e fa 48 8b 05 1d ad 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff
> > > > > > > > > c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01
> > > > > > > > > c3 48 8b 0d ed ac 0c 00 f7 d8 64 89 01 48
> > > > > > > > > RSP: 002b:00007f678d7ff788 EFLAGS: 00000246 ORIG_RAX:
> > > > > > > > 0000000000000010
> > > > > > > > > RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007f679186a17b
> > > > > > > > > RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000059
> > > > > > > > > RBP: 000055da5ee22050 R08: 000055da44b28160 R09: 0000000000000000
> > > > > > > > > R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
> > > > > > > > > R13: 000055da452b05e0 R14: 0000000000000001 R15: 0000000000000000
> > > > > > > > > </TASK>
> > > > > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > > > >
> > > > > > > > > The warnings show up only when the vdpa page-per-vq option is used
> > > > > > > > (doorbell
> > > > > > > > > mapping to guest).
> > > > > > > > >
> > > > > > > > > The issue seems to have existed before, but was visible only with
> > > > > > > > CONFIG_LOCKDEP
> > > > > > > > > enabled. I tried finding if this was introduced in more recent kernels, but
> > > > > > > > > stopped after going as far back as 6.5: the issue was still visible there.
> > > > > > > > >
> > > > > > > > > The warning is triggered for the following call chain:
> > > > > > > > > vhost_vdpa_fault()
> > > > > > > > > -> remap_pfn_range()
> > > > > > > > > -> remap_pfn_range_notrack()
> > > > > > > > > -> vm_flags_set()
> > > > > > > > > -> vma_start_write()
> > > > > > > > > -> __is_vma_write_locked()
> > > > > > > > > -> mmap_assert_write_locked()
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I've been trying to follow how the mm write lock is dropped in the above
> > > > > > > > call
> > > > > > > > > chain or not taken at all. But I couldn't make much sense of it...
> > > > > > > >
> > > > > > > > I've also had a glance at vfio_pci_mmap_fault, it seems to do something
> > > > > > > > similar.
> > > > > > > >
> > > > > > > > > Any ideas of what could have gone wrong here?
> > > > > > > >
> > > > > > > > Adding Peter for more thought here.
> > > > > > > >
> > > > > > >
> > > > > > > vfio-side fix was just queued for rc4:
> > > > > > >
> > > > > > > https://lore.kernel.org/all/20240614155603.34567eb7.alex.williamson@redhat.com/T/
> > > > > >
> > > > > > Great, thanks for the pointer.
> > > > > >
> > > > > Yes, thanks!
> > > > >
> > > > > > Dragos, do you want to propose a similar fix for vDPA?
> > > > > >
> > > > > Had a first look: the fixes look a bit daunting. I will to "port" them, not
> > > > > promising anything though.
> > > > >
> > > > > Thanks,
> > > > > Dragos
> > > >
> > > > Yea Jason, you coded this in ddd89d0a059d8e9740c75a97e0efe9bf07ee51f9,
> > > > seems a bit much to ask from a random reporter,
> > >
> > > Probably, just asking since Dragos has done some investigation.
> > >
> > > > this race
> > > > likely can bite anyone.
> > > >
> > >
> > > Dragos, I've drafted a patch, please try to see if it works (I had
> > > tested it with LOCKDEP via vp_vdpa in L2).
> > >
> > > Thanks
> >
> > What is going on here that you decided to do an attachment as
> > opposed to inlining normally?
>
> Actually, I plan to send a formal patch separately but stop at the
> last seconds since it is just tested by L2 + vp_vdpa in L1.
tag it as RFC, explain the testing status in the mail.
> If inline really matters, I will do that next time.
yes, this way people can comment.
> Thanks
>
> >
> > --
> > MST
> >
next prev parent reply other threads:[~2024-06-20 9:05 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-17 15:50 mmap_assert_write_locked warnings during for vhost_vdpa_fault Dragos Tatulea
2024-06-18 1:17 ` Jason Wang
2024-06-18 2:03 ` Tian, Kevin
2024-06-18 2:39 ` Jason Wang
2024-06-19 9:14 ` Dragos Tatulea
2024-06-19 9:51 ` Michael S. Tsirkin
2024-06-20 4:07 ` Jason Wang
2024-06-20 5:44 ` Michael S. Tsirkin
2024-06-20 8:23 ` Jason Wang
2024-06-20 9:05 ` Michael S. Tsirkin [this message]
2024-06-26 10:54 ` Dragos Tatulea
2024-07-03 16:23 ` Michael S. Tsirkin
2024-07-04 0:10 ` Jason Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240620050436-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=dtatulea@nvidia.com \
--cc=eperezma@redhat.com \
--cc=jasowang@redhat.com \
--cc=kevin.tian@intel.com \
--cc=peterx@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).