linux-s390.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Niklas Schnelle <schnelle@linux.ibm.com>
To: David Hildenbrand <david@redhat.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Sven Schnelle <svens@linux.ibm.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Gerd Bayer <gbayer@linux.ibm.com>,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Suren Baghdasaryan <surenb@google.com>
Cc: linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org
Subject: Re: [PATCH v3 1/3] s390/pci: Fix s390_mmio_read/write syscall page fault handling
Date: Tue, 11 Jun 2024 17:37:20 +0200	[thread overview]
Message-ID: <b38b571b753441314c090c3eb51c49c0e28a19d5.camel@linux.ibm.com> (raw)
In-Reply-To: <89c74380-6a60-4091-ba57-93c75d9a37d7@redhat.com>

On Tue, 2024-06-11 at 17:10 +0200, David Hildenbrand wrote:
> > > 
> > > which checks mmap_assert_write_locked().
> > > 
> > > Setting VMA flags would be racy with the mmap lock in read mode.
> > > 
> > > 
> > > remap_pfn_range() documents: "this is only safe if the mm semaphore is
> > > held when called." which doesn't spell out if it needs to be held in
> > > write mode (which I think it does) :)
> > 
> > Logically this makes sense to me. At the same time it looks like
> > fixup_user_fault() expects the caller to only hold mmap_read_lock() as
> > I do here. In there it even retakes mmap_read_lock(). But then wouldn't
> > any fault handling by its nature need to hold the write lock?
> 
> Well, if you're calling remap_pfn_range() right now the expectation is 
> that we hold it in write mode. :)
> 
> Staring at some random users, they all call it from mmap(), where you 
> hold the mmap lock in write mode.
> 
> 
> I wonder why we are not seeing that splat with vfio all of the time?
> 
> That mmap lock check was added "recently". In 1c71222e5f23 we started 
> using vm_flags_set(). That (including the mmap_assert_write_locked()) 
> check was added via bc292ab00f6c almost 1.5 years ago.
> 
> Maybe vfio is a bit special and was never really run with lockdep?
> 
> > 
> > > 
> > > 
> > > My best guess is: if you are using remap_pfn_range() from a fault
> > > handler (not during mmap time) you are doing something wrong, that's why
> > > you get that report.
> > 
> > @Alex: I guess so far the vfio_pci_mmap_fault() handler is only ever
> > triggered by "normal"/"actual" page faults where this isn't a problem?
> > Or could it be a problem there too?
> > 
> 
> I think we should see it there as well, unless I am missing something.

Well good news for me, bad news for everyone else. I just reproduced
the same problem on my x86_64 workstation. I "ported over" (hacked it
until it compiles) an x86 version of my trivial vfio-pci user-space
test code that mmaps() the BAR 0 of an NVMe and MMIO reads the NVMe
version field at offset 8. On my x86_64 box this leads to the following
splat (still on v6.10-rc1).

[  555.396773] ------------[ cut here ]------------
[  555.396774] WARNING: CPU: 3 PID: 1424 at include/linux/rwsem.h:85 remap_pfn_range_notrack+0x625/0x650
[  555.396778] Modules linked in: vfio_pci <-- 8< -->
[  555.396877] CPU: 3 PID: 1424 Comm: vfio-test Tainted: G        W          6.10.0-rc1-niks-00007-gb19d6d864df1 #4 d09afec01ce27ca8218580af28295f25e2d2ed53
[  555.396880] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Creator, BIOS P3.40 01/28/2021
[  555.396881] RIP: 0010:remap_pfn_range_notrack+0x625/0x650
[  555.396884] Code: a8 00 00 00 75 39 44 89 e0 48 81 c4 b0 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d e9 26 a7 e5 00 cc 0f 0b 41 bc ea ff ff ff eb c9 <0f> 0b 49 8b 47 10 e9 72 fa ff ff e8 8b 56 b5 ff e9 c0 fa ff ff e8
[  555.396887] RSP: 0000:ffffaf8b04ed3bc0 EFLAGS: 00010246
[  555.396889] RAX: ffff9ea747cfe300 RBX: 00000000000ee200 RCX: 0000000000000100
[  555.396890] RDX: 00000000000ee200 RSI: ffff9ea747cfe300 RDI: ffff9ea76db58fd0
[  555.396892] RBP: 00000000ffffffea R08: 8000000000000035 R09: 0000000000000000
[  555.396894] R10: ffff9ea76d9bbf40 R11: ffffffff96e5ce50 R12: 0000000000004000
[  555.396895] R13: 00007f23b988a000 R14: ffff9ea76db58fd0 R15: ffff9ea76db58fd0
[  555.396897] FS:  00007f23b9561740(0000) GS:ffff9eb66e780000(0000) knlGS:0000000000000000
[  555.396899] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  555.396901] CR2: 00007f23b988a008 CR3: 0000000136bde000 CR4: 0000000000350ef0
[  555.396903] Call Trace:
[  555.396904]  <TASK>
[  555.396905]  ? __warn+0x18c/0x2a0
[  555.396908]  ? remap_pfn_range_notrack+0x625/0x650
[  555.396911]  ? report_bug+0x1bb/0x270
[  555.396915]  ? handle_bug+0x42/0x70
[  555.396917]  ? exc_invalid_op+0x1a/0x50
[  555.396920]  ? asm_exc_invalid_op+0x1a/0x20
[  555.396923]  ? __pfx_is_ISA_range+0x10/0x10
[  555.396926]  ? remap_pfn_range_notrack+0x625/0x650
[  555.396929]  ? asm_exc_invalid_op+0x1a/0x20
[  555.396933]  ? track_pfn_remap+0x170/0x180
[  555.396936]  remap_pfn_range+0x6f/0xc0
[  555.396940]  vfio_pci_mmap_fault+0xf3/0x1b0 [vfio_pci_core 6df3b7ac5dcecb63cb090734847a65c799a8fef2]
[  555.396946]  __do_fault+0x11b/0x210
[  555.396949]  do_pte_missing+0x239/0x1350
[  555.396953]  handle_mm_fault+0xb10/0x18b0
[  555.396959]  do_user_addr_fault+0x293/0x710
[  555.396963]  exc_page_fault+0x82/0x1c0
[  555.396966]  asm_exc_page_fault+0x26/0x30
[  555.396968] RIP: 0033:0x55b0ea8bb7ac
[  555.396972] Code: 00 00 b0 00 e8 e5 f8 ff ff 31 c0 48 83 c4 20 5d c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 89 7d f8 48 8b 45 f8 <8b> 00 89 c0 5d c3 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48
[  555.396974] RSP: 002b:00007fff80973530 EFLAGS: 00010202
[  555.396976] RAX: 00007f23b988a008 RBX: 00007fff80973738 RCX: 00007f23b988a000
[  555.396978] RDX: 0000000000000001 RSI: 00007fff809735e8 RDI: 00007f23b988a008
[  555.396979] RBP: 00007fff80973530 R08: 0000000000000005 R09: 0000000000000000
[  555.396981] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
[  555.396982] R13: 0000000000000000 R14: 00007f23b98c8000 R15: 000055b0ea8bddc0
[  555.396986]  </TASK>
[  555.396987] ---[ end trace 0000000000000000 ]---


  reply	other threads:[~2024-06-11 15:37 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-29 11:36 [PATCH v3 0/3] vfio/pci: s390: Fix issues preventing VFIO_PCI_MMAP=y for s390 and enable it Niklas Schnelle
2024-05-29 11:36 ` [PATCH v3 1/3] s390/pci: Fix s390_mmio_read/write syscall page fault handling Niklas Schnelle
2024-06-11 11:21   ` Niklas Schnelle
2024-06-11 12:08     ` Niklas Schnelle
2024-06-11 13:23       ` Niklas Schnelle
2024-06-11 14:13         ` David Hildenbrand
2024-06-11 14:47           ` Niklas Schnelle
2024-06-11 15:10             ` David Hildenbrand
2024-06-11 15:37               ` Niklas Schnelle [this message]
2024-06-11 22:21                 ` Alex Williamson
2024-06-12  7:28                   ` David Hildenbrand
2024-06-11 15:56               ` Niklas Schnelle
2024-05-29 11:36 ` [PATCH v3 2/3] vfio/pci: Tolerate oversized BARs by disallowing mmap Niklas Schnelle
2024-06-18 15:51   ` Alex Williamson
2024-06-19  7:11     ` Christoph Hellwig
2024-06-19 10:56       ` Niklas Schnelle
2024-06-20  4:09         ` Christoph Hellwig
2024-06-20 12:06           ` Niklas Schnelle
2024-06-20 12:29             ` Gerd Bayer
2024-05-29 11:36 ` [PATCH v3 3/3] vfio/pci: Enable PCI resource mmap() on s390 and remove VFIO_PCI_MMAP Niklas Schnelle
2024-06-18 15:52   ` Alex Williamson
2024-06-03 15:50 ` [PATCH v3 0/3] vfio/pci: s390: Fix issues preventing VFIO_PCI_MMAP=y for s390 and enable it Christian Borntraeger
2024-06-04  9:27   ` Niklas Schnelle
2024-06-05  7:49     ` Niklas Schnelle
2024-06-06 17:27   ` Alex Williamson
2024-06-07  7:38     ` Alexander Gordeev
2024-06-07  7:47     ` Niklas Schnelle
2024-06-07 14:23       ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b38b571b753441314c090c3eb51c49c0e28a19d5.camel@linux.ibm.com \
    --to=schnelle@linux.ibm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=david@redhat.com \
    --cc=gbayer@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=jgg@ziepe.ca \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=surenb@google.com \
    --cc=svens@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).