Re: [PATCH] vfio/type1: conditional rescheduling while pinning

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Alex Williamson <alex.williamson@redhat.com>
To: Keith Busch <kbusch@kernel.org>
Cc: Keith Busch <kbusch@meta.com>, kvm@vger.kernel.org
Subject: Re: [PATCH] vfio/type1: conditional rescheduling while pinning
Date: Wed, 19 Mar 2025 12:17:04 -0600	[thread overview]
Message-ID: <20250319121704.7744c73e.alex.williamson@redhat.com> (raw)
In-Reply-To: <Z9rm-Y-B2et9uvKc@kbusch-mbp>

On Wed, 19 Mar 2025 09:47:05 -0600
Keith Busch <kbusch@kernel.org> wrote:

> On Mon, Mar 17, 2025 at 04:53:47PM -0600, Alex Williamson wrote:
> > On Mon, 17 Mar 2025 16:30:47 -0600
> > Keith Busch <kbusch@kernel.org> wrote:
> >   
> > > On Mon, Mar 17, 2025 at 03:44:17PM -0600, Alex Williamson wrote:  
> > > > On Wed, 12 Mar 2025 15:52:55 -0700    
> > > > > @@ -679,6 +679,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
> > > > >  
> > > > >  		if (unlikely(disable_hugepages))
> > > > >  			break;
> > > > > +		cond_resched();
> > > > >  	}
> > > > >  
> > > > >  out:    
> > > > 
> > > > Hey Keith, is this still necessary with:
> > > > 
> > > > https://lore.kernel.org/all/20250218222209.1382449-1-alex.williamson@redhat.com/    
> > > 
> > > Thank you for the suggestion. I'll try to fold this into a build, and
> > > see what happens. But from what I can tell, I'm not sure it will help.
> > > We're simply not getting large folios in this path and dealing with
> > > individual pages. Though it is a large contiguous range (~60GB, not
> > > necessarily aligned). Shoould we expect to only be dealing with PUD and
> > > PMD levels with these kinds of mappings?  
> > 
> > IME with QEMU, PMD alignment basically happens without any effort and
> > gets 90+% of the way there, PUD alignment requires a bit of work[1].
> >    
> > > > This is currently in linux-next from the vfio next branch and should
> > > > pretty much eliminate any stalls related to DMA mapping MMIO BARs.
> > > > Also the code here has been refactored in next, so this doesn't apply
> > > > anyway, and if there is a resched still needed, this location would
> > > > only affect DMA mapping of memory, not device BARs.  Thanks,    
> > > 
> > > Thanks for the head's up. Regardless, it doesn't look like bad place to
> > > cond_resched(), but may not trigger any cpu stall indicator outside this
> > > vfio fault path.  
> > 
> > Note that we already have a cond_resched() in vfio_iommu_map(), which
> > we'll hit any time we get a break in a contiguous mapping.  We may hit
> > that regularly enough that it's not an issue for RAM mapping, but I've
> > certainly seen soft lockups when we have many GiB of contiguous pfnmaps
> > prior to the series above.  Thanks,  
> 
> So far adding the additional patches has not changed anything. We've
> ensured we are using an address and length aligned to 2MB, but it sure
> looks like vfio's fault handler is only getting order-0 faults. I'm not
> finding anything immediately obvious about what we can change to get the
> desired higher order behvaior, though. Any other hints or information I
> could provide?

Since you mention folding in the changes, are you working on an upstream
kernel or a downstream backport?  Huge pfnmap support was added in
v6.12 via [1].  Without that you'd never see better than a order-a
fault.  I hope that's it because with all the kernel pieces in place it
should "Just work".  Thanks,

Alex

[1] https://lore.kernel.org/all/20240826204353.2228736-1-peterx@redhat.com/

next prev parent reply	other threads:[~2025-03-19 18:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-12 22:52 [PATCH] vfio/type1: conditional rescheduling while pinning Keith Busch
2025-03-17 21:44 ` Alex Williamson
2025-03-17 22:30   ` Keith Busch
2025-03-17 22:53     ` Alex Williamson
2025-03-19 15:47       ` Keith Busch
2025-03-19 18:17         ` Alex Williamson [this message]
2025-03-19 18:34           ` Keith Busch
2025-03-19 22:13             ` Keith Busch
2025-07-09 20:18           ` Keith Busch
2025-07-11 20:16             ` Alex Williamson
2025-07-11 20:40               ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250319121704.7744c73e.alex.williamson@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=kbusch@kernel.org \
    --cc=kbusch@meta.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).