Re: [PATCH] vfio/type1: Unpin zero pages

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@ziepe.ca>
To: David Hildenbrand <david@redhat.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"lpivarc@redhat.com" <lpivarc@redhat.com>,
	"Liu, Jingqi" <jingqi.liu@intel.com>,
	"Lu, Baolu" <baolu.lu@intel.com>
Subject: Re: [PATCH] vfio/type1: Unpin zero pages
Date: Wed, 7 Sep 2022 09:48:59 -0300	[thread overview]
Message-ID: <YxiTOyGqXHFkR/DY@ziepe.ca> (raw)
In-Reply-To: <b365f30b-da58-39c0-08e9-c622cc506afa@redhat.com>

On Wed, Sep 07, 2022 at 11:00:21AM +0200, David Hildenbrand wrote:
> > > I do wonder if that's a real issue, though. One approach would be to
> > > warn the VFIO users and allow for slightly exceeding the MEMLOCK limit
> > > for a while. Of course, that only works if we assume that such pinned
> > > zeropages are only extremely rarely longterm-pinned for a single VM
> > > instance by VFIO.
> > 
> > I'm confused, doesn't vfio increment the memlock for every page of VA
> > it pins? Why would it matter if the page was COW'd or not? It is
> > already accounted for today as though it was a unique page.
> > 
> > IOW if we add FOLL_FORCE it won't change the value of the memlock.
> 
> I only briefly skimmed over the code Alex might be able to provide more
> details and correct me if I'm wrong:
> 
> vfio_pin_pages_remote() contains a comment:
> 
> "Reserved pages aren't counted against the user, externally pinned pages are
> already counted against the user."
> 
> is_invalid_reserved_pfn() should return "true" for the shared zeropage and
> prevent us from accounting it via vfio_lock_acct(). Otherwise,
> vfio_find_vpfn() seems to be in place to avoid double-accounting pages.

is_invalid_reserved_pfn() is supposed to return 'true' for PFNs that
cannot be returned from pin_user_pages():

/*
 * Some mappings aren't backed by a struct page, for example an mmap'd
 * MMIO range for our own or another device.  These use a different
 * pfn conversion and shouldn't be tracked as locked pages.
 * For compound pages, any driver that sets the reserved bit in head
 * page needs to set the reserved bit in all subpages to be safe.
 */
static bool is_invalid_reserved_pfn(unsigned long pfn)

What it is talking about by 'different pfn conversion' is the
follow_fault_pfn() path, not the PUP path.

So, it is some way for VFIO to keep track of when a pfn was returned
by PUP vs follow_fault_pfn(), because it treats those two paths quite
differently.

I lost track of what the original cause of this bug is - however AFAIK
pin_user_pages() used to succeed when the zero page is mapped.

No other PUP user call this follow_fault_pfn() hacky path, and we
expect things like O_DIRECT to work properly even when reading from VA
that has the zero page mapped.

So, if we go back far enough in the git history we will find a case
where PUP is returning something for the zero page, and that something
caused is_invalid_reserved_pfn() == false since VFIO did work at some
point.

IHMO we should simply go back to the historical behavior - make
is_invalid_reserved_pfn() check for the zero_pfn and return
false. Meaning that PUP returned it.

Jason

next prev parent reply	other threads:[~2022-09-07 12:49 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-30  3:05 [PATCH] vfio/type1: Unpin zero pages Alex Williamson
2022-08-30  5:34 ` Sean Christopherson
2022-08-30  7:59 ` David Hildenbrand
2022-08-30 15:11   ` Alex Williamson
2022-08-30 15:43     ` David Hildenbrand
2022-09-02  0:53       ` Jason Gunthorpe
2022-09-02  8:24 ` Tian, Kevin
2022-09-02  8:32   ` David Hildenbrand
2022-09-02  9:09     ` Tian, Kevin
2022-09-06 23:30     ` Jason Gunthorpe
2022-09-07  9:00       ` David Hildenbrand
2022-09-07 12:48         ` Jason Gunthorpe [this message]
2022-09-07 15:55           ` Alex Williamson
2022-09-07 16:40             ` Jason Gunthorpe
2022-09-07 18:56               ` Alex Williamson
2022-09-07 19:14                 ` David Hildenbrand
2022-09-07 19:55                 ` Jason Gunthorpe
2022-09-07 20:24                   ` Alex Williamson
2022-09-07 23:07                     ` Jason Gunthorpe
2022-09-08  1:10                       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YxiTOyGqXHFkR/DY@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@intel.com \
    --cc=david@redhat.com \
    --cc=jingqi.liu@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lpivarc@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox