public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>,
	Jason Gunthorpe <jgg@ziepe.ca>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"lpivarc@redhat.com" <lpivarc@redhat.com>,
	"Liu, Jingqi" <jingqi.liu@intel.com>,
	"Lu, Baolu" <baolu.lu@intel.com>
Subject: Re: [PATCH] vfio/type1: Unpin zero pages
Date: Wed, 7 Sep 2022 21:14:54 +0200	[thread overview]
Message-ID: <7ed97088-b4f0-e36f-c935-87cd1e94c574@redhat.com> (raw)
In-Reply-To: <20220907125627.0579e592.alex.williamson@redhat.com>

On 07.09.22 20:56, Alex Williamson wrote:
> On Wed, 7 Sep 2022 13:40:52 -0300
> Jason Gunthorpe <jgg@ziepe.ca> wrote:
> 
>> On Wed, Sep 07, 2022 at 09:55:52AM -0600, Alex Williamson wrote:
>>
>>>> So, if we go back far enough in the git history we will find a case
>>>> where PUP is returning something for the zero page, and that something
>>>> caused is_invalid_reserved_pfn() == false since VFIO did work at some
>>>> point.
>>>
>>> Can we assume that?  It takes a while for a refcount leak on the zero
>>> page to cause an overflow.  My assumption is that it's never worked, we
>>> pin zero pages, don't account them against the locked memory limits
>>> because our is_invalid_reserved_pfn() test returns true, and therefore
>>> we don't unpin them.
>>
>> Oh, you think it has been buggy forever? That is not great..
>>   
>>>> IHMO we should simply go back to the historical behavior - make
>>>> is_invalid_reserved_pfn() check for the zero_pfn and return
>>>> false. Meaning that PUP returned it.
>>>
>>> We've never explicitly tested for zero_pfn and as David notes,
>>> accounting the zero page against the user's locked memory limits has
>>> user visible consequences.  VMs that worked with a specific locked
>>> memory limit may no longer work.
>>
>> Yes, but the question is if that is a strict ABI we have to preserve,
>> because if you take that view it also means because VFIO has this
>> historical bug that David can't fix the FOLL_FORCE issue either.
>>
>> If the view holds for memlock then it should hold for cgroups
>> also. This means the kernel can never change anything about
>> GFP_KERNEL_ACCOUNT allocations because it might impact userspace
>> having set a tight limit there.
>>
>> It means we can't fix the bug that VFIO is using the wrong storage for
>> memlock.
>>
>> It means qemu can't change anything about how it sets up this memory,
>> ie Kevin's idea to change the ordering.
>>
>> On the other hand the "abi break" we are talking about is that a user
>> might have to increase a configured limit in a config file after a
>> kernel upgrade.
>>
>> IDK what consensus exists here, I've never heard of anyone saying
>> these limits are a strict ABI like this.. I think at least for cgroup
>> that would be so invasive as to immediately be off the table.
> 
> I thought we'd already agreed that we were stuck with locked_vm for
> type1 and any compatibility mode of type1 due to this.  Native iommufd
> support can do the right thing since userspace will need to account for
> various new usage models anyway.
> 
> I've raised the issue with David for the zero page accounting, but I
> don't know what the solution is.  libvirt automatically adds a 1GB
> fudge factor to the VM locked memory limits to account for things like
> ROM mappings, or at least the non-zeropage backed portion of those
> ROMs.  I think that most management tools have adopted similar, so the
> majority of users shouldn't notice.  However this won't cover all
> users, so we certainly risk breaking userspace if we introduce hard
> page accounting of zero pages.
> 
> I think David suggested possibly allowing some degree of exceeding
> locked memory limits for zero page COWs.  Potentially type1 could do
> this as well; special case handling with some heuristically determined,
> module parameter defined limit.  We might also consider whether we
> could just ignore zero page mappings, maybe with a optional "strict"
> mode module option to generate an errno on such mappings.  Thanks,

So far I played with the ideas

a) allow slightly exceeding the limit and warn

b) weird vfio kernel parameter to control the zeropage behavior (old vs.
    new)

I certainly have in mind that we might need some toggle for vfio.

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2022-09-07 19:15 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-30  3:05 [PATCH] vfio/type1: Unpin zero pages Alex Williamson
2022-08-30  5:34 ` Sean Christopherson
2022-08-30  7:59 ` David Hildenbrand
2022-08-30 15:11   ` Alex Williamson
2022-08-30 15:43     ` David Hildenbrand
2022-09-02  0:53       ` Jason Gunthorpe
2022-09-02  8:24 ` Tian, Kevin
2022-09-02  8:32   ` David Hildenbrand
2022-09-02  9:09     ` Tian, Kevin
2022-09-06 23:30     ` Jason Gunthorpe
2022-09-07  9:00       ` David Hildenbrand
2022-09-07 12:48         ` Jason Gunthorpe
2022-09-07 15:55           ` Alex Williamson
2022-09-07 16:40             ` Jason Gunthorpe
2022-09-07 18:56               ` Alex Williamson
2022-09-07 19:14                 ` David Hildenbrand [this message]
2022-09-07 19:55                 ` Jason Gunthorpe
2022-09-07 20:24                   ` Alex Williamson
2022-09-07 23:07                     ` Jason Gunthorpe
2022-09-08  1:10                       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ed97088-b4f0-e36f-c935-87cd1e94c574@redhat.com \
    --to=david@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jingqi.liu@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lpivarc@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox