Re: [RFC PATCH v2 1/6] KVM: gmem: Truncate pages on punch hole

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Michael Roth <michael.roth@amd.com>
To: Sean Christopherson <seanjc@google.com>
Cc: <isaku.yamahata@intel.com>, <kvm@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <isaku.yamahata@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>, <erdemaktas@google.com>,
	Sagi Shahar <sagis@google.com>,
	David Matlack <dmatlack@google.com>,
	Kai Huang <kai.huang@intel.com>,
	Zhi Wang <zhi.wang.linux@gmail.com>, <chen.bo@intel.com>,
	<linux-coco@lists.linux.dev>,
	Chao Peng <chao.p.peng@linux.intel.com>,
	Ackerley Tng <ackerleytng@google.com>,
	Vishal Annapurve <vannapurve@google.com>,
	Yuan Yao <yuan.yao@linux.intel.com>,
	Jarkko Sakkinen <jarkko@kernel.org>,
	Xu Yilun <yilun.xu@intel.com>,
	Quentin Perret <qperret@google.com>, <wei.w.wang@intel.com>,
	Fuad Tabba <tabba@google.com>
Subject: Re: [RFC PATCH v2 1/6] KVM: gmem: Truncate pages on punch hole
Date: Thu, 5 Oct 2023 12:52:38 -0500	[thread overview]
Message-ID: <20231005175238.7bb2zut4fb7ebdqc@amd.com> (raw)
In-Reply-To: <ZQy29msIoAGQUGR2@google.com>

On Thu, Sep 21, 2023 at 02:34:46PM -0700, Sean Christopherson wrote:
> On Thu, Sep 21, 2023, Sean Christopherson wrote:
> > On Thu, Sep 21, 2023, isaku.yamahata@intel.com wrote:
> > > From: Isaku Yamahata <isaku.yamahata@intel.com>
> > > 
> > > Although kvm_gmem_punch_hole() keeps all pages in mapping on punching hole,
> > > it's common expectation that pages are truncated.  Truncate pages on
> > > punching hole.  As page contents can be encrypted, avoid zeroing partial
> > > folio by refusing partial punch hole.
> > > 
> > > Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
> > > ---
> > >  virt/kvm/guest_mem.c | 14 ++++++++++++--
> > >  1 file changed, 12 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
> > > index a819367434e9..01fb4ca861d0 100644
> > > --- a/virt/kvm/guest_mem.c
> > > +++ b/virt/kvm/guest_mem.c
> > > @@ -130,22 +130,32 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start,
> > >  static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
> > >  {
> > >  	struct list_head *gmem_list = &inode->i_mapping->private_list;
> > > +	struct address_space *mapping  = inode->i_mapping;
> > >  	pgoff_t start = offset >> PAGE_SHIFT;
> > >  	pgoff_t end = (offset + len) >> PAGE_SHIFT;
> > >  	struct kvm_gmem *gmem;
> > >  
> > > +	/*
> > > +	 * punch hole may result in zeroing partial area.  As pages can be
> > > +	 * encrypted, prohibit zeroing partial area.
> > > +	 */
> > > +	if (offset & ~PAGE_MASK || len & ~PAGE_MASK)
> > > +		return -EINVAL;
> > 
> > This should be unnecessary, kvm_gmem_fallocate() does
> > 
> > 	if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
> > 		return -EINVAL;
> > 
> > before invoking kvm_gmem_punch_hole().  If that's not working, i.e. your test
> > fails, then that code needs to be fixed.  I'll run your test to double-check,
> > but AFAICT this is unnecesary.
> 
> I confirmed that the testcase passes without the extra checks.  Just to close the
> loop, what prompted adding more checks to kvm_gmem_punch_hole()?

I don't know if it's the same issue that Isaku ran into, but for SNP we
hit a similar issue with the truncate_inode_pages_range(lstart, lend) call.

The issue in that case was a bit more subtle:

  - userspace does a hole-punch on a 4K range of its gmem FD, which happens
    to be backed by a 2MB folio.
  - truncate_inode_pages_range() gets called for that 4K range
  - truncate_inode_pages_range() does special handling on the folios at the
    start/end of the range in case they are partial and passes these to
    truncate_inode_partial_folio(folio, lstart, lend). In this case, there's
    just the 1 backing folio. But it *still* gets the special treatment, and
    so gets passed to truncate_inode_partial_folio().
  - truncate_inode_partial_folio() will then zero that 4K range, even though
    it is page-aligned, based on the following rationale in the comments:

        /*
         * We may be zeroing pages we're about to discard, but it avoids
         * doing a complex calculation here, and then doing the zeroing
         * anyway if the page split fails.
         */
        folio_zero_range(folio, offset, length);

  - after that, .invalidate_folio callback is issued, then the folio is split,
    and the caller (truncate_inode_pages_range()) does another pass through
	the whole range and can free the now-split folio then .free_folio callbacks
    are issued.

Because of that, we can't rely on .invalidate_folio/.free_folio to handle
putting the page back into a normal host-accessible state, because the
zero'ing will happen beforehand. That's why we ended up needing to do this
for SNP patches to make sure arch-specific invalidation callbacks are issued 
before the truncation occurs:

  https://github.com/mdroth/linux/commit/4ebcc04b84dd691fc6daccb9b7438402520b0704#diff-77306411fdaeb7f322a1ca756dead9feb75363aa6117b703ac118576153ddb37R233

I'd planned to post those as a separate RFC to discuss, but when I came across
this it seemed like it might be relevant to what the TDX folks might ran into
here.

If not for the zero'ing logic mentioned above, for SNP at least, the
.free_folio() ends up working pretty nicely for both truncation and fput(),
and even plays nicely with live update use-case where the destination gmem
instance shares the inode->i_mapping, since iput() won't trigger the
truncate_inode_pages_final() until the last reference goes away so we don't
have to do anything special in kvm_gmem_release() to determine when we
should/shouldn't issue the arch-invalidations to clean up things like the
RMP table.

It seems like the above zero'ing logic could be reworked to only zero non-page
aligned ranges (as the comments above truncate_inode_pages_range() claim
should be the case), which would avoid the issue for the gmem use-case. But I
wonder if some explicit "dont-zero-these-pages" flag might be more robust.

Or maybe there's some other way we should be going about this?

next prev parent reply	other threads:[~2023-10-05 17:53 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-21 20:14 [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page isaku.yamahata
2023-09-21 20:14 ` [RFC PATCH v2 1/6] KVM: gmem: Truncate pages on punch hole isaku.yamahata
2023-09-21 20:37   ` Sean Christopherson
2023-09-21 21:34     ` Sean Christopherson
2023-10-05 17:52       ` Michael Roth [this message]
2023-10-05 23:48         ` Sean Christopherson
2023-09-21 20:14 ` [RFC PATCH v2 2/6] KVM: selftests: Add negative test cases for punch hole for guest_memfd() isaku.yamahata
2023-09-21 20:14 ` [RFC PATCH v2 3/6] KVM: selftests: Add tests for punch hole on guest_memfd isaku.yamahata
2023-09-21 20:40   ` Sean Christopherson
2023-09-21 20:14 ` [RFC PATCH v2 4/6] KVM: gmem: Add ioctl to inject memory failure on guest memfd isaku.yamahata
2023-09-21 21:29   ` Sean Christopherson
2023-09-21 21:53   ` Sean Christopherson
2023-09-21 20:14 ` [RFC PATCH v2 5/6] KVM: selftests: Add test cases for KVM_GUEST_MEMORY_FAILURE isaku.yamahata
2023-09-21 20:14 ` [RFC PATCH v2 6/6] KVM: guest_memfd: selftest: Add test case for error_remove_page method isaku.yamahata
2023-09-21 23:22   ` Sean Christopherson
2023-09-21 20:29 ` [RFC PATCH v2 0/6] KVM: gmem: Implement test cases for error_remove_page Sean Christopherson
2023-09-22 19:40   ` Isaku Yamahata
2023-09-22 20:32     ` Sean Christopherson
2023-09-28 17:14       ` Paolo Bonzini
2023-09-29  2:22 ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231005175238.7bb2zut4fb7ebdqc@amd.com \
    --to=michael.roth@amd.com \
    --cc=ackerleytng@google.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=chen.bo@intel.com \
    --cc=dmatlack@google.com \
    --cc=erdemaktas@google.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=isaku.yamahata@intel.com \
    --cc=jarkko@kernel.org \
    --cc=kai.huang@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=sagis@google.com \
    --cc=seanjc@google.com \
    --cc=tabba@google.com \
    --cc=vannapurve@google.com \
    --cc=wei.w.wang@intel.com \
    --cc=yilun.xu@intel.com \
    --cc=yuan.yao@linux.intel.com \
    --cc=zhi.wang.linux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox