From: Claudio Imbrenda <imbrenda@linux.ibm.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: linux-next@vger.kernel.org, akpm@linux-foundation.org,
jack@suse.cz, kirill@shutemov.name, borntraeger@de.ibm.com,
david@redhat.com, aarcange@redhat.com, linux-mm@kvack.org,
frankja@linux.ibm.com, sfr@canb.auug.org.au, jhubbard@nvidia.com,
linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
Will Deacon <will@kernel.org>,
Sean Christopherson <sean.j.christopherson@intel.com>
Subject: Re: [PATCH v4 2/2] mm/gup/writeback: add callbacks for inaccessible pages
Date: Wed, 15 Apr 2020 11:26:39 +0200 [thread overview]
Message-ID: <20200415112639.525e25bc@p-imbrenda> (raw)
In-Reply-To: <93dc9885-adb4-8b9d-a62a-e40301053551@intel.com>
On Tue, 14 Apr 2020 11:50:16 -0700
Dave Hansen <dave.hansen@intel.com> wrote:
> On 4/14/20 9:03 AM, Claudio Imbrenda wrote:
> > On Mon, 13 Apr 2020 13:22:24 -0700
> > Dave Hansen <dave.hansen@intel.com> wrote:
> >
> >> On 3/6/20 5:25 AM, Claudio Imbrenda wrote:
> >>> On s390x the function is not supposed to fail, so it is ok to use
> >>> a WARN_ON on failure. If we ever need some more finegrained
> >>> handling we can tackle this when we know the details.
> >>
> >> Could you explain a bit why the function can't fail?
> >
> > the concept of "making accessible" is only to make sure that
> > accessing the page will not trigger faults or I/O or DMA errors. in
> > general it does not mean freely accessing the content of the page
> > in cleartext.
> >
> > on s390x, protected guest pages can be shared. the guest has to
> > actively share its pages, and in that case those pages are both
> > part of the protected VM and freely accessible by the host.
>
> Oh, that's interesting.
>
> It sounds like there are three separate concepts:
> 1. Protection
> 2. Sharing
> 3. Accessibility
>
> Protected pages may be shared and the request of the guest.
> Shared pages' plaintext can be accessed by the host. For unshared
> pages, the host can only see ciphertext.
>
> I wonder if Documentation/virt/kvm/s390-pv.rst can be beefed up with
> some of this information. It seems a bit sparse on this topic.
that is definitely something that can be fixed.
I will improve the documentation and make sure it properly explains
all the details of how protected VMs work on s390x.
> As it stands, if I were modifying generic code, I don't think I'd have
> even a chance of getting an arch_make_page_accessible() in the right
> spot.
>
> > in our case "making the page accessible" means:
> ...
> > - if the page was not shared, first encrypt it and then make it
> > accessible to the host (both operations performed securely and
> > atomically by the hardware)
>
> What happens to the guest's view of the page when this happens? Does
> it keep seeing plaintext?
>
> > then the page can be swapped out, or used for direct I/O (obviously
> > if you do I/O on a page that was not shared, you cannot expect good
> > things to happen, since you basically corrupt the memory of the
> > guest).
>
> So why even allow access to the encrypted contents if the host can't
> do anything useful with it? Is there some reason for going to the
> trouble of encrypting it and exposing it to the host?
you should not overwrite it, but you can/should write it out verbatim,
e.g. for swap
> > on s390x performing I/O directly on protected pages results in (in
> > practice) unrecoverable I/O errors, so we want to avoid it at all
> > costs.
>
> This is understandable, but we usually steer I/O operations in places
> like the DMA API, not in the core VM.
>
> We *have* the concept of pages to which I/O can't be done. There are
> plenty of crippled devices where we have to bounce data into a low
> buffer before it can go where we really want it to. I think the AMD
> SEV patches do this, for instance.
>
> > accessing protected pages from the CPU triggers an exception that
> > can be handled (and we do handle it, in fact)
> >
> > now imagine a buggy or malicious qemu process crashing the whole
> > machine just because it did I/O to/from a protected page. we
> > clearly don't want that.
>
> Is DMA disallowed to *all* protected pages? Even pages which the
> guest has explicitly shared with the host?
>
>
> >>> @@ -2807,6 +2807,13 @@ int __test_set_page_writeback(struct page
> >>> *page, bool keep_write) inc_zone_page_state(page,
> >>> NR_ZONE_WRITE_PENDING); }
> >>> unlock_page_memcg(page);
> >>> + access_ret = arch_make_page_accessible(page);
> >>> + /*
> >>> + * If writeback has been triggered on a page that cannot
> >>> be made
> >>> + * accessible, it is too late to recover here.
> >>> + */
> >>> + VM_BUG_ON_PAGE(access_ret != 0, page);
> >>> +
> >>> return ret;
> >>>
> >>> }
> >>
> >> This seems like a really odd place to do this. Writeback is
> >> specific to block I/O. I would have thought there were other
> >> kinds of devices that matter, not just block devices.
> >
> > well, yes and no. for writeback (block I/O and swap) this is the
> > right place. at this point we know that the page is present and
> > nobody else has started doing I/O yet, and I/O will happen
> > soon-ish. so we make the page accessible. there is no turning back
> > here, unlike pinning. we are not allowed to fail, we can't
>
> This description sounds really incomplete to me.
>
> Not all swap involved device I/O. For instance, zswap doesn't involve
> any devices. Would zswap need this hook?
please feel free to write to me privately if you have any further
questions or doubts :)
best regards,
Claudio Imbrenda
next prev parent reply other threads:[~2020-04-15 9:26 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-06 13:25 [PATCH v4 0/2] add callbacks for inaccessible pages Claudio Imbrenda
2020-03-06 13:25 ` [PATCH v4 1/2] mm/gup: fixup for 9947ea2c1e608e32 "mm/gup: track FOLL_PIN pages" Claudio Imbrenda
2020-03-06 13:25 ` [PATCH v4 2/2] mm/gup/writeback: add callbacks for inaccessible pages Claudio Imbrenda
2020-04-13 20:22 ` Dave Hansen
2020-04-14 16:03 ` Claudio Imbrenda
2020-04-14 18:50 ` Dave Hansen
2020-04-15 9:26 ` Claudio Imbrenda [this message]
2020-04-15 11:39 ` Janosch Frank
2020-04-15 21:52 ` Dave Hansen
2020-04-15 22:17 ` Peter Zijlstra
2020-04-15 23:34 ` Dave Hansen
2020-04-16 12:15 ` Claudio Imbrenda
2020-04-16 14:20 ` Dave Hansen
2020-04-16 14:59 ` Claudio Imbrenda
2020-04-16 15:36 ` Dave Hansen
2020-04-16 16:34 ` Claudio Imbrenda
2020-04-16 19:02 ` Dave Hansen
2020-04-21 21:31 ` Dave Hansen
2020-04-28 19:43 ` Dave Hansen
2020-04-28 20:02 ` Christian Borntraeger
2020-04-28 23:39 ` Claudio Imbrenda
2020-04-29 0:42 ` Dave Hansen
2020-04-16 11:51 ` Claudio Imbrenda
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200415112639.525e25bc@p-imbrenda \
--to=imbrenda@linux.ibm.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=borntraeger@de.ibm.com \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=frankja@linux.ibm.com \
--cc=jack@suse.cz \
--cc=jhubbard@nvidia.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-next@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=sean.j.christopherson@intel.com \
--cc=sfr@canb.auug.org.au \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).