public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Hansen <dave.hansen@intel.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	luto@kernel.org, peterz@infradead.org,
	sathyanarayanan.kuppuswamy@linux.intel.com, aarcange@redhat.com,
	ak@linux.intel.com, dan.j.williams@intel.com, david@redhat.com,
	hpa@zytor.com, jgross@suse.com, jmattson@google.com,
	joro@8bytes.org, jpoimboe@redhat.com, knsathya@kernel.org,
	pbonzini@redhat.com, sdeep@vmware.com, seanjc@google.com,
	tony.luck@intel.com, vkuznets@redhat.com, wanpengli@tencent.com,
	thomas.lendacky@amd.com, brijesh.singh@amd.com, x86@kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCHv5 11/30] x86/tdx: Handle in-kernel MMIO
Date: Wed, 9 Mar 2022 17:06:11 -0800	[thread overview]
Message-ID: <da0056e8-58cf-2c95-fe66-4dad1ae9c4da@intel.com> (raw)
In-Reply-To: <20220310005145.hzv2lzxgs7uxblfr@black.fi.intel.com>

On 3/9/22 16:51, Kirill A. Shutemov wrote:
> On Tue, Mar 08, 2022 at 01:26:28PM -0800, Dave Hansen wrote:
>> Memory encryption has zero to do with this.  The TDX isolation
>> mechanisms are totally discrete from memory encryption, although they
>> are "neighbors" of sorts.
> 
> Hm. I don't see why you say encryption is not relevant. VMM (host kernel)
> has ultimate access to guest memory cypher text. It can read it as
> cypher text without any issue (using KeyID-0).
> 
> Could you elaborate on the point?

I think you're just confusing what TDX has with MKTME.  The whitepaper says:

> The TD-bit associated with the line in memory seeks to
> detect software or devices attempting to read memory
> encrypted with private KeyID, using a shared KeyID, to reveal
> the ciphertext. On such accesses, the MKTME returns a fixed
> pattern to prevent ciphertext analysis.

I think several firstborn were sacrificed to get that bit.  Let's not
forget why we have it. :)

>>> Rather than touching the entire kernel, it might also be possible to
>>> just go after drivers that use MMIO in TDX guests.  Right now, that's
>>> limited only to virtio and some x86-specific drivers.
>>>
>>> All virtio MMIO appears to be done through a single function, which
>>> makes virtio eminently easy to patch.
>>>
>>> This approach will be adopted in the future, removing the bulk of
>>> MMIO #VEs. The #VE-based MMIO will remain serving non-virtio use cases.
>>
>> This still doesn't *quite* do it for me for a justification.  Why can't
>> the non-virtio cases be converted as well?  Why doesn't the "patching
>> MMIO sites" work for x86 code too?
>>
>> You really need to convince us that *this* approach will be required
>> forever.
> 
> What if I add:
> 
> 	Many drivers can potentially be used inside TDX guest (e.g. via device
> 	passthough or random device emulation by VMM), but very few will.
> 	Patching every possible driver is not practical. #VE-based MMIO provides
> 	functionality for everybody. Performance-critical cases can be optimized
> 	as needed.

This problem was laid out as having three cases:
1. virtio
2. x86-specific drivers
3. random drivers (everything else)

#1 could be done with paravirt
#2 is unspecified and unknown
#3 use doesn't as far as I know exist in TDX guests today

>>> +static bool handle_mmio(struct pt_regs *regs, struct ve_info *ve)
>>> +{
>>> +	char buffer[MAX_INSN_SIZE];
>>> +	unsigned long *reg, val;
>>> +	struct insn insn = {};
>>> +	enum mmio_type mmio;
>>> +	int size, extend_size;
>>> +	u8 extend_val = 0;
>>> +
>>> +	if (copy_from_kernel_nofault(buffer, (void *)regs->ip, MAX_INSN_SIZE))
>>> +		return false;
>>> +
>>> +	if (insn_decode(&insn, buffer, MAX_INSN_SIZE, INSN_MODE_64))
>>> +		return false;
>>> +
>>> +	mmio = insn_decode_mmio(&insn, &size);
>>> +	if (WARN_ON_ONCE(mmio == MMIO_DECODE_FAILED))
>>> +		return false;
>>> +
>>> +	if (mmio != MMIO_WRITE_IMM && mmio != MMIO_MOVS) {
>>> +		reg = insn_get_modrm_reg_ptr(&insn, regs);
>>> +		if (!reg)
>>> +			return false;
>>> +	}
>>> +
>>> +	ve->instr_len = insn.length;
>>> +
>>> +	switch (mmio) {
>>> +	case MMIO_WRITE:
>>> +		memcpy(&val, reg, size);
>>> +		return mmio_write(size, ve->gpa, val);
>>> +	case MMIO_WRITE_IMM:
>>> +		val = insn.immediate.value;
>>> +		return mmio_write(size, ve->gpa, val);
>>> +	case MMIO_READ:
>>> +	case MMIO_READ_ZERO_EXTEND:
>>> +	case MMIO_READ_SIGN_EXTEND:
>>> +		break;
>>> +	case MMIO_MOVS:
>>> +	case MMIO_DECODE_FAILED:
>>> +		/*
>>> +		 * MMIO was accessed with an instruction that could not be
>>> +		 * decoded or handled properly. It was likely not using io.h
>>> +		 * helpers or accessed MMIO accidentally.
>>> +		 */
>>> +		return false;
>>> +	default:
>>> +		/* Unknown insn_decode_mmio() decode value? */
>>> +		BUG();
>>> +	}
>>
>> BUG()s are bad.  The set of insn_decode_mmio() return codes is known at
>> compile time.  If we're really on the lookout for unknown values, why
>> not just:
>>
>> 	BUILD_BUG_ON(NR_MMIO_TYPES != 6); // or whatever
> 
> This doesn't work.
> 
> We can pretend that the function only forced to return values from the
> enum. The truth is that it can return whatever int it wants. Type system
> in C is too week to guarantee anything here. The BUG() is backstop for it.
> 
> This BUILD_BUG_ON() is useless. Compiler complains about missing case in
> the switch anyway.
> 
>> Also, there are *lots* of ways for this function to just fall over and
>> fail.  Why does this particular failure mode deserve a BUG()?
>>
>> Is there a reason a BUG() is better than returning failure which
>> presumably sets off the #GP-like logic?
> 
> BUG() here makes it clear that the handler itself is buggy. Returning
> false and kicking in #GP-like logic indicates that something wrong with
> the code that triggered #VE. I think it is an important distinction.

OK, then how about a WARN_ON() which is followed by the #GP?

Let's say insn_decode_mmio() does something insane like:

	return -EINVAL;

Should we really be killing the kernel for that?

>> Also, now that I've read this a few times, I've been confused by the
>> same thing a few times.  This is handling instructions that might read
>> or write or do both, correct?
>>
>> Should that be made explicit in a function comment?
> 
> Hm. Okay. Something like
> 
> /* Handle reads from and writes to MMIO region. */
> 
> before the function?

There are really two halves to the function.  It would be nice to be
explicit about how the function is laid out and why reads take a bit
more handling at the end than writes.

That might be obvious to someone who has written an emulator or two but
it would be nice to include for the uninitiated.

  reply	other threads:[~2022-03-10  1:06 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-02 14:27 [PATCHv5 00/30] TDX Guest: TDX core support Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 01/30] x86/tdx: Detect running as a TDX guest in early boot Kirill A. Shutemov
2022-03-04 15:43   ` Borislav Petkov
2022-03-04 15:47     ` Dave Hansen
2022-03-04 16:02       ` Borislav Petkov
2022-03-07 22:24         ` [PATCHv5.1 " Kirill A. Shutemov
2022-03-09 18:22           ` Borislav Petkov
2022-03-02 14:27 ` [PATCHv5 02/30] x86/tdx: Provide common base for SEAMCALL and TDCALL C wrappers Kirill A. Shutemov
2022-03-08 19:56   ` Dave Hansen
2022-03-10 12:32   ` Borislav Petkov
2022-03-10 14:44     ` Kirill A. Shutemov
2022-03-10 14:51       ` Borislav Petkov
2022-03-02 14:27 ` [PATCHv5 03/30] x86/tdx: Add __tdx_module_call() and __tdx_hypercall() helper functions Kirill A. Shutemov
2022-03-08 20:03   ` Dave Hansen
2022-03-10 15:30   ` Borislav Petkov
2022-03-10 21:20     ` Kirill A. Shutemov
2022-03-10 21:48       ` Kirill A. Shutemov
2022-03-15 15:56         ` Borislav Petkov
2022-03-12 10:41       ` Borislav Petkov
2022-03-02 14:27 ` [PATCHv5 04/30] x86/tdx: Extend the confidential computing API to support TDX guests Kirill A. Shutemov
2022-03-08 20:17   ` Dave Hansen
2022-03-09 16:01     ` [PATCHv5.1 " Kirill A. Shutemov
2022-03-09 18:36       ` Dave Hansen
2022-03-09 23:51         ` [PATCHv5.2 " Kirill A. Shutemov
2022-03-10  0:07           ` Dave Hansen
2022-03-15 19:41           ` Borislav Petkov
2022-03-02 14:27 ` [PATCHv5 05/30] x86/tdx: Exclude shared bit from __PHYSICAL_MASK Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 06/30] x86/traps: Refactor exc_general_protection() Kirill A. Shutemov
2022-03-08 20:18   ` Dave Hansen
2022-03-02 14:27 ` [PATCHv5 07/30] x86/traps: Add #VE support for TDX guest Kirill A. Shutemov
2022-03-08 20:29   ` Dave Hansen
2022-03-02 14:27 ` [PATCHv5 08/30] x86/tdx: Add HLT support for TDX guests Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 09/30] x86/tdx: Add MSR " Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 10/30] x86/tdx: Handle CPUID via #VE Kirill A. Shutemov
2022-03-08 20:33   ` Dave Hansen
2022-03-09 16:15     ` [PATCH] " Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 11/30] x86/tdx: Handle in-kernel MMIO Kirill A. Shutemov
2022-03-08 21:26   ` Dave Hansen
2022-03-10  0:51     ` Kirill A. Shutemov
2022-03-10  1:06       ` Dave Hansen [this message]
2022-03-10 16:48         ` Kirill A. Shutemov
2022-03-10 17:53           ` Dave Hansen
2022-03-11 17:18             ` Kirill A. Shutemov
2022-03-11 17:22               ` Dave Hansen
2022-03-11 18:01               ` Dave Hansen
2022-03-02 14:27 ` [PATCHv5 12/30] x86/tdx: Detect TDX at early kernel decompression time Kirill A. Shutemov
2022-03-07 22:27   ` [PATCHv5.1 " Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 13/30] x86: Adjust types used in port I/O helpers Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 14/30] x86: Consolidate " Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 15/30] x86/boot: Port I/O: allow to hook up alternative helpers Kirill A. Shutemov
2022-03-02 17:42   ` Josh Poimboeuf
2022-03-02 19:41     ` Dave Hansen
2022-03-02 20:02       ` Josh Poimboeuf
2022-03-02 14:27 ` [PATCHv5 16/30] x86/boot: Port I/O: add decompression-time support for TDX Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 17/30] x86/tdx: Port I/O: add runtime hypercalls Kirill A. Shutemov
2022-03-08 21:30   ` Dave Hansen
2022-03-02 14:27 ` [PATCHv5 18/30] x86/tdx: Port I/O: add early boot support Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 19/30] x86/tdx: Wire up KVM hypercalls Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 20/30] x86/boot: Add a trampoline for booting APs via firmware handoff Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 21/30] x86/acpi, x86/boot: Add multiprocessor wake-up support Kirill A. Shutemov
2022-03-02 14:27 ` [PATCHv5 22/30] x86/boot: Set CR0.NE early and keep it set during the boot Kirill A. Shutemov
2022-03-08 21:37   ` Dave Hansen
2022-03-02 14:27 ` [PATCHv5 23/30] x86/boot: Avoid #VE during boot for TDX platforms Kirill A. Shutemov
2022-03-07  9:29   ` Xiaoyao Li
2022-03-07 22:33     ` Kirill A. Shutemov
2022-03-08  1:19       ` Xiaoyao Li
2022-03-08 16:41         ` Kirill A. Shutemov
2022-03-07 22:36     ` [PATCHv5.1 " Kirill A. Shutemov
2022-03-02 14:28 ` [PATCHv5 24/30] x86/topology: Disable CPU online/offline control for TDX guests Kirill A. Shutemov
2022-03-02 14:28 ` [PATCHv5 25/30] x86/tdx: Make pages shared in ioremap() Kirill A. Shutemov
2022-03-08 22:02   ` Dave Hansen
2022-03-02 14:28 ` [PATCHv5 26/30] x86/mm/cpa: Add support for TDX shared memory Kirill A. Shutemov
2022-03-09 19:44   ` Dave Hansen
2022-03-02 14:28 ` [PATCHv5 27/30] x86/kvm: Use bounce buffers for TD guest Kirill A. Shutemov
2022-03-09 20:07   ` Dave Hansen
2022-03-10 14:29     ` Tom Lendacky
2022-03-10 14:51       ` Christoph Hellwig
2022-03-02 14:28 ` [PATCHv5 28/30] x86/tdx: ioapic: Add shared bit for IOAPIC base address Kirill A. Shutemov
2022-03-09 20:39   ` Dave Hansen
2022-03-02 14:28 ` [PATCHv5 29/30] ACPICA: Avoid cache flush inside virtual machines Kirill A. Shutemov
2022-03-02 16:13   ` Dan Williams
2022-03-09 20:56   ` Dave Hansen
2022-03-02 14:28 ` [PATCHv5 30/30] Documentation/x86: Document TDX kernel architecture Kirill A. Shutemov
2022-03-09 21:49   ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da0056e8-58cf-2c95-fe66-4dad1ae9c4da@intel.com \
    --to=dave.hansen@intel.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=brijesh.singh@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=jpoimboe@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=knsathya@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=sdeep@vmware.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=tony.luck@intel.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox