From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54197) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fzJqk-0004zW-F5 for qemu-devel@nongnu.org; Mon, 10 Sep 2018 06:55:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fzJg0-0008Uq-9G for qemu-devel@nongnu.org; Mon, 10 Sep 2018 06:44:20 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:60486 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fzJg0-0008Ta-0C for qemu-devel@nongnu.org; Mon, 10 Sep 2018 06:44:16 -0400 Date: Mon, 10 Sep 2018 11:44:09 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20180910104408.GE2482@work-vm> References: <20180904084648.680c564b@redhat.com> <20180906095802.7dffbf1b@redhat.com> <20180906085937.GA2639@work-vm> <20180906094245.GB2639@work-vm> <20180906172304.GF2639@work-vm> <43c91014-2ca4-681b-51e7-b8cde063014c@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <43c91014-2ca4-681b-51e7-b8cde063014c@redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v10 6/6] tpm: add ACPI memory clear interface List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laszlo Ersek Cc: =?iso-8859-1?Q?Marc-Andr=E9?= Lureau , Eduardo Habkost , "Michael S. Tsirkin" , Stefan Berger , QEMU , Paolo Bonzini , Igor Mammedov , Richard Henderson * Laszlo Ersek (lersek@redhat.com) wrote: > On 09/06/18 19:23, Dr. David Alan Gilbert wrote: > > * Marc-Andr=E9 Lureau (marcandre.lureau@gmail.com) wrote: > >> Hi > >> > >> On Thu, Sep 6, 2018 at 1:42 PM Dr. David Alan Gilbert > >> wrote: > >>> > >>> * Marc-Andr=E9 Lureau (marcandre.lureau@gmail.com) wrote: > >>>> Hi > >>>> > >>>> On Thu, Sep 6, 2018 at 12:59 PM Dr. David Alan Gilbert > >>>> wrote: > >>>>> > >>>>> * Marc-Andr=E9 Lureau (marcandre.lureau@gmail.com) wrote: > >>>>>> Hi > >>>>>> > >>>>>> On Thu, Sep 6, 2018 at 11:58 AM Igor Mammedov wrote: > >>>>>>> > >>>>>>> On Thu, 6 Sep 2018 07:50:09 +0400 > >>>>>>> Marc-Andr=E9 Lureau wrote: > >>>>>>> > >>>>>>>> Hi > >>>>>>>> > >>>>>>>> On Tue, Sep 4, 2018 at 10:47 AM Igor Mammedov wrote: > >>>>>>>>> > >>>>>>>>> On Fri, 31 Aug 2018 19:24:24 +0200 > >>>>>>>>> Marc-Andr=E9 Lureau wrote: > >>>>>>>>> > >>>>>>>>>> This allows to pass the last failing test from the Windows H= LK TPM 2.0 > >>>>>>>>>> TCG PPI 1.3 tests. > >>>>>>>>>> > >>>>>>>>>> The interface is described in the "TCG Platform Reset Attack > >>>>>>>>>> Mitigation Specification", chapter 6 "ACPI _DSM Function". A= ccording > >>>>>>>>>> to Laszlo, it's not so easy to implement in OVMF, he suggest= ed to do > >>>>>>>>>> it in qemu instead. > >>>>>>>>>> > >>>>>>>>>> Signed-off-by: Marc-Andr=E9 Lureau > >>>>>>>>>> --- > >>>>>>>>>> hw/tpm/tpm_ppi.h | 2 ++ > >>>>>>>>>> hw/i386/acpi-build.c | 46 +++++++++++++++++++++++++++++++++= +++++++++++ > >>>>>>>>>> hw/tpm/tpm_crb.c | 1 + > >>>>>>>>>> hw/tpm/tpm_ppi.c | 23 ++++++++++++++++++++++ > >>>>>>>>>> hw/tpm/tpm_tis.c | 1 + > >>>>>>>>>> docs/specs/tpm.txt | 2 ++ > >>>>>>>>>> hw/tpm/trace-events | 3 +++ > >>>>>>>>>> 7 files changed, 78 insertions(+) > >>>>>>>>>> > >>>>>>>>>> diff --git a/hw/tpm/tpm_ppi.h b/hw/tpm/tpm_ppi.h > >>>>>>>>>> index f6458bf87e..3239751e9f 100644 > >>>>>>>>>> --- a/hw/tpm/tpm_ppi.h > >>>>>>>>>> +++ b/hw/tpm/tpm_ppi.h > >>>>>>>>>> @@ -23,4 +23,6 @@ typedef struct TPMPPI { > >>>>>>>>>> bool tpm_ppi_init(TPMPPI *tpmppi, struct MemoryRegion *m, > >>>>>>>>>> hwaddr addr, Object *obj, Error **errp); > >>>>>>>>>> > >>>>>>>>>> +void tpm_ppi_reset(TPMPPI *tpmppi); > >>>>>>>>>> + > >>>>>>>>>> #endif /* TPM_TPM_PPI_H */ > >>>>>>>>>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c > >>>>>>>>>> index c5e9a6e11d..2ab3e8fae7 100644 > >>>>>>>>>> --- a/hw/i386/acpi-build.c > >>>>>>>>>> +++ b/hw/i386/acpi-build.c > >>>>>>>>>> @@ -1824,6 +1824,13 @@ build_tpm_ppi(TPMIf *tpm, Aml *dev) > >>>>>>>>>> pprq =3D aml_name("PPRQ"); > >>>>>>>>>> pprm =3D aml_name("PPRM"); > >>>>>>>>>> > >>>>>>>>>> + aml_append(dev, > >>>>>>>>>> + aml_operation_region("TPP3", AML_SYSTEM_MEMO= RY, > >>>>>>>>>> + aml_int(TPM_PPI_ADDR_BA= SE + 0x15a), > >>>>>>>>>> + 0x1)); > >>>>>>>>>> + field =3D aml_field("TPP3", AML_BYTE_ACC, AML_NOLOCK, A= ML_PRESERVE); > >>>>>>>>>> + aml_append(field, aml_named_field("MOVV", 8)); > >>>>>>>>>> + aml_append(dev, field); > >>>>>>>>>> /* > >>>>>>>>>> * DerefOf in Windows is broken with SYSTEM_MEMORY. Us= e a dynamic > >>>>>>>>>> * operation region inside of a method for getting FUNC= [op]. > >>>>>>>>>> @@ -2166,7 +2173,46 @@ build_tpm_ppi(TPMIf *tpm, Aml *dev) > >>>>>>>>>> aml_append(ifctx, aml_return(aml_buffer(1, zero= byte))); > >>>>>>>>>> } > >>>>>>>>>> aml_append(method, ifctx); > >>>>>>>>>> + > >>>>>>>>>> + ifctx =3D aml_if( > >>>>>>>>>> + aml_equal(uuid, > >>>>>>>>>> + aml_touuid("376054ED-CC13-4675-901C-4= 756D7F2D45D"))); > >>>>>>>>>> + { > >>>>>>>>>> + /* standard DSM query function */ > >>>>>>>>>> + ifctx2 =3D aml_if(aml_equal(function, zero)); > >>>>>>>>>> + { > >>>>>>>>>> + uint8_t byte_list[1] =3D { 0x03 }; > >>>>>>>>>> + aml_append(ifctx2, aml_return(aml_buffer(1,= byte_list))); > >>>>>>>>>> + } > >>>>>>>>>> + aml_append(ifctx, ifctx2); > >>>>>>>>>> + > >>>>>>>>>> + /* > >>>>>>>>>> + * TCG Platform Reset Attack Mitigation Specifi= cation 1.0 Ch.6 > >>>>>>>>>> + * > >>>>>>>>>> + * Arg 2 (Integer): Function Index =3D 1 > >>>>>>>>>> + * Arg 3 (Package): Arguments =3D Package: Type= : Integer > >>>>>>>>>> + * Operation Value of the Requ= est > >>>>>>>>>> + * Returns: Type: Integer > >>>>>>>>>> + * 0: Success > >>>>>>>>>> + * 1: General Failure > >>>>>>>>>> + */ > >>>>>>>>>> + ifctx2 =3D aml_if(aml_equal(function, one)); > >>>>>>>>>> + { > >>>>>>>>>> + aml_append(ifctx2, > >>>>>>>>>> + aml_store(aml_derefof(aml_index(= arguments, zero)), > >>>>>>>>>> + op)); > >>>>>>>>>> + { > >>>>>>>>>> + aml_append(ifctx2, aml_store(op, aml_na= me("MOVV"))); > >>>>>>>>>> + > >>>>>>>>>> + /* 0: success */ > >>>>>>>>>> + aml_append(ifctx2, aml_return(zero)); > >>>>>>>>>> + } > >>>>>>>>>> + } > >>>>>>>>>> + aml_append(ifctx, ifctx2); > >>>>>>>>>> + } > >>>>>>>>>> + aml_append(method, ifctx); > >>>>>>>>>> } > >>>>>>>>>> + > >>>>>>>>>> aml_append(dev, method); > >>>>>>>>>> } > >>>>>>>>>> > >>>>>>>>>> diff --git a/hw/tpm/tpm_crb.c b/hw/tpm/tpm_crb.c > >>>>>>>>>> index b243222fd6..48f6a716ad 100644 > >>>>>>>>>> --- a/hw/tpm/tpm_crb.c > >>>>>>>>>> +++ b/hw/tpm/tpm_crb.c > >>>>>>>>>> @@ -233,6 +233,7 @@ static void tpm_crb_reset(void *dev) > >>>>>>>>>> { > >>>>>>>>>> CRBState *s =3D CRB(dev); > >>>>>>>>>> > >>>>>>>>>> + tpm_ppi_reset(&s->ppi); > >>>>>>>>>> tpm_backend_reset(s->tpmbe); > >>>>>>>>>> > >>>>>>>>>> memset(s->regs, 0, sizeof(s->regs)); > >>>>>>>>>> diff --git a/hw/tpm/tpm_ppi.c b/hw/tpm/tpm_ppi.c > >>>>>>>>>> index 8b46b9dd4b..ce43bc5729 100644 > >>>>>>>>>> --- a/hw/tpm/tpm_ppi.c > >>>>>>>>>> +++ b/hw/tpm/tpm_ppi.c > >>>>>>>>>> @@ -16,8 +16,30 @@ > >>>>>>>>>> #include "qapi/error.h" > >>>>>>>>>> #include "cpu.h" > >>>>>>>>>> #include "sysemu/memory_mapping.h" > >>>>>>>>>> +#include "sysemu/reset.h" > >>>>>>>>>> #include "migration/vmstate.h" > >>>>>>>>>> #include "tpm_ppi.h" > >>>>>>>>>> +#include "trace.h" > >>>>>>>>>> + > >>>>>>>>>> +void tpm_ppi_reset(TPMPPI *tpmppi) > >>>>>>>>>> +{ > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> + char *ptr =3D memory_region_get_ram_ptr(&tpmppi->ram); > >>>>>>>>> nvdimm seems to use cpu_physical_memory_read() to access gues= t > >>>>>>>>> accessible memory, so question is what's difference? > >>>>>>>> > >>>>>>>> cpu_physical_memory_read() is higher level, doing dispatch on = address > >>>>>>>> and length checks. > >>>>>>>> > >>>>>>>> This is a bit unnecessary, as ppi->buf could be accessed direc= tly. > >>>>>>> [...] > >>>>>>>>>> + memset(block->host_addr, 0, > >>>>>>>>>> + block->target_end - block->target_start)= ; > >>>>>>>>>> + } > >>>>>>> my concern here is that if we directly touch guest memory here > >>>>>>> we might get in trouble on migration without dirtying modified > >>>>>>> ranges > >>>>>> > >>>>>> It is a read-only of one byte. > >>>>>> by the time the reset handler is called, the memory must have be= en > >>>>>> already migrated. > >>>>> > >>>>> Looks like a write to me? > >>>> > >>>> the PPI RAM memory is read for the "memory clear" byte > >>>> The whole guest RAM is reset to 0 if set. > >>> > >>> Oh, I see; hmm. > >>> How do you avoid zeroing things like persistent memory? Or ROMs? Or= EFI > >>> pflash? > >> > >> guest_phys_blocks_append() only cares about RAM (see > >> guest_phys_blocks_region_add) > >=20 > > Hmm, promising; it uses: > > if (!memory_region_is_ram(section->mr) || > > memory_region_is_ram_device(section->mr)) { > > return; > > } > >=20 > > so ram_device is used by vfio and vhost-user; I don't see anything el= se. > > pflash init's as a rom_device so that's probably OK. > > But things like backends/hostmem-file.c just use > > memory_region_init_ram_from_file even if they're shared or PMEM. > > So, I think this would wipe an attached PMEM device - do you want to = or > > not? >=20 > I think that question could be put, "does the reset attack mitigation > spec recommend / require clearing persistent memory"? I've got no idea. > The reset attack is that the platform is re-set (forcibly, uncleanly) > while the original OS holds some secrets in memory, then the attacker's > OS (or firmware application) is loaded, and those scavenge the > leftovers. Would the original OS keep secrets in PMEM? I don't know. No, I don't know either; and I think part of the answer might depend what PMEM is being used for; if it's being used as actual storage with a filesystem or database on, you probably don't want to clear it - I mean that's what the (P)ersistence is for. > I guess all address space that a guest OS could consider as "read-write > memory" should be wiped. I think guest_phys_blocks_append() is a good > choice here; originally I wrote that for supporting guest RAM dump; see > commit c5d7f60f06142. Note that the is_ram_device bit was added > separately, see commits e4dc3f5909ab9 and 21e00fa55f3fd -- but that's a > change in the "right" direction here, because it *restricts* what > regions qualify (for dumping, and here for clearing). Yem, I'm wondering if we should add a check for the pmem flag at the same place. Having said that; I don't think that's a question for this patch series; if we agree that guest_phys_blocks* is the right thing to use then it's a separate question about adding the pmem check to there. (I didn't know about guest_phys_block* and would have probably just used qemu_ram_foreach_block ) Dave > >=20 > >>> > >>>>> Also, don't forget that a guest reset can happen during a migrati= on. > >>>> > >>>> Hmm, does cpu_physical_memory_read() guarantee the memory has bee= n migrated? > >>>> Is there a way to wait for migration to be completed in a reset ha= ndler? > >>> > >>> No; remember that migration can take a significant amount of time (= many > >>> minutes) as you stuff many GB of RAM down a network. > >>> > >>> So you can be in the situation where: > >>> a) Migration starts > >>> b) Migration sends a copy of most of RAM across > >>> c) Guest dirties lots of RAM in parallel with b > >>> d) migration sends some of the RAM again > >>> e) guest reboots > >>> f) migration keeps sending ram across > >>> g) Migration finally completes and starts on destination > >>> > >>> a-f are all happening on the source side as the guest is still runn= ing > >>> and doing whatever it wants (including reboots). > >>> > >>> Given something like acpi-build.c's acpi_ram_update's call to > >>> memory_region_set_dirty, would that work for you? > >> > >> after the memset(), it should then call: > >> > >> memory_region_set_dirty(block->mr, 0, block->target_end - block->tar= get_start); > >> > >> looks about right? > >=20 > > I think so. >=20 > I'll admit I can't follow the dirtying discussion. But, I'm reminded of > another commit of Dave's, namely 90c647db8d59 ("Fix pflash migration", > 2016-04-15). Would the same trick apply here? >=20 > ( >=20 > BTW, I'm sorry about not having following this series closely -- I feel > bad that we can't solve this within the firmware, but we really can't. > The issue is that this memory clearing would have to occur really early > into the firmware, at the latest in the PEI phase. >=20 > However, both the 32-bit and the 64-bit variants of OVMF's PEI phase > have access only to the lowest 4GB of the memory address space. Mapping > all RAM (even with a "sliding bank") for clearing it would be a real > mess. To that, add the fact that OVMF's PEI phase executes from RAM (no= t > from pflash -- in OVMF, only SEC executes from pflash, and it > decompresses the PEI + DXE firmware volumes to RAM), so PEI would have > to clear all RAM *except* the areas its own self occupies, such as code= , > global variables (static data), heap, stack, other processor data > structures (IDT, GDT, page tables, ...). And, no gaps inside those > should be left out either, because the previous OS might have left > secrets there... >=20 > This is actually easier for firmware that runs on physical hardware; fo= r > two reasons. First, on physical hardware, the PEI phase of the firmware > runs from pflash (it might use CAR, Cache-As-RAM, for a small temporary > r/w storage), so it doesn't have to worry about scribbling over itself. > Second, on physical hardware, the memory controller too has to be boote= d > up -- the PEI code that does this is all vendors' most closely guarded > secret, and *never* open source --; and when the firmware massages > chipset registers for that, it can use non-architected means to get the > RAM to clear itself. >=20 > In comparison, in QEMU/KVM guests, the RAM just *is*. While that's a > huge relief most of the time, in this case, the fact that the RAM can > start out as nonzero, is a big problem. Hence my plea to implement the > feature in QEMU. >=20 > ) >=20 > Thanks, > Laszlo -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK