From: "Michael S. Tsirkin" <mst@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>,
ehabkost@redhat.com, kvm@vger.kernel.org, gleb@kernel.org,
mtosatti@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com,
pbonzini@redhat.com, dan.j.williams@intel.com,
Laszlo Ersek <lersek@redhat.com>,
rth@twiddle.net
Subject: Re: [Qemu-devel] How to reserve guest physical region for ACPI
Date: Thu, 7 Jan 2016 12:54:30 +0200 [thread overview]
Message-ID: <20160107123214-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <20160107113025.32cc5b52@nial.brq.redhat.com>
On Thu, Jan 07, 2016 at 11:30:25AM +0100, Igor Mammedov wrote:
> On Tue, 5 Jan 2016 18:43:02 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Tue, Jan 05, 2016 at 05:30:25PM +0100, Igor Mammedov wrote:
> > > > > bios-linker-loader is a great interface for initializing some
> > > > > guest owned data and linking it together but I think it adds
> > > > > unnecessary complexity and is misused if it's used to handle
> > > > > device owned data/on device memory in this and VMGID cases.
> > > >
> > > > I want a generic interface for guest to enumerate these things. linker
> > > > seems quite reasonable but if you see a reason why it won't do, or want
> > > > to propose a better interface, fine.
> > > >
> > > > PCI would do, too - though windows guys had concerns about
> > > > returning PCI BARs from ACPI.
> > > There were potential issues with pSeries bootloader that treated
> > > PCI_CLASS_MEMORY_RAM as conventional RAM but it was fixed.
> > > Could you point out to discussion about windows issues?
> > >
> > > What VMGEN patches that used PCI for mapping purposes were
> > > stuck at, was that it was suggested to use PCI_CLASS_MEMORY_RAM
> > > class id but we couldn't agree on it.
> > >
> > > VMGEN v13 with full discussion is here
> > > https://patchwork.ozlabs.org/patch/443554/
> > > So to continue with this route we would need to pick some other
> > > driver less class id so windows won't prompt for driver or
> > > maybe supply our own driver stub to guarantee that no one
> > > would touch it. Any suggestions?
> >
> > Pick any device/vendor id pair for which windows specifies no driver.
> > There's a small risk that this will conflict with some
> > guest but I think it's minimal.
> device/vendor id pair was QEMU specific so doesn't conflicts with anything
> issue we were trying to solve was to prevent windows asking for driver
> even though it does so only once if told not to ask again.
>
> That's why PCI_CLASS_MEMORY_RAM was selected as it's generic driver-less
> device descriptor in INF file which matches as the last resort if
> there isn't any other diver that's matched device by device/vendor id pair.
I think this is the only class in this inf.
If you can't use it, you must use an existing device/vendor id pair,
there's some risk involved but probably not much.
> >
> >
> > > >
> > > >
> > > > > There was RFC on list to make BIOS boot from NVDIMM already
> > > > > doing some ACPI table lookup/parsing. Now if they were forced
> > > > > to also parse and execute AML to initialize QEMU with guest
> > > > > allocated address that would complicate them quite a bit.
> > > >
> > > > If they just need to find a table by name, it won't be
> > > > too bad, would it?
> > > that's what they were doing scanning memory for static NVDIMM table.
> > > However if it were DataTable, BIOS side would have to execute
> > > AML so that the table address could be told to QEMU.
> >
> > Not at all. You can find any table by its signature without
> > parsing AML.
> yep, and then BIOS would need to tell its address to QEMU
> writing to IO port which is allocated statically in QEMU
> for this purpose and is described in AML only on guest side.
io ports are an ABI too but they are way easier to
maintain.
> >
> >
> > > In case of direct mapping or PCI BAR there is no need to initialize
> > > QEMU side from AML.
> > > That also saves us IO port where this address should be written
> > > if bios-linker-loader approach is used.
> > >
> > > >
> > > > > While with NVDIMM control memory region mapped directly by QEMU,
> > > > > respective patches don't need in any way to initialize QEMU,
> > > > > all they would need just read necessary data from control region.
> > > > >
> > > > > Also using bios-linker-loader takes away some usable RAM
> > > > > from guest and in the end that doesn't scale,
> > > > > the more devices I add the less usable RAM is left for guest OS
> > > > > while all the device needs is a piece of GPA address space
> > > > > that would belong to it.
> > > >
> > > > I don't get this comment. I don't think it's MMIO that is wanted.
> > > > If it's backed by qemu virtual memory then it's RAM.
> > > Then why don't allocate video card VRAM the same way and try to explain
> > > user that a guest started with '-m 128 -device cirrus-vga,vgamem_mb=64Mb'
> > > only has 64Mb of available RAM because of we think that on device VRAM
> > > is also RAM.
> > >
> > > Maybe I've used MMIO term wrongly here but it roughly reflects the idea
> > > that on device memory (whether it's VRAM, NVDIMM control block or VMGEN
> > > area) is not allocated from guest's usable RAM (as described in E820)
> > > but rather directly mapped in guest's GPA and doesn't consume available
> > > RAM as guest sees it. That's also the way it's done on real hardware.
> > >
> > > What we need in case of VMGEN ID and NVDIMM is on device memory
> > > that could be directly accessed by guest.
> > > Both direct mapping or PCI BAR do that job and we could use simple
> > > static AML without any patching.
> >
> > At least with VMGEN the issue is that there's an AML method
> > that returns the physical address.
> > Then if guest OS moves the BAR (which is legal), it will break
> > since caller has no way to know it's related to the BAR.
> I've found a following MS doc "Firmware Allocation of PCI Device Resources in Windows". It looks like when MS implemented resource rebalancing in
> Vista they pushed a compat change to PCI specs.
> That ECN is called "Ignore PCI Boot Configuration_DSM Function"
> and can be found here:
> https://pcisig.com/sites/default/files/specification_documents/ECR-Ignorebootconfig-final.pdf
>
> It looks like it's possible to forbid rebalancing per
> device/bridge if it has _DMS method that returns "do not
> ignore the boot configuration of PCI resources".
I'll have to study this but we don't want that
globally, do we?
This restricts hotplug functionality significantly.
>
> > > > > >
> > > > > > See patch at the bottom that might be handy.
> > > > > >
> > > > > > > he also innovated a way to use 64-bit address in DSDT/SSDT.rev = 1:
> > > > > > > | when writing ASL one shall make sure that only XP supported
> > > > > > > | features are in global scope, which is evaluated when tables
> > > > > > > | are loaded and features of rev2 and higher are inside methods.
> > > > > > > | That way XP doesn't crash as far as it doesn't evaluate unsupported
> > > > > > > | opcode and one can guard those opcodes checking _REV object if neccesary.
> > > > > > > (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg01010.html)
> > > > > >
> > > > > > Yes, this technique works.
> > > > > >
> > > > > > An alternative is to add an XSDT, XP ignores that.
> > > > > > XSDT at the moment breaks OVMF (because it loads both
> > > > > > the RSDT and the XSDT, which is wrong), but I think
> > > > > > Laszlo was working on a fix for that.
> > > > > Using XSDT would increase ACPI tables occupied RAM
> > > > > as it would duplicate DSDT + non XP supported AML
> > > > > at global namespace.
> > > >
> > > > Not at all - I posted patches linking to same
> > > > tables from both RSDT and XSDT at some point.
> > > > Only the list of pointers would be different.
> > > if you put XP incompatible AML in separate SSDT and link it
> > > only from XSDT than that would work but if incompatibility
> > > is in DSDT, one would have to provide compat DSDT for RSDT
> > > an incompat DSDT for XSDT.
> >
> > So don't do this.
> well spec says "An ACPI-compatible OS must use the XSDT if present",
> which I read as tables pointed by RSDT MUST be pointed by XSDT
> as well and RSDT MUST NOT not be used.
>
> so if we put incompatible changes in a separate SSDT and put
> it only in XSDT that might work. Showstopper here is OVMF which
> has issues with it as Laszlo pointed out.
But that's just a bug.
> Also since Windows implements only subset of spec XSDT trick
> would cover only XP based versions while the rest will see and
> use XSDT pointed tables which still could have incompatible
> AML with some of the later windows versions.
We'll have to see what these are exactly.
If it's methods in SSDT we can check the version supported
by the ASPM.
>
> >
> > > So far policy was don't try to run guest OS on QEMU
> > > configuration that isn't supported by it.
> >
> > It's better if guests don't see some features but
> > don't crash. It's not always possible of course but
> > we should try to avoid this.
> >
> > > For example we use VAR_PACKAGE when running with more
> > > than 255 VCPUs (commit b4f4d5481) which BSODs XP.
> >
> > Yes. And it's because we violate the spec, DSDT
> > should not have this stuff.
> >
> > > So we can continue with that policy with out resorting to
> > > using both RSDT and XSDT,
> > > It would be even easier as all AML would be dynamically
> > > generated and DSDT would only contain AML elements for
> > > a concrete QEMU configuration.
> >
> > I'd prefer XSDT but I won't nack it if you do it in DSDT.
> > I think it's not spec compliant but guests do not
> > seem to care.
> >
> > > > > So far we've managed keep DSDT compatible with XP while
> > > > > introducing features from v2 and higher ACPI revisions as
> > > > > AML that is only evaluated on demand.
> > > > > We can continue doing so unless we have to unconditionally
> > > > > add incompatible AML at global scope.
> > > > >
> > > >
> > > > Yes.
> > > >
> > > > > >
> > > > > > > Michael, Paolo, what do you think about these ideas?
> > > > > > >
> > > > > > > Thanks!
> > > > > >
> > > > > >
> > > > > >
> > > > > > So using a patch below, we can add Name(PQRS, 0x0) at the top of the
> > > > > > SSDT (or bottom, or add a separate SSDT just for that). It returns the
> > > > > > current offset so we can add that to the linker.
> > > > > >
> > > > > > Won't work if you append the Name to the Aml structure (these can be
> > > > > > nested to arbitrary depth using aml_append), so using plain GArray for
> > > > > > this API makes sense to me.
> > > > > >
> > > > > > --->
> > > > > >
> > > > > > acpi: add build_append_named_dword, returning an offset in buffer
> > > > > >
> > > > > > This is a very limited form of support for runtime patching -
> > > > > > similar in functionality to what we can do with ACPI_EXTRACT
> > > > > > macros in python, but implemented in C.
> > > > > >
> > > > > > This is to allow ACPI code direct access to data tables -
> > > > > > which is exactly what DataTableRegion is there for, except
> > > > > > no known windows release so far implements DataTableRegion.
> > > > > unsupported means Windows will BSOD, so it's practically
> > > > > unusable unless MS will patch currently existing Windows
> > > > > versions.
> > > >
> > > > Yes. That's why my patch allows patching SSDT without using
> > > > DataTableRegion.
> > > >
> > > > > Another thing about DataTableRegion is that ACPI tables are
> > > > > supposed to have static content which matches checksum in
> > > > > table the header while you are trying to use it for dynamic
> > > > > data. It would be cleaner/more compatible to teach
> > > > > bios-linker-loader to just allocate memory and patch AML
> > > > > with the allocated address.
> > > >
> > > > Yes - if address is static, you need to put it outside
> > > > the table. Can come right before or right after this.
> > > >
> > > > > Also if OperationRegion() is used, then one has to patch
> > > > > DefOpRegion directly as RegionOffset must be Integer,
> > > > > using variable names is not permitted there.
> > > >
> > > > I am not sure the comment was understood correctly.
> > > > The comment says really "we can't use DataTableRegion
> > > > so here is an alternative".
> > > so how are you going to access data at which patched
> > > NameString point to?
> > > for that you'd need a normal patched OperationRegion
> > > as well since DataTableRegion isn't usable.
> >
> > For VMGENID you would patch the method that
> > returns the address - you do not need an op region
> > as you never access it.
> >
> > I don't know about NVDIMM. Maybe OperationRegion can
> > use the patched NameString? Will need some thought.
> >
> > > >
> > > > >
> > > > > >
> > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > >
> > > > > > ---
> > > > > >
> > > > > > diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> > > > > > index 1b632dc..f8998ea 100644
> > > > > > --- a/include/hw/acpi/aml-build.h
> > > > > > +++ b/include/hw/acpi/aml-build.h
> > > > > > @@ -286,4 +286,7 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre);
> > > > > > void
> > > > > > build_rsdt(GArray *table_data, GArray *linker, GArray *table_offsets);
> > > > > >
> > > > > > +int
> > > > > > +build_append_named_dword(GArray *array, const char *name_format, ...);
> > > > > > +
> > > > > > #endif
> > > > > > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> > > > > > index 0d4b324..7f9fa65 100644
> > > > > > --- a/hw/acpi/aml-build.c
> > > > > > +++ b/hw/acpi/aml-build.c
> > > > > > @@ -262,6 +262,32 @@ static void build_append_int(GArray *table, uint64_t value)
> > > > > > }
> > > > > > }
> > > > > >
> > > > > > +/* Build NAME(XXXX, 0x0) where 0x0 is encoded as a qword,
> > > > > > + * and return the offset to 0x0 for runtime patching.
> > > > > > + *
> > > > > > + * Warning: runtime patching is best avoided. Only use this as
> > > > > > + * a replacement for DataTableRegion (for guests that don't
> > > > > > + * support it).
> > > > > > + */
> > > > > > +int
> > > > > > +build_append_named_qword(GArray *array, const char *name_format, ...)
> > > > > > +{
> > > > > > + int offset;
> > > > > > + va_list ap;
> > > > > > +
> > > > > > + va_start(ap, name_format);
> > > > > > + build_append_namestringv(array, name_format, ap);
> > > > > > + va_end(ap);
> > > > > > +
> > > > > > + build_append_byte(array, 0x0E); /* QWordPrefix */
> > > > > > +
> > > > > > + offset = array->len;
> > > > > > + build_append_int_noprefix(array, 0x0, 8);
> > > > > > + assert(array->len == offset + 8);
> > > > > > +
> > > > > > + return offset;
> > > > > > +}
> > > > > > +
> > > > > > static GPtrArray *alloc_list;
> > > > > >
> > > > > > static Aml *aml_alloc(void)
> > > > > >
> > > > > >
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
next prev parent reply other threads:[~2016-01-07 10:54 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-02 7:20 [Qemu-devel] [PATCH v9 0/5] implement vNVDIMM Xiao Guangrong
2015-12-02 7:20 ` [Qemu-devel] [PATCH v9 1/5] nvdimm: implement NVDIMM device abstract Xiao Guangrong
2015-12-02 7:20 ` [Qemu-devel] [PATCH v9 2/5] acpi: support specified oem table id for build_header Xiao Guangrong
2015-12-02 7:20 ` [Qemu-devel] [PATCH v9 3/5] nvdimm acpi: build ACPI NFIT table Xiao Guangrong
2015-12-02 7:20 ` [Qemu-devel] [PATCH v9 4/5] nvdimm acpi: build ACPI nvdimm devices Xiao Guangrong
2015-12-02 7:21 ` [Qemu-devel] [PATCH v9 5/5] nvdimm: add maintain info Xiao Guangrong
2015-12-10 3:11 ` [Qemu-devel] [PATCH v9 0/5] implement vNVDIMM Xiao Guangrong
2015-12-21 14:13 ` Xiao Guangrong
2015-12-28 2:39 ` [Qemu-devel] How to reserve guest physical region for ACPI Xiao Guangrong
2015-12-28 12:50 ` Michael S. Tsirkin
2015-12-30 15:55 ` Igor Mammedov
2015-12-30 19:52 ` Michael S. Tsirkin
2016-01-04 20:17 ` Laszlo Ersek
2016-01-05 17:08 ` Igor Mammedov
2016-01-05 17:22 ` Laszlo Ersek
2016-01-06 13:39 ` Igor Mammedov
2016-01-06 14:43 ` Laszlo Ersek
2016-01-07 13:51 ` Igor Mammedov
2016-01-07 17:33 ` Laszlo Ersek
2016-01-05 16:30 ` Igor Mammedov
2016-01-05 16:43 ` Michael S. Tsirkin
2016-01-05 17:07 ` Laszlo Ersek
2016-01-05 17:07 ` Xiao Guangrong
2016-01-07 9:21 ` Igor Mammedov
2016-01-08 4:21 ` Xiao Guangrong
2016-01-08 9:42 ` Laszlo Ersek
2016-01-08 15:59 ` Igor Mammedov
2016-01-07 10:30 ` Igor Mammedov
2016-01-07 10:54 ` Michael S. Tsirkin [this message]
2016-01-07 13:42 ` Igor Mammedov
2016-01-07 17:11 ` Laszlo Ersek
2016-01-07 17:08 ` Laszlo Ersek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160107123214-mutt-send-email-mst@redhat.com \
--to=mst@redhat.com \
--cc=dan.j.williams@intel.com \
--cc=ehabkost@redhat.com \
--cc=gleb@kernel.org \
--cc=guangrong.xiao@linux.intel.com \
--cc=imammedo@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=lersek@redhat.com \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).