qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Laszlo Ersek <lersek@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: "peter.maydell@linaro.org" <peter.maydell@linaro.org>,
	"xuwei \(O\)" <xuwei5@huawei.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Catalin Marinas <Catalin.Marinas@arm.com>,
	"ard.biesheuvel@linaro.org" <ard.biesheuvel@linaro.org>,
	"will.deacon@arm.com" <will.deacon@arm.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
	Linuxarm <linuxarm@huawei.com>, linux-mm <linux-mm@kvack.org>,
	"qemu-arm@nongnu.org" <qemu-arm@nongnu.org>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Robin Murphy <robin.murphy@arm.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [Qemu-devel] [Question] Memory hotplug clarification for Qemu ARM/virt
Date: Thu, 9 May 2019 23:48:13 +0200	[thread overview]
Message-ID: <cd2aa867-5367-b470-0a2b-33897697c23f@redhat.com> (raw)
In-Reply-To: <20190509183520.6dc47f2e@Igors-MacBook-Pro>

On 05/09/19 18:35, Igor Mammedov wrote:
> On Wed, 8 May 2019 22:26:12 +0200
> Laszlo Ersek <lersek@redhat.com> wrote:
> 
>> On 05/08/19 14:50, Robin Murphy wrote:
>>> Hi Shameer,
>>>
>>> On 08/05/2019 11:15, Shameerali Kolothum Thodi wrote:
>>>> Hi,
>>>>
>>>> This series here[0] attempts to add support for PCDIMM in QEMU for
>>>> ARM/Virt platform and has stumbled upon an issue as it is not clear(at
>>>> least
>>>> from Qemu/EDK2 point of view) how in physical world the hotpluggable
>>>> memory is handled by kernel.
>>>>
>>>> The proposed implementation in Qemu, builds the SRAT and DSDT parts
>>>> and uses GED device to trigger the hotplug. This works fine.
>>>>
>>>> But when we added the DT node corresponding to the PCDIMM(cold plug
>>>> scenario), we noticed that Guest kernel see this memory during early boot
>>>> even if we are booting with ACPI. Because of this, hotpluggable memory
>>>> may end up in zone normal and make it non-hot-un-pluggable even if Guest
>>>> boots with ACPI.
>>>>
>>>> Further discussions[1] revealed that, EDK2 UEFI has no means to
>>>> interpret the
>>>> ACPI content from Qemu(this is designed to do so) and uses DT info to
>>>> build the GetMemoryMap(). To solve this, introduced "hotpluggable"
>>>> property
>>>> to DT memory node(patches #7 & #8 from [0]) so that UEFI can
>>>> differentiate
>>>> the nodes and exclude the hotpluggable ones from GetMemoryMap().
>>>>
>>>> But then Laszlo rightly pointed out that in order to accommodate the
>>>> changes
>>>> into UEFI we need to know how exactly Linux expects/handles all the
>>>> hotpluggable memory scenarios. Please find the discussion here[2].
>>>>
>>>> For ease, I am just copying the relevant comment from Laszlo below,
>>>>
>>>> /******
>>>> "Given patches #7 and #8, as I understand them, the firmware cannot
>>>> distinguish
>>>>   hotpluggable & present, from hotpluggable & absent. The firmware can
>>>> only
>>>>   skip both hotpluggable cases. That's fine in that the firmware will
>>>> hog neither
>>>>   type -- but is that OK for the OS as well, for both ACPI boot and DT
>>>> boot?
>>>>
>>>> Consider in particular the "hotpluggable & present, ACPI boot" case.
>>>> Assuming
>>>> we modify the firmware to skip "hotpluggable" altogether, the UEFI memmap
>>>> will not include the range despite it being present at boot.
>>>> Presumably, ACPI
>>>> will refer to the range somehow, however. Will that not confuse the OS?
>>>>
>>>> When Igor raised this earlier, I suggested that
>>>> hotpluggable-and-present should
>>>> be added by the firmware, but also allocated immediately, as
>>>> EfiBootServicesData
>>>> type memory. This will prevent other drivers in the firmware from
>>>> allocating AcpiNVS
>>>> or Reserved chunks from the same memory range, the UEFI memmap will
>>>> contain
>>>> the range as EfiBootServicesData, and then the OS can release that
>>>> allocation in
>>>> one go early during boot.
>>>>
>>>> But this really has to be clarified from the Linux kernel's
>>>> expectations. Please
>>>> formalize all of the following cases:
>>>>
>>>> OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report
>>>> as  DT/ACPI should report as
>>>> -----------------  ------------------ 
>>>> -------------------------------  ------------------------
>>>> DT                 present             ?                                ?
>>>> DT                 absent              ?                                ?
>>>> ACPI               present             ?                                ?
>>>> ACPI               absent              ?                                ?
>>>>
>>>> Again, this table is dictated by Linux."
>>>>
>>>> ******/
>>>>
>>>> Could you please take a look at this and let us know what is expected
>>>> here from
>>>> a Linux kernel view point.
>>>
>>> For arm64, so far we've not even been considering DT-based hotplug - as
>>> far as I'm aware there would still be a big open question there around
>>> notification mechanisms and how to describe them. The DT stuff so far
>>> has come from the PowerPC folks, so it's probably worth seeing what
>>> their ideas are.
>>>
>>> ACPI-wise I've always assumed/hoped that hotplug-related things should
>>> be sufficiently well-specified in UEFI that "do whatever x86/IA-64 do"
>>> would be enough for us.
>>
>> As far as I can see in UEFI v2.8 -- and I had checked the spec before
>> dumping the table with the many question marks on Shameer --, all the
>> hot-plug language in the spec refers to USB and PCI hot-plug in the
>> preboot environment. There is not a single word about hot-plug at OS
>> runtime (regarding any device or component type), nor about memory
>> hot-plug (at any time).
>>
>> Looking to x86 appears valid -- so what does the Linux kernel expect on
>> that architecture, in the "ACPI" rows of the table?
> 
> I could only answer from QEMU x86 perspective.
> QEMU for x86 guests currently doesn't add hot-pluggable RAM into E820
> because of different linux guests tend to cannibalize it, making it non
> unpluggable. The last culprit I recall was KASLR.
> 
> So I'd refrain from reporting hotpluggable RAM in GetMemoryMap() if
> it's possible (it's probably hack (spec deosn't say anything about it)
> but it mostly works for Linux (plug/unplug) and Windows guest also
> fine with plug part (no unplug there)).

I can accept this as a perfectly valid design. Which would mean, QEMU should mark each hotpluggable RAM range in the DTB for the firmware with the special new property, regardless of its initial ("cold") plugged-ness, and then the firmware will not expose the range in the GCD memory space map, and consequently in the UEFI memmap either.

IOW, our table is, thus far:

OS boot (DT/ACPI)  hotpluggable & ...  GetMemoryMap() should report as  DT/ACPI should report as
-----------------  ------------------  -------------------------------  ------------------------
DT                 present             ABSENT                           ?
DT                 absent              ABSENT                           ?
ACPI               present             ABSENT                           PRESENT
ACPI               absent              ABSENT                           ABSENT

In the firmware, I only need to care about the GetMemoryMap() column, so I can work with this. Can someone please file a feature request at <https://bugzilla.tianocore.org/>, for the ArmVirtPkg Package, with these detais?

Thanks
Laszlo

> 
> As for physical systems, there are out there ones that do report
> hotpluggable RAM in GetMemoryMap().
> 
>> Shameer: if you (Huawei) are represented on the USWG / ASWG, I suggest
>> re-raising the question on those lists too; at least the "ACPI" rows of
>> the table.
>>
>> Thanks!
>> Laszlo
>>
>>>
>>> Robin.
>>>
>>>> (Hi Laszlo/Igor/Eric, please feel free to add/change if I have missed
>>>> any valid
>>>> points above).
>>>>
>>>> Thanks,
>>>> Shameer
>>>> [0] https://patchwork.kernel.org/cover/10890919/
>>>> [1] https://patchwork.kernel.org/patch/10863299/
>>>> [2] https://patchwork.kernel.org/patch/10890937/
>>>>
>>>>
>>
> 



  reply	other threads:[~2019-05-09 21:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-08 10:15 [Qemu-devel] [Question] Memory hotplug clarification for Qemu ARM/virt Shameerali Kolothum Thodi
2019-05-08 12:50 ` Robin Murphy
2019-05-08 20:26   ` Laszlo Ersek
2019-05-09 16:35     ` Igor Mammedov
2019-05-09 21:48       ` Laszlo Ersek [this message]
2019-05-10  8:34         ` Shameerali Kolothum Thodi
2019-05-10  9:15           ` Auger Eric
2019-05-10  9:27             ` Shameerali Kolothum Thodi
2019-05-10  9:58               ` Auger Eric
2019-05-10 15:05                 ` Igor Mammedov
2019-05-08 20:08 ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd2aa867-5367-b470-0a2b-33897697c23f@redhat.com \
    --to=lersek@redhat.com \
    --cc=Catalin.Marinas@arm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=eric.auger@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=robin.murphy@arm.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=will.deacon@arm.com \
    --cc=xuwei5@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).