qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Anshuman Khandual <khandual@linux.vnet.ibm.com>
To: Michael Roth <mdroth@linux.vnet.ibm.com>,
	Nathan Fontenot <nfont@linux.vnet.ibm.com>,
	bharata@linux.vnet.ibm.com, qemu-devel@nongnu.org
Cc: qemu-ppc@nongnu.org, david@gibson.dropbear.id.au
Subject: Re: [Qemu-devel] [RFC PATCH v0] spapr: Disable memory hotplug when HTAB size is insufficient
Date: Wed, 09 Sep 2015 14:36:19 +0530	[thread overview]
Message-ID: <55EFF68B.9030104@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150904161249.10296.77510@loki>

On 09/04/2015 09:42 PM, Michael Roth wrote:
> Quoting Nathan Fontenot (2015-09-04 10:49:18)
>> On 09/04/2015 10:33 AM, Michael Roth wrote:
>>> Quoting Nathan Fontenot (2015-09-03 13:50:59)
>>>> On 09/01/2015 10:28 PM, Bharata B Rao wrote:
>>>>> On Mon, Aug 24, 2015 at 09:01:51AM +0530, Bharata B Rao wrote:
>>>>>> The hash table size allocated to guest depends on the maxmem size.
>>>>>> If the host isn't able to allocate the required hash table size but
>>>>>> instead allocates less than the optimal requested size, then it will
>>>>>> not be possible to grow the RAM until maxmem via memory hotplug.
>>>>>> Attempts to hotplug memory till maxmem could fail and this failure
>>>>>> isn't being currently handled gracefully by the guest kernel thereby
>>>>>> causing guest kernel oops.
>>>>>>
>>>>>> This should eventually get fixed when we move to completely in-kernel
>>>>>> memory hotplug instead of the current method where userspace tool drmgr
>>>>>> drives the hotplug. Until the in-kernel memory hotplug is available
>>>>>> for PowerKVM, disable memory hotplug when requested hash table size
>>>>>> isn't allocated.
>>>>>
>>>>> David - Do you have any views on how to go about this ? Due to the way
>>>>> we do hotplug currently using drmgr, it appears that it is very difficult
>>>>> to have a graceful recovery within the guest kernel when memory hotplug
>>>>> request can't be fulfilled due to insufficient HTAB size. (Anshuman can
>>>>> elaborate on this with the exact description on why it is so hard to
>>>>> recover).
>>>>>
>>>>> Do you think disabling memory hotplug upfront is a reasonable workaround
>>>>> for this problem ?
>>>>>
>>>>> Nathan - When you enable in-kernel memory hotplug for PowerKVM, will you
>>>>> be exporting something for the userspace (capability ?) to check and
>>>>> determine the presense of in-kernel memory hotplug feature so that we
>>>>> can depend on graceful recovery instead of upfront disablement of
>>>>> memory hotplug from QEMU ?
>>>>>
>>>>
>>>> I did not have any plans currently to export something indicating we are
>>>> using the in-kernel memory hotplug code.
>>>>
>>>> Perhaps this is something we should consider adding the to the PAPR update
>>>> proposal that is being worked? Something to indicate we can gracefully handle
>>>> adding memory beyond HTAB size.
>>>
>>> That might make sense, but I'm curious what constitutes graceful
>>> recovery in this context. What can we do with in-kernel hotplug that's not
>>> possible with userspace tools? If it's graceful failure, is there really
>>> nothing that can be done by QEMU as the DRC level to get the same
>>> result?
>>
>> I don't have an answer for how to recover gracefully or if it will be possible.
> 
> Sorry, I meant it as a general question. Bharata mentioned Anshuman might have
> some further details?

Graceful recovery in the kernel seems to be difficult (though I cannot
say whether it is impossible) because of the way we have implemented
the memory hotplug function with the help of the userspace tool called
'drmgr'. It has two distinct steps in which it achieve memory hotplug
after receiving platform notification.

(1) Update the /proc/ofdt
(2) Write into /sys/devices/system/memory/probe

Both of these above steps try to add the new memory block into the kernel
(generic and arch specific representations). Now if the step (2) fails we
restore /proc/ofdt value to the original state present before we started
the hotplug operation. In short, this does not rollback all the changes
we had done in step (2) and step (1) gracefully. One of the reasons being
the fact that it happens in two distinct steps from user space.

Had it been attempted through a single step, kernel would have right away
reverted any changes before exiting back into the userspace. New in-kernel
memory hotplug method follows this principle now.

      reply	other threads:[~2015-09-09  9:07 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-24  3:31 [Qemu-devel] [RFC PATCH v0] spapr: Disable memory hotplug when HTAB size is insufficient Bharata B Rao
2015-08-24  4:34 ` Anshuman Khandual
2015-09-02  3:28 ` Bharata B Rao
2015-09-03  2:34   ` David Gibson
2015-09-03 18:50   ` Nathan Fontenot
2015-09-04 15:33     ` Michael Roth
2015-09-04 15:49       ` Nathan Fontenot
2015-09-04 16:12         ` Michael Roth
2015-09-09  9:06           ` Anshuman Khandual [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55EFF68B.9030104@linux.vnet.ibm.com \
    --to=khandual@linux.vnet.ibm.com \
    --cc=bharata@linux.vnet.ibm.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=nfont@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).