qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Shivaprasad G Bhat <sbhat@linux.ibm.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: xiaoguangrong.eric@gmail.com, mst@redhat.com,
	qemu-devel@nongnu.org,
	Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>,
	Bharata B Rao <bharata.rao@gmail.com>,
	qemu-ppc@nongnu.org, David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH v3 2/3] spapr: Add NVDIMM device support
Date: Thu, 12 Dec 2019 14:22:56 +0530	[thread overview]
Message-ID: <463b35ba-c35d-a251-a8ee-c65e5134cf8c@linux.ibm.com> (raw)
In-Reply-To: <20191211090509.1845b913@redhat.com>



On 12/11/2019 01:35 PM, Igor Mammedov wrote:
> On Wed, 11 Dec 2019 09:44:11 +0530
> Shivaprasad G Bhat <sbhat@linux.ibm.com> wrote:
>
>> On 12/06/2019 07:22 AM, David Gibson wrote:
>>> On Wed, Nov 27, 2019 at 09:50:54AM +0530, Bharata B Rao wrote:
>>>> On Fri, Nov 22, 2019 at 10:42 AM David Gibson
>>>> <david@gibson.dropbear.id.au> wrote:
>>>>> Ok.  A number of queries about this.
>>>>>
>>>>> 1) The PAPR spec for ibm,dynamic-memory-v2 says that the first word in
>>>>> each entry is the number of LMBs, but for NVDIMMs you use the
>>>>> not-necessarily-equal scm_block_size instead.  Does the NVDIMM
>>>>> amendment for PAPR really specify to use different block sizes for
>>>>> these cases?  (In which case that's a really stupid spec decision, but
>>>>> that wouldn't surprise me at this point).
>>>> SCM block sizes can be different from LMB sizes, but here we enforce
>>>> that SCM device size (excluding metadata) to multiple of LMB size so
>>>> that we don't end up memory range that is not aligned to LMB size.
>>> Right, but it still doesn't make sense to use scm_block_size when you
>>> create the dynamic-memory-v2 property.
>> Right, I should use LMB size here as I will be creating holes here to
>> disallow DIMMs
>> to claim those LMBs marking them INVALID as Bharata Suggested before.
>>
>>>    As far as the thing
>>> interpreting that goes, it *must* be LMB size, not SCM block size.  If
>>> those are required to be the same at this point, you should use an
>>> assert().
>> SCM block size should be a multiple for LMB size, need not be equal.
>> I'll add an assert
>> for that, checking if equal. There is no benefit I see as of now having
>> higher
>> SCM block size as the bind/unbind are already done before the bind hcall.
>>
>>>>> 2) Similarly, the ibm,dynamic-memory-v2 description says that the
>>>>> memory block described by the entry has a whole batch of contiguous
>>>>> DRCs starting at the DRC index given and continuing for #LMBs DRCs.
>>>>> For NVDIMMs it appears that you just have one DRC for the whole
>>>>> NVDIMM.  Is that right?
>>>> One NVDIMM has one DRC, In our case, we need to mark the LMBs
>>>> corresponding to that address range in ibm,dynamic-memory-v2 as
>>>> reserved and invalid.
>>> Ok, that fits very weirdly with the DRC allocation for the rest of
>>> pluggable memory, but I suppose that's PAPR for you.
>>>
>>> Having these in together is very inscrutable though, and relies on a
>>> heap of non-obvious constraints about placement of DIMMs and NVDIMMs
>>> relative to each other.  I really wonder if it would be better to have
>>> a completely different address range for the NVDIMMs.
>> The backend object for both DIMM and NVDIMM are memory-backend-*
>> and they use the address from the same space. Separating it would mean
>> using/introducing different backend object. I dont think we have a
>> choice here.
> What address-space(s) are are talking about here exactly?
>  From my point of view memory-backend-* provides RAM block at
> some HVA, which shouldn't not have anything to do with how NVDIMM
> partitions and maps it to GPA.

Ah, you are right! I got confused with the HVA.

Nonetheless, I don't see a need for having vNVDIMM in different
guest physical address range as the existing code has support for marking
memory ranges distinctly for DIMM/NVDIMM.

On another note, the x86 too does it the same way. There is no separate
range defined there.

>
>
>>>>> 3) You're not setting *any* extra flags on the entry.  How is the
>>>>> guest supposed to know which are NVDIMM entries and which are regular
>>>>> DIMM entries?  AFAICT in this version the NVDIMM slots are
>>>>> indistinguishable from the unassigned hotplug memory (which makes the
>>>>> difference in LMB and DRC numbering even more troubling).
>>>> For NVDIMM case, this patch should populate the LMB set in
>>>> ibm,dynamic-memory-v2 something like below:
>>>>               elem = spapr_get_drconf_cell(size /lmb_size, addr,
>>>>                                            0, -1,
>>>> SPAPR_LMB_FLAGS_RESERVED | SPAPR_LMB_FLAGS_DRC_INVALID);
>>>>
>>>> This will ensure that the NVDIMM range will never be considered as
>>>> valid memory range for memory hotplug.
>>> Hrm.  Ok so we already have code that does that for any gaps between
>>> DIMMs.  I don't think there's actually anything that that code will do
>>> differently than the code you have for NVDIMMs, so you could just skip
>>> over the NVDIMMs here and it should do the right thing.
>>>
>>> The *interpretation* of those entries will become different: for space
>>> into which a regular DIMM is later inserted, we'll assume the DRC
>>> index given is a base and there are more DRCs following it, but for
>>> NVDIMMs we'll assume the same DRC throughout.  This is nuts, but IIUC
>>> that's what PAPR says and we can't do much about it.
>> My current patch is buggy as Bharata pointed out. The NVDIMM DRCs
>> are not to be populated here, but mark the LMB DRCs as RESERVED and INVALID
>> so that no malicious attempts to online those LMBs at those NVDIMM address
>> ranges are attempted.
>>
>>>   
>>>>> 4) AFAICT these are _present_ NVDIMMs, so why is
>>>>> SPAPR_LMB_FLAGS_ASSIGNED not set for them?  (and why is the node
>>>>> forced to -1, regardless of di->node).
>>>>>   
>>>>>>            QSIMPLEQ_INSERT_TAIL(&drconf_queue, elem, entry);
>>>>>>            nr_entries++;
>>>>>>            cur_addr = addr + size;
>>>>>> @@ -1261,6 +1273,85 @@ static void spapr_dt_hypervisor(SpaprMachineState *spapr, void *fdt)
>>>>>>        }
>>>>>>    }
>>>>>>
>>>>>> +static void spapr_create_nvdimm_dr_connectors(SpaprMachineState *spapr)
>>>>>> +{
>>>>>> +    MachineState *machine = MACHINE(spapr);
>>>>>> +    int i;
>>>>>> +
>>>>>> +    for (i = 0; i < machine->ram_slots; i++) {
>>>>>> +        spapr_dr_connector_new(OBJECT(spapr), TYPE_SPAPR_DRC_PMEM, i);
>>>>> What happens if you try to plug an NVDIMM to one of these slots, but a
>>>>> regular DIMM has already taken it?
>>>> NVDIMM hotplug won't get that occupied slot.
>>> Ok.
>>>   



  reply	other threads:[~2019-12-12  8:54 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-14 18:37 [PATCH v3 0/3] ppc: spapr: virtual NVDIMM support Shivaprasad G Bhat
2019-10-14 18:37 ` [PATCH v3 1/3] mem: move nvdimm_device_list to utilities Shivaprasad G Bhat
2019-11-19  2:58   ` David Gibson
2019-11-19  7:13   ` Igor Mammedov
2019-11-20  8:01     ` Shivaprasad G Bhat
2019-11-20  9:35       ` Igor Mammedov
2019-10-14 18:37 ` [PATCH v3 2/3] spapr: Add NVDIMM device support Shivaprasad G Bhat
2019-11-22  4:30   ` David Gibson
2019-11-27  4:20     ` Bharata B Rao
2019-12-06  1:52       ` David Gibson
2019-12-11  4:14         ` Shivaprasad G Bhat
2019-12-11  8:05           ` Igor Mammedov
2019-12-12  8:52             ` Shivaprasad G Bhat [this message]
2020-01-03  0:45               ` David Gibson
2019-12-16 11:15     ` Shivaprasad G Bhat
2019-10-14 18:38 ` [PATCH v3 3/3] spapr: Add Hcalls to support PAPR NVDIMM device Shivaprasad G Bhat
2019-11-22  5:11   ` David Gibson
2019-12-17  6:10     ` Shivaprasad G Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=463b35ba-c35d-a251-a8ee-c65e5134cf8c@linux.ibm.com \
    --to=sbhat@linux.ibm.com \
    --cc=bharata.rao@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=imammedo@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=sbhat@linux.vnet.ibm.com \
    --cc=xiaoguangrong.eric@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).