From mboxrd@z Thu Jan 1 00:00:00 1970 From: Xiao Guangrong Subject: Re: [Qemu-devel] [PATCH v2 06/11] nvdimm acpi: initialize the resource used by NVDIMM ACPI Date: Mon, 22 Feb 2016 18:30:03 +0800 Message-ID: <56CAE32B.9080401@linux.intel.com> References: <20160215133722-mutt-send-email-mst@redhat.com> <20160215143234.29320a5f@nial.brq.redhat.com> <56C1F469.2040602@linux.intel.com> <20160215182404.0878474f@nial.brq.redhat.com> <56C21A7D.5040902@linux.intel.com> <20160216120047.5a50eccf@nial.brq.redhat.com> <56C3D522.6090401@linux.intel.com> <20160217192356-mutt-send-email-mst@redhat.com> <56C54298.3000904@linux.intel.com> <20160218110523.058a4716@nial.brq.redhat.com> <20160219100211-mutt-send-email-mst@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Igor Mammedov , ehabkost@redhat.com, KVM list , Gleb Natapov , mtosatti@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, Paolo Bonzini , rth@twiddle.net To: Dan Williams , "Michael S. Tsirkin" Return-path: Received: from mga04.intel.com ([192.55.52.120]:28529 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751208AbcBVKi0 (ORCPT ); Mon, 22 Feb 2016 05:38:26 -0500 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On 02/19/2016 04:43 PM, Dan Williams wrote: > On Fri, Feb 19, 2016 at 12:08 AM, Michael S. Tsirkin = wrote: >> On Thu, Feb 18, 2016 at 11:05:23AM +0100, Igor Mammedov wrote: >>> On Thu, 18 Feb 2016 12:03:36 +0800 >>> Xiao Guangrong wrote: >>> >>>> On 02/18/2016 01:26 AM, Michael S. Tsirkin wrote: >>>>> On Wed, Feb 17, 2016 at 10:04:18AM +0800, Xiao Guangrong wrote: >>>>>>>>> As for the rest could that commands go via MMIO that we usual= ly >>>>>>>>> use for control path? >>>>>>>> >>>>>>>> So both input data and output data go through single MMIO, we = need to >>>>>>>> introduce a protocol to pass these data, that is complex? >>>>>>>> >>>>>>>> And is any MMIO we can reuse (more complexer=EF=BC=9F) or we s= hould allocate this >>>>>>>> MMIO page =EF=BC=88the old question - where to allocated?=EF=BC= =89? >>>>>>> Maybe you could reuse/extend memhotplug IO interface, >>>>>>> or alternatively as Michael suggested add a vendor specific PCI= _Config, >>>>>>> I'd suggest PM device for that (hw/acpi/[piix4.c|ihc9.c]) >>>>>>> which I like even better since you won't need to care about whi= ch ports >>>>>>> to allocate at all. >>>>>> >>>>>> Well, if Michael does not object, i will do it in the next versi= on. :) >>>>> >>>>> Sorry, the thread's so long by now that I'm no longer sure what d= oes "it" refer to. >>>> >>>> Never mind i saw you were busy on other loops. >>>> >>>> "It" means the suggestion of Igor that "map each label area right = after each >>>> NVDIMM's data memory" >>> Michael pointed out that putting label right after each NVDIMM >>> might burn up to 256GB of address space due to DIMM's alignment for= 256 NVDIMMs. >>> However if address for each label is picked with pc_dimm_get_free_a= ddr() >>> and label's MemoryRegion alignment is default 2MB then all labels >>> would be allocated close to each other within a single 1GB range. >>> >>> That would burn only 1GB for 500 labels which is more than possible= 256 NVDIMMs. >> >> I thought about it, once we support hotplug, this means that one wil= l >> have to pre-declare how much is needed so QEMU can mark the correct >> memory reserved, that would be nasty. Maybe we always pre-reserve 1G= byte. >> Okay but next time we need something, do we steal another Gigabyte? >> It seems too much, I'll think it over on the weekend. >> >> Really, most other devices manage to get by with 4K chunks just fine= , I >> don't see why do we are so special and need to steal gigabytes of >> physically contigious phy ranges. > > What's the driving use case for labels in the guest? For example, > NVDIMM-N devices are supported by the kernel without labels. Yes, I see Linux driver supports label-less vNVDIMM that is exact curre= nt QEMU doing. However, label-less is only Linux specific implementation (as it completely bypasses namespace), other OS vendors (e.g Microsoft) will u= se label storage to address their own requirements=EF=BC=8Cor they do not follow= namespace spec at all. Another reason is that label is essential for PBLK support. BTW, the label support can be dynamically configured and it will be dis= abled on default. > > I certainly would not want to sacrifice 1GB alignment for a label are= a. > Yup, me too. From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41666) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aXnsv-0001Dj-D7 for qemu-devel@nongnu.org; Mon, 22 Feb 2016 05:38:34 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aXnss-0000Lx-3B for qemu-devel@nongnu.org; Mon, 22 Feb 2016 05:38:33 -0500 Received: from mga04.intel.com ([192.55.52.120]:63363) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aXnsr-0000Kz-QC for qemu-devel@nongnu.org; Mon, 22 Feb 2016 05:38:30 -0500 References: <20160215133722-mutt-send-email-mst@redhat.com> <20160215143234.29320a5f@nial.brq.redhat.com> <56C1F469.2040602@linux.intel.com> <20160215182404.0878474f@nial.brq.redhat.com> <56C21A7D.5040902@linux.intel.com> <20160216120047.5a50eccf@nial.brq.redhat.com> <56C3D522.6090401@linux.intel.com> <20160217192356-mutt-send-email-mst@redhat.com> <56C54298.3000904@linux.intel.com> <20160218110523.058a4716@nial.brq.redhat.com> <20160219100211-mutt-send-email-mst@redhat.com> From: Xiao Guangrong Message-ID: <56CAE32B.9080401@linux.intel.com> Date: Mon, 22 Feb 2016 18:30:03 +0800 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH v2 06/11] nvdimm acpi: initialize the resource used by NVDIMM ACPI List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Dan Williams , "Michael S. Tsirkin" Cc: ehabkost@redhat.com, KVM list , Gleb Natapov , mtosatti@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, Paolo Bonzini , Igor Mammedov , rth@twiddle.net On 02/19/2016 04:43 PM, Dan Williams wrote: > On Fri, Feb 19, 2016 at 12:08 AM, Michael S. Tsirkin wrote: >> On Thu, Feb 18, 2016 at 11:05:23AM +0100, Igor Mammedov wrote: >>> On Thu, 18 Feb 2016 12:03:36 +0800 >>> Xiao Guangrong wrote: >>> >>>> On 02/18/2016 01:26 AM, Michael S. Tsirkin wrote: >>>>> On Wed, Feb 17, 2016 at 10:04:18AM +0800, Xiao Guangrong wrote: >>>>>>>>> As for the rest could that commands go via MMIO that we usually >>>>>>>>> use for control path? >>>>>>>> >>>>>>>> So both input data and output data go through single MMIO, we need to >>>>>>>> introduce a protocol to pass these data, that is complex? >>>>>>>> >>>>>>>> And is any MMIO we can reuse (more complexer?) or we should allocate this >>>>>>>> MMIO page (the old question - where to allocated?)? >>>>>>> Maybe you could reuse/extend memhotplug IO interface, >>>>>>> or alternatively as Michael suggested add a vendor specific PCI_Config, >>>>>>> I'd suggest PM device for that (hw/acpi/[piix4.c|ihc9.c]) >>>>>>> which I like even better since you won't need to care about which ports >>>>>>> to allocate at all. >>>>>> >>>>>> Well, if Michael does not object, i will do it in the next version. :) >>>>> >>>>> Sorry, the thread's so long by now that I'm no longer sure what does "it" refer to. >>>> >>>> Never mind i saw you were busy on other loops. >>>> >>>> "It" means the suggestion of Igor that "map each label area right after each >>>> NVDIMM's data memory" >>> Michael pointed out that putting label right after each NVDIMM >>> might burn up to 256GB of address space due to DIMM's alignment for 256 NVDIMMs. >>> However if address for each label is picked with pc_dimm_get_free_addr() >>> and label's MemoryRegion alignment is default 2MB then all labels >>> would be allocated close to each other within a single 1GB range. >>> >>> That would burn only 1GB for 500 labels which is more than possible 256 NVDIMMs. >> >> I thought about it, once we support hotplug, this means that one will >> have to pre-declare how much is needed so QEMU can mark the correct >> memory reserved, that would be nasty. Maybe we always pre-reserve 1Gbyte. >> Okay but next time we need something, do we steal another Gigabyte? >> It seems too much, I'll think it over on the weekend. >> >> Really, most other devices manage to get by with 4K chunks just fine, I >> don't see why do we are so special and need to steal gigabytes of >> physically contigious phy ranges. > > What's the driving use case for labels in the guest? For example, > NVDIMM-N devices are supported by the kernel without labels. Yes, I see Linux driver supports label-less vNVDIMM that is exact current QEMU doing. However, label-less is only Linux specific implementation (as it completely bypasses namespace), other OS vendors (e.g Microsoft) will use label storage to address their own requirements,or they do not follow namespace spec at all. Another reason is that label is essential for PBLK support. BTW, the label support can be dynamically configured and it will be disabled on default. > > I certainly would not want to sacrifice 1GB alignment for a label area. > Yup, me too.