* [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource. @ 2018-06-11 10:54 ` Zhang Yi 0 siblings, 0 replies; 14+ messages in thread From: Zhang Yi @ 2018-06-11 10:54 UTC (permalink / raw) To: xiaoguangrong.eric, dan.j.williams, ross.zwisler, stefanha, yu.c.zhang Cc: ehabkost, linux-nvdimm, mst, qemu-devel, Zhang Yi, imammedo Nvdimm driver use Memory hot-plug APIs to map it's pmem resource, which at a section granularity. When QEMU emulated the vNVDIMM device, decrease the label-storage, QEMU will put the vNVDIMMs directly next to one another in physical address space, which means that the boundary between them won't align to the 128 MB memory section size. Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com> --- hw/mem/nvdimm.c | 2 +- include/hw/mem/nvdimm.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c index 4087aca..ff6e171 100644 --- a/hw/mem/nvdimm.c +++ b/hw/mem/nvdimm.c @@ -109,7 +109,7 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp) NVDIMMDevice *nvdimm = NVDIMM(dimm); uint64_t align, pmem_size, size = memory_region_size(mr); - align = memory_region_get_alignment(mr); + align = MAX(memory_region_get_alignment(mr), NVDIMM_ALIGN_SIZE); pmem_size = size - nvdimm->label_size; nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size; diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h index 3c82751..1d384e4 100644 --- a/include/hw/mem/nvdimm.h +++ b/include/hw/mem/nvdimm.h @@ -41,6 +41,7 @@ * at least 128KB in size, which holds around 1000 labels." */ #define MIN_NAMESPACE_LABEL_SIZE (128UL << 10) +#define NVDIMM_ALIGN_SIZE (128UL << 20) #define TYPE_NVDIMM "nvdimm" #define NVDIMM(obj) OBJECT_CHECK(NVDIMMDevice, (obj), TYPE_NVDIMM) -- 2.7.4 _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource. @ 2018-06-11 10:54 ` Zhang Yi 0 siblings, 0 replies; 14+ messages in thread From: Zhang Yi @ 2018-06-11 10:54 UTC (permalink / raw) To: xiaoguangrong.eric, dan.j.williams, ross.zwisler, stefanha, yu.c.zhang Cc: mst, qemu-devel, imammedo, linux-nvdimm, ehabkost, Zhang Yi Nvdimm driver use Memory hot-plug APIs to map it's pmem resource, which at a section granularity. When QEMU emulated the vNVDIMM device, decrease the label-storage, QEMU will put the vNVDIMMs directly next to one another in physical address space, which means that the boundary between them won't align to the 128 MB memory section size. Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com> --- hw/mem/nvdimm.c | 2 +- include/hw/mem/nvdimm.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c index 4087aca..ff6e171 100644 --- a/hw/mem/nvdimm.c +++ b/hw/mem/nvdimm.c @@ -109,7 +109,7 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp) NVDIMMDevice *nvdimm = NVDIMM(dimm); uint64_t align, pmem_size, size = memory_region_size(mr); - align = memory_region_get_alignment(mr); + align = MAX(memory_region_get_alignment(mr), NVDIMM_ALIGN_SIZE); pmem_size = size - nvdimm->label_size; nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size; diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h index 3c82751..1d384e4 100644 --- a/include/hw/mem/nvdimm.h +++ b/include/hw/mem/nvdimm.h @@ -41,6 +41,7 @@ * at least 128KB in size, which holds around 1000 labels." */ #define MIN_NAMESPACE_LABEL_SIZE (128UL << 10) +#define NVDIMM_ALIGN_SIZE (128UL << 20) #define TYPE_NVDIMM "nvdimm" #define NVDIMM(obj) OBJECT_CHECK(NVDIMMDevice, (obj), TYPE_NVDIMM) -- 2.7.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource. 2018-06-11 10:54 ` Zhang Yi @ 2018-06-11 16:26 ` Stefan Hajnoczi -1 siblings, 0 replies; 14+ messages in thread From: Stefan Hajnoczi @ 2018-06-11 16:26 UTC (permalink / raw) To: Zhang Yi Cc: xiaoguangrong.eric, mst, linux-nvdimm, qemu-devel, yu.c.zhang, imammedo, dan.j.williams, ross.zwisler, ehabkost [-- Attachment #1: Type: text/plain, Size: 1825 bytes --] On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote: > Nvdimm driver use Memory hot-plug APIs to map it's pmem resource, > which at a section granularity. > > When QEMU emulated the vNVDIMM device, decrease the label-storage, > QEMU will put the vNVDIMMs directly next to one another in physical > address space, which means that the boundary between them won't > align to the 128 MB memory section size. I'm having a hard time parsing this. Where does the "128 MB memory section size" come from? ACPI? A chipset-specific value? > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com> > --- > hw/mem/nvdimm.c | 2 +- > include/hw/mem/nvdimm.h | 1 + > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c > index 4087aca..ff6e171 100644 > --- a/hw/mem/nvdimm.c > +++ b/hw/mem/nvdimm.c > @@ -109,7 +109,7 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp) > NVDIMMDevice *nvdimm = NVDIMM(dimm); > uint64_t align, pmem_size, size = memory_region_size(mr); > > - align = memory_region_get_alignment(mr); > + align = MAX(memory_region_get_alignment(mr), NVDIMM_ALIGN_SIZE); > > pmem_size = size - nvdimm->label_size; > nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size; > diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h > index 3c82751..1d384e4 100644 > --- a/include/hw/mem/nvdimm.h > +++ b/include/hw/mem/nvdimm.h > @@ -41,6 +41,7 @@ > * at least 128KB in size, which holds around 1000 labels." > */ > #define MIN_NAMESPACE_LABEL_SIZE (128UL << 10) > +#define NVDIMM_ALIGN_SIZE (128UL << 20) > > #define TYPE_NVDIMM "nvdimm" > #define NVDIMM(obj) OBJECT_CHECK(NVDIMMDevice, (obj), TYPE_NVDIMM) > -- > 2.7.4 > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource. @ 2018-06-11 16:26 ` Stefan Hajnoczi 0 siblings, 0 replies; 14+ messages in thread From: Stefan Hajnoczi @ 2018-06-11 16:26 UTC (permalink / raw) To: Zhang Yi Cc: xiaoguangrong.eric, dan.j.williams, ross.zwisler, yu.c.zhang, mst, qemu-devel, imammedo, linux-nvdimm, ehabkost [-- Attachment #1: Type: text/plain, Size: 1825 bytes --] On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote: > Nvdimm driver use Memory hot-plug APIs to map it's pmem resource, > which at a section granularity. > > When QEMU emulated the vNVDIMM device, decrease the label-storage, > QEMU will put the vNVDIMMs directly next to one another in physical > address space, which means that the boundary between them won't > align to the 128 MB memory section size. I'm having a hard time parsing this. Where does the "128 MB memory section size" come from? ACPI? A chipset-specific value? > Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com> > --- > hw/mem/nvdimm.c | 2 +- > include/hw/mem/nvdimm.h | 1 + > 2 files changed, 2 insertions(+), 1 deletion(-) > > diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c > index 4087aca..ff6e171 100644 > --- a/hw/mem/nvdimm.c > +++ b/hw/mem/nvdimm.c > @@ -109,7 +109,7 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp) > NVDIMMDevice *nvdimm = NVDIMM(dimm); > uint64_t align, pmem_size, size = memory_region_size(mr); > > - align = memory_region_get_alignment(mr); > + align = MAX(memory_region_get_alignment(mr), NVDIMM_ALIGN_SIZE); > > pmem_size = size - nvdimm->label_size; > nvdimm->label_data = memory_region_get_ram_ptr(mr) + pmem_size; > diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h > index 3c82751..1d384e4 100644 > --- a/include/hw/mem/nvdimm.h > +++ b/include/hw/mem/nvdimm.h > @@ -41,6 +41,7 @@ > * at least 128KB in size, which holds around 1000 labels." > */ > #define MIN_NAMESPACE_LABEL_SIZE (128UL << 10) > +#define NVDIMM_ALIGN_SIZE (128UL << 20) > > #define TYPE_NVDIMM "nvdimm" > #define NVDIMM(obj) OBJECT_CHECK(NVDIMMDevice, (obj), TYPE_NVDIMM) > -- > 2.7.4 > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource. 2018-06-11 16:26 ` [Qemu-devel] " Stefan Hajnoczi @ 2018-06-12 2:55 ` Dan Williams -1 siblings, 0 replies; 14+ messages in thread From: Dan Williams @ 2018-06-12 2:55 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Xiao Guangrong, Michael S. Tsirkin, linux-nvdimm, Qemu Developers, Zhang Yi, Igor Mammedov, Eduardo Habkost On Mon, Jun 11, 2018 at 9:26 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote: >> Nvdimm driver use Memory hot-plug APIs to map it's pmem resource, >> which at a section granularity. >> >> When QEMU emulated the vNVDIMM device, decrease the label-storage, >> QEMU will put the vNVDIMMs directly next to one another in physical >> address space, which means that the boundary between them won't >> align to the 128 MB memory section size. > > I'm having a hard time parsing this. > > Where does the "128 MB memory section size" come from? ACPI? > A chipset-specific value? > The devm_memremap_pages() implementation use the memory hotplug core to allocate the 'struct page' array/map for persistent memory. Memory hotplug can only be performed in terms of sections, 128MB on x86_64. There is some limited support for allowing devm_memremap_pages() to overlap 'System RAM' within a given section, but it does not currently support multiple devm_memremap_pages() calls overlapping within the same section. There is currently a kernel bug where we do not handle this unsupported configuration gracefully. The fix will cause configurations configurations that try to overlap 2 persistent memory ranges in the same section to fail. The proposed fix is trying to make sure that QEMU does not run afoul of this constraint. There is currently no line of sight to reduce the minimum memory hotplug alignment size to less than 128M. Also, as other architectures outside of x86_64 add devm_memremap_pages() support, the minimum section alignment constraint might change and is a property of a guest OS. My understanding is that some guest OSes might expect an even larger persistent memory minimum alignment. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource. @ 2018-06-12 2:55 ` Dan Williams 0 siblings, 0 replies; 14+ messages in thread From: Dan Williams @ 2018-06-12 2:55 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Zhang Yi, Xiao Guangrong, Ross Zwisler, yu.c.zhang, Michael S. Tsirkin, Qemu Developers, Igor Mammedov, linux-nvdimm, Eduardo Habkost On Mon, Jun 11, 2018 at 9:26 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote: >> Nvdimm driver use Memory hot-plug APIs to map it's pmem resource, >> which at a section granularity. >> >> When QEMU emulated the vNVDIMM device, decrease the label-storage, >> QEMU will put the vNVDIMMs directly next to one another in physical >> address space, which means that the boundary between them won't >> align to the 128 MB memory section size. > > I'm having a hard time parsing this. > > Where does the "128 MB memory section size" come from? ACPI? > A chipset-specific value? > The devm_memremap_pages() implementation use the memory hotplug core to allocate the 'struct page' array/map for persistent memory. Memory hotplug can only be performed in terms of sections, 128MB on x86_64. There is some limited support for allowing devm_memremap_pages() to overlap 'System RAM' within a given section, but it does not currently support multiple devm_memremap_pages() calls overlapping within the same section. There is currently a kernel bug where we do not handle this unsupported configuration gracefully. The fix will cause configurations configurations that try to overlap 2 persistent memory ranges in the same section to fail. The proposed fix is trying to make sure that QEMU does not run afoul of this constraint. There is currently no line of sight to reduce the minimum memory hotplug alignment size to less than 128M. Also, as other architectures outside of x86_64 add devm_memremap_pages() support, the minimum section alignment constraint might change and is a property of a guest OS. My understanding is that some guest OSes might expect an even larger persistent memory minimum alignment. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource. 2018-06-12 2:55 ` Dan Williams @ 2018-06-12 13:27 ` Zhang,Yi -1 siblings, 0 replies; 14+ messages in thread From: Zhang,Yi @ 2018-06-12 13:27 UTC (permalink / raw) To: Dan Williams, Stefan Hajnoczi Cc: Xiao Guangrong, linux-nvdimm, Michael S. Tsirkin, Qemu Developers, Igor Mammedov, Eduardo Habkost On 一, 2018-06-11 at 19:55 -0700, Dan Williams wrote: > On Mon, Jun 11, 2018 at 9:26 AM, Stefan Hajnoczi <stefanha@redhat.com > > wrote: > > > > On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote: > > > > > > Nvdimm driver use Memory hot-plug APIs to map it's pmem resource, > > > which at a section granularity. > > > > > > When QEMU emulated the vNVDIMM device, decrease the label- > > > storage, > > > QEMU will put the vNVDIMMs directly next to one another in > > > physical > > > address space, which means that the boundary between them won't > > > align to the 128 MB memory section size. > > I'm having a hard time parsing this. > > > > Where does the "128 MB memory section size" come from? ACPI? > > A chipset-specific value? > > > The devm_memremap_pages() implementation use the memory hotplug core > to allocate the 'struct page' array/map for persistent memory. Memory > hotplug can only be performed in terms of sections, 128MB on x86_64. > There is some limited support for allowing devm_memremap_pages() to > overlap 'System RAM' within a given section, but it does not > currently > support multiple devm_memremap_pages() calls overlapping within the > same section. There is currently a kernel bug where we do not handle > this unsupported configuration gracefully. The fix will cause > configurations configurations that try to overlap 2 persistent memory > ranges in the same section to fail. > > The proposed fix is trying to make sure that QEMU does not run afoul > of this constraint. > > There is currently no line of sight to reduce the minimum memory > hotplug alignment size to less than 128M. Also, as other > architectures > outside of x86_64 add devm_memremap_pages() support, the minimum > section alignment constraint might change and is a property of a > guest > OS. My understanding is that some guest OSes might expect an even > larger persistent memory minimum alignment. > Thanks Dan's explanation, I still have a question that why we overlapping the un-align area instead of drop it? and let it align to the next section. _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource. @ 2018-06-12 13:27 ` Zhang,Yi 0 siblings, 0 replies; 14+ messages in thread From: Zhang,Yi @ 2018-06-12 13:27 UTC (permalink / raw) To: Dan Williams, Stefan Hajnoczi Cc: Xiao Guangrong, Michael S. Tsirkin, linux-nvdimm, Qemu Developers, yu.c.zhang, Igor Mammedov, Ross Zwisler, Eduardo Habkost On 一, 2018-06-11 at 19:55 -0700, Dan Williams wrote: > On Mon, Jun 11, 2018 at 9:26 AM, Stefan Hajnoczi <stefanha@redhat.com > > wrote: > > > > On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote: > > > > > > Nvdimm driver use Memory hot-plug APIs to map it's pmem resource, > > > which at a section granularity. > > > > > > When QEMU emulated the vNVDIMM device, decrease the label- > > > storage, > > > QEMU will put the vNVDIMMs directly next to one another in > > > physical > > > address space, which means that the boundary between them won't > > > align to the 128 MB memory section size. > > I'm having a hard time parsing this. > > > > Where does the "128 MB memory section size" come from? ACPI? > > A chipset-specific value? > > > The devm_memremap_pages() implementation use the memory hotplug core > to allocate the 'struct page' array/map for persistent memory. Memory > hotplug can only be performed in terms of sections, 128MB on x86_64. > There is some limited support for allowing devm_memremap_pages() to > overlap 'System RAM' within a given section, but it does not > currently > support multiple devm_memremap_pages() calls overlapping within the > same section. There is currently a kernel bug where we do not handle > this unsupported configuration gracefully. The fix will cause > configurations configurations that try to overlap 2 persistent memory > ranges in the same section to fail. > > The proposed fix is trying to make sure that QEMU does not run afoul > of this constraint. > > There is currently no line of sight to reduce the minimum memory > hotplug alignment size to less than 128M. Also, as other > architectures > outside of x86_64 add devm_memremap_pages() support, the minimum > section alignment constraint might change and is a property of a > guest > OS. My understanding is that some guest OSes might expect an even > larger persistent memory minimum alignment. > Thanks Dan's explanation, I still have a question that why we overlapping the un-align area instead of drop it? and let it align to the next section. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource. 2018-06-12 2:55 ` Dan Williams @ 2018-06-12 15:04 ` Haozhong Zhang -1 siblings, 0 replies; 14+ messages in thread From: Haozhong Zhang @ 2018-06-12 15:04 UTC (permalink / raw) To: Dan Williams, Zhang Yi Cc: Xiao Guangrong, linux-nvdimm, Michael S. Tsirkin, Qemu Developers, yu.c.zhang, Stefan Hajnoczi, Igor Mammedov, Ross Zwisler, Eduardo Habkost [-- Attachment #1: Type: text/plain, Size: 2079 bytes --] On 06/11/18 19:55, Dan Williams wrote: > On Mon, Jun 11, 2018 at 9:26 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote: > >> Nvdimm driver use Memory hot-plug APIs to map it's pmem resource, > >> which at a section granularity. > >> > >> When QEMU emulated the vNVDIMM device, decrease the label-storage, > >> QEMU will put the vNVDIMMs directly next to one another in physical > >> address space, which means that the boundary between them won't > >> align to the 128 MB memory section size. > > > > I'm having a hard time parsing this. > > > > Where does the "128 MB memory section size" come from? ACPI? > > A chipset-specific value? > > > > The devm_memremap_pages() implementation use the memory hotplug core > to allocate the 'struct page' array/map for persistent memory. Memory > hotplug can only be performed in terms of sections, 128MB on x86_64. IIUC, it also affects the normal RAM hotplug to a Linux VM on QEMU. If that is the case, it will be helpful to lift this option to pc-dimm. Thanks, Haozhong > There is some limited support for allowing devm_memremap_pages() to > overlap 'System RAM' within a given section, but it does not currently > support multiple devm_memremap_pages() calls overlapping within the > same section. There is currently a kernel bug where we do not handle > this unsupported configuration gracefully. The fix will cause > configurations configurations that try to overlap 2 persistent memory > ranges in the same section to fail. > > The proposed fix is trying to make sure that QEMU does not run afoul > of this constraint. > > There is currently no line of sight to reduce the minimum memory > hotplug alignment size to less than 128M. Also, as other architectures > outside of x86_64 add devm_memremap_pages() support, the minimum > section alignment constraint might change and is a property of a guest > OS. My understanding is that some guest OSes might expect an even > larger persistent memory minimum alignment. > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource. @ 2018-06-12 15:04 ` Haozhong Zhang 0 siblings, 0 replies; 14+ messages in thread From: Haozhong Zhang @ 2018-06-12 15:04 UTC (permalink / raw) To: Dan Williams, Zhang Yi Cc: Stefan Hajnoczi, Xiao Guangrong, Michael S. Tsirkin, linux-nvdimm, Qemu Developers, yu.c.zhang, Igor Mammedov, Ross Zwisler, Eduardo Habkost [-- Attachment #1: Type: text/plain, Size: 2079 bytes --] On 06/11/18 19:55, Dan Williams wrote: > On Mon, Jun 11, 2018 at 9:26 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote: > >> Nvdimm driver use Memory hot-plug APIs to map it's pmem resource, > >> which at a section granularity. > >> > >> When QEMU emulated the vNVDIMM device, decrease the label-storage, > >> QEMU will put the vNVDIMMs directly next to one another in physical > >> address space, which means that the boundary between them won't > >> align to the 128 MB memory section size. > > > > I'm having a hard time parsing this. > > > > Where does the "128 MB memory section size" come from? ACPI? > > A chipset-specific value? > > > > The devm_memremap_pages() implementation use the memory hotplug core > to allocate the 'struct page' array/map for persistent memory. Memory > hotplug can only be performed in terms of sections, 128MB on x86_64. IIUC, it also affects the normal RAM hotplug to a Linux VM on QEMU. If that is the case, it will be helpful to lift this option to pc-dimm. Thanks, Haozhong > There is some limited support for allowing devm_memremap_pages() to > overlap 'System RAM' within a given section, but it does not currently > support multiple devm_memremap_pages() calls overlapping within the > same section. There is currently a kernel bug where we do not handle > this unsupported configuration gracefully. The fix will cause > configurations configurations that try to overlap 2 persistent memory > ranges in the same section to fail. > > The proposed fix is trying to make sure that QEMU does not run afoul > of this constraint. > > There is currently no line of sight to reduce the minimum memory > hotplug alignment size to less than 128M. Also, as other architectures > outside of x86_64 add devm_memremap_pages() support, the minimum > section alignment constraint might change and is a property of a guest > OS. My understanding is that some guest OSes might expect an even > larger persistent memory minimum alignment. > [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource. 2018-06-12 15:04 ` [Qemu-devel] " Haozhong Zhang @ 2018-06-13 14:16 ` Stefan Hajnoczi -1 siblings, 0 replies; 14+ messages in thread From: Stefan Hajnoczi @ 2018-06-13 14:16 UTC (permalink / raw) To: Haozhong Zhang Cc: Xiao Guangrong, linux-nvdimm, Michael S. Tsirkin, Qemu Developers, Zhang Yi, yu.c.zhang, Igor Mammedov, Dan Williams, Ross Zwisler, Eduardo Habkost [-- Attachment #1: Type: text/plain, Size: 1915 bytes --] On Tue, Jun 12, 2018 at 11:04:25PM +0800, Haozhong Zhang wrote: > On 06/11/18 19:55, Dan Williams wrote: > > On Mon, Jun 11, 2018 at 9:26 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote: > > >> Nvdimm driver use Memory hot-plug APIs to map it's pmem resource, > > >> which at a section granularity. > > >> > > >> When QEMU emulated the vNVDIMM device, decrease the label-storage, > > >> QEMU will put the vNVDIMMs directly next to one another in physical > > >> address space, which means that the boundary between them won't > > >> align to the 128 MB memory section size. > > > > > > I'm having a hard time parsing this. > > > > > > Where does the "128 MB memory section size" come from? ACPI? > > > A chipset-specific value? > > > > > > > The devm_memremap_pages() implementation use the memory hotplug core > > to allocate the 'struct page' array/map for persistent memory. Memory > > hotplug can only be performed in terms of sections, 128MB on x86_64. > > IIUC, it also affects the normal RAM hotplug to a Linux VM on QEMU. If > that is the case, it will be helpful to lift this option to pc-dimm. I agree. There should be one place in QEMU for the machine-specific hotplug memory alignment value. It would be best to track down the property of the hardware that determines these alignment values instead of letting current Linux software limitations determine QEMU's behavior. That way we know that QEMU emulates real hardware accurately and will work with any guest OS. I imagine it depends on the chipset (i.e. QEMU machine type). If it proves hard to pinpoint the hardware limit, then please include a comment in the code explaining that a value that works with Linux guests is being used for now. This will allow people reading the code to understand where this behavior comes from. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource. @ 2018-06-13 14:16 ` Stefan Hajnoczi 0 siblings, 0 replies; 14+ messages in thread From: Stefan Hajnoczi @ 2018-06-13 14:16 UTC (permalink / raw) To: Haozhong Zhang Cc: Dan Williams, Zhang Yi, Xiao Guangrong, Michael S. Tsirkin, linux-nvdimm, Qemu Developers, yu.c.zhang, Igor Mammedov, Ross Zwisler, Eduardo Habkost [-- Attachment #1: Type: text/plain, Size: 1915 bytes --] On Tue, Jun 12, 2018 at 11:04:25PM +0800, Haozhong Zhang wrote: > On 06/11/18 19:55, Dan Williams wrote: > > On Mon, Jun 11, 2018 at 9:26 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote: > > >> Nvdimm driver use Memory hot-plug APIs to map it's pmem resource, > > >> which at a section granularity. > > >> > > >> When QEMU emulated the vNVDIMM device, decrease the label-storage, > > >> QEMU will put the vNVDIMMs directly next to one another in physical > > >> address space, which means that the boundary between them won't > > >> align to the 128 MB memory section size. > > > > > > I'm having a hard time parsing this. > > > > > > Where does the "128 MB memory section size" come from? ACPI? > > > A chipset-specific value? > > > > > > > The devm_memremap_pages() implementation use the memory hotplug core > > to allocate the 'struct page' array/map for persistent memory. Memory > > hotplug can only be performed in terms of sections, 128MB on x86_64. > > IIUC, it also affects the normal RAM hotplug to a Linux VM on QEMU. If > that is the case, it will be helpful to lift this option to pc-dimm. I agree. There should be one place in QEMU for the machine-specific hotplug memory alignment value. It would be best to track down the property of the hardware that determines these alignment values instead of letting current Linux software limitations determine QEMU's behavior. That way we know that QEMU emulates real hardware accurately and will work with any guest OS. I imagine it depends on the chipset (i.e. QEMU machine type). If it proves hard to pinpoint the hardware limit, then please include a comment in the code explaining that a value that works with Linux guests is being used for now. This will allow people reading the code to understand where this behavior comes from. Stefan [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 455 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource. 2018-06-12 15:04 ` [Qemu-devel] " Haozhong Zhang @ 2018-06-13 16:30 ` Igor Mammedov -1 siblings, 0 replies; 14+ messages in thread From: Igor Mammedov @ 2018-06-13 16:30 UTC (permalink / raw) To: Haozhong Zhang Cc: Xiao Guangrong, Michael S. Tsirkin, linux-nvdimm, Qemu Developers, Zhang Yi, Stefan Hajnoczi, Eduardo Habkost On Tue, 12 Jun 2018 23:04:25 +0800 Haozhong Zhang <hzzhan9@gmail.com> wrote: > On 06/11/18 19:55, Dan Williams wrote: > > On Mon, Jun 11, 2018 at 9:26 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote: > > >> Nvdimm driver use Memory hot-plug APIs to map it's pmem resource, > > >> which at a section granularity. > > >> > > >> When QEMU emulated the vNVDIMM device, decrease the label-storage, > > >> QEMU will put the vNVDIMMs directly next to one another in physical > > >> address space, which means that the boundary between them won't > > >> align to the 128 MB memory section size. > > > > > > I'm having a hard time parsing this. > > > > > > Where does the "128 MB memory section size" come from? ACPI? > > > A chipset-specific value? > > > > > > > The devm_memremap_pages() implementation use the memory hotplug core > > to allocate the 'struct page' array/map for persistent memory. Memory > > hotplug can only be performed in terms of sections, 128MB on x86_64. > > IIUC, it also affects the normal RAM hotplug to a Linux VM on QEMU. If > that is the case, it will be helpful to lift this option to pc-dimm. Default alignment on page size boundary is implemented for the reason that QEMU has no idea about guest os alignments req. and these requirements might vary greatly depending on guest os running. With some guests it works just fine even with 2M alignments/dimm sizes. So it's up to upper layers which know what guest os is running to pick plugged dimm sizes. So if a particular linux version minimum block size is 128, then mgmt needs to plug dimm with size which is multiple of that. That should satisfy whatever alignment req guest os has. In case of nvdimm we need to fix address allocation in QEMU to account for label size which broke above rule leading to "overlap" over label area of nvdimm which isn't mapped into guest address space, but that's probably it. PS: not related to patch question. Intel guys contributed most of the code to nvdimm and continue actively to develop it. Can we have a designated maintainer for nvdimm part from Intel in addition to authors who just code/merge feature and disappear (not reachable) shortly after that? > Thanks, > Haozhong > > > There is some limited support for allowing devm_memremap_pages() to > > overlap 'System RAM' within a given section, but it does not currently > > support multiple devm_memremap_pages() calls overlapping within the > > same section. There is currently a kernel bug where we do not handle > > this unsupported configuration gracefully. The fix will cause > > configurations configurations that try to overlap 2 persistent memory > > ranges in the same section to fail. > > > > The proposed fix is trying to make sure that QEMU does not run afoul > > of this constraint. > > > > There is currently no line of sight to reduce the minimum memory > > hotplug alignment size to less than 128M. Also, as other architectures > > outside of x86_64 add devm_memremap_pages() support, the minimum > > section alignment constraint might change and is a property of a guest > > OS. My understanding is that some guest OSes might expect an even > > larger persistent memory minimum alignment. > > _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource. @ 2018-06-13 16:30 ` Igor Mammedov 0 siblings, 0 replies; 14+ messages in thread From: Igor Mammedov @ 2018-06-13 16:30 UTC (permalink / raw) To: Haozhong Zhang Cc: Dan Williams, Zhang Yi, Xiao Guangrong, linux-nvdimm, Michael S. Tsirkin, Qemu Developers, yu.c.zhang, Stefan Hajnoczi, Ross Zwisler, Eduardo Habkost On Tue, 12 Jun 2018 23:04:25 +0800 Haozhong Zhang <hzzhan9@gmail.com> wrote: > On 06/11/18 19:55, Dan Williams wrote: > > On Mon, Jun 11, 2018 at 9:26 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > On Mon, Jun 11, 2018 at 06:54:25PM +0800, Zhang Yi wrote: > > >> Nvdimm driver use Memory hot-plug APIs to map it's pmem resource, > > >> which at a section granularity. > > >> > > >> When QEMU emulated the vNVDIMM device, decrease the label-storage, > > >> QEMU will put the vNVDIMMs directly next to one another in physical > > >> address space, which means that the boundary between them won't > > >> align to the 128 MB memory section size. > > > > > > I'm having a hard time parsing this. > > > > > > Where does the "128 MB memory section size" come from? ACPI? > > > A chipset-specific value? > > > > > > > The devm_memremap_pages() implementation use the memory hotplug core > > to allocate the 'struct page' array/map for persistent memory. Memory > > hotplug can only be performed in terms of sections, 128MB on x86_64. > > IIUC, it also affects the normal RAM hotplug to a Linux VM on QEMU. If > that is the case, it will be helpful to lift this option to pc-dimm. Default alignment on page size boundary is implemented for the reason that QEMU has no idea about guest os alignments req. and these requirements might vary greatly depending on guest os running. With some guests it works just fine even with 2M alignments/dimm sizes. So it's up to upper layers which know what guest os is running to pick plugged dimm sizes. So if a particular linux version minimum block size is 128, then mgmt needs to plug dimm with size which is multiple of that. That should satisfy whatever alignment req guest os has. In case of nvdimm we need to fix address allocation in QEMU to account for label size which broke above rule leading to "overlap" over label area of nvdimm which isn't mapped into guest address space, but that's probably it. PS: not related to patch question. Intel guys contributed most of the code to nvdimm and continue actively to develop it. Can we have a designated maintainer for nvdimm part from Intel in addition to authors who just code/merge feature and disappear (not reachable) shortly after that? > Thanks, > Haozhong > > > There is some limited support for allowing devm_memremap_pages() to > > overlap 'System RAM' within a given section, but it does not currently > > support multiple devm_memremap_pages() calls overlapping within the > > same section. There is currently a kernel bug where we do not handle > > this unsupported configuration gracefully. The fix will cause > > configurations configurations that try to overlap 2 persistent memory > > ranges in the same section to fail. > > > > The proposed fix is trying to make sure that QEMU does not run afoul > > of this constraint. > > > > There is currently no line of sight to reduce the minimum memory > > hotplug alignment size to less than 128M. Also, as other architectures > > outside of x86_64 add devm_memremap_pages() support, the minimum > > section alignment constraint might change and is a property of a guest > > OS. My understanding is that some guest OSes might expect an even > > larger persistent memory minimum alignment. > > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-06-13 16:30 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-06-11 10:54 [Qemu-devel] [RFC PATCH 1/1] nvdimm: let qemu requiring section alignment of pmem resource Zhang Yi 2018-06-11 10:54 ` Zhang Yi 2018-06-11 16:26 ` Stefan Hajnoczi 2018-06-11 16:26 ` [Qemu-devel] " Stefan Hajnoczi 2018-06-12 2:55 ` Dan Williams 2018-06-12 2:55 ` Dan Williams 2018-06-12 13:27 ` Zhang,Yi 2018-06-12 13:27 ` Zhang,Yi 2018-06-12 15:04 ` Haozhong Zhang 2018-06-12 15:04 ` [Qemu-devel] " Haozhong Zhang 2018-06-13 14:16 ` Stefan Hajnoczi 2018-06-13 14:16 ` [Qemu-devel] " Stefan Hajnoczi 2018-06-13 16:30 ` Igor Mammedov 2018-06-13 16:30 ` Igor Mammedov
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.