From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35603) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cOCEc-0004sA-OO for qemu-devel@nongnu.org; Mon, 02 Jan 2017 18:41:48 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cOCEZ-0006IH-Lj for qemu-devel@nongnu.org; Mon, 02 Jan 2017 18:41:46 -0500 Date: Tue, 3 Jan 2017 10:34:23 +1100 From: David Gibson Message-ID: <20170102233423.GH12761@umbus.fritz.box> References: <20161222052212.49006-1-aik@ozlabs.ru> <20161222052212.49006-2-aik@ozlabs.ru> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="+QwZB9vYiNIzNXIj" Content-Disposition: inline In-Reply-To: <20161222052212.49006-2-aik@ozlabs.ru> Subject: Re: [Qemu-devel] [PATCH qemu 1/2] exec, kvm, target-ppc: Move getrampagesize() to common code List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: qemu-devel@nongnu.org, qemu-ppc@nongnu.org, pbonzini@redhat.com --+QwZB9vYiNIzNXIj Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Dec 22, 2016 at 04:22:11PM +1100, Alexey Kardashevskiy wrote: > getrampagesize() returns the largest supported page size and mainly > used to know if huge pages are enabled. >=20 > However is implemented in target-ppc/kvm.c and not available > in TCG or other architectures. >=20 > This renames and moves gethugepagesize() to mmap-alloc.c where > fd-based analog of it is already implemented. This renames and moves > getrampagesize() to exec.c as it seems to be the common place for > helpers like this. >=20 > This first user for it is going to be a spapr-pci-host-bridge which > needs to know the largest RAM page size so the guest could try > using bigger IOMMU pages to save memory. >=20 > Signed-off-by: Alexey Kardashevskiy Reviewed-by: David Gibson Seems sensible to me, but I'm not comfortable merging this via my tree since it touches such core code. Probably should go via Paolo. > --- > include/exec/ram_addr.h | 1 + > include/qemu/mmap-alloc.h | 2 + > exec.c | 82 ++++++++++++++++++++++++++++++++++++ > target-ppc/kvm.c | 105 ++--------------------------------------= ------ > util/mmap-alloc.c | 25 +++++++++++ > 5 files changed, 113 insertions(+), 102 deletions(-) >=20 > diff --git a/include/exec/ram_addr.h b/include/exec/ram_addr.h > index 54d7108a9e..3935cbcfcd 100644 > --- a/include/exec/ram_addr.h > +++ b/include/exec/ram_addr.h > @@ -91,6 +91,7 @@ typedef struct RAMList { > } RAMList; > extern RAMList ram_list; > =20 > +long qemu_getrampagesize(void); > ram_addr_t last_ram_offset(void); > void qemu_mutex_lock_ramlist(void); > void qemu_mutex_unlock_ramlist(void); > diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h > index 933c024ac5..50385e3f81 100644 > --- a/include/qemu/mmap-alloc.h > +++ b/include/qemu/mmap-alloc.h > @@ -5,6 +5,8 @@ > =20 > size_t qemu_fd_getpagesize(int fd); > =20 > +size_t qemu_mempath_getpagesize(const char *mem_path); > + > void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared); > =20 > void qemu_ram_munmap(void *ptr, size_t size); > diff --git a/exec.c b/exec.c > index 08c558eecf..d73b477a70 100644 > --- a/exec.c > +++ b/exec.c > @@ -32,6 +32,7 @@ > #endif > #include "sysemu/kvm.h" > #include "sysemu/sysemu.h" > +#include "sysemu/numa.h" > #include "qemu/timer.h" > #include "qemu/config-file.h" > #include "qemu/error-report.h" > @@ -1218,6 +1219,87 @@ void qemu_mutex_unlock_ramlist(void) > } > =20 > #ifdef __linux__ > +/* > + * FIXME TOCTTOU: this iterates over memory backends' mem-path, which > + * may or may not name the same files / on the same filesystem now as > + * when we actually open and map them. Iterate over the file > + * descriptors instead, and use qemu_fd_getpagesize(). > + */ > +static int find_max_supported_pagesize(Object *obj, void *opaque) > +{ > + char *mem_path; > + long *hpsize_min =3D opaque; > + > + if (object_dynamic_cast(obj, TYPE_MEMORY_BACKEND)) { > + mem_path =3D object_property_get_str(obj, "mem-path", NULL); > + if (mem_path) { > + long hpsize =3D qemu_mempath_getpagesize(mem_path); > + if (hpsize < *hpsize_min) { > + *hpsize_min =3D hpsize; > + } > + } else { > + *hpsize_min =3D getpagesize(); > + } > + } > + > + return 0; > +} > + > +long qemu_getrampagesize(void) > +{ > + long hpsize =3D LONG_MAX; > + long mainrampagesize; > + Object *memdev_root; > + > + if (mem_path) { > + mainrampagesize =3D qemu_mempath_getpagesize(mem_path); > + } else { > + mainrampagesize =3D getpagesize(); > + } > + > + /* it's possible we have memory-backend objects with > + * hugepage-backed RAM. these may get mapped into system > + * address space via -numa parameters or memory hotplug > + * hooks. we want to take these into account, but we > + * also want to make sure these supported hugepage > + * sizes are applicable across the entire range of memory > + * we may boot from, so we take the min across all > + * backends, and assume normal pages in cases where a > + * backend isn't backed by hugepages. > + */ > + memdev_root =3D object_resolve_path("/objects", NULL); > + if (memdev_root) { > + object_child_foreach(memdev_root, find_max_supported_pagesize, &= hpsize); > + } > + if (hpsize =3D=3D LONG_MAX) { > + /* No additional memory regions found =3D=3D> Report main RAM pa= ge size */ > + return mainrampagesize; > + } > + > + /* If NUMA is disabled or the NUMA nodes are not backed with a > + * memory-backend, then there is at least one node using "normal" RA= M, > + * so if its page size is smaller we have got to report that size in= stead. > + */ > + if (hpsize > mainrampagesize && > + (nb_numa_nodes =3D=3D 0 || numa_info[0].node_memdev =3D=3D NULL)= ) { > + static bool warned; > + if (!warned) { > + error_report("Huge page support disabled (n/a for main memor= y)."); > + warned =3D true; > + } > + return mainrampagesize; > + } > + > + return hpsize; > +} > +#else > +long qemu_getrampagesize(void) > +{ > + return getpagesize(); > +} > +#endif > + > +#ifdef __linux__ > static int64_t get_file_size(int fd) > { > int64_t size =3D lseek(fd, 0, SEEK_END); > diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c > index 6e91a4d8bb..e0abffa8ad 100644 > --- a/target-ppc/kvm.c > +++ b/target-ppc/kvm.c > @@ -42,6 +42,7 @@ > #include "trace.h" > #include "exec/gdbstub.h" > #include "exec/memattrs.h" > +#include "exec/ram_addr.h" > #include "sysemu/hostmem.h" > #include "qemu/cutils.h" > #if defined(TARGET_PPC64) > @@ -325,106 +326,6 @@ static void kvm_get_smmu_info(PowerPCCPU *cpu, stru= ct kvm_ppc_smmu_info *info) > kvm_get_fallback_smmu_info(cpu, info); > } > =20 > -static long gethugepagesize(const char *mem_path) > -{ > - struct statfs fs; > - int ret; > - > - do { > - ret =3D statfs(mem_path, &fs); > - } while (ret !=3D 0 && errno =3D=3D EINTR); > - > - if (ret !=3D 0) { > - fprintf(stderr, "Couldn't statfs() memory path: %s\n", > - strerror(errno)); > - exit(1); > - } > - > -#define HUGETLBFS_MAGIC 0x958458f6 > - > - if (fs.f_type !=3D HUGETLBFS_MAGIC) { > - /* Explicit mempath, but it's ordinary pages */ > - return getpagesize(); > - } > - > - /* It's hugepage, return the huge page size */ > - return fs.f_bsize; > -} > - > -/* > - * FIXME TOCTTOU: this iterates over memory backends' mem-path, which > - * may or may not name the same files / on the same filesystem now as > - * when we actually open and map them. Iterate over the file > - * descriptors instead, and use qemu_fd_getpagesize(). > - */ > -static int find_max_supported_pagesize(Object *obj, void *opaque) > -{ > - char *mem_path; > - long *hpsize_min =3D opaque; > - > - if (object_dynamic_cast(obj, TYPE_MEMORY_BACKEND)) { > - mem_path =3D object_property_get_str(obj, "mem-path", NULL); > - if (mem_path) { > - long hpsize =3D gethugepagesize(mem_path); > - if (hpsize < *hpsize_min) { > - *hpsize_min =3D hpsize; > - } > - } else { > - *hpsize_min =3D getpagesize(); > - } > - } > - > - return 0; > -} > - > -static long getrampagesize(void) > -{ > - long hpsize =3D LONG_MAX; > - long mainrampagesize; > - Object *memdev_root; > - > - if (mem_path) { > - mainrampagesize =3D gethugepagesize(mem_path); > - } else { > - mainrampagesize =3D getpagesize(); > - } > - > - /* it's possible we have memory-backend objects with > - * hugepage-backed RAM. these may get mapped into system > - * address space via -numa parameters or memory hotplug > - * hooks. we want to take these into account, but we > - * also want to make sure these supported hugepage > - * sizes are applicable across the entire range of memory > - * we may boot from, so we take the min across all > - * backends, and assume normal pages in cases where a > - * backend isn't backed by hugepages. > - */ > - memdev_root =3D object_resolve_path("/objects", NULL); > - if (memdev_root) { > - object_child_foreach(memdev_root, find_max_supported_pagesize, &= hpsize); > - } > - if (hpsize =3D=3D LONG_MAX) { > - /* No additional memory regions found =3D=3D> Report main RAM pa= ge size */ > - return mainrampagesize; > - } > - > - /* If NUMA is disabled or the NUMA nodes are not backed with a > - * memory-backend, then there is at least one node using "normal" RA= M, > - * so if its page size is smaller we have got to report that size in= stead. > - */ > - if (hpsize > mainrampagesize && > - (nb_numa_nodes =3D=3D 0 || numa_info[0].node_memdev =3D=3D NULL)= ) { > - static bool warned; > - if (!warned) { > - error_report("Huge page support disabled (n/a for main memor= y)."); > - warned =3D true; > - } > - return mainrampagesize; > - } > - > - return hpsize; > -} > - > static bool kvm_valid_page_size(uint32_t flags, long rampgsize, uint32_t= shift) > { > if (!(flags & KVM_PPC_PAGE_SIZES_REAL)) { > @@ -454,7 +355,7 @@ static void kvm_fixup_page_sizes(PowerPCCPU *cpu) > has_smmu_info =3D true; > } > =20 > - rampagesize =3D getrampagesize(); > + rampagesize =3D qemu_getrampagesize(); > =20 > /* Convert to QEMU form */ > memset(&env->sps, 0, sizeof(env->sps)); > @@ -2177,7 +2078,7 @@ uint64_t kvmppc_rma_size(uint64_t current_size, uns= igned int hash_shift) > /* Find the largest hardware supported page size that's less than > * or equal to the (logical) backing page size of guest RAM */ > kvm_get_smmu_info(POWERPC_CPU(first_cpu), &info); > - rampagesize =3D getrampagesize(); > + rampagesize =3D qemu_getrampagesize(); > best_page_shift =3D 0; > =20 > for (i =3D 0; i < KVM_PPC_PAGE_SIZES_MAX_SZ; i++) { > diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c > index 5a85aa3c89..564c79109c 100644 > --- a/util/mmap-alloc.c > +++ b/util/mmap-alloc.c > @@ -39,6 +39,31 @@ size_t qemu_fd_getpagesize(int fd) > return getpagesize(); > } > =20 > +size_t qemu_mempath_getpagesize(const char *mem_path) > +{ > +#ifdef CONFIG_LINUX > + struct statfs fs; > + int ret; > + > + do { > + ret =3D statfs(mem_path, &fs); > + } while (ret !=3D 0 && errno =3D=3D EINTR); > + > + if (ret !=3D 0) { > + fprintf(stderr, "Couldn't statfs() memory path: %s\n", > + strerror(errno)); > + exit(1); > + } > + > + if (fs.f_type =3D=3D HUGETLBFS_MAGIC) { > + /* It's hugepage, return the huge page size */ > + return fs.f_bsize; > + } > +#endif > + > + return getpagesize(); > +} > + > void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared) > { > /* --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --+QwZB9vYiNIzNXIj Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJYauN8AAoJEGw4ysog2bOSA3YP/1Zi9VBAGowGEtPLz+MVCp3M HqB6SonxAxrkBlZT94KLEK+5dXKt6UCi5ALG3OubmmYy1nf16gOOOfnVzJ/CW9zr mSGCSWBDfDOjC0LF2DOXwlp69NDleu/Vb0vLxoIYO19JbjB40vHsCj+clBMjZoMb TJ+jgBFFKutcOz1E0SnETDMO4VYdiRa+3uH7l/69mcph9/nMiCDvgsWTZERqfw2D XhVFJTZAICRh/iDwmhzOmOsokJiLS+93wliLTEYO8ZyX8iA7Ycof/lhIRiyLykJH B0y39d8BAYtcWqLbaUdl4ThEmyic+hvcfhxulzDeS4AOSXCaJ2ccVUnIwKsMYd3D jnCH0TnuOC8D8zz3j3pM29lcmKNpB9yD6D3i4TrwoOOzb60FldKUc5cqQyji2WNV YqMQNbLjMP8xlmNnYZZVtFuwssiBtVF8h8ADyvQcDtfsiwqJH2b9VCs/1Wre2/+Z 7WuNPxqe0CRxot02JtcmF35SXrR9G5b2olpSZ9hiLl8YNMVxd3h7gRTajpxNjZZw +GvGYQNKta4tykJr5qSyrx07/TSFrBPO9RMHEMO1mf/YEUarGNgticUi0PXLGRin yMEJJyQJQcv8Z+H8b02xkBcg4k/IUag0qgJZlzz6g1APvmd2JHokQT9rggS+FEr3 wgd4/QQuRvTM6AWLmFSA =qRBt -----END PGP SIGNATURE----- --+QwZB9vYiNIzNXIj--