From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49667) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eJ9lT-0007Ja-0x for qemu-devel@nongnu.org; Sun, 26 Nov 2017 22:07:24 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eJ9lO-0004zf-6E for qemu-devel@nongnu.org; Sun, 26 Nov 2017 22:07:23 -0500 Received: from mga07.intel.com ([134.134.136.100]:24852) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eJ9lN-0004zW-Q5 for qemu-devel@nongnu.org; Sun, 26 Nov 2017 22:07:18 -0500 Date: Mon, 27 Nov 2017 11:06:35 +0800 From: Zhong Yang Message-ID: <20171127030635.GA29806@yangzhon-Virtual> References: <1511505030-3669-1-git-send-email-yang.zhong@intel.com> <5A1A5C6E.9060409@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5A1A5C6E.9060409@huawei.com> Subject: Re: [Qemu-devel] [PATCH v3] rcu: reduce more than 7MB heap memory by malloc_trim() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Shannon Zhao Cc: qemu-devel@nongnu.org, pbonzini@redhat.com, weidong.huang@huawei.com, arei.gonglei@huawei.com, liujunjie23@huawei.com, wangxinxin.wang@huawei.com, stone.xulei@huawei.com, zhang.zhanghailiang@huawei.com, stefanha@redhat.com, berrange@redhat.com, yang.zhong@intel.com On Sun, Nov 26, 2017 at 02:17:18PM +0800, Shannon Zhao wrote: > Hi, > > On 2017/11/24 14:30, Yang Zhong wrote: > > Since there are some issues in memory alloc/free machenism > > in glibc for little chunk memory, if Qemu frequently > > alloc/free little chunk memory, the glibc doesn't alloc > > little chunk memory from free list of glibc and still > > allocate from OS, which make the heap size bigger and bigger. > > > > This patch introduce malloc_trim(), which will free heap memory. > > > > Below are test results from smaps file. > > (1)without patch > > 55f0783e1000-55f07992a000 rw-p 00000000 00:00 0 [heap] > > Size: 21796 kB > > Rss: 14260 kB > > Pss: 14260 kB > > > > (2)with patch > > 55cc5fadf000-55cc61008000 rw-p 00000000 00:00 0 [heap] > > Size: 21668 kB > > Rss: 6940 kB > > Pss: 6940 kB > > > > Signed-off-by: Yang Zhong > > --- > > configure | 29 +++++++++++++++++++++++++++++ > > util/rcu.c | 6 ++++++ > > 2 files changed, 35 insertions(+) > > > > diff --git a/configure b/configure > > index 0c6e757..6292ab0 100755 > > --- a/configure > > +++ b/configure > > @@ -426,6 +426,7 @@ vxhs="" > > supported_cpu="no" > > supported_os="no" > > bogus_os="no" > > +malloc_trim="yes" > > > > # parse CC options first > > for opt do > > @@ -3857,6 +3858,30 @@ if test "$tcmalloc" = "yes" && test "$jemalloc" = "yes" ; then > > exit 1 > > fi > > > > +# Even if malloc_trim() is available, these non-libc memory allocators > > +# do not support it. > > +if test "$tcmalloc" = "yes" || test "$jemalloc" = "yes" ; then > > + if test "$malloc_trim" = "yes" ; then > > + echo "Disabling malloc_trim with non-libc memory allocator" > > + fi > > + malloc_trim="no" > > +fi > > + > > +####################################### > > +# malloc_trim > > + > > +if test "$malloc_trim" != "no" ; then > > + cat > $TMPC << EOF > > +#include > > +int main(void) { malloc_trim(0); return 0; } > > +EOF > > + if compile_prog "" "" ; then > > + malloc_trim="yes" > > + else > > + malloc_trim="no" > > + fi > > +fi > > + > > ########################################## > > # tcmalloc probe > > > > @@ -6012,6 +6037,10 @@ if test "$opengl" = "yes" ; then > > fi > > fi > > > > +if test "$malloc_trim" = "yes" ; then > > + echo "CONFIG_MALLOC_TRIM=y" >> $config_host_mak > > +fi > > + > > if test "$avx2_opt" = "yes" ; then > > echo "CONFIG_AVX2_OPT=y" >> $config_host_mak > > fi > > diff --git a/util/rcu.c b/util/rcu.c > > index ca5a63e..f403b77 100644 > > --- a/util/rcu.c > > +++ b/util/rcu.c > > @@ -32,6 +32,9 @@ > > #include "qemu/atomic.h" > > #include "qemu/thread.h" > > #include "qemu/main-loop.h" > > +#if defined(CONFIG_MALLOC_TRIM) > > +#include > > +#endif > > > > /* > > * Global grace period counter. Bit 0 is always one in rcu_gp_ctr. > > @@ -272,6 +275,9 @@ static void *call_rcu_thread(void *opaque) > > node->func(node); > > } > > qemu_mutex_unlock_iothread(); > > +#if defined(CONFIG_MALLOC_TRIM) > > + malloc_trim(4 * 1024 * 1024); > > +#endif > > } > > abort(); > > } > > > > Looks like this patch introduces a performance regression. With this > patch the time of booting a VM with 60 scsi disks on ARM64 is increased > by 200+ seconds. > Hello Shannon, Thanks for your reply! As for your concerns, i did VM bootup compared tests, and results as below: #test command ./qemu-system-x86_64 -enable-kvm -cpu host -m 2G -smp cpus=4,cores=4,\ threads=1,sockets=1 -drive format=raw,\ file=test.img,index=0,media=disk -nographic #without patch root@intel-internal-corei7-64:~# systemd-analyze Startup finished in 4.979s (kernel) + 1.214s (userspace) = 6.193s root@intel-internal-corei7-64:~# systemd-analyze Startup finished in 4.922s (kernel) + 1.175s (userspace) = 6.097s root@intel-internal-corei7-64:~# systemd-analyze Startup finished in 4.990s (kernel) + 1.301s (userspace) = 6.291s root@intel-internal-corei7-64:~# systemd-analyze Startup finished in 5.063s (kernel) + 1.336s (userspace) = 6.400s root@intel-internal-corei7-64:~# systemd-analyze Startup finished in 4.820s (kernel) + 1.237s (userspace) = 6.057s avg: kernel 4.9548, userspace 1.2526 #with this patch root@intel-internal-corei7-64:~# systemd-analyze Startup finished in 5.099s (kernel) + 1.579s (userspace) = 6.679s root@intel-internal-corei7-64:~# systemd-analyze Startup finished in 5.003s (kernel) + 1.343s (userspace) = 6.347s root@intel-internal-corei7-64:~# systemd-analyze Startup finished in 4.853s (kernel) + 1.220s (userspace) = 6.074s root@intel-internal-corei7-64:~# systemd-analyze Startup finished in 4.836s (kernel) + 1.111s (userspace) = 5.948s root@intel-internal-corei7-64:~# systemd-analyze Startup finished in 4.917s (kernel) + 1.166s (userspace) = 6.083s avg: kernel 4.9416s, userspace: 1.2838 From above test results, there are almost not any performance regression on x86 platform. Sorry, there is not any ARM based platform in my hand, i can't give related datas. thanks! Regards, Yang > Thanks, > -- > Shannon