From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from zen.linaroharston ([51.148.130.216]) by smtp.gmail.com with ESMTPSA id r1sm7343933wrx.11.2020.02.27.04.19.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Feb 2020 04:19:48 -0800 (PST) Received: from zen (localhost [127.0.0.1]) by zen.linaroharston (Postfix) with ESMTP id A5F5F1FF87; Thu, 27 Feb 2020 12:19:43 +0000 (GMT) References: <20200226181020.19592-1-alex.bennee@linaro.org> <20200226181020.19592-5-alex.bennee@linaro.org> User-agent: mu4e 1.3.8; emacs 27.0.60 From: Alex =?utf-8?Q?Benn=C3=A9e?= To: Niek Linnenbank Cc: QEMU Developers , Paolo Bonzini , qemu-arm , Richard Henderson Subject: Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit In-reply-to: Date: Thu, 27 Feb 2020 12:19:43 +0000 Message-ID: <87tv3cw7c0.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-TUID: NWSrvk65TOkM Niek Linnenbank writes: > Hi Alex, > > On Wed, Feb 26, 2020 at 7:13 PM Alex Benn=C3=A9e = wrote: > >> While 32mb is certainly usable a full system boot ends up flushing the >> codegen buffer nearly 100 times. Increase the default on 64 bit hosts >> to take advantage of all that spare memory. After this change I can >> boot my tests system without any TB flushes. >> > > That great, with this change I'm seeing a performance improvement when > running the avocado tests for cubieboard. > It runs about 4-5 seconds faster. My host is Ubuntu 18.04 on 64-bit. > > I don't know much about the internals of TCG nor how it actually uses the > cache, > but it seems logical to me that increasing the cache size would improve > performance. > > What I'm wondering is: will this also result in TCG translating larger > chunks in one shot, so potentially > taking more time to do the translation? If so, could it perhaps affect mo= re > latency sensitive code? No - the size of the translation blocks is governed by the guest code and where it ends a basic block. In system mode we also care about crossing guest page boundaries. >> Signed-off-by: Alex Benn=C3=A9e >> > Tested-by: Niek Linnenbank > > >> --- >> accel/tcg/translate-all.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c >> index 4ce5d1b3931..f7baa512059 100644 >> --- a/accel/tcg/translate-all.c >> +++ b/accel/tcg/translate-all.c >> @@ -929,7 +929,11 @@ static void page_lock_pair(PageDesc **ret_p1, >> tb_page_addr_t phys1, >> # define MAX_CODE_GEN_BUFFER_SIZE ((size_t)-1) >> #endif >> >> +#if TCG_TARGET_REG_BITS =3D=3D 32 >> #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB) >> +#else >> +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (2 * GiB) >> +#endif >> > > The qemu process now takes up more virtual memory, about ~2.5GiB in my > test, which can be expected with this change. > > Is it very likely that the TCG cache will be filled quickly and completel= y? > I'm asking because I also use Qemu to do automated testing > where the nodes are 64-bit but each have only 2GiB physical RAM. Well so this is the interesting question and as ever it depends. For system emulation the buffer will just slowly fill-up over time until exhausted and which point it will flush and reset. Each time the guest needs to flush a page and load fresh code in we will generate more translated code. If the guest isn't under load and never uses all it's RAM for code then in theory the pages of the mmap that are never filled never need to be actualised by the host kernel. You can view the behaviour by running "info jit" from the HMP monitor in your tests. The "TB Flush" value shows the number of times this has happened along with other information about translation state. > > Regards, > Niek > > >> >> #define DEFAULT_CODE_GEN_BUFFER_SIZE \ >> (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \ >> -- >> 2.20.1 >> >> >> --=20 Alex Benn=C3=A9e From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 171F5C4BA24 for ; Thu, 27 Feb 2020 12:20:43 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D377E24699 for ; Thu, 27 Feb 2020 12:20:42 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="LNJZHVt9" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D377E24699 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:58412 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j7I9i-0003BV-2Q for qemu-devel@archiver.kernel.org; Thu, 27 Feb 2020 07:20:42 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:33243) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1j7I8u-0002FL-Lu for qemu-devel@nongnu.org; Thu, 27 Feb 2020 07:19:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1j7I8t-0003fx-DP for qemu-devel@nongnu.org; Thu, 27 Feb 2020 07:19:52 -0500 Received: from mail-wm1-x341.google.com ([2a00:1450:4864:20::341]:35084) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1j7I8t-0003eV-5h for qemu-devel@nongnu.org; Thu, 27 Feb 2020 07:19:51 -0500 Received: by mail-wm1-x341.google.com with SMTP id m3so3183786wmi.0 for ; Thu, 27 Feb 2020 04:19:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=references:user-agent:from:to:cc:subject:in-reply-to:date :message-id:mime-version:content-transfer-encoding; bh=vLmvs05i9+uOUgE5FPuOrMEdPV3Xgh5AZzSByUzDhmQ=; b=LNJZHVt9b5TesS8YCPdneUr9nB0BUN8h88K7tPUbO4uX/kQ41PYuTmC7wxXbQOnQ3d jxqKtwuvm7cyubTV9u9ROu+mnFItYTi5iLEwlBvFS0AtgQBKbWTidyWlLvMMjhdkvOtP k1zbPsK3k9JpyAphthGMtkh6ca+9PqSIBeP3pcTGolCOYRS4DA4cXkX8WDhLjGFBDmbl LemU9ndAneKW7iGtaontWItvWuWVJQXee55RzlO1P6gbgu+Ds7l53OYgnrF2qGCe60rP qnmtLaLXTjeHEESdx+8IspiuDO98vnoN2alzUKZWk29sDNN+6iPn96bk0sQK309k8omi 6DCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:references:user-agent:from:to:cc:subject :in-reply-to:date:message-id:mime-version:content-transfer-encoding; bh=vLmvs05i9+uOUgE5FPuOrMEdPV3Xgh5AZzSByUzDhmQ=; b=ilxUFOn8odzHe9JTEwiUcW7NYikO4hcQVE7VcbJ+AXdDg63r8rvyaEllmdmRdTe394 uYq48BmDuOs6bJ7oO2qbw+0+noLzODEzYIY9OeMsmvor4ARW1Kgo+5QR46ziNzVwFxMT 71At0qlV0dp/PDioheIaNWnxUV04lxQgihYEd4AfLAyiAtfUNvO0yOSRrJPIYNymQ8pS F0XAM4KLKMHqvdkGMUHtTZ9DtwlOVmTWczt4uSUwNYb2Ave0a2Y81NWj9YgngQxJUn5n SELfJ+58toJPDqPkltvtxsUi1aQG96nSo5pxXFi3lBaOUkhpOUg7ByUPEaG8IYznQgFL hZnw== X-Gm-Message-State: APjAAAUd4Zp8lUIMExfw59293Ru5IGzNVAn9yrpfy4G1HI4kvwztJ9Tc Ew/Mt0IcVIE8ORZhFpeoGB+rMA== X-Google-Smtp-Source: APXvYqxJXzusppW8Hqi39hdC9RIco49kt4MxxYnjio203tbtoyGiqFE+fAWeWA2PVkxz+wl3wsLqew== X-Received: by 2002:a1c:9d43:: with SMTP id g64mr4889109wme.62.1582805989756; Thu, 27 Feb 2020 04:19:49 -0800 (PST) Received: from zen.linaroharston ([51.148.130.216]) by smtp.gmail.com with ESMTPSA id r1sm7343933wrx.11.2020.02.27.04.19.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Feb 2020 04:19:48 -0800 (PST) Received: from zen (localhost [127.0.0.1]) by zen.linaroharston (Postfix) with ESMTP id A5F5F1FF87; Thu, 27 Feb 2020 12:19:43 +0000 (GMT) References: <20200226181020.19592-1-alex.bennee@linaro.org> <20200226181020.19592-5-alex.bennee@linaro.org> User-agent: mu4e 1.3.8; emacs 27.0.60 From: Alex =?utf-8?Q?Benn=C3=A9e?= To: Niek Linnenbank Subject: Re: [PATCH v1 4/4] accel/tcg: increase default code gen buffer size for 64 bit In-reply-to: Date: Thu, 27 Feb 2020 12:19:43 +0000 Message-ID: <87tv3cw7c0.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::341 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Paolo Bonzini , qemu-arm , QEMU Developers , Richard Henderson Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Niek Linnenbank writes: > Hi Alex, > > On Wed, Feb 26, 2020 at 7:13 PM Alex Benn=C3=A9e = wrote: > >> While 32mb is certainly usable a full system boot ends up flushing the >> codegen buffer nearly 100 times. Increase the default on 64 bit hosts >> to take advantage of all that spare memory. After this change I can >> boot my tests system without any TB flushes. >> > > That great, with this change I'm seeing a performance improvement when > running the avocado tests for cubieboard. > It runs about 4-5 seconds faster. My host is Ubuntu 18.04 on 64-bit. > > I don't know much about the internals of TCG nor how it actually uses the > cache, > but it seems logical to me that increasing the cache size would improve > performance. > > What I'm wondering is: will this also result in TCG translating larger > chunks in one shot, so potentially > taking more time to do the translation? If so, could it perhaps affect mo= re > latency sensitive code? No - the size of the translation blocks is governed by the guest code and where it ends a basic block. In system mode we also care about crossing guest page boundaries. >> Signed-off-by: Alex Benn=C3=A9e >> > Tested-by: Niek Linnenbank > > >> --- >> accel/tcg/translate-all.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c >> index 4ce5d1b3931..f7baa512059 100644 >> --- a/accel/tcg/translate-all.c >> +++ b/accel/tcg/translate-all.c >> @@ -929,7 +929,11 @@ static void page_lock_pair(PageDesc **ret_p1, >> tb_page_addr_t phys1, >> # define MAX_CODE_GEN_BUFFER_SIZE ((size_t)-1) >> #endif >> >> +#if TCG_TARGET_REG_BITS =3D=3D 32 >> #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB) >> +#else >> +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (2 * GiB) >> +#endif >> > > The qemu process now takes up more virtual memory, about ~2.5GiB in my > test, which can be expected with this change. > > Is it very likely that the TCG cache will be filled quickly and completel= y? > I'm asking because I also use Qemu to do automated testing > where the nodes are 64-bit but each have only 2GiB physical RAM. Well so this is the interesting question and as ever it depends. For system emulation the buffer will just slowly fill-up over time until exhausted and which point it will flush and reset. Each time the guest needs to flush a page and load fresh code in we will generate more translated code. If the guest isn't under load and never uses all it's RAM for code then in theory the pages of the mmap that are never filled never need to be actualised by the host kernel. You can view the behaviour by running "info jit" from the HMP monitor in your tests. The "TB Flush" value shows the number of times this has happened along with other information about translation state. > > Regards, > Niek > > >> >> #define DEFAULT_CODE_GEN_BUFFER_SIZE \ >> (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \ >> -- >> 2.20.1 >> >> >> --=20 Alex Benn=C3=A9e