From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC377C433E2 for ; Fri, 17 Jul 2020 14:28:28 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 973032070E for ; Fri, 17 Jul 2020 14:28:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="SJOMoL6w" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 973032070E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:41900 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jwRLf-0002Qg-TV for qemu-devel@archiver.kernel.org; Fri, 17 Jul 2020 10:28:27 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:52814) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jwRL1-0001z8-Bv for qemu-devel@nongnu.org; Fri, 17 Jul 2020 10:27:47 -0400 Received: from mail-wr1-x429.google.com ([2a00:1450:4864:20::429]:44287) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jwRKz-0001Ol-4j for qemu-devel@nongnu.org; Fri, 17 Jul 2020 10:27:47 -0400 Received: by mail-wr1-x429.google.com with SMTP id b6so11330114wrs.11 for ; Fri, 17 Jul 2020 07:27:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=references:user-agent:from:to:cc:subject:in-reply-to:date :message-id:mime-version:content-transfer-encoding; bh=8SOgN09JSMSPjx+VUbm82cpMSQRYwxa/6o23S/Dkg78=; b=SJOMoL6wZ9Sagf8n1Br87ryC5mxdgLUMv+ENUVuNZXyAltKpXTnrVxWwIQ8B671Jr3 0fWSoRnFs1O4/aJdFBSWd/xccz2n3dEiLvxBYQImc9hTP67YjO2T11Kf5U9VW5a/EJhP hWoPNoOTp74i/MnVE319D3AK34ET7JD0tFsAWb4X4x87IPX+lQfwngJ9VnWaZlwWXfxU gfuWcQiuSEviJPG1AwRQHsfRTm29x7OHrxJK2tU+LSVdEM4AKyynqycIbZMLgI0hGoWN 80+es8n3KOHae8cfDS2U0a049gFhc3BqGe7QHNwXeLqJSH6vT0HN5pJpAk8Iba57C2mp gkAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:references:user-agent:from:to:cc:subject :in-reply-to:date:message-id:mime-version:content-transfer-encoding; bh=8SOgN09JSMSPjx+VUbm82cpMSQRYwxa/6o23S/Dkg78=; b=BYwGBKgKWRA95zI2gGbGgeVNECRnleacMdDK1xb3cP83ryXWpdlj46vA6Fctdg2whb CC9VA0dJIaMfrQJ38fFO4k0DIHnO+C+JM6RYHQSVn8YjIn6+j2++nLGF7hWR9MAkXMeM rphQQidLnED79rWfm80jxCCJSZmNDYtMQ/QjHRaZfBNMcw3rbqYZr0z3jE8af3uUTG0d flZrJdjosonYFjO52HdgVyc+El85S65pdmU9q1NCGiNtsNL/aPWLCQXlrE27bYgO15av dWdK8zw+oj2Md8yUS2/Jwc4kQO1/6ZSpPYVQxPBOOdZv6McL2bAGZrY2HWgSKN6xDgKY MuRw== X-Gm-Message-State: AOAM530du3E9d0qmRuU7kktnSsU5dI/TRpPOXCTNbLRjgkW5012dnyJk cCZH7Ea8jUUAqLDB/9L7rHuUJw== X-Google-Smtp-Source: ABdhPJxpuAu2MdWqGUn/HVzJAddGk2aZ8PDT5//CG9R5waxbj41h1paszzs8aywkMJXsaNtTVTpGOA== X-Received: by 2002:adf:82f5:: with SMTP id 108mr10538480wrc.218.1594996063008; Fri, 17 Jul 2020 07:27:43 -0700 (PDT) Received: from zen.linaroharston ([51.148.130.216]) by smtp.gmail.com with ESMTPSA id t14sm2825645wrv.14.2020.07.17.07.27.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jul 2020 07:27:42 -0700 (PDT) Received: from zen (localhost [127.0.0.1]) by zen.linaroharston (Postfix) with ESMTP id 3CE0C1FF7E; Fri, 17 Jul 2020 15:27:41 +0100 (BST) References: <871rlbwhlp.fsf@linaro.org> User-agent: mu4e 1.5.4; emacs 28.0.50 From: Alex =?utf-8?Q?Benn=C3=A9e?= To: Christian Ehrhardt Subject: Re: TB Cache size grows out of control with qemu 5.0 In-reply-to: Date: Fri, 17 Jul 2020 15:27:41 +0100 Message-ID: <87k0z2usgy.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::429; envelope-from=alex.bennee@linaro.org; helo=mail-wr1-x429.google.com X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Paolo Bonzini , Richard Henderson , qemu-devel Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Christian Ehrhardt writes: > On Thu, Jul 16, 2020 at 6:27 PM Alex Benn=C3=A9e = wrote: > >> >> Christian Ehrhardt writes: >> >> > On Wed, Jul 15, 2020 at 5:58 PM BALATON Zoltan >> wrote: >> > >> >> See commit 47a2def4533a2807e48954abd50b32ecb1aaf29a and the next two >> >> following it. >> >> >> > >> > Thank you Zoltan for pointing out this commit, I agree that this seems >> to be >> > the trigger for the issues I'm seeing. Unfortunately the common CI host >> size >> > is 1-2G. For example on Ubuntu Autopkgtests 1.5G. >> > Those of them running guests do so in 0.5-1G size in TCG mode >> > (as they often can't rely on having KVM available). >> > >> > The 1G TB buffer + 0.5G actual guest size + lack of dynamic downsizing >> > on memory pressure (never existed) makes these systems go OOM-Killing >> > the qemu process. >> >> Ooops. I admit the assumption was that most people running system >> emulation would be doing it on beefier machines. >> >> > The patches indicated that the TB flushes on a full guest boot are a >> > good indicator of the TB size efficiency. From my old checks I had: >> > >> > - Qemu 4.2 512M guest with 32M default overwritten by ram-size/4 >> > TB flush count 14, 14, 16 >> > - Qemu 5.0 512M guest with 1G default >> > TB flush count 1, 1, 1 >> > >> > I agree that ram/4 seems odd, especially on huge guests that is a lot >> > potentially wasted. And most environments have a bit of breathing >> > room 1G is too big in small host systems and the common CI system falls >> > into this category. So I tuned it down to 256M for a test. >> > >> > - Qemu 4.2 512M guest with tb-size 256M >> > TB flush count 5, 5, 5 >> > - Qemu 5.0 512M guest with tb-size 256M >> > TB flush count 5, 5, 5 >> > - Qemu 5.0 512M guest with 256M default in code >> > TB flush count 5, 5, 5 >> > >> > So performance wise the results are as much in-between as you'd think >> from a >> > TB size in between. And the memory consumption which (for me) is the >> actual >> > current issue to fix would be back in line again as expected. >> >> So I'm glad you have the workaround. >> >> > >> > So on one hand I'm suggesting something like: >> > --- a/accel/tcg/translate-all.c >> > +++ b/accel/tcg/translate-all.c >> > @@ -944,7 +944,7 @@ static void page_lock_pair(PageDesc **re >> > * Users running large scale system emulation may want to tweak their >> > * runtime setup via the tb-size control on the command line. >> > */ >> > -#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (1 * GiB) >> > +#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (256 * MiB) >> >> The problem we have is any number we pick here is arbitrary. And while >> it did regress your use-case changing it again just pushes a performance >> regression onto someone else. > > > Thanks for your feedback Alex! > > That is true "for you" since 5.0 is released from upstreams POV. > But from the downstreams POV no 5.0 exists for Ubuntu yet and I'd break > many places releasing it like that. > Sadly the performance gain to the other cases will most likely go unnotic= ed. > > >> The most (*) 64 bit desktop PCs have 16Gb >> of RAM, almost all have more than 8gb. And there is a workaround. >> > > Due to our work around virtualization the values representing > "most 64 bit desktop PCs" aren't the only thing that matters :-) > > ... > > >> > This is a bit more tricky than it seems as ram_sizes is no more >> > present in that context but it is enough to discuss it. >> > That should serve all cases - small and large - better as a pure >> > static default of 1G or always ram/4? >> >> I'm definitely against re-introducing ram_size into the mix. The >> original commit (a1b18df9a4) that broke this introduced an ordering >> dependency which we don't want to bring back. >> > > I agree with that reasoning, but currently without any size dependency > the "arbitrary value" we picked to be 1G is even more fixed than it was > before. > Compared to pre v5.0 for now I can only decide to > a) tune it down -> performance impact for huge guests > b) keep it at 1G -> functional breakage with small hosts > > I'd be more amenable to something that took into account host memory and >> limited the default if it was smaller than a threshold. Is there a way >> to probe that that doesn't involve slurping /proc/meminfo? >> > > I agree that a host-size dependency might be the better way to go, > yet I have no great cross-platform resilient way to get that. > Maybe we can make it like "if I can get some value consider it, > otherwise use the current default". > That would improve many places already, while keeping the rest at the > current behavior. > > >> > >> > P.S. I added Alex being the Author of the offending patch and >> Richard/Paolo >> > for being listed in the Maintainers file for TCG. >> >> >> > From Zoltan (unifying the thread a bit): > >> I agree that this should be dependent on host memory size not guest >> ram_size but it might be tricky to get that value because different host >> OSes would need different ways. > > Well - where it isn't available we will continue to take the default > qemu 5.0 already had. If required on other platforms as well they can add > their way of host memory detection into this as needed. > >> Maybe a new qemu_host_mem_size portability >> function will be needed that implements this for different host OSes. >> POSIX may or may not have sysconf _SC_PHYS_PAGES and _SC_AVPHYS_PAGES > > We should not try to get into the business of _SC_AVPHYS_PAGES and > try to understand/assume what might be cache or otherwise (re)usable. > Since we only look for some alignment to hosts size _SC_PHYS_PAGES should > be good enough and available in more places than the other options. > >> and linux has sysinfo but don't know how reliable these are. > > sysconf is slightly more widely available than sysinfo and has enough for > what we need. > > > I have combined the thoughts above into a patch and it works well in > my tests. > > 32G Host: > pages 8187304.000000 > pagesize 4096.000000 > max_default 4191899648 > final tb_size 1073741824 > > (qemu) info jit > Translation buffer state: > gen code size 210425059/1073736659 > TB count 368273 > TB avg target size 20 max=3D1992 bytes > TB avg host size 330 bytes (expansion ratio: 16.1) > cross page TB count 1656 (0%) > direct jump count 249813 (67%) (2 jumps=3D182112 49%) > TB hash buckets 197613/262144 (75.38% head buckets used) > TB hash occupancy 34.15% avg chain occ. Histogram: [0,10)%|=E2=96=86 = =E2=96=88 > =E2=96=85=E2=96=81=E2=96=83=E2=96=81=E2=96=81|[90,100]% > TB hash avg chain 1.020 buckets. Histogram: 1|=E2=96=88=E2=96=81=E2=96= =81|3 > > Statistics: > TB flush count 1 > TB invalidate count 451673 > TLB full flushes 0 > TLB partial flushes 154819 > TLB elided flushes 191627 > [TCG profiler not compiled] > > =3D> 1G TB size not changed compared to v5.0 - as intended > > > But on a small 1.5G Host it now works without OOM: > > pages 379667.000000 > pagesize 4096.000000 > max_default 194389504 > final tb_size 194389504 > > (qemu) info jit > Translation buffer state: > gen code size 86179731/194382803 > TB count 149995 > TB avg target size 20 max=3D1992 bytes > TB avg host size 333 bytes (expansion ratio: 16.5) > cross page TB count 716 (0%) > direct jump count 98854 (65%) (2 jumps=3D74962 49%) > TB hash buckets 58842/65536 (89.79% head buckets used) > TB hash occupancy 51.46% avg chain occ. Histogram: [0,10)%|=E2=96=83 = =E2=96=87 > =E2=96=88=E2=96=82=E2=96=86=E2=96=81=E2=96=84|[90,100]% > TB hash avg chain 1.091 buckets. Histogram: 1|=E2=96=88=E2=96=81=E2=96= =81|3 > > Statistics: > TB flush count 10 > TB invalidate count 31733 > TLB full flushes 0 > TLB partial flushes 180891 > TLB elided flushes 244107 > [TCG profiler not compiled] > > =3D> ~185M which is way more reasonable given the host size > > The patch will have a rather large comment in it, I'm not sure if the full > comment is needed, but I wanted to leave a trace what&why is going > on in the code for the next one who comes by. > > Submitting as a proper patch to the list in a bit ... Ahh did you not get CC'ed by my patch this morning? --=20 Alex Benn=C3=A9e