Re: [OE-core][dunfell 13/14] qemu: add 34Kf-64tlb fictitious cpu type

public inbox for openembedded-core@lists.openembedded.org
 help / color / mirror / Atom feed

From: "akuster" <akuster808@gmail.com>
To: Steve Sakoman <steve@sakoman.com>,
	openembedded-core@lists.openembedded.org
Subject: Re: [OE-core][dunfell 13/14] qemu: add 34Kf-64tlb fictitious cpu type
Date: Fri, 9 Oct 2020 12:23:02 -0700	[thread overview]
Message-ID: <1d2bfd98-69d7-2f43-7b6b-7ea5c2b2719e@gmail.com> (raw)
In-Reply-To: <182869203b57860695e818b620e40b398ef4e921.1602253014.git.steve@sakoman.com>

Can we make an  Arch change like this in a stable release?

-armin

On 10/9/20 7:18 AM, Steve Sakoman wrote:
> From: Victor Kamensky <kamensky@cisco.com>
>
> In Yocto Project PR 13992 it was reported that qemumips
> in autobuilder runs almost twice slower then qemumips64 and
> some times hit time out.
>
> Upon investigations of qemu-system with perf, gdb, and
> SystemTap and comparing qemumips and qemumips64 machines
> behavior it was noticed that qemu soft mmu code behaves
> quite different and in case if qemumips tlbwr instruction
> called 16 times more oftern. It happens that in qemumips64
> case qemu runs with cpu type that contains 64 TLB, but in case
> of qemumips qemu runs with cpu type that contains only
> 16 TLBs.
>
> The idea of proposed qemu patch is to introduce fictitious
> 34Kf-64tlb cpu type that defined exactly as 34Kf but has
> 64 TLBs, instead of original 16 TLBs.
>
> Testing of core-image-full-cmdline:do_testimage with
> 34Kf-64tlb shows 40% or so test execution real time
> improvement.
>
> Note for future porters of the patch: easiest way to update
> the patch and be in sync with 34Kf definition is to copy
> 34Kf machine definition and apply the following changes to
> it (just change 15 to 63 of CP0C1_MMU bits value)
>
> [kamensky@coreos-lnx2 qemu]$ diff ~/34Kf.c ~/34Kf-64tlb.c
> 2c2
> <         .name = "34Kf",
>>         .name = "34Kf-64tlb",
> 6c6
> <         .CP0_Config1 = MIPS_CONFIG1 | (1 << CP0C1_FP) | (15 << CP0C1_MMU) |
>>         .CP0_Config1 = MIPS_CONFIG1 | (1 << CP0C1_FP) | (63 << CP0C1_MMU) |
> Fixes https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992
>
> Upstream Status: Inappropriate
>
> Signed-off-by: Victor Kamensky <kamensky@cisco.com>
> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
> (cherry picked from commit 4470a04943352224955f17e004962f0f9e1c9b0c)
> Signed-off-by: Steve Sakoman <steve@sakoman.com>
> ---
>  meta/recipes-devtools/qemu/qemu.inc           |   1 +
>  ...tlb-fictitious-cpu-type-like-34Kf-bu.patch | 118 ++++++++++++++++++
>  2 files changed, 119 insertions(+)
>  create mode 100644 meta/recipes-devtools/qemu/qemu/0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch
>
> diff --git a/meta/recipes-devtools/qemu/qemu.inc b/meta/recipes-devtools/qemu/qemu.inc
> index 7ce89c0023..7c21b66a0c 100644
> --- a/meta/recipes-devtools/qemu/qemu.inc
> +++ b/meta/recipes-devtools/qemu/qemu.inc
> @@ -48,6 +48,7 @@ SRC_URI = "https://download.qemu.org/${BPN}-${PV}.tar.xz \
>  	   file://CVE-2020-14364.patch \
>  	   file://CVE-2020-14415.patch \
>  	   file://CVE-2020-16092.patch \
> +	   file://0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch \
>  	   "
>  UPSTREAM_CHECK_REGEX = "qemu-(?P<pver>\d+(\.\d+)+)\.tar"
>  
> diff --git a/meta/recipes-devtools/qemu/qemu/0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch b/meta/recipes-devtools/qemu/qemu/0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch
> new file mode 100644
> index 0000000000..b6312e1543
> --- /dev/null
> +++ b/meta/recipes-devtools/qemu/qemu/0001-mips-add-34Kf-64tlb-fictitious-cpu-type-like-34Kf-bu.patch
> @@ -0,0 +1,118 @@
> +From b3fcc7d96523ad8e3ea28c09d495ef08529d01ce Mon Sep 17 00:00:00 2001
> +From: Victor Kamensky <kamensky@cisco.com>
> +Date: Wed, 7 Oct 2020 10:19:42 -0700
> +Subject: [PATCH] mips: add 34Kf-64tlb fictitious cpu type like 34Kf but with
> + 64 TLBs
> +
> +In Yocto Project CI runs it was observed that test run
> +of 32 bit mips image takes almost twice longer than 64 bit
> +mips image with the same logical load and CI execution
> +hits timeout.
> +
> +See https://bugzilla.yoctoproject.org/show_bug.cgi?id=13992
> +
> +Yocto project uses 34Kf cpu type to run 32 bit mips image,
> +and MIPS64R2-generic cpu type to run 64 bit mips64 image.
> +
> +Upon qemu behavior differences investigation between mips
> +and mips64 two prominent observations came up: under
> +logically similar load (same definition and configuration
> +of user-land image) in case of mips get_physical_address
> +function is called almost twice more often, meaning
> +twice more memory accesses involved in this case. Also
> +number of tlbwr instruction executed (r4k_helper_tlbwr
> +qemu function) almost 16 time bigger in mips case than in
> +mips64.
> +
> +It turns out that 34Kf cpu has 16 TLBs, but in case of
> +MIPS64R2-generic it is 64 TLBs. So that explains why
> +some many more tlbwr had to be execute by kernel TLB refill
> +handler in case of 32 bit misp.
> +
> +The idea of the fix is to come up with new 34Kf-64tlb fictitious
> +cpu type, that would behave exactly as 34Kf but it would
> +contain 64 TLBs to reduce TLB trashing. After all, adding
> +more TLBs to soft mmu is easy.
> +
> +Experiment with some significant non-trvial load in Yocto
> +environment by running do_testimage load shows that 34Kf-64tlb
> +cpu performs 40% or so better than original 34Kf cpu wrt test
> +execution real time.
> +
> +It is not ideal to have cpu type that does not exist in the
> +wild but given performance gains it seems to be justified.
> +
> +Signed-off-by: Victor Kamensky <kamensky@cisco.com>
> +---
> + target/mips/translate_init.inc.c | 55 ++++++++++++++++++++++++++++++++++++++++
> + 1 file changed, 55 insertions(+)
> +
> +diff --git a/target/mips/translate_init.inc.c b/target/mips/translate_init.inc.c
> +index 637caccd89..b73ab48231 100644
> +--- a/target/mips/translate_init.inc.c
> ++++ b/target/mips/translate_init.inc.c
> +@@ -297,6 +297,61 @@ const mips_def_t mips_defs[] =
> +         .insn_flags = CPU_MIPS32R2 | ASE_MIPS16 | ASE_DSP | ASE_MT,
> +         .mmu_type = MMU_TYPE_R4000,
> +     },
> ++    /*
> ++     * Verbatim copy of "34Kf" cpu, only bumped up number of TLB entries
> ++     * from 16 to 64 (see CP0_Config0 value at CP0C1_MMU bits) to improve
> ++     * performance by reducing number of TLB refill exceptions and
> ++     * eliminating need to run all corresponding TLB refill handling
> ++     * instructions.
> ++     */
> ++    {
> ++        .name = "34Kf-64tlb",
> ++        .CP0_PRid = 0x00019500,
> ++        .CP0_Config0 = MIPS_CONFIG0 | (0x1 << CP0C0_AR) |
> ++                       (MMU_TYPE_R4000 << CP0C0_MT),
> ++        .CP0_Config1 = MIPS_CONFIG1 | (1 << CP0C1_FP) | (63 << CP0C1_MMU) |
> ++                       (0 << CP0C1_IS) | (3 << CP0C1_IL) | (1 << CP0C1_IA) |
> ++                       (0 << CP0C1_DS) | (3 << CP0C1_DL) | (1 << CP0C1_DA) |
> ++                       (1 << CP0C1_CA),
> ++        .CP0_Config2 = MIPS_CONFIG2,
> ++        .CP0_Config3 = MIPS_CONFIG3 | (1 << CP0C3_VInt) | (1 << CP0C3_MT) |
> ++                       (1 << CP0C3_DSPP),
> ++        .CP0_LLAddr_rw_bitmask = 0,
> ++        .CP0_LLAddr_shift = 0,
> ++        .SYNCI_Step = 32,
> ++        .CCRes = 2,
> ++        .CP0_Status_rw_bitmask = 0x3778FF1F,
> ++        .CP0_TCStatus_rw_bitmask = (0 << CP0TCSt_TCU3) | (0 << CP0TCSt_TCU2) |
> ++                    (1 << CP0TCSt_TCU1) | (1 << CP0TCSt_TCU0) |
> ++                    (0 << CP0TCSt_TMX) | (1 << CP0TCSt_DT) |
> ++                    (1 << CP0TCSt_DA) | (1 << CP0TCSt_A) |
> ++                    (0x3 << CP0TCSt_TKSU) | (1 << CP0TCSt_IXMT) |
> ++                    (0xff << CP0TCSt_TASID),
> ++        .CP1_fcr0 = (1 << FCR0_F64) | (1 << FCR0_L) | (1 << FCR0_W) |
> ++                    (1 << FCR0_D) | (1 << FCR0_S) | (0x95 << FCR0_PRID),
> ++        .CP1_fcr31 = 0,
> ++        .CP1_fcr31_rw_bitmask = 0xFF83FFFF,
> ++        .CP0_SRSCtl = (0xf << CP0SRSCtl_HSS),
> ++        .CP0_SRSConf0_rw_bitmask = 0x3fffffff,
> ++        .CP0_SRSConf0 = (1U << CP0SRSC0_M) | (0x3fe << CP0SRSC0_SRS3) |
> ++                    (0x3fe << CP0SRSC0_SRS2) | (0x3fe << CP0SRSC0_SRS1),
> ++        .CP0_SRSConf1_rw_bitmask = 0x3fffffff,
> ++        .CP0_SRSConf1 = (1U << CP0SRSC1_M) | (0x3fe << CP0SRSC1_SRS6) |
> ++                    (0x3fe << CP0SRSC1_SRS5) | (0x3fe << CP0SRSC1_SRS4),
> ++        .CP0_SRSConf2_rw_bitmask = 0x3fffffff,
> ++        .CP0_SRSConf2 = (1U << CP0SRSC2_M) | (0x3fe << CP0SRSC2_SRS9) |
> ++                    (0x3fe << CP0SRSC2_SRS8) | (0x3fe << CP0SRSC2_SRS7),
> ++        .CP0_SRSConf3_rw_bitmask = 0x3fffffff,
> ++        .CP0_SRSConf3 = (1U << CP0SRSC3_M) | (0x3fe << CP0SRSC3_SRS12) |
> ++                    (0x3fe << CP0SRSC3_SRS11) | (0x3fe << CP0SRSC3_SRS10),
> ++        .CP0_SRSConf4_rw_bitmask = 0x3fffffff,
> ++        .CP0_SRSConf4 = (0x3fe << CP0SRSC4_SRS15) |
> ++                    (0x3fe << CP0SRSC4_SRS14) | (0x3fe << CP0SRSC4_SRS13),
> ++        .SEGBITS = 32,
> ++        .PABITS = 32,
> ++        .insn_flags = CPU_MIPS32R2 | ASE_MIPS16 | ASE_DSP | ASE_MT,
> ++        .mmu_type = MMU_TYPE_R4000,
> ++    },
> +     {
> +         .name = "74Kf",
> +         .CP0_PRid = 0x00019700,
> +-- 
> +2.14.5
> +
>
> 
>

next prev parent reply	other threads:[~2020-10-09 19:23 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-09 14:18 [OE-core][dunfell 00/14] Patch review Steve Sakoman
2020-10-09 14:18 ` [OE-core][dunfell 01/14] Revert "lttng-modules: backport writeback.h changes from 2.12.x to fix kernel 5.4.62+" Steve Sakoman
2020-10-09 14:18 ` [OE-core][dunfell 02/14] lttng-modules: update to 2.11.6 Steve Sakoman
2020-10-09 14:18 ` [OE-core][dunfell 03/14] lttng-tools: update to 2.11.5 Steve Sakoman
2020-10-09 14:18 ` [OE-core][dunfell 04/14] lttng-ust: update to 2.11.1 Steve Sakoman
2020-10-09 14:18 ` [OE-core][dunfell 05/14] stress-ng: Upgrade 0.11.01 -> 0.11.17 Steve Sakoman
2020-10-09 14:18 ` [OE-core][dunfell 06/14] glibc: do_stash_locale must not delete files from ${D} Steve Sakoman
2020-10-09 14:18 ` [OE-core][dunfell 07/14] libtools-cross/shadow-sysroot: Use nopackages inherit Steve Sakoman
2020-10-09 14:18 ` [OE-core][dunfell 08/14] classes/sanity: Bump minimum python version to 3.5 Steve Sakoman
2020-10-09 14:18 ` [OE-core][dunfell 09/14] linux-yocto/5.4: fix kprobes build warning Steve Sakoman
2020-10-09 14:18 ` [OE-core][dunfell 10/14] linux-yocto/5.4: update to v5.4.67 Steve Sakoman
2020-10-09 14:18 ` [OE-core][dunfell 11/14] linux-yocto/5.4: update to v5.4.68 Steve Sakoman
2020-10-09 14:18 ` [OE-core][dunfell 12/14] linux-yocto/5.4: update to v5.4.69 Steve Sakoman
2020-10-09 14:18 ` [OE-core][dunfell 13/14] qemu: add 34Kf-64tlb fictitious cpu type Steve Sakoman
2020-10-09 19:23   ` akuster [this message]
2020-10-09 19:28     ` Steve Sakoman
2020-10-09 20:12       ` Bruce Ashfield
2020-10-09 20:44       ` Khem Raj
2020-10-12 14:39         ` Steve Sakoman
2020-10-09 14:18 ` [OE-core][dunfell 14/14] qemumips: use 34Kf-64tlb CPU emulation Steve Sakoman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1d2bfd98-69d7-2f43-7b6b-7ea5c2b2719e@gmail.com \
    --to=akuster808@gmail.com \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=steve@sakoman.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox