From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD65FC43219 for ; Mon, 29 Apr 2019 10:14:21 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3F51320656 for ; Mon, 29 Apr 2019 10:14:21 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="gXIzW3Az" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3F51320656 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linuxfoundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 44t0rC2tL6zDqR2 for ; Mon, 29 Apr 2019 20:14:19 +1000 (AEST) Received: from ozlabs.org (bilbo.ozlabs.org [IPv6:2401:3900:2:1::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 44t0Lf38X6zDqPf for ; Mon, 29 Apr 2019 19:52:10 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=linuxfoundation.org Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="gXIzW3Az"; dkim-atps=neutral Received: from ozlabs.org (bilbo.ozlabs.org [IPv6:2401:3900:2:1::2]) by bilbo.ozlabs.org (Postfix) with ESMTP id 44t0Lf2CP7z8tDC for ; Mon, 29 Apr 2019 19:52:10 +1000 (AEST) Received: by ozlabs.org (Postfix) id 44t0Lf1g9Xz9sML; Mon, 29 Apr 2019 19:52:10 +1000 (AEST) Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=linuxfoundation.org (client-ip=198.145.29.99; helo=mail.kernel.org; envelope-from=gregkh@linuxfoundation.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linuxfoundation.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="gXIzW3Az"; dkim-atps=neutral Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44t0Ld3zMqz9sCJ for ; Mon, 29 Apr 2019 19:52:09 +1000 (AEST) Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 9345C20578; Mon, 29 Apr 2019 09:52:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1556531528; bh=MsteLY+rnvWsY1nWvXjNMr0CcBgs1lhKCPTWB69K0/Q=; h=Subject:To:Cc:From:Date:In-Reply-To:From; b=gXIzW3AzgT2qo8gCP5qLfrcTstgxeFhBSmiiHHtP4EMTgbTThg78M/HWzxb9SHzzi bce9TqEJ8TeGXJPwZGzSeOXWdYqrIzteE5I5N2IlL/LV0oYA9oSlGRfShvnAqY0Me4 Xo/j8C1GVoeC7/lgg3KYE5+NtjpPEWuBR1OZ+Y3Q= Subject: Patch "powerpc/64s: Improve RFI L1-D cache flush fallback" has been added to the 4.4-stable tree To: christophe.leroy@c-s.fr, diana.craciun@nxp.com, gregkh@linuxfoundation.org, linuxppc-dev@ozlabs.org, mpe@ellerman.id.au, msuchanek@suse.de, npiggin@gmail.com From: Date: Mon, 29 Apr 2019 11:51:23 +0200 In-Reply-To: <20190421142037.21881-3-mpe@ellerman.id.au> Message-ID: <155653148322018@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ANSI_X3.4-1968 Content-Transfer-Encoding: 8bit X-stable: commit X-Patchwork-Hint: ignore X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: stable-commits@vger.kernel.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" This is a note to let you know that I've just added the patch titled powerpc/64s: Improve RFI L1-D cache flush fallback to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: powerpc-64s-improve-rfi-l1-d-cache-flush-fallback.patch and it can be found in the queue-4.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >From foo@baz Mon 29 Apr 2019 11:38:37 AM CEST From: Michael Ellerman Date: Mon, 22 Apr 2019 00:19:47 +1000 Subject: powerpc/64s: Improve RFI L1-D cache flush fallback To: stable@vger.kernel.org, gregkh@linuxfoundation.org Cc: linuxppc-dev@ozlabs.org, diana.craciun@nxp.com, msuchanek@suse.de, npiggin@gmail.com, christophe.leroy@c-s.fr Message-ID: <20190421142037.21881-3-mpe@ellerman.id.au> From: Nicholas Piggin commit bdcb1aefc5b3f7d0f1dc8b02673602bca2ff7a4b upstream. The fallback RFI flush is used when firmware does not provide a way to flush the cache. It's a "displacement flush" that evicts useful data by displacing it with an uninteresting buffer. The flush has to take care to work with implementation specific cache replacment policies, so the recipe has been in flux. The initial slow but conservative approach is to touch all lines of a congruence class, with dependencies between each load. It has since been determined that a linear pattern of loads without dependencies is sufficient, and is significantly faster. Measuring the speed of a null syscall with RFI fallback flush enabled gives the relative improvement: P8 - 1.83x P9 - 1.75x The flush also becomes simpler and more adaptable to different cache geometries. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/include/asm/paca.h | 3 - arch/powerpc/kernel/asm-offsets.c | 3 - arch/powerpc/kernel/exceptions-64s.S | 76 ++++++++++++++++------------------- arch/powerpc/kernel/setup_64.c | 13 ----- arch/powerpc/xmon/xmon.c | 2 5 files changed, 39 insertions(+), 58 deletions(-) --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -199,8 +199,7 @@ struct paca_struct { */ u64 exrfi[13] __aligned(0x80); void *rfi_flush_fallback_area; - u64 l1d_flush_congruence; - u64 l1d_flush_sets; + u64 l1d_flush_size; #endif }; --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -245,8 +245,7 @@ int main(void) DEFINE(PACA_IN_MCE, offsetof(struct paca_struct, in_mce)); DEFINE(PACA_RFI_FLUSH_FALLBACK_AREA, offsetof(struct paca_struct, rfi_flush_fallback_area)); DEFINE(PACA_EXRFI, offsetof(struct paca_struct, exrfi)); - DEFINE(PACA_L1D_FLUSH_CONGRUENCE, offsetof(struct paca_struct, l1d_flush_congruence)); - DEFINE(PACA_L1D_FLUSH_SETS, offsetof(struct paca_struct, l1d_flush_sets)); + DEFINE(PACA_L1D_FLUSH_SIZE, offsetof(struct paca_struct, l1d_flush_size)); #endif DEFINE(PACAHWCPUID, offsetof(struct paca_struct, hw_cpu_id)); DEFINE(PACAKEXECSTATE, offsetof(struct paca_struct, kexec_state)); --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -1571,39 +1571,37 @@ rfi_flush_fallback: std r9,PACA_EXRFI+EX_R9(r13) std r10,PACA_EXRFI+EX_R10(r13) std r11,PACA_EXRFI+EX_R11(r13) - std r12,PACA_EXRFI+EX_R12(r13) - std r8,PACA_EXRFI+EX_R13(r13) mfctr r9 ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13) - ld r11,PACA_L1D_FLUSH_SETS(r13) - ld r12,PACA_L1D_FLUSH_CONGRUENCE(r13) - /* - * The load adresses are at staggered offsets within cachelines, - * which suits some pipelines better (on others it should not - * hurt). - */ - addi r12,r12,8 + ld r11,PACA_L1D_FLUSH_SIZE(r13) + srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */ mtctr r11 DCBT_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */ /* order ld/st prior to dcbt stop all streams with flushing */ sync -1: li r8,0 - .rept 8 /* 8-way set associative */ - ldx r11,r10,r8 - add r8,r8,r12 - xor r11,r11,r11 // Ensure r11 is 0 even if fallback area is not - add r8,r8,r11 // Add 0, this creates a dependency on the ldx - .endr - addi r10,r10,128 /* 128 byte cache line */ + + /* + * The load adresses are at staggered offsets within cachelines, + * which suits some pipelines better (on others it should not + * hurt). + */ +1: + ld r11,(0x80 + 8)*0(r10) + ld r11,(0x80 + 8)*1(r10) + ld r11,(0x80 + 8)*2(r10) + ld r11,(0x80 + 8)*3(r10) + ld r11,(0x80 + 8)*4(r10) + ld r11,(0x80 + 8)*5(r10) + ld r11,(0x80 + 8)*6(r10) + ld r11,(0x80 + 8)*7(r10) + addi r10,r10,0x80*8 bdnz 1b mtctr r9 ld r9,PACA_EXRFI+EX_R9(r13) ld r10,PACA_EXRFI+EX_R10(r13) ld r11,PACA_EXRFI+EX_R11(r13) - ld r12,PACA_EXRFI+EX_R12(r13) - ld r8,PACA_EXRFI+EX_R13(r13) GET_SCRATCH0(r13); rfid @@ -1614,39 +1612,37 @@ hrfi_flush_fallback: std r9,PACA_EXRFI+EX_R9(r13) std r10,PACA_EXRFI+EX_R10(r13) std r11,PACA_EXRFI+EX_R11(r13) - std r12,PACA_EXRFI+EX_R12(r13) - std r8,PACA_EXRFI+EX_R13(r13) mfctr r9 ld r10,PACA_RFI_FLUSH_FALLBACK_AREA(r13) - ld r11,PACA_L1D_FLUSH_SETS(r13) - ld r12,PACA_L1D_FLUSH_CONGRUENCE(r13) - /* - * The load adresses are at staggered offsets within cachelines, - * which suits some pipelines better (on others it should not - * hurt). - */ - addi r12,r12,8 + ld r11,PACA_L1D_FLUSH_SIZE(r13) + srdi r11,r11,(7 + 3) /* 128 byte lines, unrolled 8x */ mtctr r11 DCBT_STOP_ALL_STREAM_IDS(r11) /* Stop prefetch streams */ /* order ld/st prior to dcbt stop all streams with flushing */ sync -1: li r8,0 - .rept 8 /* 8-way set associative */ - ldx r11,r10,r8 - add r8,r8,r12 - xor r11,r11,r11 // Ensure r11 is 0 even if fallback area is not - add r8,r8,r11 // Add 0, this creates a dependency on the ldx - .endr - addi r10,r10,128 /* 128 byte cache line */ + + /* + * The load adresses are at staggered offsets within cachelines, + * which suits some pipelines better (on others it should not + * hurt). + */ +1: + ld r11,(0x80 + 8)*0(r10) + ld r11,(0x80 + 8)*1(r10) + ld r11,(0x80 + 8)*2(r10) + ld r11,(0x80 + 8)*3(r10) + ld r11,(0x80 + 8)*4(r10) + ld r11,(0x80 + 8)*5(r10) + ld r11,(0x80 + 8)*6(r10) + ld r11,(0x80 + 8)*7(r10) + addi r10,r10,0x80*8 bdnz 1b mtctr r9 ld r9,PACA_EXRFI+EX_R9(r13) ld r10,PACA_EXRFI+EX_R10(r13) ld r11,PACA_EXRFI+EX_R11(r13) - ld r12,PACA_EXRFI+EX_R12(r13) - ld r8,PACA_EXRFI+EX_R13(r13) GET_SCRATCH0(r13); hrfid --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -902,19 +902,8 @@ static void init_fallback_flush(void) memset(l1d_flush_fallback_area, 0, l1d_size * 2); for_each_possible_cpu(cpu) { - /* - * The fallback flush is currently coded for 8-way - * associativity. Different associativity is possible, but it - * will be treated as 8-way and may not evict the lines as - * effectively. - * - * 128 byte lines are mandatory. - */ - u64 c = l1d_size / 8; - paca[cpu].rfi_flush_fallback_area = l1d_flush_fallback_area; - paca[cpu].l1d_flush_congruence = c; - paca[cpu].l1d_flush_sets = c / 128; + paca[cpu].l1d_flush_size = l1d_size; } } --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -2146,8 +2146,6 @@ static void dump_one_paca(int cpu) printf(" slb_cache[%d]: = 0x%016lx\n", i, p->slb_cache[i]); DUMP(p, rfi_flush_fallback_area, "px"); - DUMP(p, l1d_flush_congruence, "llx"); - DUMP(p, l1d_flush_sets, "llx"); #endif DUMP(p, dscr_default, "llx"); #ifdef CONFIG_PPC_BOOK3E Patches currently in stable-queue which might be from mpe@ellerman.id.au are queue-4.4/powerpc-64s-add-support-for-a-store-forwarding-barrier-at-kernel-entry-exit.patch queue-4.4/powerpc-64-make-stf-barrier-ppc_book3s_64-specific.patch queue-4.4/powerpc-pseries-set-or-clear-security-feature-flags.patch queue-4.4/powerpc-fsl-fix-spectre_v2-mitigations-reporting.patch queue-4.4/powerpc-64s-patch-barrier_nospec-in-modules.patch queue-4.4/powerpc-pseries-support-firmware-disable-of-rfi-flush.patch queue-4.4/powerpc-rfi-flush-call-setup_rfi_flush-after-lpm-migration.patch queue-4.4/powerpc-pseries-query-hypervisor-for-count-cache-flush-settings.patch queue-4.4/powerpc-powernv-set-or-clear-security-feature-flags.patch queue-4.4/powerpc-64s-add-support-for-software-count-cache-flush.patch queue-4.4/powerpc64s-show-ori31-availability-in-spectre_v1-sysfs-file-not-v2.patch queue-4.4/powerpc-fsl-flush-the-branch-predictor-at-each-kernel-entry-64bit.patch queue-4.4/powerpc-fsl-update-spectre-v2-reporting.patch queue-4.4/powerpc-64s-wire-up-cpu_show_spectre_v2.patch queue-4.4/powerpc-64-make-meltdown-reporting-book3s-64-specific.patch queue-4.4/powerpc-rfi-flush-make-it-possible-to-call-setup_rfi_flush-again.patch queue-4.4/powerpc-64s-add-support-for-ori-barrier_nospec-patching.patch queue-4.4/powerpc-use-barrier_nospec-in-copy_from_user.patch queue-4.4/powerpc-64s-fix-section-mismatch-warnings-from-setup_rfi_flush.patch queue-4.4/powerpc-avoid-code-patching-freed-init-sections.patch queue-4.4/powerpc-fsl-add-macro-to-flush-the-branch-predictor.patch queue-4.4/powerpc-xmon-add-rfi-flush-related-fields-to-paca-dump.patch queue-4.4/powerpc-fsl-add-barrier_nospec-implementation-for-nxp-powerpc-book3e.patch queue-4.4/powerpc-security-fix-spectre_v2-reporting.patch queue-4.4/powerpc-add-security-feature-flags-for-spectre-meltdown.patch queue-4.4/powerpc-powernv-use-the-security-flags-in-pnv_setup_rfi_flush.patch queue-4.4/powerpc-64-disable-the-speculation-barrier-from-the-command-line.patch queue-4.4/powerpc-fsl-fix-the-flush-of-branch-predictor.patch queue-4.4/powerpc-pseries-use-the-security-flags-in-pseries_setup_rfi_flush.patch queue-4.4/powerpc-64-add-config_ppc_barrier_nospec.patch queue-4.4/powerpc-64s-move-cpu_show_meltdown.patch queue-4.4/powerpc-64-use-barrier_nospec-in-syscall-entry.patch queue-4.4/powerpc-fsl-add-nospectre_v2-command-line-argument.patch queue-4.4/powerpc-64s-add-new-security-feature-flags-for-count-cache-flush.patch queue-4.4/powerpc-fsl-add-infrastructure-to-fixup-branch-predictor-flush.patch queue-4.4/powerpc-rfi-flush-differentiate-enabled-and-patched-flush-types.patch queue-4.4/powerpc-64s-enhance-the-information-in-cpu_show_spectre_v1.patch queue-4.4/powerpc-64-call-setup_barrier_nospec-from-setup_arch.patch queue-4.4/powerpc-rfi-flush-always-enable-fallback-flush-on-pseries.patch queue-4.4/powerpc-64s-improve-rfi-l1-d-cache-flush-fallback.patch queue-4.4/powerpc-asm-add-a-patch_site-macro-helpers-for-patching-instructions.patch queue-4.4/powerpc-pseries-add-new-h_get_cpu_characteristics-flags.patch queue-4.4/powerpc-64s-enable-barrier_nospec-based-on-firmware-settings.patch queue-4.4/powerpc-powernv-support-firmware-disable-of-rfi-flush.patch queue-4.4/powerpc-rfi-flush-move-the-logic-to-avoid-a-redo-into-the-debugfs-code.patch queue-4.4/powerpc-powernv-query-firmware-for-count-cache-flush-settings.patch queue-4.4/powerpc-64s-wire-up-cpu_show_spectre_v1.patch queue-4.4/powerpc-64s-add-barrier_nospec.patch queue-4.4/powerpc-64s-enhance-the-information-in-cpu_show_meltdown.patch queue-4.4/powerpc-move-default-security-feature-flags.patch queue-4.4/powerpc-pseries-fix-clearing-of-security-feature-flags.patch queue-4.4/powerpc-pseries-restore-default-security-feature-flags-on-setup.patch