public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <Alexander.Levin@microsoft.com>
To: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>, Richard Biener <rguenther@suse.de>,
	Jakub Jelinek <jakub@gcc.gnu.org>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Sasha Levin <Alexander.Levin@microsoft.com>
Subject: [PATCH AUTOSEL for 4.14 88/97] crypto: aes-generic - build with -Os on gcc-7+
Date: Mon, 19 Mar 2018 15:56:27 +0000	[thread overview]
Message-ID: <20180319155411.12348-88-alexander.levin@microsoft.com> (raw)
In-Reply-To: <20180319155411.12348-1-alexander.levin@microsoft.com>

From: Arnd Bergmann <arnd@arndb.de>

[ Upstream commit 148b974deea927f5dbb6c468af2707b488bfa2de ]

While testing other changes, I discovered that gcc-7.2.1 produces badly
optimized code for aes_encrypt/aes_decrypt. This is especially true when
CONFIG_UBSAN_SANITIZE_ALL is enabled, where it leads to extremely
large stack usage that in turn might cause kernel stack overflows:

crypto/aes_generic.c: In function 'aes_encrypt':
crypto/aes_generic.c:1371:1: warning: the frame size of 4880 bytes is larger than 2048 bytes [-Wframe-larger-than=]
crypto/aes_generic.c: In function 'aes_decrypt':
crypto/aes_generic.c:1441:1: warning: the frame size of 4864 bytes is larger than 2048 bytes [-Wframe-larger-than=]

I verified that this problem exists on all architectures that are
supported by gcc-7.2, though arm64 in particular is less affected than
the others. I also found that gcc-7.1 and gcc-8 do not show the extreme
stack usage but still produce worse code than earlier versions for this
file, apparently because of optimization passes that generally provide
a substantial improvement in object code quality but understandably fail
to find any shortcuts in the AES algorithm.

Possible workarounds include

a) disabling -ftree-pre and -ftree-sra optimizations, this was an earlier
   patch I tried, which reliably fixed the stack usage, but caused a
   serious performance regression in some versions, as later testing
   found.

b) disabling UBSAN on this file or all ciphers, as suggested by Ard
   Biesheuvel. This would lead to massively better crypto performance in
   UBSAN-enabled kernels and avoid the stack usage, but there is a concern
   over whether we should exclude arbitrary files from UBSAN at all.

c) Forcing the optimization level in a different way. Similar to a),
   but rather than deselecting specific optimization stages,
   this now uses "gcc -Os" for this file, regardless of the
   CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE/SIZE option. This is a reliable
   workaround for the stack consumption on all architecture, and I've
   retested the performance results now on x86, cycles/byte (lower is
   better) for cbc(aes-generic) with 256 bit keys:

			-O2     -Os
	gcc-6.3.1	14.9	15.1
	gcc-7.0.1	14.7	15.3
	gcc-7.1.1	15.3	14.7
	gcc-7.2.1	16.8	15.9
	gcc-8.0.0	15.5	15.6

This implements the option c) by enabling forcing -Os on all compiler
versions starting with gcc-7.1. As a workaround for PR83356, it would
only be needed for gcc-7.2+ with UBSAN enabled, but since it also shows
better performance on gcc-7.1 without UBSAN, it seems appropriate to
use the faster version here as well.

Side note: during testing, I also played with the AES code in libressl,
which had a similar performance regression from gcc-6 to gcc-7.2,
but was three times slower overall. It might be interesting to
investigate that further and possibly port the Linux implementation
into that.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651
Cc: Richard Biener <rguenther@suse.de>
Cc: Jakub Jelinek <jakub@gcc.gnu.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
---
 crypto/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/crypto/Makefile b/crypto/Makefile
index da190be60ce2..adaf2c63baeb 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -98,6 +98,7 @@ obj-$(CONFIG_CRYPTO_TWOFISH_COMMON) += twofish_common.o
 obj-$(CONFIG_CRYPTO_SERPENT) += serpent_generic.o
 CFLAGS_serpent_generic.o := $(call cc-option,-fsched-pressure)  # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
 obj-$(CONFIG_CRYPTO_AES) += aes_generic.o
+CFLAGS_aes_generic.o := $(call cc-ifversion, -ge, 0701, -Os) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
 obj-$(CONFIG_CRYPTO_AES_TI) += aes_ti.o
 obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia_generic.o
 obj-$(CONFIG_CRYPTO_CAST_COMMON) += cast_common.o
-- 
2.14.1

  parent reply	other threads:[~2018-03-19 21:56 UTC|newest]

Thread overview: 97+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-19 15:54 [PATCH AUTOSEL for 4.14 01/97] i40iw: Fix sequence number for the first partial FPDU Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 02/97] i40iw: Correct Q1/XF object count equation Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 03/97] i40iw: Validate correct IRD/ORD connection parameters Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 04/97] clk: meson: mpll: use 64-bit maths in params_from_rate Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 05/97] ARM: dts: ls1021a: add "fsl,ls1021a-esdhc" compatible string to esdhc node Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 06/97] Bluetooth: Add a new 04ca:3015 QCA_ROME device Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 07/97] ipv6: Reinject IPv6 packets if IPsec policy matches after SNAT Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 08/97] thermal: power_allocator: fix one race condition issue for thermal_instances list Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 09/97] perf probe: Find versioned symbols from map Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 10/97] perf probe: Add warning message if there is unexpected event name Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 11/97] perf evsel: Enable ignore_missing_thread for pid option Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 12/97] net: hns3: free the ring_data structrue when change tqps Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 13/97] net: hns3: fix for getting auto-negotiation state in hclge_get_autoneg Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 14/97] l2tp: fix missing print session offset info Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 15/97] rds; Reset rs->rs_bound_addr in rds_add_bound() failure path Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 16/97] ACPI / video: Default lcd_only to true on Win8-ready and newer machines Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 17/97] net/mlx4_en: Change default QoS settings Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 18/97] VFS: close race between getcwd() and d_move() Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 19/97] watchdog: dw_wdt: add stop watchdog operation Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 20/97] clk: divider: fix incorrect usage of container_of Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 21/97] clk: sunxi-ng: fix the A64/H5 clock description of DE2 CCU Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 22/97] PM / devfreq: Fix potential NULL pointer dereference in governor_store Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 23/97] selftests/net: fix bugs in address and port initialization Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 24/97] RDMA/cma: Mark end of CMA ID messages Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 25/97] hwmon: (ina2xx) Make calibration register value fixed Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 26/97] clk: sunxi-ng: a83t: Add M divider to TCON1 clock Sasha Levin
2018-03-19 15:54 ` [PATCH AUTOSEL for 4.14 27/97] media: videobuf2-core: don't go out of the buffer range Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 28/97] ASoC: Intel: Skylake: Disable clock gating during firmware and library download Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 29/97] ASoC: Intel: cht_bsw_rt5645: Analog Mic support Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 30/97] spi: sh-msiof: Fix timeout failures for TX-only DMA transfers Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 31/97] scsi: libiscsi: Allow sd_shutdown on bad transport Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 32/97] scsi: mpt3sas: Proper handling of set/clear of "ATA command pending" flag Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 33/97] scsi: qla2xxx: Fix NULL pointer access for fcport structure Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 34/97] irqchip/gic-v3: Fix the driver probe() fail due to disabled GICC entry Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 35/97] ACPI: EC: Fix debugfs_create_*() usage Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 36/97] mac80211: Fix setting TX power on monitor interfaces Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 37/97] vfb: fix video mode and line_length being set when loaded Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 38/97] ACPICA: Recognize the Windows 10 version 1607 and 1703 OSI strings Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 39/97] gpio: label descriptors using the device name Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 40/97] powernv-cpufreq: Add helper to extract pstate from PMSR Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 41/97] IB/rdmavt: Allocate CQ memory on the correct node Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 42/97] blk-mq: avoid to map CPU into stale hw queue Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 43/97] blk-mq: fix race between updating nr_hw_queues and switching io sched Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 44/97] ipv6: Set nexthop flags during route creation Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 45/97] backlight: tdo24m: Fix the SPI CS between transfers Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 46/97] pinctrl: baytrail: Enable glitch filter for GPIOs used as interrupts Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 47/97] nvme_fcloop: disassocate local port structs Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 48/97] nvme_fcloop: fix abort race condition Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 49/97] tpm: return a TPM_RC_COMMAND_CODE response if command is not implemented Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 50/97] perf report: Fix a no annotate browser displayed issue Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 51/97] iio: imu: st_lsm6dsx: fix endianness in st_lsm6dsx_read_oneshot() Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 52/97] staging: lustre: disable preempt while sampling processor id Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 53/97] ASoC: Intel: sst: Fix the return value of 'sst_send_byte_stream_mrfld()' Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 54/97] netfilter: core: only allow one nat hook per hook point Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 55/97] power: supply: axp288_charger: Properly stop work on probe-error / remove Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 56/97] rt2x00: do not pause queue unconditionally on error path Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 57/97] wl1251: check return from call to wl1251_acx_arp_ip_filter Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 58/97] xfs: include inobt buffers in ifree tx log reservation Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 59/97] xfs: fix up agi unlinked list reservations Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 60/97] net/mlx5: Fix race for multiple RoCE enable Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 61/97] net: hns3: Fix an error of total drop packet statistics Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 62/97] net: hns3: Fix a loop index error of tqp statistics query Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 63/97] net: hns3: Fix an error macro definition of HNS3_TQP_STAT Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 64/97] net: hns3: fix for changing MTU Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 65/97] bcache: ret IOERR when read meets metadata error Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 66/97] bcache: stop writeback thread after detaching Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 67/97] bcache: segregate flash only volume write streams Sasha Levin
2018-03-19 15:55 ` [PATCH AUTOSEL for 4.14 68/97] scsi: libsas: fix memory leak in sas_smp_get_phy_events() Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 69/97] scsi: libsas: fix error when getting phy events Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 70/97] scsi: libsas: initialize sas_phy status according to response of DISCOVER Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 71/97] blk-mq: fix kernel oops in blk_mq_tag_idle() Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 72/97] tty: n_gsm: Allow ADM response in addition to UA for control dlci Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 73/97] block, bfq: put async queues for root bfq groups too Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 74/97] EDAC, mv64x60: Fix an error handling path Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 75/97] uio_hv_generic: check that host supports monitor page Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 76/97] i40evf: don't rely on netif_running() outside rtnl_lock() Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 77/97] cxgb4vf: Fix SGE FL buffer initialization logic for 64K pages Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 78/97] clk: fix reentrancy of clk_enable() on UP systems Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 79/97] scsi: megaraid_sas: Error handling for invalid ldcount provided by firmware in RAID map Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 80/97] scsi: megaraid_sas: unload flag should be set after scsi_remove_host is called Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 81/97] RDMA/cma: Fix rdma_cm path querying for RoCE Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 82/97] gpio: thunderx: fix error return code in thunderx_gpio_probe() Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 83/97] x86/gart: Exclude GART aperture from vmcore Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 84/97] sdhci: Advertise 2.0v supply on SDIO host controller Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 85/97] ibmvnic: Don't handle RX interrupts when not up Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 86/97] Input: goodix - disable IRQs while suspended Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 87/97] mtd: mtd_oobtest: Handle bitflips during reads Sasha Levin
2018-03-19 15:56 ` Sasha Levin [this message]
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 89/97] genirq/affinity: assign vectors to all possible CPUs Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 90/97] perf tools: Fix copyfile_offset update of output offset Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 91/97] signal/parisc: Document a conflict with SI_USER with SIGFPE Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 92/97] signal/metag: " Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 93/97] signal/powerpc: Document conflicts with SI_USER and SIGFPE and SIGTRAP Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 94/97] signal/arm: Document conflicts with SI_USER and SIGFPE Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 95/97] xfs: account finobt blocks properly in perag reservation Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 96/97] tcmu: release blocks for partially setup cmds Sasha Levin
2018-03-19 15:56 ` [PATCH AUTOSEL for 4.14 97/97] thermal: int3400_thermal: fix error handling in int3400_thermal_probe() Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180319155411.12348-88-alexander.levin@microsoft.com \
    --to=alexander.levin@microsoft.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=arnd@arndb.de \
    --cc=herbert@gondor.apana.org.au \
    --cc=jakub@gcc.gnu.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rguenther@suse.de \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox