* [rafael-pm:bleeding-edge] BUILD SUCCESS acc71791e32e8c9a8fbe04ced8b00c49dfbcc52b
From: kernel test robot @ 2024-04-03 8:20 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: linux-acpi, devel, linux-pm
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git bleeding-edge
branch HEAD: acc71791e32e8c9a8fbe04ced8b00c49dfbcc52b Merge branch 'acpi-thermal' into bleeding-edge
elapsed time: 729m
configs tested: 162
configs skipped: 3
The following configs have been built successfully.
More configs may be tested in the coming days.
tested configs:
alpha allnoconfig gcc
alpha allyesconfig gcc
alpha defconfig gcc
arc allmodconfig gcc
arc allnoconfig gcc
arc allyesconfig gcc
arc defconfig gcc
arc nsimosci_hs_smp_defconfig gcc
arc randconfig-001-20240403 gcc
arc randconfig-002-20240403 gcc
arm allmodconfig gcc
arm allnoconfig clang
arm allyesconfig gcc
arm defconfig clang
arm integrator_defconfig clang
arm randconfig-001-20240403 gcc
arm randconfig-002-20240403 gcc
arm randconfig-003-20240403 clang
arm randconfig-004-20240403 gcc
arm realview_defconfig clang
arm s5pv210_defconfig gcc
arm64 allmodconfig clang
arm64 allnoconfig gcc
arm64 defconfig gcc
arm64 randconfig-001-20240403 clang
arm64 randconfig-002-20240403 clang
arm64 randconfig-003-20240403 gcc
arm64 randconfig-004-20240403 clang
csky allmodconfig gcc
csky allnoconfig gcc
csky allyesconfig gcc
csky defconfig gcc
csky randconfig-001-20240403 gcc
csky randconfig-002-20240403 gcc
hexagon allmodconfig clang
hexagon allnoconfig clang
hexagon allyesconfig clang
hexagon defconfig clang
hexagon randconfig-001-20240403 clang
hexagon randconfig-002-20240403 clang
i386 allmodconfig gcc
i386 allnoconfig gcc
i386 allyesconfig gcc
i386 buildonly-randconfig-001-20240403 gcc
i386 buildonly-randconfig-002-20240403 clang
i386 buildonly-randconfig-003-20240403 clang
i386 buildonly-randconfig-004-20240403 clang
i386 buildonly-randconfig-005-20240403 gcc
i386 buildonly-randconfig-006-20240403 clang
i386 defconfig clang
i386 randconfig-001-20240403 gcc
i386 randconfig-002-20240403 clang
i386 randconfig-003-20240403 gcc
i386 randconfig-004-20240403 gcc
i386 randconfig-005-20240403 clang
i386 randconfig-006-20240403 gcc
i386 randconfig-011-20240403 gcc
i386 randconfig-012-20240403 clang
i386 randconfig-013-20240403 gcc
i386 randconfig-014-20240403 clang
i386 randconfig-015-20240403 gcc
i386 randconfig-016-20240403 clang
loongarch allmodconfig gcc
loongarch allnoconfig gcc
loongarch defconfig gcc
loongarch randconfig-001-20240403 gcc
loongarch randconfig-002-20240403 gcc
m68k allmodconfig gcc
m68k allnoconfig gcc
m68k allyesconfig gcc
m68k bvme6000_defconfig gcc
m68k defconfig gcc
microblaze allmodconfig gcc
microblaze allnoconfig gcc
microblaze allyesconfig gcc
microblaze defconfig gcc
mips allnoconfig gcc
mips allyesconfig gcc
nios2 allmodconfig gcc
nios2 allnoconfig gcc
nios2 allyesconfig gcc
nios2 defconfig gcc
nios2 randconfig-001-20240403 gcc
nios2 randconfig-002-20240403 gcc
openrisc allnoconfig gcc
openrisc allyesconfig gcc
openrisc defconfig gcc
openrisc or1ksim_defconfig gcc
parisc allmodconfig gcc
parisc allnoconfig gcc
parisc allyesconfig gcc
parisc defconfig gcc
parisc randconfig-001-20240403 gcc
parisc randconfig-002-20240403 gcc
parisc64 defconfig gcc
powerpc allmodconfig gcc
powerpc allnoconfig gcc
powerpc allyesconfig clang
powerpc cm5200_defconfig clang
powerpc eiger_defconfig clang
powerpc ep8248e_defconfig gcc
powerpc kmeter1_defconfig gcc
powerpc mpc8315_rdb_defconfig clang
powerpc randconfig-001-20240403 gcc
powerpc randconfig-002-20240403 gcc
powerpc randconfig-003-20240403 clang
powerpc tqm8555_defconfig clang
powerpc64 randconfig-001-20240403 gcc
powerpc64 randconfig-002-20240403 clang
powerpc64 randconfig-003-20240403 gcc
riscv allmodconfig clang
riscv allnoconfig gcc
riscv allyesconfig clang
riscv defconfig clang
riscv randconfig-001-20240403 clang
riscv randconfig-002-20240403 clang
s390 allmodconfig clang
s390 allnoconfig clang
s390 allyesconfig gcc
s390 defconfig clang
s390 randconfig-001-20240403 clang
s390 randconfig-002-20240403 clang
sh allmodconfig gcc
sh allnoconfig gcc
sh allyesconfig gcc
sh defconfig gcc
sh kfr2r09-romimage_defconfig gcc
sh kfr2r09_defconfig gcc
sh lboxre2_defconfig gcc
sh randconfig-001-20240403 gcc
sh randconfig-002-20240403 gcc
sh se7724_defconfig gcc
sparc allmodconfig gcc
sparc allnoconfig gcc
sparc defconfig gcc
sparc64 allmodconfig gcc
sparc64 allyesconfig gcc
sparc64 defconfig gcc
sparc64 randconfig-001-20240403 gcc
sparc64 randconfig-002-20240403 gcc
um allmodconfig clang
um allnoconfig clang
um allyesconfig gcc
um defconfig clang
um i386_defconfig gcc
um randconfig-001-20240403 gcc
um randconfig-002-20240403 clang
um x86_64_defconfig clang
x86_64 allnoconfig clang
x86_64 allyesconfig clang
x86_64 buildonly-randconfig-001-20240403 gcc
x86_64 buildonly-randconfig-002-20240403 gcc
x86_64 buildonly-randconfig-003-20240403 clang
x86_64 buildonly-randconfig-004-20240403 gcc
x86_64 buildonly-randconfig-005-20240403 clang
x86_64 buildonly-randconfig-006-20240403 gcc
x86_64 defconfig gcc
x86_64 rhel-8.3-rust clang
xtensa allnoconfig gcc
xtensa audio_kc705_defconfig gcc
xtensa randconfig-001-20240403 gcc
xtensa randconfig-002-20240403 gcc
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
* [PATCH 27/34] cpufreq: intel_pstate: hide unused intel_pstate_cpu_oob_ids[]
From: Arnd Bergmann @ 2024-04-03 8:06 UTC (permalink / raw)
To: linux-kernel, Srinivas Pandruvada, Len Brown, Rafael J. Wysocki,
Viresh Kumar
Cc: Arnd Bergmann, Doug Smythies, Zhenguo Yao, Tero Kristo,
Jiri Slaby (SUSE), linux-pm
In-Reply-To: <20240403080702.3509288-1-arnd@kernel.org>
From: Arnd Bergmann <arnd@arndb.de>
The reference to this variable is hidden in an #ifdef:
drivers/cpufreq/intel_pstate.c:2440:32: error: 'intel_pstate_cpu_oob_ids' defined but not used [-Werror=unused-const-variable=]
Use the same check around the definition.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/cpufreq/intel_pstate.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index dbbf299f4219..29ce9edc6f68 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2437,6 +2437,7 @@ static const struct x86_cpu_id intel_pstate_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, intel_pstate_cpu_ids);
+#ifdef CONFIG_ACPI
static const struct x86_cpu_id intel_pstate_cpu_oob_ids[] __initconst = {
X86_MATCH(BROADWELL_D, core_funcs),
X86_MATCH(BROADWELL_X, core_funcs),
@@ -2445,6 +2446,7 @@ static const struct x86_cpu_id intel_pstate_cpu_oob_ids[] __initconst = {
X86_MATCH(SAPPHIRERAPIDS_X, core_funcs),
{}
};
+#endif
static const struct x86_cpu_id intel_pstate_cpu_ee_disable_ids[] = {
X86_MATCH(KABYLAKE, core_funcs),
--
2.39.2
^ permalink raw reply related
* [PATCH 09/34] power: rt9455: hide unused rt9455_boost_voltage_values
From: Arnd Bergmann @ 2024-04-03 8:06 UTC (permalink / raw)
To: linux-kernel, Sebastian Reichel, Anda-Maria Nicolae,
Krzysztof Kozlowski
Cc: Arnd Bergmann, David Lechner, Rob Herring, Uwe Kleine-König,
linux-pm
In-Reply-To: <20240403080702.3509288-1-arnd@kernel.org>
From: Arnd Bergmann <arnd@arndb.de>
The rt9455_boost_voltage_values[] array is only used when USB PHY
support is enabled, causing a W=1 warning otherwise:
drivers/power/supply/rt9455_charger.c:200:18: error: 'rt9455_boost_voltage_values' defined but not used [-Werror=unused-const-variable=]
Enclose the definition in the same #ifdef as the references to it.
Fixes: e86d69dd786e ("power_supply: Add support for Richtek RT9455 battery charger")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/power/supply/rt9455_charger.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/power/supply/rt9455_charger.c b/drivers/power/supply/rt9455_charger.c
index c345a77f9f78..e4dbacd50a43 100644
--- a/drivers/power/supply/rt9455_charger.c
+++ b/drivers/power/supply/rt9455_charger.c
@@ -192,6 +192,7 @@ static const int rt9455_voreg_values[] = {
4450000, 4450000, 4450000, 4450000, 4450000, 4450000, 4450000, 4450000
};
+#if IS_ENABLED(CONFIG_USB_PHY)
/*
* When the charger is in boost mode, REG02[7:2] represent boost output
* voltage.
@@ -207,6 +208,7 @@ static const int rt9455_boost_voltage_values[] = {
5600000, 5600000, 5600000, 5600000, 5600000, 5600000, 5600000, 5600000,
5600000, 5600000, 5600000, 5600000, 5600000, 5600000, 5600000, 5600000,
};
+#endif
/* REG07[3:0] (VMREG) in uV */
static const int rt9455_vmreg_values[] = {
--
2.39.2
^ permalink raw reply related
* [PATCH 00/34] address all -Wunused-const warnings
From: Arnd Bergmann @ 2024-04-03 8:06 UTC (permalink / raw)
To: linux-kernel
Cc: Arnd Bergmann, Michael Ellerman, Christophe Leroy, Damien Le Moal,
Jiri Kosina, Greg Kroah-Hartman, Corey Minyard, Peter Huewe,
Jarkko Sakkinen, Tero Kristo, Stephen Boyd, Ian Abbott,
H Hartley Sweeten, Srinivas Pandruvada, Len Brown,
Rafael J. Wysocki, John Allen, Herbert Xu, Vinod Koul,
Ard Biesheuvel, Bjorn Andersson, Moritz Fischer, Liviu Dudau,
Benjamin Tissoires, Andi Shyti, Michael Hennerich, Peter Rosin,
Lars-Peter Clausen, Jonathan Cameron, Dmitry Torokhov,
Markuss Broks, Alexandre Torgue, Lee Jones, Jakub Kicinski,
Shyam Sundar S K, Iyappan Subramanian, Yisen Zhuang,
Stanislaw Gruszka, Kalle Valo, Sebastian Reichel, Tony Lindgren,
Mark Brown, Alexandre Belloni, Xiang Chen, Martin K. Petersen,
Neil Armstrong, Heiko Stuebner, Krzysztof Kozlowski,
Vaibhav Hiremath, Alex Elder, Jiri Slaby, Jacky Huang,
Helge Deller, Christoph Hellwig, Robin Murphy, Steven Rostedt,
Masami Hiramatsu, Andrew Morton, Kees Cook, Trond Myklebust,
Anna Schumaker, Masahiro Yamada, Nathan Chancellor, Takashi Iwai,
linuxppc-dev, linux-ide, openipmi-developer, linux-integrity,
linux-omap, linux-clk, linux-pm, linux-crypto, dmaengine,
linux-efi, linux-arm-msm, linux-fpga, dri-devel, linux-input,
linux-i2c, linux-iio, linux-stm32, linux-arm-kernel, netdev,
linux-leds, linux-wireless, linux-rtc, linux-scsi, linux-spi,
linux-amlogic, linux-rockchip, linux-samsung-soc, greybus-dev,
linux-staging, linux-serial, linux-usb, linux-fbdev, iommu,
linux-trace-kernel, kasan-dev, linux-hardening, linux-nfs,
linux-kbuild, alsa-devel, linux-sound
From: Arnd Bergmann <arnd@arndb.de>
Compilers traditionally warn for unused 'static' variables, but not
if they are constant. The reason here is a custom for C++ programmers
to define named constants as 'static const' variables in header files
instead of using macros or enums.
In W=1 builds, we get warnings only static const variables in C
files, but not in headers, which is a good compromise, but this still
produces warning output in at least 30 files. These warnings are
almost all harmless, but also trivial to fix, and there is no
good reason to warn only about the non-const variables being unused.
I've gone through all the files that I found using randconfig and
allmodconfig builds and created patches to avoid these warnings,
with the goal of retaining a clean build once the option is enabled
by default.
Unfortunately, there is one fairly large patch ("drivers: remove
incorrect of_match_ptr/ACPI_PTR annotations") that touches
34 individual drivers that all need the same one-line change.
If necessary, I can split it up by driver or by subsystem,
but at least for reviewing I would keep it as one piece for
the moment.
Please merge the individual patches through subsystem trees.
I expect that some of these will have to go through multiple
revisions before they are picked up, so anything that gets
applied early saves me from resending.
Arnd
Arnd Bergmann (31):
powerpc/fsl-soc: hide unused const variable
ubsan: fix unused variable warning in test module
platform: goldfish: remove ACPI_PTR() annotations
i2c: pxa: hide unused icr_bits[] variable
3c515: remove unused 'mtu' variable
tracing: hide unused ftrace_event_id_fops
Input: synaptics: hide unused smbus_pnp_ids[] array
power: rt9455: hide unused rt9455_boost_voltage_values
efi: sysfb: don't build when EFI is disabled
clk: ti: dpll: fix incorrect #ifdef checks
apm-emulation: hide an unused variable
sisfb: hide unused variables
dma/congiguous: avoid warning about unused size_bytes
leds: apu: remove duplicate DMI lookup data
iio: ad5755: hook up of_device_id lookup to platform driver
greybus: arche-ctrl: move device table to its right location
lib: checksum: hide unused expected_csum_ipv6_magic[]
sunrpc: suppress warnings for unused procfs functions
comedi: ni_atmio: avoid warning for unused device_ids[] table
iwlegacy: don't warn for unused variables with DEBUG_FS=n
drm/komeda: don't warn for unused debugfs files
firmware: qcom_scm: mark qcom_scm_qseecom_allowlist as __maybe_unused
crypto: ccp - drop platform ifdef checks
usb: gadget: omap_udc: remove unused variable
isdn: kcapi: don't build unused procfs code
cpufreq: intel_pstate: hide unused intel_pstate_cpu_oob_ids[]
net: xgbe: remove extraneous #ifdef checks
Input: imagis - remove incorrect ifdef checks
sata: mv: drop unnecessary #ifdef checks
ASoC: remove incorrect of_match_ptr/ACPI_PTR annotations
spi: remove incorrect of_match_ptr annotations
drivers: remove incorrect of_match_ptr/ACPI_PTR annotations
kbuild: always enable -Wunused-const-variable
Krzysztof Kozlowski (1):
Input: stmpe-ts - mark OF related data as maybe unused
arch/powerpc/sysdev/fsl_msi.c | 2 +
drivers/ata/sata_mv.c | 64 +++++++++----------
drivers/char/apm-emulation.c | 5 +-
drivers/char/ipmi/ipmb_dev_int.c | 2 +-
drivers/char/tpm/tpm_ftpm_tee.c | 2 +-
drivers/clk/ti/dpll.c | 10 ++-
drivers/comedi/drivers/ni_atmio.c | 2 +-
drivers/cpufreq/intel_pstate.c | 2 +
drivers/crypto/ccp/sp-platform.c | 14 +---
drivers/dma/img-mdc-dma.c | 2 +-
drivers/firmware/efi/Makefile | 3 +-
drivers/firmware/efi/sysfb_efi.c | 2 -
drivers/firmware/qcom/qcom_scm.c | 2 +-
drivers/fpga/versal-fpga.c | 2 +-
.../gpu/drm/arm/display/komeda/komeda_dev.c | 8 ---
drivers/hid/hid-google-hammer.c | 6 +-
drivers/i2c/busses/i2c-pxa.c | 2 +-
drivers/i2c/muxes/i2c-mux-ltc4306.c | 2 +-
drivers/i2c/muxes/i2c-mux-reg.c | 2 +-
drivers/iio/dac/ad5755.c | 1 +
drivers/input/mouse/synaptics.c | 2 +
drivers/input/touchscreen/imagis.c | 4 +-
drivers/input/touchscreen/stmpe-ts.c | 2 +-
drivers/input/touchscreen/wdt87xx_i2c.c | 2 +-
drivers/isdn/capi/Makefile | 3 +-
drivers/isdn/capi/kcapi.c | 7 +-
drivers/leds/leds-apu.c | 3 +-
drivers/mux/adg792a.c | 2 +-
drivers/net/ethernet/3com/3c515.c | 3 -
drivers/net/ethernet/amd/xgbe/xgbe-platform.c | 8 ---
drivers/net/ethernet/apm/xgene-v2/main.c | 2 +-
drivers/net/ethernet/hisilicon/hns_mdio.c | 2 +-
drivers/net/wireless/intel/iwlegacy/4965-rs.c | 15 +----
drivers/net/wireless/intel/iwlegacy/common.h | 2 -
drivers/platform/goldfish/goldfish_pipe.c | 2 +-
drivers/power/supply/rt9455_charger.c | 2 +
drivers/regulator/pbias-regulator.c | 2 +-
drivers/regulator/twl-regulator.c | 2 +-
drivers/regulator/twl6030-regulator.c | 2 +-
drivers/rtc/rtc-fsl-ftm-alarm.c | 2 +-
drivers/scsi/hisi_sas/hisi_sas_v1_hw.c | 2 +-
drivers/scsi/hisi_sas/hisi_sas_v2_hw.c | 2 +-
drivers/spi/spi-armada-3700.c | 2 +-
drivers/spi/spi-img-spfi.c | 2 +-
drivers/spi/spi-meson-spicc.c | 2 +-
drivers/spi/spi-meson-spifc.c | 2 +-
drivers/spi/spi-orion.c | 2 +-
drivers/spi/spi-pic32-sqi.c | 2 +-
drivers/spi/spi-pic32.c | 2 +-
drivers/spi/spi-rockchip.c | 2 +-
drivers/spi/spi-s3c64xx.c | 2 +-
drivers/spi/spi-st-ssc4.c | 2 +-
drivers/staging/greybus/arche-apb-ctrl.c | 1 +
drivers/staging/greybus/arche-platform.c | 9 +--
drivers/staging/pi433/pi433_if.c | 2 +-
drivers/tty/serial/amba-pl011.c | 6 +-
drivers/tty/serial/ma35d1_serial.c | 2 +-
drivers/usb/gadget/udc/omap_udc.c | 10 +--
drivers/video/fbdev/sis/init301.c | 3 +-
kernel/dma/contiguous.c | 2 +-
kernel/trace/trace_events.c | 4 ++
lib/checksum_kunit.c | 2 +
lib/test_ubsan.c | 2 +-
net/sunrpc/cache.c | 10 +--
scripts/Makefile.extrawarn | 1 -
sound/soc/atmel/sam9x5_wm8731.c | 2 +-
sound/soc/codecs/rt5514-spi.c | 2 +-
sound/soc/qcom/lpass-sc7280.c | 2 +-
sound/soc/samsung/aries_wm8994.c | 2 +-
69 files changed, 121 insertions(+), 169 deletions(-)
--
2.39.2
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Corey Minyard <minyard@acm.org>
Cc: Peter Huewe <peterhuewe@gmx.de>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Tero Kristo <kristo@kernel.org>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Ian Abbott <abbotti@mev.co.uk>
Cc: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: John Allen <john.allen@amd.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Vinod Koul <vkoul@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Bjorn Andersson <andersson@kernel.org>
Cc: Moritz Fischer <mdf@kernel.org>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: Andi Shyti <andi.shyti@kernel.org>
Cc: Michael Hennerich <michael.hennerich@analog.com>
Cc: Peter Rosin <peda@axentia.se>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Markuss Broks <markuss.broks@gmail.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Lee Jones <lee@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Cc: Iyappan Subramanian <iyappan@os.amperecomputing.com>
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Stanislaw Gruszka <stf_xl@wp.pl>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Xiang Chen <chenxiang66@hisilicon.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Neil Armstrong <neil.armstrong@linaro.org>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Vaibhav Hiremath <hvaibhav.linux@gmail.com>
Cc: Alex Elder <elder@kernel.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Jacky Huang <ychuang3@nuvoton.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Takashi Iwai <tiwai@suse.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-ide@vger.kernel.org
Cc: openipmi-developer@lists.sourceforge.net
Cc: linux-integrity@vger.kernel.org
Cc: linux-omap@vger.kernel.org
Cc: linux-clk@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Cc: dmaengine@vger.kernel.org
Cc: linux-efi@vger.kernel.org
Cc: linux-arm-msm@vger.kernel.org
Cc: linux-fpga@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-input@vger.kernel.org
Cc: linux-i2c@vger.kernel.org
Cc: linux-iio@vger.kernel.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Cc: linux-leds@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: linux-rtc@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: linux-spi@vger.kernel.org
Cc: linux-amlogic@lists.infradead.org
Cc: linux-rockchip@lists.infradead.org
Cc: linux-samsung-soc@vger.kernel.org
Cc: greybus-dev@lists.linaro.org
Cc: linux-staging@lists.linux.dev
Cc: linux-serial@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: linux-fbdev@vger.kernel.org
Cc: iommu@lists.linux.dev
Cc: linux-trace-kernel@vger.kernel.org
Cc: kasan-dev@googlegroups.com
Cc: linux-hardening@vger.kernel.org
Cc: linux-nfs@vger.kernel.org
Cc: linux-kbuild@vger.kernel.org
Cc: alsa-devel@alsa-project.org
Cc: linux-sound@vger.kernel.org
^ permalink raw reply
* kernel/sched/core.c:961:15: error: incompatible pointer to integer conversion passing 'typeof
From: Naresh Kamboju @ 2024-04-03 7:38 UTC (permalink / raw)
To: open list, Linux Regressions, lkft-triage, clang-built-linux,
Linux PM
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Nathan Chancellor, Nick Desaulniers
The riscv clang-17 defconfig build failed due to following warnings / errors
on the Linux next-20240402.
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
riscv:
build:
* clang-17-lkftconfig - Failed
* rv32-clang-17-defconfig - Failed
* clang-17-tinyconfig - Failed
* rv32-clang-17-tinyconfig - Failed
* clang-17-defconfig - Failed
* clang-17-allnoconfig - Failed
* rv32-clang-17-allnoconfig - Failed
Build log:
-------
kernel/sched/core.c:961:15: error: incompatible pointer to integer
conversion passing 'typeof (*((__ai_ptr)))' (aka 'struct wake_q_node
*') to parameter of type 'uintptr_t' (aka 'unsigned long')
[-Wint-conversion]
961 | if (unlikely(cmpxchg_relaxed(&node->next, NULL, WAKE_Q_TAIL)))
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Steps to reproduce:
---------
# tuxmake --runtime podman --target-arch riscv --toolchain clang-17
--kconfig defconfig LLVM=1 LLVM_IAS=1
Links:
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240402/testrun/23264917/suite/build/test/clang-17-defconfig/details/
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20240402/testrun/23264917/suite/build/test/clang-17-defconfig/log
--
Linaro LKFT
https://lkft.linaro.org
^ permalink raw reply
* Re: [PATCH 00/18] platform/chrome: provide ID table for avoiding fallback match
From: patchwork-bot+chrome-platform @ 2024-04-03 7:20 UTC (permalink / raw)
To: Tzung-Bi Shih
Cc: bleung, groeck, linus.walleij, brgl, hverkuil-cisco, mchehab, sre,
alexandre.belloni, chrome-platform, pmalani, linux-gpio,
linux-media, linux-pm, linux-rtc, krzk
In-Reply-To: <20240329075630.2069474-1-tzungbi@kernel.org>
Hello:
This series was applied to chrome-platform/linux.git (for-next)
by Tzung-Bi Shih <tzungbi@kernel.org>:
On Fri, 29 Mar 2024 15:56:12 +0800 you wrote:
> Inspired by [1]. Turn all MODULE_ALIAS() in ChromeOS EC platform drivers into
> proper platform_device_id table and MODULE_DEVICE_TABLE().
>
> The series is basically looking for drivers from:
> - `struct mfd_cell` in drivers/mfd/cros_ec_dev.c.
> - grep -R MODULE_ALIAS drivers/platform/chrome/.
>
> [...]
Here is the summary with links:
- [01/18] media: platform: cros-ec: provide ID table for avoiding fallback match
(no matching commit)
- [02/18] gpio: cros-ec: provide ID table for avoiding fallback match
(no matching commit)
- [03/18] rtc: cros-ec: provide ID table for avoiding fallback match
(no matching commit)
- [04/18] platform/chrome: cros_ec_sensorhub: provide ID table for avoiding fallback match
(no matching commit)
- [05/18] power: supply: cros_usbpd: provide ID table for avoiding fallback match
(no matching commit)
- [06/18] platform/chrome: cros_usbpd_logger: provide ID table for avoiding fallback match
(no matching commit)
- [07/18] platform/chrome: cros_usbpd_notify: provide ID table for avoiding fallback match
(no matching commit)
- [08/18] platform/chrome: cros_ec_chardev: provide ID table for avoiding fallback match
(no matching commit)
- [09/18] platform/chrome: cros_ec_debugfs: provide ID table for avoiding fallback match
(no matching commit)
- [10/18] platform/chrome: cros_ec_sysfs: provide ID table for avoiding fallback match
(no matching commit)
- [11/18] power: supply: cros_pchg: provide ID table for avoiding fallback match
(no matching commit)
- [12/18] platform/chrome: cros_ec_lightbar: provide ID table for avoiding fallback match
(no matching commit)
- [13/18] platform/chrome: cros_ec_vbc: provide ID table for avoiding fallback match
(no matching commit)
- [14/18] platform/chrome: cros_kbd_led_backlight: provide ID table for avoiding fallback match
https://git.kernel.org/chrome-platform/c/d91ca83599cd
- [15/18] platform/chrome: wilco_ec: telemetry: provide ID table for avoiding fallback match
(no matching commit)
- [16/18] platform/chrome: wilco_ec: debugfs: provide ID table for avoiding fallback match
(no matching commit)
- [17/18] platform/chrome: wilco_ec: event: remove redundant MODULE_ALIAS
(no matching commit)
- [18/18] platform/chrome/wilco_ec: core: provide ID table for avoiding fallback match
(no matching commit)
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH 00/18] platform/chrome: provide ID table for avoiding fallback match
From: patchwork-bot+chrome-platform @ 2024-04-03 7:20 UTC (permalink / raw)
To: Tzung-Bi Shih
Cc: bleung, groeck, linus.walleij, brgl, hverkuil-cisco, mchehab, sre,
alexandre.belloni, chrome-platform, pmalani, linux-gpio,
linux-media, linux-pm, linux-rtc, krzk
In-Reply-To: <20240329075630.2069474-1-tzungbi@kernel.org>
Hello:
This series was applied to chrome-platform/linux.git (for-kernelci)
by Tzung-Bi Shih <tzungbi@kernel.org>:
On Fri, 29 Mar 2024 15:56:12 +0800 you wrote:
> Inspired by [1]. Turn all MODULE_ALIAS() in ChromeOS EC platform drivers into
> proper platform_device_id table and MODULE_DEVICE_TABLE().
>
> The series is basically looking for drivers from:
> - `struct mfd_cell` in drivers/mfd/cros_ec_dev.c.
> - grep -R MODULE_ALIAS drivers/platform/chrome/.
>
> [...]
Here is the summary with links:
- [01/18] media: platform: cros-ec: provide ID table for avoiding fallback match
(no matching commit)
- [02/18] gpio: cros-ec: provide ID table for avoiding fallback match
(no matching commit)
- [03/18] rtc: cros-ec: provide ID table for avoiding fallback match
(no matching commit)
- [04/18] platform/chrome: cros_ec_sensorhub: provide ID table for avoiding fallback match
(no matching commit)
- [05/18] power: supply: cros_usbpd: provide ID table for avoiding fallback match
(no matching commit)
- [06/18] platform/chrome: cros_usbpd_logger: provide ID table for avoiding fallback match
(no matching commit)
- [07/18] platform/chrome: cros_usbpd_notify: provide ID table for avoiding fallback match
(no matching commit)
- [08/18] platform/chrome: cros_ec_chardev: provide ID table for avoiding fallback match
(no matching commit)
- [09/18] platform/chrome: cros_ec_debugfs: provide ID table for avoiding fallback match
(no matching commit)
- [10/18] platform/chrome: cros_ec_sysfs: provide ID table for avoiding fallback match
(no matching commit)
- [11/18] power: supply: cros_pchg: provide ID table for avoiding fallback match
(no matching commit)
- [12/18] platform/chrome: cros_ec_lightbar: provide ID table for avoiding fallback match
(no matching commit)
- [13/18] platform/chrome: cros_ec_vbc: provide ID table for avoiding fallback match
(no matching commit)
- [14/18] platform/chrome: cros_kbd_led_backlight: provide ID table for avoiding fallback match
https://git.kernel.org/chrome-platform/c/d91ca83599cd
- [15/18] platform/chrome: wilco_ec: telemetry: provide ID table for avoiding fallback match
(no matching commit)
- [16/18] platform/chrome: wilco_ec: debugfs: provide ID table for avoiding fallback match
(no matching commit)
- [17/18] platform/chrome: wilco_ec: event: remove redundant MODULE_ALIAS
(no matching commit)
- [18/18] platform/chrome/wilco_ec: core: provide ID table for avoiding fallback match
(no matching commit)
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply
* Re: [PATCH v6 1/6] dt-bindings: interconnect: Add Qualcomm IPQ9574 support
From: Krzysztof Kozlowski @ 2024-04-03 7:09 UTC (permalink / raw)
To: Varadarajan Narayanan, andersson, konrad.dybcio, mturquette,
sboyd, robh, krzysztof.kozlowski+dt, conor+dt, djakov,
dmitry.baryshkov, quic_anusha, linux-arm-msm, linux-clk,
devicetree, linux-kernel, linux-pm
In-Reply-To: <20240402103406.3638821-2-quic_varada@quicinc.com>
On 02/04/2024 12:34, Varadarajan Narayanan wrote:
> +#define ICC_NSSNOC_NSSCC 10
> +#define ICC_NSSNOC_SNOC_0 11
> +#define ICC_NSSNOC_SNOC_1 12
> +#define ICC_NSSNOC_PCNOC_1 13
> +#define ICC_NSSNOC_QOSGEN_REF 14
> +#define ICC_NSSNOC_TIMEOUT_REF 15
> +#define ICC_NSSNOC_XO_DCD 16
> +#define ICC_NSSNOC_ATB 17
> +#define ICC_MEM_NOC_NSSNOC 18
> +#define ICC_NSSNOC_MEMNOC 19
> +#define ICC_NSSNOC_MEM_NOC_1 20
> +
> +#define ICC_NSSNOC_PPE 0
> +#define ICC_NSSNOC_PPE_CFG 1
> +#define ICC_NSSNOC_NSS_CSR 2
> +#define ICC_NSSNOC_IMEM_QSB 3
> +#define ICC_NSSNOC_IMEM_AHB 4
> +
> +#define MASTER(x) ((ICC_ ## x) * 2)
> +#define SLAVE(x) (MASTER(x) + 1)
You already received comment to make your bindings consistent with other
Qualcomm bindings. Now you repeat the same mistake.
No, that is neither consistent nor greppble.
Best regards,
Krzysztof
^ permalink raw reply
* [rafael-pm:intel_pstate-testing 14/14] drivers/cpufreq/intel_pstate.c:3301:undefined reference to `arch_rebuild_sched_domains'
From: kernel test robot @ 2024-04-03 6:58 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: oe-kbuild-all, linux-acpi, devel, linux-pm
tree: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git intel_pstate-testing
head: e35f6a1b27a71e7c8cb1880197e01d93b593cc85
commit: e35f6a1b27a71e7c8cb1880197e01d93b593cc85 [14/14] cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems
config: x86_64-randconfig-014-20240403 (https://download.01.org/0day-ci/archive/20240403/202404031421.nMNiWZMY-lkp@intel.com/config)
compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240403/202404031421.nMNiWZMY-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404031421.nMNiWZMY-lkp@intel.com/
All errors (new ones prefixed by >>):
ld: drivers/cpufreq/intel_pstate.o: in function `intel_pstate_register_driver':
>> drivers/cpufreq/intel_pstate.c:3301:(.text+0x409c): undefined reference to `arch_rebuild_sched_domains'
vim +3301 drivers/cpufreq/intel_pstate.c
3267
3268 static int intel_pstate_register_driver(struct cpufreq_driver *driver)
3269 {
3270 int ret;
3271
3272 if (driver == &intel_pstate)
3273 intel_pstate_sysfs_expose_hwp_dynamic_boost();
3274
3275 memset(&global, 0, sizeof(global));
3276 global.max_perf_pct = 100;
3277 global.turbo_disabled = turbo_is_disabled();
3278 global.no_turbo = global.turbo_disabled;
3279
3280 arch_set_max_freq_ratio(global.turbo_disabled);
3281
3282 intel_pstate_driver = driver;
3283 ret = cpufreq_register_driver(intel_pstate_driver);
3284 if (ret) {
3285 intel_pstate_driver_cleanup();
3286 return ret;
3287 }
3288
3289 global.min_perf_pct = min_perf_pct_min();
3290
3291 /*
3292 * On hybrid systems, use asym capacity instead of ITMT, but because
3293 * the capacity of SMT threads is not deterministic even approximately,
3294 * do not do that when SMT is in use.
3295 */
3296 if (hwp_is_hybrid && !sched_smt_active()) {
3297 sched_clear_itmt_support();
3298
3299 hybrid_init_cpu_scaling();
3300
> 3301 arch_rebuild_sched_domains();
3302 }
3303
3304 return 0;
3305 }
3306
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
* Re: [regression] 6.8.1: fails to hibernate with pm_runtime_force_suspend+0x0/0x120 returns -16
From: Linux regression tracking (Thorsten Leemhuis) @ 2024-04-03 4:49 UTC (permalink / raw)
To: Martin Steigerwald, linux-pm, regressions, linux-kernel
In-Reply-To: <22240355.EfDdHjke4D@lichtvoll.de>
On 02.04.24 21:42, Martin Steigerwald wrote:
> Linux regression tracking (Thorsten Leemhuis) - 19.03.24, 09:40:06 CEST:
>> On 16.03.24 17:12, Martin Steigerwald wrote:
>>> Martin Steigerwald - 16.03.24, 17:02:44 CET:
>>>> ThinkPad T14 AMD Gen 1 fails to hibernate with self-compiled 6.8.1.
>>>> Hibernation works correctly with self-compiled 6.7.9.
>>>
>>> Apparently 6.8.1 does not even reboot correctly anymore. runit on
>>> Devuan. It says it is doing the system reboot but then nothing
>>> happens.
>>>
>>> As for hibernation the kernel cancels the attempt and returns back to
>>> user space desktop session.
>>>
>>>> Trying to use "no_console_suspend" to debug next. Will not do bisect
>>>> between major kernel releases on a production machine.
>>
>> FWIW, without a bisection I guess no developer will take a closer look
>> (but I might be wrong and you lucky here!), as any change in those
>> hundreds of drivers used on that machine can possibly lead to problems
>> like yours. So without a bisection we are likely stuck here, unless
>> someone else runs into the same problem and bisects or fixes it. Sorry,
>> but that's just how it is.
>
> I have been asked this repeatedly with previous bug reports. My issue
> with bisecting between major kernel versions is this:
>
> When I look around here I see no second ThinkPad T14 AMD Gen 1 here I
> could use for testing. Also doing a kernel bisect using a GRML live iso…
> not really.
>
> The one I reported this from is a production machine with a 4 TB NVMe
> SSD which contains a lot of data. I am not willing to risk data loss or
> (silent) file system corruption by bisecting between major kernel
> releases. Bisecting between major kernel releases in my understanding
> would require to test various releases between in this example 6.7 and
> 6.8 and even between 6.7 and 6.8-rc1. At least in my understand anything
> between 6.7 and 6.8-rc1 is not guaranteed to be even be somewhat stable.
It's hard to qualify and always a matter of personal viewpoint/opinion,
but I'd say: kernel from the merge window are pretty stable and
reliable. But sure, accidents that eat data happen and they happen
slightly more often during merge windows because the rate of change is
higher. But in the end they do not happen often, which is why Fedora
rawhide for example ships merge window kernels all the time.
> I
> am not usually installing an rc1 kernel on a production machine, but
> rather wait for at least rc2/3 nowadays. Its a balanced risk calculation.
> And rc2/3 or later appears to be a risk I am willing to take. But
> something between stable and rc1? Nope.
Well, that's up to you -- but the reality is also that developers are
not obliged to look into regressions report closely, unless someone
bisected it.
> It is not even that rare. 6.7 some rc failed with hibernation as well.
Maybe too few people (or too few of those that run the latest kernels)
use hibernate these days (I haven't for more than 15 years), which is
why it's not tested much.
> With exactly the same machine. I refused to do a bisect as well in that
> case. At some later time the issue was fixed without me doing anything
> more.
Maybe you were lucky, maybe someone else bisected and reported the problem.
> Now my question is this: Without me willing to bisect in that case, is
> a bug report even useful? Otherwise I may just switch this last machine
> to distribution kernels. It would save a lot of time for me. This private
> and freelancer production machine is the last left-over machine with self-
> compiled kernels.
>
> So far I still thought I would somehow be contributing to Linux kernel
> quality with detailed bug reports that take time to write, but apparently
> I am not. Can you clarify?
Not really, as it always depends on the situation. There are bugs (like
https://lore.kernel.org/all/08275279-7462-4f4a-a0ee-8aa015f829bc@leemhuis.info/
) where a report without a bisection is enough. But there are others
where it's unlikely that anyone will take a closer look; a lot of those
reg. suspend/hibernate fall into this category, as problems in that area
can be cause by any subsystem and its drivers -- which is why the power
management people can't look into most of those, as then they quickly
wouldn't get anything else done while spending time on bugs most of the
time other people caused.
Ciao, Thorsten
^ permalink raw reply
* [rafael-pm:intel_pstate-testing 14/14] intel_pstate.c:undefined reference to `arch_rebuild_sched_domains'
From: kernel test robot @ 2024-04-03 0:50 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: oe-kbuild-all, linux-acpi, devel, linux-pm
tree: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git intel_pstate-testing
head: e35f6a1b27a71e7c8cb1880197e01d93b593cc85
commit: e35f6a1b27a71e7c8cb1880197e01d93b593cc85 [14/14] cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems
config: i386-buildonly-randconfig-001-20240403 (https://download.01.org/0day-ci/archive/20240403/202404030858.0AJW6re4-lkp@intel.com/config)
compiler: gcc-13 (Ubuntu 13.2.0-4ubuntu3) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240403/202404030858.0AJW6re4-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202404030858.0AJW6re4-lkp@intel.com/
All errors (new ones prefixed by >>):
ld: drivers/cpufreq/intel_pstate.o: in function `intel_pstate_register_driver':
>> intel_pstate.c:(.text+0x232e): undefined reference to `arch_rebuild_sched_domains'
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply
* Re: [PATCH 3/3] arm64: cpuidle: Add arm_poll_idle
From: Ankur Arora @ 2024-04-02 23:17 UTC (permalink / raw)
To: Mark Rutland
Cc: Haris Okanovic, linux-kernel, linux-pm, linux-assembly, peterz,
Ali Saidi, Geoff Blake, Brian Silver
In-Reply-To: <Zgw--fHBH9kEQsi0@FVFF77S0Q05N>
Mark Rutland <mark.rutland@arm.com> writes:
> On Mon, Apr 01, 2024 at 08:47:06PM -0500, Haris Okanovic wrote:
>> An arm64 cpuidle driver with two states: (1) First polls for new runable
>> tasks up to 100 us (by default) before (2) a wfi idle and awoken by
>> interrupt (the current arm64 behavior). It allows CPUs to return from
>> idle more quickly by avoiding the longer interrupt wakeup path, which
>> may require EL1/EL2 transition in certain VM scenarios.
>
> Please start off with an explanation of the problem you're trying to solve
> (which IIUC is to wake up more quickly in certain cases), before describing the
> solution. That makes it *significantly* easier for people to review this, since
> once you have the problem statement in mind it's much easier to understand how
> the solution space follows from that.
>
>> Poll duration is optionally configured at load time via the poll_limit
>> module parameter.
>
> Why should this be a configurable parameter?
>
> (note, at this point you haven't introduced any of the data below, so the
> trade-off isn't clear to anyone).
>
>> The default 100 us duration was experimentally chosen, by measuring QPS
>> (queries per sec) of the MLPerf bert inference benchmark, which seems
>> particularly susceptible to this change; see procedure below. 100 us is
>> the inflection point where QPS stopped growing in a range of tested
>> values. All results are from AWS m7g.16xlarge instances (Graviton3 SoC)
>> with dedicated tenancy (dedicated hardware).
>>
>> | before | 10us | 25us | 50us | 100us | 125us | 150us | 200us | 300us |
>> | 5.87 | 5.91 | 5.96 | 6.01 | 6.06 | 6.07 | 6.06 | 6.06 | 6.06 |
>>
>> Perf's scheduler benchmarks also improve with a range of poll_limit
>> values >= 10 us. Higher limits produce near identical results within a
>> 3% noise margin. The following tables are `perf bench sched` results,
>> run times in seconds.
>>
>> `perf bench sched messaging -l 80000`
>> | AWS instance | SoC | Before | After | % Change |
>> | c6g.16xl (VM) | Graviton2 | 18.974 | 18.400 | none |
>> | c7g.16xl (VM) | Graviton3 | 13.852 | 13.859 | none |
>> | c6g.metal | Graviton2 | 17.621 | 16.744 | none |
>> | c7g.metal | Graviton3 | 13.430 | 13.404 | none |
>>
>> `perf bench sched pipe -l 2500000`
>> | AWS instance | SoC | Before | After | % Change |
>> | c6g.16xl (VM) | Graviton2 | 30.158 | 15.181 | -50% |
>> | c7g.16xl (VM) | Graviton3 | 18.289 | 12.067 | -34% |
>> | c6g.metal | Graviton2 | 17.609 | 15.170 | -14% |
>> | c7g.metal | Graviton3 | 14.103 | 12.304 | -13% |
>>
>> `perf bench sched seccomp-notify -l 2500000`
>> | AWS instance | SoC | Before | After | % Change |
>> | c6g.16xl (VM) | Graviton2 | 28.784 | 13.754 | -52% |
>> | c7g.16xl (VM) | Graviton3 | 16.964 | 11.430 | -33% |
>> | c6g.metal | Graviton2 | 15.717 | 13.536 | -14% |
>> | c7g.metal | Graviton3 | 13.301 | 11.491 | -14% |
>
> Ok, so perf numbers for a busy workload go up.
>
> What happens for idle state residency on a mostly idle system?
>
>> Steps to run MLPerf bert inference on Ubuntu 22.04:
>> sudo apt install build-essential python3 python3-pip
>> pip install "pybind11[global]" tensorflow transformers
>> export TF_ENABLE_ONEDNN_OPTS=1
>> export DNNL_DEFAULT_FPMATH_MODE=BF16
>> git clone https://github.com/mlcommons/inference.git --recursive
>> cd inference
>> git checkout v2.0
>> cd loadgen
>> CFLAGS="-std=c++14" python3 setup.py bdist_wheel
>> pip install dist/*.whl
>> cd ../language/bert
>> make setup
>> python3 run.py --backend=tf --scenario=SingleStream
>>
>> Suggested-by: Ali Saidi <alisaidi@amazon.com>
>> Reviewed-by: Ali Saidi <alisaidi@amazon.com>
>> Reviewed-by: Geoff Blake <blakgeof@amazon.com>
>> Cc: Brian Silver <silverbr@amazon.com>
>> Signed-off-by: Haris Okanovic <harisokn@amazon.com>
>> ---
>> drivers/cpuidle/Kconfig.arm | 13 ++
>> drivers/cpuidle/Makefile | 1 +
>> drivers/cpuidle/cpuidle-arm-polling.c | 171 ++++++++++++++++++++++++++
>> 3 files changed, 185 insertions(+)
>> create mode 100644 drivers/cpuidle/cpuidle-arm-polling.c
>>
>> diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm
>> index a1ee475d180d..484666dda38d 100644
>> --- a/drivers/cpuidle/Kconfig.arm
>> +++ b/drivers/cpuidle/Kconfig.arm
>> @@ -14,6 +14,19 @@ config ARM_CPUIDLE
>> initialized by calling the CPU operations init idle hook
>> provided by architecture code.
>>
>> +config ARM_POLL_CPUIDLE
>> + bool "ARM64 CPU idle Driver with polling"
>> + depends on ARM64
>> + depends on ARM_ARCH_TIMER_EVTSTREAM
>> + select CPU_IDLE_MULTIPLE_DRIVERS
>> + help
>> + Select this to enable a polling cpuidle driver for ARM64:
>> + The first state polls TIF_NEED_RESCHED for best latency on short
>> + sleep intervals. The second state falls back to arch_cpu_idle() to
>> + wait for interrupt. This is can be helpful in workloads that
>> + frequently block/wake at short intervals or VMs where wakeup IPIs
>> + are more expensive.
>
> Why is this a separate driver rather than an optional feature in the existing
> driver?
>
> The fact that this duplicates a bunch of code indicates to me that this should
> not be a separate driver.
Also, the cpuidle-haltpoll driver is meant to do something quite similar.
That driver polls adaptively based on the haltpoll governor's tuning of
the polling period.
However, cpuidle-haltpoll is currently x86 only. Mihai (also from Oracle)
posted patches [1] adding support for ARM64.
Haris, could you take a look at it and see if it does what you are
looking for? The polling path in the linked version also uses
smp_cond_load_relaxed() so even the mechanisms for both of these
are fairly similar.
(I'll be sending out the next version shortly. Happy to Cc you if you
would like to try that out.)
Thanks
Ankur
[1] https://lore.kernel.org/lkml/1707982910-27680-1-git-send-email-mihai.carabas@oracle.com/
>
>> +
>> config ARM_PSCI_CPUIDLE
>> bool "PSCI CPU idle Driver"
>> depends on ARM_PSCI_FW
>> diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
>> index d103342b7cfc..23c21422792d 100644
>> --- a/drivers/cpuidle/Makefile
>> +++ b/drivers/cpuidle/Makefile
>> @@ -22,6 +22,7 @@ obj-$(CONFIG_ARM_U8500_CPUIDLE) += cpuidle-ux500.o
>> obj-$(CONFIG_ARM_AT91_CPUIDLE) += cpuidle-at91.o
>> obj-$(CONFIG_ARM_EXYNOS_CPUIDLE) += cpuidle-exynos.o
>> obj-$(CONFIG_ARM_CPUIDLE) += cpuidle-arm.o
>> +obj-$(CONFIG_ARM_POLL_CPUIDLE) += cpuidle-arm-polling.o
>> obj-$(CONFIG_ARM_PSCI_CPUIDLE) += cpuidle-psci.o
>> obj-$(CONFIG_ARM_PSCI_CPUIDLE_DOMAIN) += cpuidle-psci-domain.o
>> obj-$(CONFIG_ARM_TEGRA_CPUIDLE) += cpuidle-tegra.o
>> diff --git a/drivers/cpuidle/cpuidle-arm-polling.c b/drivers/cpuidle/cpuidle-arm-polling.c
>> new file mode 100644
>> index 000000000000..bca128568114
>> --- /dev/null
>> +++ b/drivers/cpuidle/cpuidle-arm-polling.c
>> @@ -0,0 +1,171 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * ARM64 CPU idle driver using wfe polling
>> + *
>> + * Copyright 2024 Amazon.com, Inc. or its affiliates. All rights reserved.
>> + *
>> + * Authors:
>> + * Haris Okanovic <harisokn@amazon.com>
>> + * Brian Silver <silverbr@amazon.com>
>> + *
>> + * Based on cpuidle-arm.c
>> + * Copyright (C) 2014 ARM Ltd.
>> + * Author: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>> + */
>> +
>> +#include <linux/cpu.h>
>> +#include <linux/cpu_cooling.h>
>> +#include <linux/cpuidle.h>
>> +#include <linux/sched/clock.h>
>> +
>> +#include <asm/cpuidle.h>
>> +#include <asm/readex.h>
>> +
>> +#include "dt_idle_states.h"
>> +
>> +/* Max duration of the wfe() poll loop in us, before transitioning to
>> + * arch_cpu_idle()/wfi() sleep.
>> + */
>
> /*
> * Comments should have the leading '/*' on a separate line.
> * See https://www.kernel.org/doc/html/v6.8/process/coding-style.html#commenting
> */
>
>> +#define DEFAULT_POLL_LIMIT_US 100
>> +static unsigned int poll_limit __read_mostly = DEFAULT_POLL_LIMIT_US;
>> +
>> +/*
>> + * arm_idle_wfe_poll - Polls state in wfe loop until reschedule is
>> + * needed or timeout
>> + */
>> +static int __cpuidle arm_idle_wfe_poll(struct cpuidle_device *dev,
>> + struct cpuidle_driver *drv, int idx)
>> +{
>> + u64 time_start, time_limit;
>> +
>> + time_start = local_clock();
>> + dev->poll_time_limit = false;
>> +
>> + local_irq_enable();
>
> Why enable IRQs here? We don't do that in the regular cpuidle-arm driver, nor
> the cpuidle-psci driver, and there's no explanation here or in the commit message.
>
> How does this interact with RCU? Is that still watching or are we in an
> extended quiescent state? For PSCI idle states we enter an EQS, and it seems
> like we probably should here...
>
>> +
>> + if (current_set_polling_and_test())
>> + goto end;
>> +
>> + time_limit = cpuidle_poll_time(drv, dev);
>> +
>> + do {
>> + // exclusive read arms the monitor for wfe
>> + if (__READ_ONCE_EX(current_thread_info()->flags) & _TIF_NEED_RESCHED)
>> + goto end;
>> +
>> + // may exit prematurely, see ARM_ARCH_TIMER_EVTSTREAM
>> + wfe();
>> + } while (local_clock() - time_start < time_limit);
>
> .. and if the EVTSTREAM is disabled, we'll sit in WFE forever rather than
> entering a deeper idle state, which doesn't seem desirable.
>
> It's worth noting that now that we have WFET, we'll probably want to disable
> the EVTSTREAM by default at some point, at least in some configurations, since
> that'll be able to sit in a WFE state for longer while also reliably waking up
> when required.
>
> I suspect we want something like an smp_load_acquire_timeout() here to do the
> wait in arch code (allowing us to use WFET), and enabling this state will
> depend on either having WFET or EVTSTREAM.
>
>> +
>> + dev->poll_time_limit = true;
>> +
>> +end:
>> + current_clr_polling();
>> + return idx;
>> +}
>> +
>> +/*
>> + * arm_idle_wfi - Places cpu in lower power state until interrupt,
>> + * a fallback to polling
>> + */
>> +static int __cpuidle arm_idle_wfi(struct cpuidle_device *dev,
>> + struct cpuidle_driver *drv, int idx)
>> +{
>> + if (current_clr_polling_and_test()) {
>> + local_irq_enable();
>> + return idx;
>> + }
>
> Same as above, why enable IRQs here?
>
>> + arch_cpu_idle();
>> + return idx;
>
> .. and if we need to enable IRQs in the other cases above, why do we *not*
> need to enable them here?
>
>> +}
>> +
>> +static struct cpuidle_driver arm_poll_idle_driver __initdata = {
>> + .name = "arm_poll_idle",
>> + .owner = THIS_MODULE,
>> + .states = {
>> + {
>> + .enter = arm_idle_wfe_poll,
>> + .exit_latency = 0,
>> + .target_residency = 0,
>> + .exit_latency_ns = 0,
>> + .power_usage = UINT_MAX,
>> + .flags = CPUIDLE_FLAG_POLLING,
>> + .name = "WFE",
>> + .desc = "ARM WFE",
>> + },
>> + {
>> + .enter = arm_idle_wfi,
>> + .exit_latency = DEFAULT_POLL_LIMIT_US,
>> + .target_residency = DEFAULT_POLL_LIMIT_US,
>> + .power_usage = UINT_MAX,
>> + .name = "WFI",
>> + .desc = "ARM WFI",
>> + },
>> + },
>> + .state_count = 2,
>> +};
>
> How does this interact with the existing driver?
>
> How does DEFAULT_POLL_LIMIT_US compare with PSCI idle states?
>
>> +
>> +/*
>> + * arm_poll_init_cpu - Initializes arm cpuidle polling driver for one cpu
>> + */
>> +static int __init arm_poll_init_cpu(int cpu)
>> +{
>> + int ret;
>> + struct cpuidle_driver *drv;
>> +
>> + drv = kmemdup(&arm_poll_idle_driver, sizeof(*drv), GFP_KERNEL);
>> + if (!drv)
>> + return -ENOMEM;
>> +
>> + drv->cpumask = (struct cpumask *)cpumask_of(cpu);
>> + drv->states[1].exit_latency = poll_limit;
>> + drv->states[1].target_residency = poll_limit;
>> +
>> + ret = cpuidle_register(drv, NULL);
>> + if (ret) {
>> + pr_err("failed to register driver: %d, cpu %d\n", ret, cpu);
>> + goto out_kfree_drv;
>> + }
>> +
>> + pr_info("registered driver cpu %d\n", cpu);
>
> This does not need to be printed for each CPU.
>
> Mark.
>
>> +
>> + cpuidle_cooling_register(drv);
>> +
>> + return 0;
>> +
>> +out_kfree_drv:
>> + kfree(drv);
>> + return ret;
>> +}
>> +
>> +/*
>> + * arm_poll_init - Initializes arm cpuidle polling driver
>> + */
>> +static int __init arm_poll_init(void)
>> +{
>> + int cpu, ret;
>> + struct cpuidle_driver *drv;
>> + struct cpuidle_device *dev;
>> +
>> + for_each_possible_cpu(cpu) {
>> + ret = arm_poll_init_cpu(cpu);
>> + if (ret)
>> + goto out_fail;
>> + }
>> +
>> + return 0;
>> +
>> +out_fail:
>> + pr_info("de-register all");
>> + while (--cpu >= 0) {
>> + dev = per_cpu(cpuidle_devices, cpu);
>> + drv = cpuidle_get_cpu_driver(dev);
>> + cpuidle_unregister(drv);
>> + kfree(drv);
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +module_param(poll_limit, uint, 0444);
>> +device_initcall(arm_poll_init);
>> --
>> 2.34.1
>>
>>
--
ankur
^ permalink raw reply
* Re: [PATCH v3 0/6] thermal: More separation between the core and drivers
From: Rafael J. Wysocki @ 2024-04-02 19:42 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Srinivas Pandruvada, Daniel Lezcano, Lukasz Luba,
AngeloGioacchino Del Regno, Rafael J. Wysocki
In-Reply-To: <4558251.LvFx2qVVIh@kreacher>
On Tue, Apr 2, 2024 at 9:04 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> Hi Everyone,
>
> This is an update of
>
> https://lore.kernel.org/linux-pm/4558384.LvFx2qVVIh@kreacher/
>
> and
>
> https://lore.kernel.org/linux-pm/2331888.ElGaqSPkdT@kreacher/
>
> which rebases the first patch on top of 6.9-rc2, adds 3 patches and adjusts
> the third patch from v2.
>
> The original description of the first two patches still applies:
>
> > Patch [1/2] is based on the observation that the threshold field in struct
> > thermal_trip really should be core-internal and to make that happen it
> > introduces a wrapper structure around struct thermal_trip for internal
> > use in the core.
> >
> > Patch [2/2] moves the definition of the new structure and the struct
> > thermal_zone_device one to a local header file in the core to enforce
> > more separation between the core and drivers.
> >
> > The patches are not expected to introduce any observable differences in
> > behavior, so please let me know if you see any of that.
>
> Note that these patches were first sent before the merge window and have not
> really changed since then (except for a minor rebase of the first patch in
> this series). Moreover, no comments regarding the merit of these patches
> have been made shared, so if this continues, I will be considering them as
> good to go by the end of this week.
>
> Patch [3/6] is a rewrite of comments regarding trip crossing and threshold
> computations.
>
> Patch [4/6] updates the trip crossing detection code to consolidate the
> threshold initialization with trip crossing on the way up.
>
> Patch [5/6] ([3/3] in v2) adds a mechanism to sort notifications and debug
> calls taking place during one invocation of __thermal_zone_device_update() so
> they always go in temperature order.
>
> Patch [6/6] relocates the critical and trip point handling to avoid a
> redundant temperature check.
>
> The series applies on top of 6.9-rc2 and I'm planning to create a test
> branch containing it.
As promised:
https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=thermal-core-testing
^ permalink raw reply
* Re: [regression] 6.8.1: fails to hibernate with pm_runtime_force_suspend+0x0/0x120 returns -16
From: Martin Steigerwald @ 2024-04-02 19:42 UTC (permalink / raw)
To: linux-pm, regressions, linux-kernel,
Linux regressions mailing list
In-Reply-To: <be5a7213-78b3-4917-b95b-ec31cd2350e4@leemhuis.info>
Hi Thorsten, hi,
Linux regression tracking (Thorsten Leemhuis) - 19.03.24, 09:40:06 CEST:
> On 16.03.24 17:12, Martin Steigerwald wrote:
> > Martin Steigerwald - 16.03.24, 17:02:44 CET:
> >> ThinkPad T14 AMD Gen 1 fails to hibernate with self-compiled 6.8.1.
> >> Hibernation works correctly with self-compiled 6.7.9.
> >
> > Apparently 6.8.1 does not even reboot correctly anymore. runit on
> > Devuan. It says it is doing the system reboot but then nothing
> > happens.
> >
> > As for hibernation the kernel cancels the attempt and returns back to
> > user space desktop session.
> >
> >> Trying to use "no_console_suspend" to debug next. Will not do bisect
> >> between major kernel releases on a production machine.
>
> FWIW, without a bisection I guess no developer will take a closer look
> (but I might be wrong and you lucky here!), as any change in those
> hundreds of drivers used on that machine can possibly lead to problems
> like yours. So without a bisection we are likely stuck here, unless
> someone else runs into the same problem and bisects or fixes it. Sorry,
> but that's just how it is.
I have been asked this repeatedly with previous bug reports. My issue
with bisecting between major kernel versions is this:
When I look around here I see no second ThinkPad T14 AMD Gen 1 here I
could use for testing. Also doing a kernel bisect using a GRML live iso…
not really.
The one I reported this from is a production machine with a 4 TB NVMe
SSD which contains a lot of data. I am not willing to risk data loss or
(silent) file system corruption by bisecting between major kernel
releases. Bisecting between major kernel releases in my understanding
would require to test various releases between in this example 6.7 and
6.8 and even between 6.7 and 6.8-rc1. At least in my understand anything
between 6.7 and 6.8-rc1 is not guaranteed to be even be somewhat stable. I
am not usually installing an rc1 kernel on a production machine, but
rather wait for at least rc2/3 nowadays. Its a balanced risk calculation.
And rc2/3 or later appears to be a risk I am willing to take. But
something between stable and rc1? Nope.
It is not even that rare. 6.7 some rc failed with hibernation as well.
With exactly the same machine. I refused to do a bisect as well in that
case. At some later time the issue was fixed without me doing anything
more.
Now my question is this: Without me willing to bisect in that case, is
a bug report even useful? Otherwise I may just switch this last machine
to distribution kernels. It would save a lot of time for me. This private
and freelancer production machine is the last left-over machine with self-
compiled kernels.
So far I still thought I would somehow be contributing to Linux kernel
quality with detailed bug reports that take time to write, but apparently
I am not. Can you clarify?
Ciao,
--
Martin
^ permalink raw reply
* [PATCH v3 2/6] thermal: core: Make struct thermal_zone_device definition internal
From: Rafael J. Wysocki @ 2024-04-02 18:57 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Srinivas Pandruvada, Daniel Lezcano, Lukasz Luba,
AngeloGioacchino Del Regno
In-Reply-To: <4558251.LvFx2qVVIh@kreacher>
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Move the definitions of struct thermal_trip_desc and struct
thermal_zone_device to an internal header file in the thermal core,
as they don't need to be accessible to any code other than the thermal
core and so they don't need to be present in a global header.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
v2 -> v3: Minor changelog update
v1 -> v2: No changes
---
drivers/thermal/thermal_core.h | 85 +++++++++++++++++++++++++++++++++++++++
drivers/thermal/thermal_trace.h | 2
include/linux/thermal.h | 87 ----------------------------------------
3 files changed, 89 insertions(+), 85 deletions(-)
Index: linux-pm/drivers/thermal/thermal_core.h
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.h
+++ linux-pm/drivers/thermal/thermal_core.h
@@ -15,6 +15,91 @@
#include "thermal_netlink.h"
#include "thermal_debugfs.h"
+struct thermal_trip_desc {
+ struct thermal_trip trip;
+ int threshold;
+};
+
+/**
+ * struct thermal_zone_device - structure for a thermal zone
+ * @id: unique id number for each thermal zone
+ * @type: the thermal zone device type
+ * @device: &struct device for this thermal zone
+ * @removal: removal completion
+ * @trip_temp_attrs: attributes for trip points for sysfs: trip temperature
+ * @trip_type_attrs: attributes for trip points for sysfs: trip type
+ * @trip_hyst_attrs: attributes for trip points for sysfs: trip hysteresis
+ * @mode: current mode of this thermal zone
+ * @devdata: private pointer for device private data
+ * @num_trips: number of trip points the thermal zone supports
+ * @passive_delay_jiffies: number of jiffies to wait between polls when
+ * performing passive cooling.
+ * @polling_delay_jiffies: number of jiffies to wait between polls when
+ * checking whether trip points have been crossed (0 for
+ * interrupt driven systems)
+ * @temperature: current temperature. This is only for core code,
+ * drivers should use thermal_zone_get_temp() to get the
+ * current temperature
+ * @last_temperature: previous temperature read
+ * @emul_temperature: emulated temperature when using CONFIG_THERMAL_EMULATION
+ * @passive: 1 if you've crossed a passive trip point, 0 otherwise.
+ * @prev_low_trip: the low current temperature if you've crossed a passive
+ trip point.
+ * @prev_high_trip: the above current temperature if you've crossed a
+ passive trip point.
+ * @need_update: if equals 1, thermal_zone_device_update needs to be invoked.
+ * @ops: operations this &thermal_zone_device supports
+ * @tzp: thermal zone parameters
+ * @governor: pointer to the governor for this thermal zone
+ * @governor_data: private pointer for governor data
+ * @thermal_instances: list of &struct thermal_instance of this thermal zone
+ * @ida: &struct ida to generate unique id for this zone's cooling
+ * devices
+ * @lock: lock to protect thermal_instances list
+ * @node: node in thermal_tz_list (in thermal_core.c)
+ * @poll_queue: delayed work for polling
+ * @notify_event: Last notification event
+ * @suspended: thermal zone suspend indicator
+ * @trips: array of struct thermal_trip objects
+ */
+struct thermal_zone_device {
+ int id;
+ char type[THERMAL_NAME_LENGTH];
+ struct device device;
+ struct completion removal;
+ struct attribute_group trips_attribute_group;
+ struct thermal_attr *trip_temp_attrs;
+ struct thermal_attr *trip_type_attrs;
+ struct thermal_attr *trip_hyst_attrs;
+ enum thermal_device_mode mode;
+ void *devdata;
+ int num_trips;
+ unsigned long passive_delay_jiffies;
+ unsigned long polling_delay_jiffies;
+ int temperature;
+ int last_temperature;
+ int emul_temperature;
+ int passive;
+ int prev_low_trip;
+ int prev_high_trip;
+ atomic_t need_update;
+ struct thermal_zone_device_ops ops;
+ struct thermal_zone_params *tzp;
+ struct thermal_governor *governor;
+ void *governor_data;
+ struct list_head thermal_instances;
+ struct ida ida;
+ struct mutex lock;
+ struct list_head node;
+ struct delayed_work poll_queue;
+ enum thermal_notify_event notify_event;
+ bool suspended;
+#ifdef CONFIG_THERMAL_DEBUGFS
+ struct thermal_debugfs *debugfs;
+#endif
+ struct thermal_trip_desc trips[] __counted_by(num_trips);
+};
+
/* Default Thermal Governor */
#if defined(CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE)
#define DEFAULT_THERMAL_GOVERNOR "step_wise"
Index: linux-pm/include/linux/thermal.h
===================================================================
--- linux-pm.orig/include/linux/thermal.h
+++ linux-pm/include/linux/thermal.h
@@ -73,17 +73,14 @@ struct thermal_trip {
void *priv;
};
-struct thermal_trip_desc {
- struct thermal_trip trip;
- int threshold;
-};
-
#define THERMAL_TRIP_FLAG_RW_TEMP BIT(0)
#define THERMAL_TRIP_FLAG_RW_HYST BIT(1)
#define THERMAL_TRIP_FLAG_RW (THERMAL_TRIP_FLAG_RW_TEMP | \
THERMAL_TRIP_FLAG_RW_HYST)
+struct thermal_zone_device;
+
struct thermal_zone_device_ops {
int (*bind) (struct thermal_zone_device *,
struct thermal_cooling_device *);
@@ -130,86 +127,6 @@ struct thermal_cooling_device {
};
/**
- * struct thermal_zone_device - structure for a thermal zone
- * @id: unique id number for each thermal zone
- * @type: the thermal zone device type
- * @device: &struct device for this thermal zone
- * @removal: removal completion
- * @trip_temp_attrs: attributes for trip points for sysfs: trip temperature
- * @trip_type_attrs: attributes for trip points for sysfs: trip type
- * @trip_hyst_attrs: attributes for trip points for sysfs: trip hysteresis
- * @mode: current mode of this thermal zone
- * @devdata: private pointer for device private data
- * @num_trips: number of trip points the thermal zone supports
- * @passive_delay_jiffies: number of jiffies to wait between polls when
- * performing passive cooling.
- * @polling_delay_jiffies: number of jiffies to wait between polls when
- * checking whether trip points have been crossed (0 for
- * interrupt driven systems)
- * @temperature: current temperature. This is only for core code,
- * drivers should use thermal_zone_get_temp() to get the
- * current temperature
- * @last_temperature: previous temperature read
- * @emul_temperature: emulated temperature when using CONFIG_THERMAL_EMULATION
- * @passive: 1 if you've crossed a passive trip point, 0 otherwise.
- * @prev_low_trip: the low current temperature if you've crossed a passive
- trip point.
- * @prev_high_trip: the above current temperature if you've crossed a
- passive trip point.
- * @need_update: if equals 1, thermal_zone_device_update needs to be invoked.
- * @ops: operations this &thermal_zone_device supports
- * @tzp: thermal zone parameters
- * @governor: pointer to the governor for this thermal zone
- * @governor_data: private pointer for governor data
- * @thermal_instances: list of &struct thermal_instance of this thermal zone
- * @ida: &struct ida to generate unique id for this zone's cooling
- * devices
- * @lock: lock to protect thermal_instances list
- * @node: node in thermal_tz_list (in thermal_core.c)
- * @poll_queue: delayed work for polling
- * @notify_event: Last notification event
- * @suspended: thermal zone suspend indicator
- * @trips: array of struct thermal_trip objects
- */
-struct thermal_zone_device {
- int id;
- char type[THERMAL_NAME_LENGTH];
- struct device device;
- struct completion removal;
- struct attribute_group trips_attribute_group;
- struct thermal_attr *trip_temp_attrs;
- struct thermal_attr *trip_type_attrs;
- struct thermal_attr *trip_hyst_attrs;
- enum thermal_device_mode mode;
- void *devdata;
- int num_trips;
- unsigned long passive_delay_jiffies;
- unsigned long polling_delay_jiffies;
- int temperature;
- int last_temperature;
- int emul_temperature;
- int passive;
- int prev_low_trip;
- int prev_high_trip;
- atomic_t need_update;
- struct thermal_zone_device_ops ops;
- struct thermal_zone_params *tzp;
- struct thermal_governor *governor;
- void *governor_data;
- struct list_head thermal_instances;
- struct ida ida;
- struct mutex lock;
- struct list_head node;
- struct delayed_work poll_queue;
- enum thermal_notify_event notify_event;
- bool suspended;
-#ifdef CONFIG_THERMAL_DEBUGFS
- struct thermal_debugfs *debugfs;
-#endif
- struct thermal_trip_desc trips[] __counted_by(num_trips);
-};
-
-/**
* struct thermal_governor - structure that holds thermal governor information
* @name: name of the governor
* @bind_to_tz: callback called when binding to a thermal zone. If it
Index: linux-pm/drivers/thermal/thermal_trace.h
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_trace.h
+++ linux-pm/drivers/thermal/thermal_trace.h
@@ -9,6 +9,8 @@
#include <linux/thermal.h>
#include <linux/tracepoint.h>
+#include "thermal_core.h"
+
TRACE_DEFINE_ENUM(THERMAL_TRIP_CRITICAL);
TRACE_DEFINE_ENUM(THERMAL_TRIP_HOT);
TRACE_DEFINE_ENUM(THERMAL_TRIP_PASSIVE);
^ permalink raw reply
* [PATCH v3 3/6] thermal: core: Rewrite comments in handle_thermal_trip()
From: Rafael J. Wysocki @ 2024-04-02 18:59 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Srinivas Pandruvada, Daniel Lezcano, Lukasz Luba,
AngeloGioacchino Del Regno
In-Reply-To: <4558251.LvFx2qVVIh@kreacher>
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Make the comments regarding trip crossing and threshold updates in
handle_thermal_trip() slightly more clear.
No functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
v2 -> v3: New patch
---
drivers/thermal/thermal_core.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)
Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -368,6 +368,13 @@ static void handle_thermal_trip(struct t
if (trip->temperature == THERMAL_TEMP_INVALID)
return;
+ /*
+ * If the trip temperature or hysteresis has been updated recently,
+ * the threshold needs to be computed again using the new values.
+ * However, its initial value still reflects the old ones and that
+ * is what needs to be compared with the previous zone temperature
+ * to decide which action to take.
+ */
if (tz->last_temperature == THERMAL_TEMP_INVALID) {
/* Initialization. */
td->threshold = trip->temperature;
@@ -375,11 +382,9 @@ static void handle_thermal_trip(struct t
td->threshold -= trip->hysteresis;
} else if (tz->last_temperature < td->threshold) {
/*
- * The trip threshold is equal to the trip temperature, unless
- * the latter has changed in the meantime. In either case,
- * the trip is crossed if the current zone temperature is at
- * least equal to its temperature, but otherwise ensure that
- * the threshold and the trip temperature will be equal.
+ * There is no mitigation under way, so it needs to be started
+ * if the zone temperature exceeds the trip one. The new
+ * threshold is then set to the low temperature of the trip.
*/
if (tz->temperature >= trip->temperature) {
thermal_notify_tz_trip_up(tz, trip);
@@ -390,14 +395,9 @@ static void handle_thermal_trip(struct t
}
} else {
/*
- * The previous zone temperature was above or equal to the trip
- * threshold, which would be equal to the "low temperature" of
- * the trip (its temperature minus its hysteresis), unless the
- * trip temperature or hysteresis had changed. In either case,
- * the trip is crossed if the current zone temperature is below
- * the low temperature of the trip, but otherwise ensure that
- * the trip threshold will be equal to the low temperature of
- * the trip.
+ * Mitigation is under way, so it needs to stop if the zone
+ * temperature falls below the low temperature of the trip.
+ * In that case, the trip temperature becomes the new threshold.
*/
if (tz->temperature < trip->temperature - trip->hysteresis) {
thermal_notify_tz_trip_down(tz, trip);
^ permalink raw reply
* [PATCH v3 4/6] thermal: core: Send trip crossing notifications at init time if needed
From: Rafael J. Wysocki @ 2024-04-02 19:02 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Srinivas Pandruvada, Daniel Lezcano, Lukasz Luba,
AngeloGioacchino Del Regno
In-Reply-To: <4558251.LvFx2qVVIh@kreacher>
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If a trip point is already exceeded by the zone temperature at the
initialization time, no trip crossing notification is send regarding
this even though mitigation should be started then.
Address this by rearranging the code in handle_thermal_trip() to
send a trip crossing notification for trip points already exceeded
by the zone temperature initially which also allows to reduce its
size by using the observation that the initialization and regular
trip crossing on the way up become the same case then.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
v2 -> v3: New patch
---
drivers/thermal/thermal_core.c | 37 ++++++++++++++++---------------------
1 file changed, 16 insertions(+), 21 deletions(-)
Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -364,6 +364,7 @@ static void handle_thermal_trip(struct t
struct thermal_trip_desc *td)
{
const struct thermal_trip *trip = &td->trip;
+ int old_threshold;
if (trip->temperature == THERMAL_TEMP_INVALID)
return;
@@ -375,25 +376,11 @@ static void handle_thermal_trip(struct t
* is what needs to be compared with the previous zone temperature
* to decide which action to take.
*/
- if (tz->last_temperature == THERMAL_TEMP_INVALID) {
- /* Initialization. */
- td->threshold = trip->temperature;
- if (tz->temperature >= td->threshold)
- td->threshold -= trip->hysteresis;
- } else if (tz->last_temperature < td->threshold) {
- /*
- * There is no mitigation under way, so it needs to be started
- * if the zone temperature exceeds the trip one. The new
- * threshold is then set to the low temperature of the trip.
- */
- if (tz->temperature >= trip->temperature) {
- thermal_notify_tz_trip_up(tz, trip);
- thermal_debug_tz_trip_up(tz, trip);
- td->threshold = trip->temperature - trip->hysteresis;
- } else {
- td->threshold = trip->temperature;
- }
- } else {
+ old_threshold = td->threshold;
+ td->threshold = trip->temperature;
+
+ if (tz->last_temperature >= old_threshold &&
+ tz->last_temperature != THERMAL_TEMP_INVALID) {
/*
* Mitigation is under way, so it needs to stop if the zone
* temperature falls below the low temperature of the trip.
@@ -402,10 +389,18 @@ static void handle_thermal_trip(struct t
if (tz->temperature < trip->temperature - trip->hysteresis) {
thermal_notify_tz_trip_down(tz, trip);
thermal_debug_tz_trip_down(tz, trip);
- td->threshold = trip->temperature;
} else {
- td->threshold = trip->temperature - trip->hysteresis;
+ td->threshold -= trip->hysteresis;
}
+ } else if (tz->temperature >= trip->temperature) {
+ /*
+ * There is no mitigation under way, so it needs to be started
+ * if the zone temperature exceeds the trip one. The new
+ * threshold is then set to the low temperature of the trip.
+ */
+ thermal_notify_tz_trip_up(tz, trip);
+ thermal_debug_tz_trip_up(tz, trip);
+ td->threshold -= trip->hysteresis;
}
if (trip->type == THERMAL_TRIP_CRITICAL || trip->type == THERMAL_TRIP_HOT)
^ permalink raw reply
* [PATCH v3 5/6] thermal: core: Sort trip point crossing notifications by temperature
From: Rafael J. Wysocki @ 2024-04-02 19:03 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Srinivas Pandruvada, Daniel Lezcano, Lukasz Luba,
AngeloGioacchino Del Regno
In-Reply-To: <4558251.LvFx2qVVIh@kreacher>
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If multiple trip points are crossed in one go and the trips table in
the thermal zone device object is not sorted, the corresponding trip
point crossing notifications sent to user space will not be ordered
either.
Moreover, if the trips table is sorted by trip temperature in ascending
order, the trip crossing notifications on the way up will be sent in that
order too, but the trip crossing notifications on the way down will be
sent in the reverse order.
This is generally confusing and it is better to make the kernel send the
notifications in the order of growing (on the way up) or falling (on the
way down) trip temperature.
To achieve that, instead of sending a trip crossing notification and
recording a trip crossing event in the statistics right away from
handle_thermal_trip(), put the trip in question on a list that will be
sorted by __thermal_zone_device_update() after processing all of the trips
and before sending the notifications and recording trip crossing events.
Link: https://lore.kernel.org/linux-pm/20240306085428.88011-1-daniel.lezcano@linaro.org/
Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
v2 -> v3: Rebase, update changelog, add notify_temp to struct thermal_trip_desc
v1 -> v2: New patch
---
drivers/thermal/thermal_core.c | 41 +++++++++++++++++++++++++++++++++++------
drivers/thermal/thermal_core.h | 2 ++
2 files changed, 37 insertions(+), 6 deletions(-)
Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -15,6 +15,7 @@
#include <linux/slab.h>
#include <linux/kdev_t.h>
#include <linux/idr.h>
+#include <linux/list_sort.h>
#include <linux/thermal.h>
#include <linux/reboot.h>
#include <linux/string.h>
@@ -361,7 +362,9 @@ static void handle_critical_trips(struct
}
static void handle_thermal_trip(struct thermal_zone_device *tz,
- struct thermal_trip_desc *td)
+ struct thermal_trip_desc *td,
+ struct list_head *way_up_list,
+ struct list_head *way_down_list)
{
const struct thermal_trip *trip = &td->trip;
int old_threshold;
@@ -387,8 +390,8 @@ static void handle_thermal_trip(struct t
* In that case, the trip temperature becomes the new threshold.
*/
if (tz->temperature < trip->temperature - trip->hysteresis) {
- thermal_notify_tz_trip_down(tz, trip);
- thermal_debug_tz_trip_down(tz, trip);
+ list_add(&td->notify_list_node, way_down_list);
+ td->notify_temp = trip->temperature - trip->hysteresis;
} else {
td->threshold -= trip->hysteresis;
}
@@ -398,8 +401,8 @@ static void handle_thermal_trip(struct t
* if the zone temperature exceeds the trip one. The new
* threshold is then set to the low temperature of the trip.
*/
- thermal_notify_tz_trip_up(tz, trip);
- thermal_debug_tz_trip_up(tz, trip);
+ list_add_tail(&td->notify_list_node, way_up_list);
+ td->notify_temp = trip->temperature;
td->threshold -= trip->hysteresis;
}
@@ -452,10 +455,24 @@ static void thermal_zone_device_init(str
pos->initialized = false;
}
+static int thermal_trip_notify_cmp(void *ascending, const struct list_head *a,
+ const struct list_head *b)
+{
+ struct thermal_trip_desc *tda = container_of(a, struct thermal_trip_desc,
+ notify_list_node);
+ struct thermal_trip_desc *tdb = container_of(b, struct thermal_trip_desc,
+ notify_list_node);
+ int ret = tdb->notify_temp - tda->notify_temp;
+
+ return ascending ? ret : -ret;
+}
+
void __thermal_zone_device_update(struct thermal_zone_device *tz,
enum thermal_notify_event event)
{
struct thermal_trip_desc *td;
+ LIST_HEAD(way_down_list);
+ LIST_HEAD(way_up_list);
if (tz->suspended)
return;
@@ -470,7 +487,19 @@ void __thermal_zone_device_update(struct
tz->notify_event = event;
for_each_trip_desc(tz, td)
- handle_thermal_trip(tz, td);
+ handle_thermal_trip(tz, td, &way_up_list, &way_down_list);
+
+ list_sort(&way_up_list, &way_up_list, thermal_trip_notify_cmp);
+ list_for_each_entry(td, &way_up_list, notify_list_node) {
+ thermal_notify_tz_trip_up(tz, &td->trip);
+ thermal_debug_tz_trip_up(tz, &td->trip);
+ }
+
+ list_sort(NULL, &way_down_list, thermal_trip_notify_cmp);
+ list_for_each_entry(td, &way_down_list, notify_list_node) {
+ thermal_notify_tz_trip_down(tz, &td->trip);
+ thermal_debug_tz_trip_down(tz, &td->trip);
+ }
monitor_thermal_zone(tz);
}
Index: linux-pm/drivers/thermal/thermal_core.h
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.h
+++ linux-pm/drivers/thermal/thermal_core.h
@@ -17,6 +17,8 @@
struct thermal_trip_desc {
struct thermal_trip trip;
+ struct list_head notify_list_node;
+ int notify_temp;
int threshold;
};
^ permalink raw reply
* [PATCH v3 0/6] thermal: More separation between the core and drivers
From: Rafael J. Wysocki @ 2024-04-02 18:54 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Srinivas Pandruvada, Daniel Lezcano, Lukasz Luba,
AngeloGioacchino Del Regno
Hi Everyone,
This is an update of
https://lore.kernel.org/linux-pm/4558384.LvFx2qVVIh@kreacher/
and
https://lore.kernel.org/linux-pm/2331888.ElGaqSPkdT@kreacher/
which rebases the first patch on top of 6.9-rc2, adds 3 patches and adjusts
the third patch from v2.
The original description of the first two patches still applies:
> Patch [1/2] is based on the observation that the threshold field in struct
> thermal_trip really should be core-internal and to make that happen it
> introduces a wrapper structure around struct thermal_trip for internal
> use in the core.
>
> Patch [2/2] moves the definition of the new structure and the struct
> thermal_zone_device one to a local header file in the core to enforce
> more separation between the core and drivers.
>
> The patches are not expected to introduce any observable differences in
> behavior, so please let me know if you see any of that.
Note that these patches were first sent before the merge window and have not
really changed since then (except for a minor rebase of the first patch in
this series). Moreover, no comments regarding the merit of these patches
have been made shared, so if this continues, I will be considering them as
good to go by the end of this week.
Patch [3/6] is a rewrite of comments regarding trip crossing and threshold
computations.
Patch [4/6] updates the trip crossing detection code to consolidate the
threshold initialization with trip crossing on the way up.
Patch [5/6] ([3/3] in v2) adds a mechanism to sort notifications and debug
calls taking place during one invocation of __thermal_zone_device_update() so
they always go in temperature order.
Patch [6/6] relocates the critical and trip point handling to avoid a
redundant temperature check.
The series applies on top of 6.9-rc2 and I'm planning to create a test
branch containing it.
Thanks!
^ permalink raw reply
* [PATCH v3 1/6] thermal: core: Move threshold out of struct thermal_trip
From: Rafael J. Wysocki @ 2024-04-02 18:56 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Srinivas Pandruvada, Daniel Lezcano, Lukasz Luba,
AngeloGioacchino Del Regno
In-Reply-To: <4558251.LvFx2qVVIh@kreacher>
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The threshold field in struct thermal_trip is only used internally by
the thermal core and it is better to prevent drivers from misusing it.
It also takes some space unnecessarily in the trip tables passed by
drivers to the core during thermal zone registration.
For this reason, introduce struct thermal_trip_desc as a wrapper around
struct thermal_trip, move the threshold field directly into it and make
the thermal core store struct thermal_trip_desc objects in the internal
thermal zone trip tables. Adjust all of the code using trip tables in
the thermal core accordingly.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
v2 -> v3: Rebase on top of 6.9-rc2, minor changelog update.
v1 -> v2: No changes.
---
drivers/thermal/gov_fair_share.c | 7 +++--
drivers/thermal/gov_power_allocator.c | 6 ++--
drivers/thermal/thermal_core.c | 46 +++++++++++++++++++---------------
drivers/thermal/thermal_core.h | 7 +++--
drivers/thermal/thermal_debugfs.c | 6 ++--
drivers/thermal/thermal_helpers.c | 8 +++--
drivers/thermal/thermal_netlink.c | 6 ++--
drivers/thermal/thermal_sysfs.c | 20 +++++++-------
drivers/thermal/thermal_trip.c | 15 +++++------
include/linux/thermal.h | 9 ++++--
10 files changed, 78 insertions(+), 52 deletions(-)
Index: linux-pm/include/linux/thermal.h
===================================================================
--- linux-pm.orig/include/linux/thermal.h
+++ linux-pm/include/linux/thermal.h
@@ -61,7 +61,6 @@ enum thermal_notify_event {
* struct thermal_trip - representation of a point in temperature domain
* @temperature: temperature value in miliCelsius
* @hysteresis: relative hysteresis in miliCelsius
- * @threshold: trip crossing notification threshold miliCelsius
* @type: trip point type
* @priv: pointer to driver data associated with this trip
* @flags: flags representing binary properties of the trip
@@ -69,12 +68,16 @@ enum thermal_notify_event {
struct thermal_trip {
int temperature;
int hysteresis;
- int threshold;
enum thermal_trip_type type;
u8 flags;
void *priv;
};
+struct thermal_trip_desc {
+ struct thermal_trip trip;
+ int threshold;
+};
+
#define THERMAL_TRIP_FLAG_RW_TEMP BIT(0)
#define THERMAL_TRIP_FLAG_RW_HYST BIT(1)
@@ -203,7 +206,7 @@ struct thermal_zone_device {
#ifdef CONFIG_THERMAL_DEBUGFS
struct thermal_debugfs *debugfs;
#endif
- struct thermal_trip trips[] __counted_by(num_trips);
+ struct thermal_trip_desc trips[] __counted_by(num_trips);
};
/**
Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -361,17 +361,19 @@ static void handle_critical_trips(struct
}
static void handle_thermal_trip(struct thermal_zone_device *tz,
- struct thermal_trip *trip)
+ struct thermal_trip_desc *td)
{
+ const struct thermal_trip *trip = &td->trip;
+
if (trip->temperature == THERMAL_TEMP_INVALID)
return;
if (tz->last_temperature == THERMAL_TEMP_INVALID) {
/* Initialization. */
- trip->threshold = trip->temperature;
- if (tz->temperature >= trip->threshold)
- trip->threshold -= trip->hysteresis;
- } else if (tz->last_temperature < trip->threshold) {
+ td->threshold = trip->temperature;
+ if (tz->temperature >= td->threshold)
+ td->threshold -= trip->hysteresis;
+ } else if (tz->last_temperature < td->threshold) {
/*
* The trip threshold is equal to the trip temperature, unless
* the latter has changed in the meantime. In either case,
@@ -382,9 +384,9 @@ static void handle_thermal_trip(struct t
if (tz->temperature >= trip->temperature) {
thermal_notify_tz_trip_up(tz, trip);
thermal_debug_tz_trip_up(tz, trip);
- trip->threshold = trip->temperature - trip->hysteresis;
+ td->threshold = trip->temperature - trip->hysteresis;
} else {
- trip->threshold = trip->temperature;
+ td->threshold = trip->temperature;
}
} else {
/*
@@ -400,9 +402,9 @@ static void handle_thermal_trip(struct t
if (tz->temperature < trip->temperature - trip->hysteresis) {
thermal_notify_tz_trip_down(tz, trip);
thermal_debug_tz_trip_down(tz, trip);
- trip->threshold = trip->temperature;
+ td->threshold = trip->temperature;
} else {
- trip->threshold = trip->temperature - trip->hysteresis;
+ td->threshold = trip->temperature - trip->hysteresis;
}
}
@@ -458,7 +460,7 @@ static void thermal_zone_device_init(str
void __thermal_zone_device_update(struct thermal_zone_device *tz,
enum thermal_notify_event event)
{
- struct thermal_trip *trip;
+ struct thermal_trip_desc *td;
if (tz->suspended)
return;
@@ -472,8 +474,8 @@ void __thermal_zone_device_update(struct
tz->notify_event = event;
- for_each_trip(tz, trip)
- handle_thermal_trip(tz, trip);
+ for_each_trip_desc(tz, td)
+ handle_thermal_trip(tz, td);
monitor_thermal_zone(tz);
}
@@ -766,7 +768,7 @@ int thermal_zone_bind_cooling_device(str
if (trip_index < 0 || trip_index >= tz->num_trips)
return -EINVAL;
- return thermal_bind_cdev_to_trip(tz, &tz->trips[trip_index], cdev,
+ return thermal_bind_cdev_to_trip(tz, &tz->trips[trip_index].trip, cdev,
upper, lower, weight);
}
EXPORT_SYMBOL_GPL(thermal_zone_bind_cooling_device);
@@ -825,7 +827,7 @@ int thermal_zone_unbind_cooling_device(s
if (trip_index < 0 || trip_index >= tz->num_trips)
return -EINVAL;
- return thermal_unbind_cdev_from_trip(tz, &tz->trips[trip_index], cdev);
+ return thermal_unbind_cdev_from_trip(tz, &tz->trips[trip_index].trip, cdev);
}
EXPORT_SYMBOL_GPL(thermal_zone_unbind_cooling_device);
@@ -1221,16 +1223,19 @@ static void thermal_set_delay_jiffies(un
int thermal_zone_get_crit_temp(struct thermal_zone_device *tz, int *temp)
{
- int i, ret = -EINVAL;
+ const struct thermal_trip_desc *td;
+ int ret = -EINVAL;
if (tz->ops.get_crit_temp)
return tz->ops.get_crit_temp(tz, temp);
mutex_lock(&tz->lock);
- for (i = 0; i < tz->num_trips; i++) {
- if (tz->trips[i].type == THERMAL_TRIP_CRITICAL) {
- *temp = tz->trips[i].temperature;
+ for_each_trip_desc(tz, td) {
+ const struct thermal_trip *trip = &td->trip;
+
+ if (trip->type == THERMAL_TRIP_CRITICAL) {
+ *temp = trip->temperature;
ret = 0;
break;
}
@@ -1274,7 +1279,9 @@ thermal_zone_device_register_with_trips(
const struct thermal_zone_params *tzp,
int passive_delay, int polling_delay)
{
+ const struct thermal_trip *trip = trips;
struct thermal_zone_device *tz;
+ struct thermal_trip_desc *td;
int id;
int result;
struct thermal_governor *governor;
@@ -1339,7 +1346,8 @@ thermal_zone_device_register_with_trips(
tz->device.class = thermal_class;
tz->devdata = devdata;
tz->num_trips = num_trips;
- memcpy(tz->trips, trips, num_trips * sizeof(*trips));
+ for_each_trip_desc(tz, td)
+ td->trip = *trip++;
thermal_set_delay_jiffies(&tz->passive_delay_jiffies, passive_delay);
thermal_set_delay_jiffies(&tz->polling_delay_jiffies, polling_delay);
Index: linux-pm/drivers/thermal/thermal_core.h
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.h
+++ linux-pm/drivers/thermal/thermal_core.h
@@ -120,8 +120,11 @@ void thermal_governor_update_tz(struct t
enum thermal_notify_event reason);
/* Helpers */
-#define for_each_trip(__tz, __trip) \
- for (__trip = __tz->trips; __trip - __tz->trips < __tz->num_trips; __trip++)
+#define for_each_trip_desc(__tz, __td) \
+ for (__td = __tz->trips; __td - __tz->trips < __tz->num_trips; __td++)
+
+#define trip_to_trip_desc(__trip) \
+ container_of(__trip, struct thermal_trip_desc, trip)
void __thermal_zone_set_trips(struct thermal_zone_device *tz);
int thermal_zone_trip_id(const struct thermal_zone_device *tz,
Index: linux-pm/drivers/thermal/thermal_sysfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_sysfs.c
+++ linux-pm/drivers/thermal/thermal_sysfs.c
@@ -88,7 +88,7 @@ trip_point_type_show(struct device *dev,
if (sscanf(attr->attr.name, "trip_point_%d_type", &trip_id) != 1)
return -EINVAL;
- switch (tz->trips[trip_id].type) {
+ switch (tz->trips[trip_id].trip.type) {
case THERMAL_TRIP_CRITICAL:
return sprintf(buf, "critical\n");
case THERMAL_TRIP_HOT:
@@ -120,7 +120,7 @@ trip_point_temp_store(struct device *dev
mutex_lock(&tz->lock);
- trip = &tz->trips[trip_id];
+ trip = &tz->trips[trip_id].trip;
if (temp != trip->temperature) {
if (tz->ops.set_trip_temp) {
@@ -150,7 +150,7 @@ trip_point_temp_show(struct device *dev,
if (sscanf(attr->attr.name, "trip_point_%d_temp", &trip_id) != 1)
return -EINVAL;
- return sprintf(buf, "%d\n", tz->trips[trip_id].temperature);
+ return sprintf(buf, "%d\n", tz->trips[trip_id].trip.temperature);
}
static ssize_t
@@ -171,7 +171,7 @@ trip_point_hyst_store(struct device *dev
mutex_lock(&tz->lock);
- trip = &tz->trips[trip_id];
+ trip = &tz->trips[trip_id].trip;
if (hyst != trip->hysteresis) {
trip->hysteresis = hyst;
@@ -194,7 +194,7 @@ trip_point_hyst_show(struct device *dev,
if (sscanf(attr->attr.name, "trip_point_%d_hyst", &trip_id) != 1)
return -EINVAL;
- return sprintf(buf, "%d\n", tz->trips[trip_id].hysteresis);
+ return sprintf(buf, "%d\n", tz->trips[trip_id].trip.hysteresis);
}
static ssize_t
@@ -393,7 +393,7 @@ static const struct attribute_group *the
*/
static int create_trip_attrs(struct thermal_zone_device *tz)
{
- const struct thermal_trip *trip;
+ const struct thermal_trip_desc *td;
struct attribute **attrs;
/* This function works only for zones with at least one trip */
@@ -429,8 +429,8 @@ static int create_trip_attrs(struct ther
return -ENOMEM;
}
- for_each_trip(tz, trip) {
- int indx = thermal_zone_trip_id(tz, trip);
+ for_each_trip_desc(tz, td) {
+ int indx = thermal_zone_trip_id(tz, &td->trip);
/* create trip type attribute */
snprintf(tz->trip_type_attrs[indx].name, THERMAL_NAME_LENGTH,
@@ -452,7 +452,7 @@ static int create_trip_attrs(struct ther
tz->trip_temp_attrs[indx].name;
tz->trip_temp_attrs[indx].attr.attr.mode = S_IRUGO;
tz->trip_temp_attrs[indx].attr.show = trip_point_temp_show;
- if (trip->flags & THERMAL_TRIP_FLAG_RW_TEMP) {
+ if (td->trip.flags & THERMAL_TRIP_FLAG_RW_TEMP) {
tz->trip_temp_attrs[indx].attr.attr.mode |= S_IWUSR;
tz->trip_temp_attrs[indx].attr.store =
trip_point_temp_store;
@@ -467,7 +467,7 @@ static int create_trip_attrs(struct ther
tz->trip_hyst_attrs[indx].name;
tz->trip_hyst_attrs[indx].attr.attr.mode = S_IRUGO;
tz->trip_hyst_attrs[indx].attr.show = trip_point_hyst_show;
- if (trip->flags & THERMAL_TRIP_FLAG_RW_HYST) {
+ if (td->trip.flags & THERMAL_TRIP_FLAG_RW_HYST) {
tz->trip_hyst_attrs[indx].attr.attr.mode |= S_IWUSR;
tz->trip_hyst_attrs[indx].attr.store =
trip_point_hyst_store;
Index: linux-pm/drivers/thermal/thermal_debugfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_debugfs.c
+++ linux-pm/drivers/thermal/thermal_debugfs.c
@@ -744,7 +744,7 @@ static void tze_seq_stop(struct seq_file
static int tze_seq_show(struct seq_file *s, void *v)
{
struct thermal_zone_device *tz = s->private;
- struct thermal_trip *trip;
+ struct thermal_trip_desc *td;
struct tz_episode *tze;
const char *type;
int trip_id;
@@ -757,7 +757,9 @@ static int tze_seq_show(struct seq_file
seq_printf(s, "| trip | type | temp(°mC) | hyst(°mC) | duration | avg(°mC) | min(°mC) | max(°mC) |\n");
- for_each_trip(tz, trip) {
+ for_each_trip_desc(tz, td) {
+ const struct thermal_trip *trip = &td->trip;
+
/*
* There is no possible mitigation happening at the
* critical trip point, so the stats will be always
Index: linux-pm/drivers/thermal/thermal_netlink.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_netlink.c
+++ linux-pm/drivers/thermal/thermal_netlink.c
@@ -442,7 +442,7 @@ out_cancel_nest:
static int thermal_genl_cmd_tz_get_trip(struct param *p)
{
struct sk_buff *msg = p->msg;
- const struct thermal_trip *trip;
+ const struct thermal_trip_desc *td;
struct thermal_zone_device *tz;
struct nlattr *start_trip;
int id;
@@ -462,7 +462,9 @@ static int thermal_genl_cmd_tz_get_trip(
mutex_lock(&tz->lock);
- for_each_trip(tz, trip) {
+ for_each_trip_desc(tz, td) {
+ const struct thermal_trip *trip = &td->trip;
+
if (nla_put_u32(msg, THERMAL_GENL_ATTR_TZ_TRIP_ID,
thermal_zone_trip_id(tz, trip)) ||
nla_put_u32(msg, THERMAL_GENL_ATTR_TZ_TRIP_TYPE, trip->type) ||
Index: linux-pm/drivers/thermal/thermal_trip.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_trip.c
+++ linux-pm/drivers/thermal/thermal_trip.c
@@ -13,11 +13,11 @@ int for_each_thermal_trip(struct thermal
int (*cb)(struct thermal_trip *, void *),
void *data)
{
- struct thermal_trip *trip;
+ struct thermal_trip_desc *td;
int ret;
- for_each_trip(tz, trip) {
- ret = cb(trip, data);
+ for_each_trip_desc(tz, td) {
+ ret = cb(&td->trip, data);
if (ret)
return ret;
}
@@ -63,7 +63,7 @@ EXPORT_SYMBOL_GPL(thermal_zone_get_num_t
*/
void __thermal_zone_set_trips(struct thermal_zone_device *tz)
{
- const struct thermal_trip *trip;
+ const struct thermal_trip_desc *td;
int low = -INT_MAX, high = INT_MAX;
int ret;
@@ -72,7 +72,8 @@ void __thermal_zone_set_trips(struct the
if (!tz->ops.set_trips)
return;
- for_each_trip(tz, trip) {
+ for_each_trip_desc(tz, td) {
+ const struct thermal_trip *trip = &td->trip;
int trip_low;
trip_low = trip->temperature - trip->hysteresis;
@@ -110,7 +111,7 @@ int __thermal_zone_get_trip(struct therm
if (!tz || trip_id < 0 || trip_id >= tz->num_trips || !trip)
return -EINVAL;
- *trip = tz->trips[trip_id];
+ *trip = tz->trips[trip_id].trip;
return 0;
}
EXPORT_SYMBOL_GPL(__thermal_zone_get_trip);
@@ -135,7 +136,7 @@ int thermal_zone_trip_id(const struct th
* Assume the trip to be located within the bounds of the thermal
* zone's trips[] table.
*/
- return trip - tz->trips;
+ return trip_to_trip_desc(trip) - tz->trips;
}
void thermal_zone_trip_updated(struct thermal_zone_device *tz,
const struct thermal_trip *trip)
Index: linux-pm/drivers/thermal/gov_fair_share.c
===================================================================
--- linux-pm.orig/drivers/thermal/gov_fair_share.c
+++ linux-pm/drivers/thermal/gov_fair_share.c
@@ -17,10 +17,13 @@
static int get_trip_level(struct thermal_zone_device *tz)
{
- const struct thermal_trip *trip, *level_trip = NULL;
+ const struct thermal_trip *level_trip = NULL;
+ const struct thermal_trip_desc *td;
int trip_level = -1;
- for_each_trip(tz, trip) {
+ for_each_trip_desc(tz, td) {
+ const struct thermal_trip *trip = &td->trip;
+
if (trip->temperature >= tz->temperature)
continue;
Index: linux-pm/drivers/thermal/gov_power_allocator.c
===================================================================
--- linux-pm.orig/drivers/thermal/gov_power_allocator.c
+++ linux-pm/drivers/thermal/gov_power_allocator.c
@@ -496,9 +496,11 @@ static void get_governor_trips(struct th
const struct thermal_trip *first_passive = NULL;
const struct thermal_trip *last_passive = NULL;
const struct thermal_trip *last_active = NULL;
- const struct thermal_trip *trip;
+ const struct thermal_trip_desc *td;
+
+ for_each_trip_desc(tz, td) {
+ const struct thermal_trip *trip = &td->trip;
- for_each_trip(tz, trip) {
switch (trip->type) {
case THERMAL_TRIP_PASSIVE:
if (!first_passive) {
Index: linux-pm/drivers/thermal/thermal_helpers.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_helpers.c
+++ linux-pm/drivers/thermal/thermal_helpers.c
@@ -50,7 +50,7 @@ get_thermal_instance(struct thermal_zone
mutex_lock(&tz->lock);
mutex_lock(&cdev->lock);
- trip = &tz->trips[trip_index];
+ trip = &tz->trips[trip_index].trip;
list_for_each_entry(pos, &tz->thermal_instances, tz_node) {
if (pos->tz == tz && pos->trip == trip && pos->cdev == cdev) {
@@ -82,7 +82,7 @@ EXPORT_SYMBOL(get_thermal_instance);
*/
int __thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp)
{
- const struct thermal_trip *trip;
+ const struct thermal_trip_desc *td;
int crit_temp = INT_MAX;
int ret = -EINVAL;
@@ -91,7 +91,9 @@ int __thermal_zone_get_temp(struct therm
ret = tz->ops.get_temp(tz, temp);
if (IS_ENABLED(CONFIG_THERMAL_EMULATION) && tz->emul_temperature) {
- for_each_trip(tz, trip) {
+ for_each_trip_desc(tz, td) {
+ const struct thermal_trip *trip = &td->trip;
+
if (trip->type == THERMAL_TRIP_CRITICAL) {
crit_temp = trip->temperature;
break;
^ permalink raw reply
* [PATCH v3 6/6] thermal: core: Relocate critical and hot trip handling
From: Rafael J. Wysocki @ 2024-04-02 19:04 UTC (permalink / raw)
To: Linux PM
Cc: LKML, Srinivas Pandruvada, Daniel Lezcano, Lukasz Luba,
AngeloGioacchino Del Regno
In-Reply-To: <4558251.LvFx2qVVIh@kreacher>
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Modify handle_thermal_trip() to call handle_critical_trips() only after
finding that the trip temperature has been crossed on the way up and
remove the redundant temperature check from the latter.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/thermal/thermal_core.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -349,10 +349,6 @@ void thermal_zone_device_critical_reboot
static void handle_critical_trips(struct thermal_zone_device *tz,
const struct thermal_trip *trip)
{
- /* If we have not crossed the trip_temp, we do not care. */
- if (trip->temperature <= 0 || tz->temperature < trip->temperature)
- return;
-
trace_thermal_zone_trip(tz, thermal_zone_trip_id(tz, trip), trip->type);
if (trip->type == THERMAL_TRIP_CRITICAL)
@@ -404,12 +400,15 @@ static void handle_thermal_trip(struct t
list_add_tail(&td->notify_list_node, way_up_list);
td->notify_temp = trip->temperature;
td->threshold -= trip->hysteresis;
+
+ if (trip->type == THERMAL_TRIP_CRITICAL ||
+ trip->type == THERMAL_TRIP_HOT) {
+ handle_critical_trips(tz, trip);
+ return;
+ }
}
- if (trip->type == THERMAL_TRIP_CRITICAL || trip->type == THERMAL_TRIP_HOT)
- handle_critical_trips(tz, trip);
- else
- handle_non_critical_trips(tz, trip);
+ handle_non_critical_trips(tz, trip);
}
static void update_temperature(struct thermal_zone_device *tz)
^ permalink raw reply
* RE: [PATCH] tools/power/turbostat: Fix uncore frequency file string
From: Ernst, Justin @ 2024-04-02 18:57 UTC (permalink / raw)
To: Len Brown; +Cc: linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <CAJvTdK=R+XQZ4Vov8iXGiMADShgrwSoDL8-Jqfhii7YruRLDsg@mail.gmail.com>
> From: Len Brown <lenb@kernel.org>
> Sent: Tuesday, April 2, 2024 12:46 PM
>
> Thanks for the patch, Justin,
>
> Looks like the probe part of this was already fixed in my git tree, so
> I lopped off that hunk and kept your 1st hunk.
>
> Let me know if it works, or if I screwed it up.
It works! I tested on a 16-socket system with "..., package_10_die_00/, package_11_die_00/, etc" directories present.
Thanks for the link to your latest tree. It made it very easy to build and test your patch.
Cheers,
-Justin
>
> latest is in this tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git/
>
> thanks,
> -Len
>
> On Fri, Mar 8, 2024 at 3:50 PM Justin Ernst <justin.ernst@hpe.com> wrote:
> >
> > Running turbostat on a 16 socket HPE Scale-up Compute 3200 (SapphireRapids) fails with:
> > turbostat: /sys/devices/system/cpu/intel_uncore_frequency/package_010_die_00/current_freq_khz: open
> failed: No such file or directory
> >
> > We observe the sysfs uncore frequency directories named:
> > ...
> > package_09_die_00/
> > package_10_die_00/
> > package_11_die_00/
> > ...
> > package_15_die_00/
> >
> > The culprit is an incorrect sprintf format string "package_0%d_die_0%d" used
> > with each instance of reading uncore frequency files. uncore-frequency-common.c
> > creates the sysfs directory with the format "package_%02d_die_%02d". Once the
> > package value reaches double digits, the formats diverge.
> >
> > Change each instance of "package_0%d_die_0%d" to "package_%02d_die_%02d".
> >
> > Signed-off-by: Justin Ernst <justin.ernst@hpe.com>
> > ---
> > tools/power/x86/turbostat/turbostat.c | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
> > index 7a334377f92b..2a15a23cb726 100644
> > --- a/tools/power/x86/turbostat/turbostat.c
> > +++ b/tools/power/x86/turbostat/turbostat.c
> > @@ -2599,7 +2599,7 @@ unsigned long long get_uncore_mhz(int package, int die)
> > {
> > char path[128];
> >
> > - sprintf(path,
> "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/current_freq_khz", package,
> > + sprintf(path,
> "/sys/devices/system/cpu/intel_uncore_frequency/package_%02d_die_%02d/current_freq_khz", package,
> > die);
> >
> > return (snapshot_sysfs_counter(path) / 1000);
> > @@ -4589,20 +4589,20 @@ static void probe_intel_uncore_frequency(void)
> > for (j = 0; j < topo.num_die; ++j) {
> > int k, l;
> >
> > - sprintf(path,
> "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/min_freq_khz",
> > + sprintf(path,
> "/sys/devices/system/cpu/intel_uncore_frequency/package_%02d_die_%02d/min_freq_khz",
> > i, j);
> > k = read_sysfs_int(path);
> > - sprintf(path,
> "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/max_freq_khz",
> > + sprintf(path,
> "/sys/devices/system/cpu/intel_uncore_frequency/package_%02d_die_%02d/max_freq_khz",
> > i, j);
> > l = read_sysfs_int(path);
> > fprintf(outf, "Uncore Frequency pkg%d die%d: %d - %d MHz ", i, j, k / 1000,
> l / 1000);
> >
> > sprintf(path,
> > -
> "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/initial_min_freq_khz",
> > +
> "/sys/devices/system/cpu/intel_uncore_frequency/package_%02d_die_%02d/initial_min_freq_khz",
> > i, j);
> > k = read_sysfs_int(path);
> > sprintf(path,
> > -
> "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/initial_max_freq_khz",
> > +
> "/sys/devices/system/cpu/intel_uncore_frequency/package_%02d_die_%02d/initial_max_freq_khz",
> > i, j);
> > l = read_sysfs_int(path);
> > fprintf(outf, "(%d - %d MHz)\n", k / 1000, l / 1000);
> > --
> > 2.26.2
> >
>
>
> --
> Len Brown, Intel
^ permalink raw reply
* Re: [PATCH] tools/power/turbostat: Fix uncore frequency file string
From: Len Brown @ 2024-04-02 17:45 UTC (permalink / raw)
To: Justin Ernst; +Cc: linux-pm, linux-kernel
In-Reply-To: <20240308204957.2007212-1-justin.ernst@hpe.com>
Thanks for the patch, Justin,
Looks like the probe part of this was already fixed in my git tree, so
I lopped off that hunk and kept your 1st hunk.
Let me know if it works, or if I screwed it up.
latest is in this tree:
https://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux.git/
thanks,
-Len
On Fri, Mar 8, 2024 at 3:50 PM Justin Ernst <justin.ernst@hpe.com> wrote:
>
> Running turbostat on a 16 socket HPE Scale-up Compute 3200 (SapphireRapids) fails with:
> turbostat: /sys/devices/system/cpu/intel_uncore_frequency/package_010_die_00/current_freq_khz: open failed: No such file or directory
>
> We observe the sysfs uncore frequency directories named:
> ...
> package_09_die_00/
> package_10_die_00/
> package_11_die_00/
> ...
> package_15_die_00/
>
> The culprit is an incorrect sprintf format string "package_0%d_die_0%d" used
> with each instance of reading uncore frequency files. uncore-frequency-common.c
> creates the sysfs directory with the format "package_%02d_die_%02d". Once the
> package value reaches double digits, the formats diverge.
>
> Change each instance of "package_0%d_die_0%d" to "package_%02d_die_%02d".
>
> Signed-off-by: Justin Ernst <justin.ernst@hpe.com>
> ---
> tools/power/x86/turbostat/turbostat.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
> index 7a334377f92b..2a15a23cb726 100644
> --- a/tools/power/x86/turbostat/turbostat.c
> +++ b/tools/power/x86/turbostat/turbostat.c
> @@ -2599,7 +2599,7 @@ unsigned long long get_uncore_mhz(int package, int die)
> {
> char path[128];
>
> - sprintf(path, "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/current_freq_khz", package,
> + sprintf(path, "/sys/devices/system/cpu/intel_uncore_frequency/package_%02d_die_%02d/current_freq_khz", package,
> die);
>
> return (snapshot_sysfs_counter(path) / 1000);
> @@ -4589,20 +4589,20 @@ static void probe_intel_uncore_frequency(void)
> for (j = 0; j < topo.num_die; ++j) {
> int k, l;
>
> - sprintf(path, "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/min_freq_khz",
> + sprintf(path, "/sys/devices/system/cpu/intel_uncore_frequency/package_%02d_die_%02d/min_freq_khz",
> i, j);
> k = read_sysfs_int(path);
> - sprintf(path, "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/max_freq_khz",
> + sprintf(path, "/sys/devices/system/cpu/intel_uncore_frequency/package_%02d_die_%02d/max_freq_khz",
> i, j);
> l = read_sysfs_int(path);
> fprintf(outf, "Uncore Frequency pkg%d die%d: %d - %d MHz ", i, j, k / 1000, l / 1000);
>
> sprintf(path,
> - "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/initial_min_freq_khz",
> + "/sys/devices/system/cpu/intel_uncore_frequency/package_%02d_die_%02d/initial_min_freq_khz",
> i, j);
> k = read_sysfs_int(path);
> sprintf(path,
> - "/sys/devices/system/cpu/intel_uncore_frequency/package_0%d_die_0%d/initial_max_freq_khz",
> + "/sys/devices/system/cpu/intel_uncore_frequency/package_%02d_die_%02d/initial_max_freq_khz",
> i, j);
> l = read_sysfs_int(path);
> fprintf(outf, "(%d - %d MHz)\n", k / 1000, l / 1000);
> --
> 2.26.2
>
--
Len Brown, Intel
^ permalink raw reply
* Re: [PATCH 3/3] arm64: cpuidle: Add arm_poll_idle
From: Mark Rutland @ 2024-04-02 17:23 UTC (permalink / raw)
To: Haris Okanovic
Cc: linux-kernel, linux-pm, linux-assembly, peterz, Ali Saidi,
Geoff Blake, Brian Silver
In-Reply-To: <20240402014706.3969151-3-harisokn@amazon.com>
On Mon, Apr 01, 2024 at 08:47:06PM -0500, Haris Okanovic wrote:
> An arm64 cpuidle driver with two states: (1) First polls for new runable
> tasks up to 100 us (by default) before (2) a wfi idle and awoken by
> interrupt (the current arm64 behavior). It allows CPUs to return from
> idle more quickly by avoiding the longer interrupt wakeup path, which
> may require EL1/EL2 transition in certain VM scenarios.
Please start off with an explanation of the problem you're trying to solve
(which IIUC is to wake up more quickly in certain cases), before describing the
solution. That makes it *significantly* easier for people to review this, since
once you have the problem statement in mind it's much easier to understand how
the solution space follows from that.
> Poll duration is optionally configured at load time via the poll_limit
> module parameter.
Why should this be a configurable parameter?
(note, at this point you haven't introduced any of the data below, so the
trade-off isn't clear to anyone).
> The default 100 us duration was experimentally chosen, by measuring QPS
> (queries per sec) of the MLPerf bert inference benchmark, which seems
> particularly susceptible to this change; see procedure below. 100 us is
> the inflection point where QPS stopped growing in a range of tested
> values. All results are from AWS m7g.16xlarge instances (Graviton3 SoC)
> with dedicated tenancy (dedicated hardware).
>
> | before | 10us | 25us | 50us | 100us | 125us | 150us | 200us | 300us |
> | 5.87 | 5.91 | 5.96 | 6.01 | 6.06 | 6.07 | 6.06 | 6.06 | 6.06 |
>
> Perf's scheduler benchmarks also improve with a range of poll_limit
> values >= 10 us. Higher limits produce near identical results within a
> 3% noise margin. The following tables are `perf bench sched` results,
> run times in seconds.
>
> `perf bench sched messaging -l 80000`
> | AWS instance | SoC | Before | After | % Change |
> | c6g.16xl (VM) | Graviton2 | 18.974 | 18.400 | none |
> | c7g.16xl (VM) | Graviton3 | 13.852 | 13.859 | none |
> | c6g.metal | Graviton2 | 17.621 | 16.744 | none |
> | c7g.metal | Graviton3 | 13.430 | 13.404 | none |
>
> `perf bench sched pipe -l 2500000`
> | AWS instance | SoC | Before | After | % Change |
> | c6g.16xl (VM) | Graviton2 | 30.158 | 15.181 | -50% |
> | c7g.16xl (VM) | Graviton3 | 18.289 | 12.067 | -34% |
> | c6g.metal | Graviton2 | 17.609 | 15.170 | -14% |
> | c7g.metal | Graviton3 | 14.103 | 12.304 | -13% |
>
> `perf bench sched seccomp-notify -l 2500000`
> | AWS instance | SoC | Before | After | % Change |
> | c6g.16xl (VM) | Graviton2 | 28.784 | 13.754 | -52% |
> | c7g.16xl (VM) | Graviton3 | 16.964 | 11.430 | -33% |
> | c6g.metal | Graviton2 | 15.717 | 13.536 | -14% |
> | c7g.metal | Graviton3 | 13.301 | 11.491 | -14% |
Ok, so perf numbers for a busy workload go up.
What happens for idle state residency on a mostly idle system?
> Steps to run MLPerf bert inference on Ubuntu 22.04:
> sudo apt install build-essential python3 python3-pip
> pip install "pybind11[global]" tensorflow transformers
> export TF_ENABLE_ONEDNN_OPTS=1
> export DNNL_DEFAULT_FPMATH_MODE=BF16
> git clone https://github.com/mlcommons/inference.git --recursive
> cd inference
> git checkout v2.0
> cd loadgen
> CFLAGS="-std=c++14" python3 setup.py bdist_wheel
> pip install dist/*.whl
> cd ../language/bert
> make setup
> python3 run.py --backend=tf --scenario=SingleStream
>
> Suggested-by: Ali Saidi <alisaidi@amazon.com>
> Reviewed-by: Ali Saidi <alisaidi@amazon.com>
> Reviewed-by: Geoff Blake <blakgeof@amazon.com>
> Cc: Brian Silver <silverbr@amazon.com>
> Signed-off-by: Haris Okanovic <harisokn@amazon.com>
> ---
> drivers/cpuidle/Kconfig.arm | 13 ++
> drivers/cpuidle/Makefile | 1 +
> drivers/cpuidle/cpuidle-arm-polling.c | 171 ++++++++++++++++++++++++++
> 3 files changed, 185 insertions(+)
> create mode 100644 drivers/cpuidle/cpuidle-arm-polling.c
>
> diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm
> index a1ee475d180d..484666dda38d 100644
> --- a/drivers/cpuidle/Kconfig.arm
> +++ b/drivers/cpuidle/Kconfig.arm
> @@ -14,6 +14,19 @@ config ARM_CPUIDLE
> initialized by calling the CPU operations init idle hook
> provided by architecture code.
>
> +config ARM_POLL_CPUIDLE
> + bool "ARM64 CPU idle Driver with polling"
> + depends on ARM64
> + depends on ARM_ARCH_TIMER_EVTSTREAM
> + select CPU_IDLE_MULTIPLE_DRIVERS
> + help
> + Select this to enable a polling cpuidle driver for ARM64:
> + The first state polls TIF_NEED_RESCHED for best latency on short
> + sleep intervals. The second state falls back to arch_cpu_idle() to
> + wait for interrupt. This is can be helpful in workloads that
> + frequently block/wake at short intervals or VMs where wakeup IPIs
> + are more expensive.
Why is this a separate driver rather than an optional feature in the existing
driver?
The fact that this duplicates a bunch of code indicates to me that this should
not be a separate driver.
> +
> config ARM_PSCI_CPUIDLE
> bool "PSCI CPU idle Driver"
> depends on ARM_PSCI_FW
> diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile
> index d103342b7cfc..23c21422792d 100644
> --- a/drivers/cpuidle/Makefile
> +++ b/drivers/cpuidle/Makefile
> @@ -22,6 +22,7 @@ obj-$(CONFIG_ARM_U8500_CPUIDLE) += cpuidle-ux500.o
> obj-$(CONFIG_ARM_AT91_CPUIDLE) += cpuidle-at91.o
> obj-$(CONFIG_ARM_EXYNOS_CPUIDLE) += cpuidle-exynos.o
> obj-$(CONFIG_ARM_CPUIDLE) += cpuidle-arm.o
> +obj-$(CONFIG_ARM_POLL_CPUIDLE) += cpuidle-arm-polling.o
> obj-$(CONFIG_ARM_PSCI_CPUIDLE) += cpuidle-psci.o
> obj-$(CONFIG_ARM_PSCI_CPUIDLE_DOMAIN) += cpuidle-psci-domain.o
> obj-$(CONFIG_ARM_TEGRA_CPUIDLE) += cpuidle-tegra.o
> diff --git a/drivers/cpuidle/cpuidle-arm-polling.c b/drivers/cpuidle/cpuidle-arm-polling.c
> new file mode 100644
> index 000000000000..bca128568114
> --- /dev/null
> +++ b/drivers/cpuidle/cpuidle-arm-polling.c
> @@ -0,0 +1,171 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * ARM64 CPU idle driver using wfe polling
> + *
> + * Copyright 2024 Amazon.com, Inc. or its affiliates. All rights reserved.
> + *
> + * Authors:
> + * Haris Okanovic <harisokn@amazon.com>
> + * Brian Silver <silverbr@amazon.com>
> + *
> + * Based on cpuidle-arm.c
> + * Copyright (C) 2014 ARM Ltd.
> + * Author: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> + */
> +
> +#include <linux/cpu.h>
> +#include <linux/cpu_cooling.h>
> +#include <linux/cpuidle.h>
> +#include <linux/sched/clock.h>
> +
> +#include <asm/cpuidle.h>
> +#include <asm/readex.h>
> +
> +#include "dt_idle_states.h"
> +
> +/* Max duration of the wfe() poll loop in us, before transitioning to
> + * arch_cpu_idle()/wfi() sleep.
> + */
/*
* Comments should have the leading '/*' on a separate line.
* See https://www.kernel.org/doc/html/v6.8/process/coding-style.html#commenting
*/
> +#define DEFAULT_POLL_LIMIT_US 100
> +static unsigned int poll_limit __read_mostly = DEFAULT_POLL_LIMIT_US;
> +
> +/*
> + * arm_idle_wfe_poll - Polls state in wfe loop until reschedule is
> + * needed or timeout
> + */
> +static int __cpuidle arm_idle_wfe_poll(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv, int idx)
> +{
> + u64 time_start, time_limit;
> +
> + time_start = local_clock();
> + dev->poll_time_limit = false;
> +
> + local_irq_enable();
Why enable IRQs here? We don't do that in the regular cpuidle-arm driver, nor
the cpuidle-psci driver, and there's no explanation here or in the commit message.
How does this interact with RCU? Is that still watching or are we in an
extended quiescent state? For PSCI idle states we enter an EQS, and it seems
like we probably should here...
> +
> + if (current_set_polling_and_test())
> + goto end;
> +
> + time_limit = cpuidle_poll_time(drv, dev);
> +
> + do {
> + // exclusive read arms the monitor for wfe
> + if (__READ_ONCE_EX(current_thread_info()->flags) & _TIF_NEED_RESCHED)
> + goto end;
> +
> + // may exit prematurely, see ARM_ARCH_TIMER_EVTSTREAM
> + wfe();
> + } while (local_clock() - time_start < time_limit);
... and if the EVTSTREAM is disabled, we'll sit in WFE forever rather than
entering a deeper idle state, which doesn't seem desirable.
It's worth noting that now that we have WFET, we'll probably want to disable
the EVTSTREAM by default at some point, at least in some configurations, since
that'll be able to sit in a WFE state for longer while also reliably waking up
when required.
I suspect we want something like an smp_load_acquire_timeout() here to do the
wait in arch code (allowing us to use WFET), and enabling this state will
depend on either having WFET or EVTSTREAM.
> +
> + dev->poll_time_limit = true;
> +
> +end:
> + current_clr_polling();
> + return idx;
> +}
> +
> +/*
> + * arm_idle_wfi - Places cpu in lower power state until interrupt,
> + * a fallback to polling
> + */
> +static int __cpuidle arm_idle_wfi(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv, int idx)
> +{
> + if (current_clr_polling_and_test()) {
> + local_irq_enable();
> + return idx;
> + }
Same as above, why enable IRQs here?
> + arch_cpu_idle();
> + return idx;
... and if we need to enable IRQs in the other cases above, why do we *not*
need to enable them here?
> +}
> +
> +static struct cpuidle_driver arm_poll_idle_driver __initdata = {
> + .name = "arm_poll_idle",
> + .owner = THIS_MODULE,
> + .states = {
> + {
> + .enter = arm_idle_wfe_poll,
> + .exit_latency = 0,
> + .target_residency = 0,
> + .exit_latency_ns = 0,
> + .power_usage = UINT_MAX,
> + .flags = CPUIDLE_FLAG_POLLING,
> + .name = "WFE",
> + .desc = "ARM WFE",
> + },
> + {
> + .enter = arm_idle_wfi,
> + .exit_latency = DEFAULT_POLL_LIMIT_US,
> + .target_residency = DEFAULT_POLL_LIMIT_US,
> + .power_usage = UINT_MAX,
> + .name = "WFI",
> + .desc = "ARM WFI",
> + },
> + },
> + .state_count = 2,
> +};
How does this interact with the existing driver?
How does DEFAULT_POLL_LIMIT_US compare with PSCI idle states?
> +
> +/*
> + * arm_poll_init_cpu - Initializes arm cpuidle polling driver for one cpu
> + */
> +static int __init arm_poll_init_cpu(int cpu)
> +{
> + int ret;
> + struct cpuidle_driver *drv;
> +
> + drv = kmemdup(&arm_poll_idle_driver, sizeof(*drv), GFP_KERNEL);
> + if (!drv)
> + return -ENOMEM;
> +
> + drv->cpumask = (struct cpumask *)cpumask_of(cpu);
> + drv->states[1].exit_latency = poll_limit;
> + drv->states[1].target_residency = poll_limit;
> +
> + ret = cpuidle_register(drv, NULL);
> + if (ret) {
> + pr_err("failed to register driver: %d, cpu %d\n", ret, cpu);
> + goto out_kfree_drv;
> + }
> +
> + pr_info("registered driver cpu %d\n", cpu);
This does not need to be printed for each CPU.
Mark.
> +
> + cpuidle_cooling_register(drv);
> +
> + return 0;
> +
> +out_kfree_drv:
> + kfree(drv);
> + return ret;
> +}
> +
> +/*
> + * arm_poll_init - Initializes arm cpuidle polling driver
> + */
> +static int __init arm_poll_init(void)
> +{
> + int cpu, ret;
> + struct cpuidle_driver *drv;
> + struct cpuidle_device *dev;
> +
> + for_each_possible_cpu(cpu) {
> + ret = arm_poll_init_cpu(cpu);
> + if (ret)
> + goto out_fail;
> + }
> +
> + return 0;
> +
> +out_fail:
> + pr_info("de-register all");
> + while (--cpu >= 0) {
> + dev = per_cpu(cpuidle_devices, cpu);
> + drv = cpuidle_get_cpu_driver(dev);
> + cpuidle_unregister(drv);
> + kfree(drv);
> + }
> +
> + return ret;
> +}
> +
> +module_param(poll_limit, uint, 0444);
> +device_initcall(arm_poll_init);
> --
> 2.34.1
>
>
^ permalink raw reply
* Re: [PATCH 2/3] arm64: add __READ_ONCE_EX()
From: Mark Rutland @ 2024-04-02 16:48 UTC (permalink / raw)
To: Haris Okanovic; +Cc: linux-kernel, linux-pm, linux-assembly, peterz
In-Reply-To: <20240402014706.3969151-2-harisokn@amazon.com>
On Mon, Apr 01, 2024 at 08:47:05PM -0500, Haris Okanovic wrote:
> Perform an exclusive load, which atomically loads a word and arms the
> execusive monitor to enable wfe() polling of an address.
>
> Adding this macro in preparation for an arm64 cpuidle driver which
> supports a wfe() based polling state.
>
> https://developer.arm.com/documentation/dht0008/a/arm-synchronization-primitives/exclusive-accesses/exclusive-monitors
>
> Signed-off-by: Haris Okanovic <harisokn@amazon.com>
> ---
> arch/arm64/include/asm/readex.h | 46 +++++++++++++++++++++++++++++++++
> 1 file changed, 46 insertions(+)
> create mode 100644 arch/arm64/include/asm/readex.h
>
> diff --git a/arch/arm64/include/asm/readex.h b/arch/arm64/include/asm/readex.h
> new file mode 100644
> index 000000000000..51963c3107e1
> --- /dev/null
> +++ b/arch/arm64/include/asm/readex.h
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Based on arch/arm64/include/asm/rwonce.h
> + *
> + * Copyright (C) 2020 Google LLC.
> + * Copyright (C) 2024 Amazon.com, Inc. or its affiliates.
> + */
> +
> +#ifndef __ASM_READEX_H
> +#define __ASM_READEX_H
> +
> +#define __LOAD_EX(sfx, regs...) "ldaxr" #sfx "\t" #regs
> +
> +#define __READ_ONCE_EX(x) \
> +({ \
> + typeof(&(x)) __x = &(x); \
> + int atomic = 1; \
> + union { __unqual_scalar_typeof(*__x) __val; char __c[1]; } __u; \
> + switch (sizeof(x)) { \
> + case 1: \
> + asm volatile(__LOAD_EX(b, %w0, %1) \
> + : "=r" (*(__u8 *)__u.__c) \
> + : "Q" (*__x) : "memory"); \
> + break; \
> + case 2: \
> + asm volatile(__LOAD_EX(h, %w0, %1) \
> + : "=r" (*(__u16 *)__u.__c) \
> + : "Q" (*__x) : "memory"); \
> + break; \
> + case 4: \
> + asm volatile(__LOAD_EX(, %w0, %1) \
> + : "=r" (*(__u32 *)__u.__c) \
> + : "Q" (*__x) : "memory"); \
> + break; \
> + case 8: \
> + asm volatile(__LOAD_EX(, %0, %1) \
> + : "=r" (*(__u64 *)__u.__c) \
> + : "Q" (*__x) : "memory"); \
> + break; \
> + default: \
> + atomic = 0; \
> + } \
> + atomic ? (typeof(*__x))__u.__val : (*(volatile typeof(__x))__x);\
> +})
Why can't you use the existing smp_cond_load_relaxed() or
smp_cond_load_acquire()?
I don't believe this is necessary.
Mark.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox