* [PATCH 6.1 000/219] 6.1.54-rc1 review
@ 2023-09-17 19:12 Greg Kroah-Hartman
2023-09-17 20:47 ` SeongJae Park
` (11 more replies)
0 siblings, 12 replies; 39+ messages in thread
From: Greg Kroah-Hartman @ 2023-09-17 19:12 UTC (permalink / raw)
To: stable
Cc: Greg Kroah-Hartman, patches, linux-kernel, torvalds, akpm, linux,
shuah, patches, lkft-triage, pavel, jonathanh, f.fainelli,
sudipm.mukherjee, srw, rwarsow, conor
This is the start of the stable review cycle for the 6.1.54 release.
There are 219 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.54-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linux 6.1.54-rc1
Wesley Chalmers <wesley.chalmers@amd.com>
drm/amd/display: Fix a bug when searching for insert_above_mpcc
Maciej W. Rozycki <macro@orcam.me.uk>
MIPS: Only fiddle with CHECKFLAGS if `need-compiler'
Kuniyuki Iwashima <kuniyu@amazon.com>
kcm: Fix error handling for SOCK_DGRAM in kcm_sendmsg().
Vadim Fedorenko <vadim.fedorenko@linux.dev>
ixgbe: fix timestamp configuration code
Kuniyuki Iwashima <kuniyu@amazon.com>
tcp: Fix bind() regression for v4-mapped-v6 non-wildcard address.
Kuniyuki Iwashima <kuniyu@amazon.com>
tcp: Fix bind() regression for v4-mapped-v6 wildcard address.
Kuniyuki Iwashima <kuniyu@amazon.com>
tcp: Factorise sk_family-independent comparison in inet_bind2_bucket_match(_addr_any).
Kuniyuki Iwashima <kuniyu@amazon.com>
ipv6: Remove in6addr_any alternatives.
Eric Dumazet <edumazet@google.com>
ipv6: fix ip6_sock_set_addr_preferences() typo
Sascha Hauer <s.hauer@pengutronix.de>
net: macb: fix sleep inside spinlock
Harini Katakam <harini.katakam@xilinx.com>
net: macb: Enable PTP unicast
Liu Jian <liujian56@huawei.com>
net/tls: do not free tls_rec on async operation in bpf_exec_tx_verdict()
Geert Uytterhoeven <geert+renesas@glider.be>
platform/mellanox: NVSW_SN2201 should depend on ACPI
Shravan Kumar Ramani <shravankr@nvidia.com>
platform/mellanox: mlxbf-pmc: Fix reading of unprogrammed events
Shravan Kumar Ramani <shravankr@nvidia.com>
platform/mellanox: mlxbf-pmc: Fix potential buffer overflows
Liming Sun <limings@nvidia.com>
platform/mellanox: mlxbf-tmfifo: Drop jumbo frames
Liming Sun <limings@nvidia.com>
platform/mellanox: mlxbf-tmfifo: Drop the Rx packet if no more descriptors
Shigeru Yoshida <syoshida@redhat.com>
kcm: Fix memory leak in error path of kcm_sendmsg()
Hayes Wang <hayeswang@realtek.com>
r8152: check budget for r8152_poll()
Vladimir Oltean <vladimir.oltean@nxp.com>
net: dsa: sja1105: block FDB accesses that are concurrent with a switch reset
Vladimir Oltean <vladimir.oltean@nxp.com>
net: dsa: sja1105: serialize sja1105_port_mcast_flood() with other FDB accesses
Vladimir Oltean <vladimir.oltean@nxp.com>
net: dsa: sja1105: fix multicast forwarding working only for last added mdb entry
Vladimir Oltean <vladimir.oltean@nxp.com>
net: dsa: sja1105: propagate exact error code from sja1105_dynamic_config_poll_valid()
Vladimir Oltean <vladimir.oltean@nxp.com>
net: dsa: sja1105: hide all multicast addresses from "bridge fdb show"
Ciprian Regus <ciprian.regus@analog.com>
net:ethernet:adi:adin1110: Fix forwarding offload
Yang Yingliang <yangyingliang@huawei.com>
net: ethernet: adi: adin1110: use eth_broadcast_addr() to assign broadcast address
Ziyang Xuan <william.xuanziyang@huawei.com>
hsr: Fix uninit-value access in fill_frame_info()
Hangyu Hua <hbh25y@gmail.com>
net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_hwlro_get_fdir_all()
Hangyu Hua <hbh25y@gmail.com>
net: ethernet: mvpp2_main: fix possible OOB write in mvpp2_ethtool_get_rxnfc()
Vincent Whitchurch <vincent.whitchurch@axis.com>
net: stmmac: fix handling of zero coalescing tx-usecs
Guangguan Wang <guangguan.wang@linux.alibaba.com>
net/smc: use smc_lgr_list.lock to protect smc_lgr_list.list iterate in smcr_port_add
Björn Töpel <bjorn@rivosinc.com>
selftests: Keep symlinks, when possible
Björn Töpel <bjorn@rivosinc.com>
kselftest/runner.sh: Propagate SIGTERM to runner child
Liu Jian <liujian56@huawei.com>
net: ipv4: fix one memleak in __inet_del_ifa()
Jinjie Ruan <ruanjinjie@huawei.com>
kunit: Fix wild-memory-access bug in kunit_free_suite_set()
Hamza Mahfooz <hamza.mahfooz@amd.com>
drm/amdgpu: register a dirty framebuffer callback for fbcon
Gabe Teeger <gabe.teeger@amd.com>
drm/amd/display: Remove wait while locked
Wenjing Liu <wenjing.liu@amd.com>
drm/amd/display: always switch off ODM before committing more streams
Namhyung Kim <namhyung@kernel.org>
perf hists browser: Fix the number of entries for 'e' key
Namhyung Kim <namhyung@kernel.org>
perf tools: Handle old data in PERF_RECORD_ATTR
Namhyung Kim <namhyung@kernel.org>
perf test shell stat_bpf_counters: Fix test on Intel
Namhyung Kim <namhyung@kernel.org>
perf hists browser: Fix hierarchy mode header
Maciej W. Rozycki <macro@orcam.me.uk>
MIPS: Fix CONFIG_CPU_DADDI_WORKAROUNDS `modules_install' regression
Sean Christopherson <seanjc@google.com>
KVM: SVM: Skip VMSA init in sev_es_init_vmcb() if pointer is NULL
Sean Christopherson <seanjc@google.com>
KVM: SVM: Set target pCPU during IRTE update if target vCPU is running
Sean Christopherson <seanjc@google.com>
KVM: nSVM: Load L1's TSC multiplier based on L1 state, not L2 state
Sean Christopherson <seanjc@google.com>
KVM: nSVM: Check instead of asserting on nested TSC scaling support
Sean Christopherson <seanjc@google.com>
KVM: SVM: Get source vCPUs from source VM for SEV-ES intrahost migration
Sean Christopherson <seanjc@google.com>
KVM: SVM: Don't inject #UD if KVM attempts to skip SEV guest insn
Sean Christopherson <seanjc@google.com>
KVM: SVM: Take and hold ir_list_lock when updating vCPU's Physical ID entry
Hamza Mahfooz <hamza.mahfooz@amd.com>
drm/amd/display: prevent potential division by zero errors
Melissa Wen <mwen@igalia.com>
drm/amd/display: enable cursor degamma for DCN3+ DRM legacy gamma
William Zhang <william.zhang@broadcom.com>
mtd: rawnand: brcmnand: Fix ECC level field setting for v7.2 controller
William Zhang <william.zhang@broadcom.com>
mtd: rawnand: brcmnand: Fix potential false time out warning
Linus Walleij <linus.walleij@linaro.org>
mtd: spi-nor: Correct flags for Winbond w25q128
William Zhang <william.zhang@broadcom.com>
mtd: rawnand: brcmnand: Fix potential out-of-bounds access in oob write
William Zhang <william.zhang@broadcom.com>
mtd: rawnand: brcmnand: Fix crash during the panic_write
Liu Ying <victor.liu@nxp.com>
drm/mxsfb: Disable overlay plane in mxsfb_plane_overlay_atomic_disable()
Anand Jain <anand.jain@oracle.com>
btrfs: use the correct superblock to compare fsid in btrfs_validate_super
Naohiro Aota <naohiro.aota@wdc.com>
btrfs: zoned: re-enable metadata over-commit for zoned mode
Josef Bacik <josef@toxicpanda.com>
btrfs: set page extent mapped after read_folio in relocate_one_page
Filipe Manana <fdmanana@suse.com>
btrfs: don't start transaction when joining with TRANS_JOIN_NOSTART
Boris Burkov <boris@bur.io>
btrfs: free qgroup rsv on io failure
Boris Burkov <boris@bur.io>
btrfs: fix start transaction qgroup rsv double free
Naohiro Aota <naohiro.aota@wdc.com>
btrfs: zoned: do not zone finish data relocation block group
ruanmeisi <ruan.meisi@zte.com.cn>
fuse: nlookup missing decrement in fuse_direntplus_link
Damien Le Moal <dlemoal@kernel.org>
ata: pata_ftide010: Add missing MODULE_DESCRIPTION
Damien Le Moal <dlemoal@kernel.org>
ata: sata_gemini: Add missing MODULE_DESCRIPTION
Michael Schmitz <schmitzmic@gmail.com>
ata: pata_falcon: fix IO base selection for Q40
Werner Fischer <devlists@wefi.net>
ata: ahci: Add Elkhart Lake AHCI controller
Christian Marangi <ansuelsmth@gmail.com>
hwspinlock: qcom: add missing regmap config for SFPB MMIO implementation
Nathan Chancellor <nathan@kernel.org>
lib: test_scanf: Add explicit type cast to result initialization in test_number_prefix()
Jaegeuk Kim <jaegeuk@kernel.org>
f2fs: avoid false alarm of circular locking
Jaegeuk Kim <jaegeuk@kernel.org>
f2fs: flush inode if atomic file is aborted
Luís Henriques <lhenriques@suse.de>
ext4: fix memory leaks in ext4_fname_{setup_filename,prepare_lookup}
Wang Jianjian <wangjianjian0@foxmail.com>
ext4: add correct group descriptors and reserved GDT blocks to system zone
Zhang Yi <yi.zhang@huawei.com>
jbd2: correct the end of the journal recovery scan range
Zhihao Cheng <chengzhihao1@huawei.com>
jbd2: check 'jh->b_transaction' before removing it from checkpoint
Zhang Yi <yi.zhang@huawei.com>
jbd2: fix checkpoint cleanup performance regression
Hien Huynh <hien.huynh.px@renesas.com>
dmaengine: sh: rz-dmac: Fix destination and source data size setting
Walter Chang <walter.chang@mediatek.com>
clocksource/drivers/arm_arch_timer: Disable timer before programming CVAL
Pavel Kozlov <pavel.kozlov@synopsys.com>
ARC: atomics: Add compiler barrier to atomic operations...
Saeed Mahameed <saeedm@nvidia.com>
net/mlx5: Free IRQ rmap and notifier on kernel shutdown
Kalesh Singh <kaleshsingh@google.com>
Multi-gen LRU: avoid race in inc_min_seq()
Petr Tesarik <petr.tesarik.ext@huawei.com>
sh: boards: Fix CEU buffer size passed to dma_declare_coherent_memory()
Jie Wang <wangjie125@huawei.com>
net: hns3: remove GSO partial feature bit
Yisen Zhuang <yisen.zhuang@huawei.com>
net: hns3: fix the port information display when sfp is absent
Jijie Shao <shaojijie@huawei.com>
net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue
Hao Chen <chenhao418@huawei.com>
net: hns3: fix debugfs concurrency issue between kfree buffer and read
Hao Chen <chenhao418@huawei.com>
net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read()
Jian Shen <shenjian15@huawei.com>
net: hns3: fix tx timeout issue
Wander Lairson Costa <wander@redhat.com>
netfilter: nfnetlink_osf: avoid OOB read
Florian Westphal <fw@strlen.de>
netfilter: nftables: exthdr: fix 4-byte stack OOB write
Sebastian Andrzej Siewior <bigeasy@linutronix.de>
bpf: Assign bpf_tramp_run_ctx::saved_run_ctx before recursion check.
Sebastian Andrzej Siewior <bigeasy@linutronix.de>
bpf: Invoke __bpf_prog_exit_sleepable_recur() on recursion in kern_sys_bpf().
Martin KaFai Lau <martin.lau@kernel.org>
bpf: Remove prog->active check for bpf_lsm and bpf_iter
Vladimir Oltean <vladimir.oltean@nxp.com>
net: dsa: sja1105: complete tc-cbs offload support on SJA1110
Vladimir Oltean <vladimir.oltean@nxp.com>
net: dsa: sja1105: fix -ENOSPC when replacing the same tc-cbs too many times
Vladimir Oltean <vladimir.oltean@nxp.com>
net: dsa: sja1105: fix bandwidth discrepancy between tc-cbs software and offload
Eric Dumazet <edumazet@google.com>
ip_tunnels: use DEV_STATS_INC()
Ariel Marcovitch <arielmarcovitch@gmail.com>
idr: fix param name in idr_alloc_cyclic() doc
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
s390/zcrypt: don't leak memory if dev_set_name() fails
Olga Zaborska <olga.zaborska@intel.com>
igb: Change IGB_MIN to allow set rx/tx value between 64 and 80
Olga Zaborska <olga.zaborska@intel.com>
igbvf: Change IGBVF_MIN to allow set rx/tx value between 64 and 80
Olga Zaborska <olga.zaborska@intel.com>
igc: Change IGC_MIN to allow set rx/tx value between 64 and 80
Geetha sowjanya <gakula@marvell.com>
octeontx2-af: Fix truncation of smq in CN10K NIX AQ enqueue mbox handler
Shigeru Yoshida <syoshida@redhat.com>
kcm: Destroy mutex in kcm_exit_net()
valis <sec@valis.email>
net: sched: sch_qfq: Fix UAF in qfq_dequeue()
Kuniyuki Iwashima <kuniyu@amazon.com>
af_unix: Fix data race around sk->sk_err.
Kuniyuki Iwashima <kuniyu@amazon.com>
af_unix: Fix data-races around sk->sk_shutdown.
Kuniyuki Iwashima <kuniyu@amazon.com>
af_unix: Fix data-race around unix_tot_inflight.
Kuniyuki Iwashima <kuniyu@amazon.com>
af_unix: Fix data-races around user->unix_inflight.
John Fastabend <john.fastabend@gmail.com>
bpf, sockmap: Fix skb refcnt race after locking changes
Oleksij Rempel <linux@rempel-privat.de>
net: phy: micrel: Correct bit assignments for phy_device flags
Alex Henrie <alexhenrie24@gmail.com>
net: ipv6/addrconf: avoid integer underflow in ipv6_create_tempaddr
Liang Chen <liangchen.linux@gmail.com>
veth: Fixing transmit return status for dropped packets
Eric Dumazet <edumazet@google.com>
gve: fix frag_list chaining
Corinna Vinschen <vinschen@redhat.com>
igb: disable virtualization features on 82580
Sriram Yagnaraman <sriram.yagnaraman@est.tech>
ipv6: ignore dst hint for multipath routes
Sriram Yagnaraman <sriram.yagnaraman@est.tech>
ipv4: ignore dst hint for multipath routes
Eric Dumazet <edumazet@google.com>
mptcp: annotate data-races around msk->rmem_fwd_alloc
Eric Dumazet <edumazet@google.com>
net: annotate data-races around sk->sk_forward_alloc
Eric Dumazet <edumazet@google.com>
net: use sk_forward_alloc_get() in sk_get_meminfo()
Sean Christopherson <seanjc@google.com>
drm/i915/gvt: Drop unused helper intel_vgpu_reset_gtt()
Sean Christopherson <seanjc@google.com>
drm/i915/gvt: Put the page reference obtained by KVM's gfn_to_pfn()
Sean Christopherson <seanjc@google.com>
drm/i915/gvt: Verify pfn is "valid" before dereferencing "struct page"
Xiubo Li <xiubli@redhat.com>
ceph: make members in struct ceph_mds_request_args_ext a union
Magnus Karlsson <magnus.karlsson@intel.com>
xsk: Fix xsk_diag use-after-free error during socket cleanup
Florian Westphal <fw@strlen.de>
net: fib: avoid warn splat in flow dissector
Eric Dumazet <edumazet@google.com>
net: read sk->sk_family once in sk_mc_loop()
Eric Dumazet <edumazet@google.com>
ipv4: annotate data-races around fi->fib_dead
Eric Dumazet <edumazet@google.com>
sctp: annotate data-races around sk->sk_wmem_queued
Eric Dumazet <edumazet@google.com>
net/sched: fq_pie: avoid stalls in fq_pie_timer()
Katya Orlova <e.orlova@ispras.ru>
smb: propagate error code of extract_sharename()
Paulo Alcantara <pc@cjr.nz>
cifs: use fs_context for automounts
Yu Kuai <yukuai3@huawei.com>
blk-throttle: consider 'carryover_ios/bytes' in throtl_trim_slice()
Yu Kuai <yukuai3@huawei.com>
blk-throttle: use calculate_io/bytes_allowed() for throtl_trim_slice()
Andrzej Hajda <andrzej.hajda@intel.com>
drm/i915: mark requests for GuC virtual engines to avoid use-after-free
Namhyung Kim <namhyung@kernel.org>
perf test stat_bpf_counters_cgrp: Enhance perf stat cgroup BPF counter test
Kajol Jain <kjain@linux.ibm.com>
perf test stat_bpf_counters_cgrp: Fix shellcheck issue about logical operators
Vladimir Zapolskiy <vz@mleia.com>
pwm: lpc32xx: Remove handling of PWM channels
Raag Jadav <raag.jadav@intel.com>
watchdog: intel-mid_wdt: add MODULE_ALIAS() to allow auto-load
Arnaldo Carvalho de Melo <acme@redhat.com>
perf top: Don't pass an ERR_PTR() directly to perf_session__delete()
Kajol Jain <kjain@linux.ibm.com>
perf vendor events: Drop STORES_PER_INST metric event for power10 platform
Kajol Jain <kjain@linux.ibm.com>
perf vendor events: Drop some of the JSON/events for power10 platform
Kajol Jain <kjain@linux.ibm.com>
perf vendor events: Update the JSON/events descriptions for power10 platform
Sean Christopherson <seanjc@google.com>
x86/virt: Drop unnecessary check on extended CPUID level in cpu_has_svm()
Arnaldo Carvalho de Melo <acme@redhat.com>
perf annotate bpf: Don't enclose non-debug code with an assert()
Dmitry Torokhov <dmitry.torokhov@gmail.com>
Input: tca6416-keypad - fix interrupt enable disbalance
Dmitry Torokhov <dmitry.torokhov@gmail.com>
Input: tca6416-keypad - always expect proper IRQ number in i2c client
Ying Liu <victor.liu@nxp.com>
backlight: gpio_backlight: Drop output GPIO direction check for initial power state
Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
pwm: atmel-tcb: Fix resource freeing in error path and remove
Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
pwm: atmel-tcb: Harmonize resource allocation order
Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
pwm: atmel-tcb: Convert to platform remove callback returning void
Arnaldo Carvalho de Melo <acme@redhat.com>
perf trace: Really free the evsel->priv area
Arnaldo Carvalho de Melo <acme@redhat.com>
perf trace: Use zfree() to reduce chances of use after free
Jeff LaBundy <jeff@labundy.com>
Input: iqs7222 - configure power mode before triggering ATI
Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
kconfig: fix possible buffer overflow
Jonathan Marek <jonathan@marek.ca>
mailbox: qcom-ipcc: fix incorrect num_chans counting
Andreas Gruenbacher <agruenba@redhat.com>
gfs2: low-memory forced flush fixes
Andreas Gruenbacher <agruenba@redhat.com>
gfs2: Switch to wait_event in gfs2_logd
Christophe JAILLET <christophe.jaillet@wanadoo.fr>
tpm_crb: Fix an error handling path in crb_acpi_add()
Masahiro Yamada <masahiroy@kernel.org>
kbuild: do not run depmod for 'make modules_sign'
Masahiro Yamada <masahiroy@kernel.org>
kbuild: rpm-pkg: define _arch conditionally
Eric Dumazet <edumazet@google.com>
net: deal with integer overflows in kmalloc_reserve()
Eric Dumazet <edumazet@google.com>
net: factorize code in kmalloc_reserve()
Eric Dumazet <edumazet@google.com>
net: remove osize variable in __alloc_skb()
Eric Dumazet <edumazet@google.com>
net: add SKB_HEAD_ALIGN() helper
Qiang Yu <quic_qianyu@quicinc.com>
bus: mhi: host: Skip MHI reset if device is in RDDM
Fedor Pchelkin <pchelkin@ispras.ru>
NFSv4/pnfs: minor fix for cleanup path in nfs4_get_device_info
Trond Myklebust <trond.myklebust@hammerspace.com>
NFS: Fix a potential data corruption
Johan Hovold <johan+linaro@kernel.org>
clk: qcom: mss-sc7180: fix missing resume during probe
Johan Hovold <johan+linaro@kernel.org>
clk: qcom: q6sstop-qcs404: fix missing resume during probe
Johan Hovold <johan+linaro@kernel.org>
clk: qcom: lpasscc-sc7280: fix missing resume during probe
Johan Hovold <johan+linaro@kernel.org>
clk: qcom: dispcc-sm8450: fix runtime PM imbalance on probe errors
Chris Lew <quic_clew@quicinc.com>
soc: qcom: qmi_encdec: Restrict string length in decode
Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
clk: qcom: gcc-mdm9615: use proper parent for pll0_vote clock
Marco Felsch <m.felsch@pengutronix.de>
clk: imx: pll14xx: align pdiv with reference manual
Ahmad Fatoum <a.fatoum@pengutronix.de>
clk: imx: pll14xx: dynamically configure PLL for 393216000/361267200Hz
Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
dt-bindings: clock: xlnx,versal-clk: drop select:false
Raag Jadav <raag.jadav@intel.com>
pinctrl: cherryview: fix address_space_handler() argument
Bharath SM <bharathsm@microsoft.com>
cifs: update desired access while requesting for directory lease
Helge Deller <deller@gmx.de>
parisc: led: Reduce CPU overhead for disk & lan LED computation
Helge Deller <deller@gmx.de>
parisc: led: Fix LAN receive and transmit LEDs
Andrew Donnellan <ajd@linux.ibm.com>
lib/test_meminit: allocate pages up to order MAX_ORDER
Muchun Song <muchun.song@linux.dev>
mm: hugetlb_vmemmap: fix a race between vmemmap pmd split
Michal Hocko <mhocko@suse.com>
memcg: drop kmem.limit_in_bytes
Steve French <stfrench@microsoft.com>
send channel sequence number in SMB3 requests after reconnects
Chris Paterson <chris.paterson2@renesas.com>
arm64: dts: renesas: rzg2l: Fix txdv-skew-psec typos
Johan Hovold <johan+linaro@kernel.org>
clk: qcom: turingcc-qcs404: fix missing resume during probe
Sheetal <sheetal@nvidia.com>
ASoC: tegra: Fix SFC conversion for few rates
Thomas Zimmermann <tzimmermann@suse.de>
drm/ast: Fix DRAM init on AST2200
Johan Hovold <johan+linaro@kernel.org>
clk: qcom: camcc-sc7180: fix async resume during probe
Thomas Zimmermann <tzimmermann@suse.de>
fbdev/ep93xx-fb: Do not assign to struct fb_info.dev
Chengming Zhou <zhouchengming@bytedance.com>
null_blk: fix poll request timeout handling
Quinn Tran <qutran@marvell.com>
scsi: qla2xxx: Fix firmware resource tracking
Quinn Tran <qutran@marvell.com>
scsi: qla2xxx: Error code did not return to upper layer
Nilesh Javali <njavali@marvell.com>
scsi: qla2xxx: Fix smatch warn for qla_init_iocb_limit()
Quinn Tran <qutran@marvell.com>
scsi: qla2xxx: Flush mailbox commands on chip reset
Manish Rangankar <mrangankar@marvell.com>
scsi: qla2xxx: Remove unsupported ql2xenabledif option
Quinn Tran <qutran@marvell.com>
scsi: qla2xxx: Fix TMF leak through
Quinn Tran <qutran@marvell.com>
scsi: qla2xxx: Fix session hang in gnl
Quinn Tran <qutran@marvell.com>
scsi: qla2xxx: Turn off noisy message log
Quinn Tran <qutran@marvell.com>
scsi: qla2xxx: Fix erroneous link up failure
Quinn Tran <qutran@marvell.com>
scsi: qla2xxx: Fix command flush during TMF
Quinn Tran <qutran@marvell.com>
scsi: qla2xxx: fix inconsistent TMF timeout
Quinn Tran <qutran@marvell.com>
scsi: qla2xxx: Fix deletion race condition
Quinn Tran <qutran@marvell.com>
scsi: qla2xxx: Limit TMF to 8 per function
Quinn Tran <qutran@marvell.com>
scsi: qla2xxx: Adjust IOCB resource on qpair create
Gurchetan Singh <gurchetansingh@chromium.org>
drm/virtio: Conditionally allocate virtio_gpu_fence
Pavel Begunkov <asml.silence@gmail.com>
io_uring: Don't set affinity on a dying sqpoll thread
Pavel Begunkov <asml.silence@gmail.com>
io_uring/sqpoll: fix io-wq affinity when IORING_SETUP_SQPOLL is used
Pavel Begunkov <asml.silence@gmail.com>
io_uring: break out of iowq iopoll on teardown
Pavel Begunkov <asml.silence@gmail.com>
io_uring/net: don't overflow multishot accept
Pavel Begunkov <asml.silence@gmail.com>
io_uring: revert "io_uring fix multishot accept ordering"
Pavel Begunkov <asml.silence@gmail.com>
io_uring: always lock in io_apoll_task_func
Kalesh Singh <kaleshsingh@google.com>
Multi-gen LRU: fix per-zone reclaim
Yu Zhao <yuzhao@google.com>
mm: multi-gen LRU: rename lrugen->lists[] to lrugen->folios[]
Quan Tian <qtian@vmware.com>
net/ipv6: SKB symmetric hash should incorporate transport ports
-------------
Diffstat:
Documentation/admin-guide/cgroup-v1/memory.rst | 2 -
.../devicetree/bindings/clock/xlnx,versal-clk.yaml | 2 -
Documentation/mm/multigen_lru.rst | 8 +-
Makefile | 6 +-
arch/arc/include/asm/atomic-llsc.h | 6 +-
arch/arc/include/asm/atomic64-arcv2.h | 6 +-
arch/arm64/boot/dts/renesas/rzg2l-smarc-som.dtsi | 4 +-
arch/arm64/boot/dts/renesas/rzg2lc-smarc-som.dtsi | 2 +-
arch/arm64/boot/dts/renesas/rzg2ul-smarc-som.dtsi | 4 +-
arch/arm64/net/bpf_jit_comp.c | 9 +-
arch/mips/Makefile | 6 +-
arch/parisc/include/asm/led.h | 4 +-
arch/sh/boards/mach-ap325rxa/setup.c | 2 +-
arch/sh/boards/mach-ecovec24/setup.c | 6 +-
arch/sh/boards/mach-kfr2r09/setup.c | 2 +-
arch/sh/boards/mach-migor/setup.c | 2 +-
arch/sh/boards/mach-se/7724/setup.c | 6 +-
arch/x86/include/asm/virtext.h | 6 -
arch/x86/kvm/svm/avic.c | 59 +++++-
arch/x86/kvm/svm/nested.c | 9 +-
arch/x86/kvm/svm/sev.c | 9 +-
arch/x86/kvm/svm/svm.c | 35 ++-
arch/x86/net/bpf_jit_comp.c | 19 +-
block/blk-throttle.c | 99 ++++-----
drivers/ata/ahci.c | 2 +
drivers/ata/pata_falcon.c | 50 +++--
drivers/ata/pata_ftide010.c | 1 +
drivers/ata/sata_gemini.c | 1 +
drivers/block/null_blk/main.c | 12 +-
drivers/bus/mhi/host/pm.c | 5 +
drivers/char/tpm/tpm_crb.c | 5 +-
drivers/clk/imx/clk-pll14xx.c | 13 +-
drivers/clk/qcom/camcc-sc7180.c | 2 +-
drivers/clk/qcom/dispcc-sm8450.c | 13 +-
drivers/clk/qcom/gcc-mdm9615.c | 2 +-
drivers/clk/qcom/lpasscc-sc7280.c | 16 +-
drivers/clk/qcom/mss-sc7180.c | 13 +-
drivers/clk/qcom/q6sstop-qcs404.c | 15 +-
drivers/clk/qcom/turingcc-qcs404.c | 13 +-
drivers/clocksource/arm_arch_timer.c | 7 +
drivers/dma/sh/rz-dmac.c | 11 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 26 ++-
.../drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 7 +
drivers/gpu/drm/amd/display/dc/Makefile | 1 +
drivers/gpu/drm/amd/display/dc/core/dc.c | 68 ++++--
drivers/gpu/drm/amd/display/dc/dcn10/dcn10_mpc.c | 5 +-
drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 11 -
.../drm/amd/display/modules/freesync/freesync.c | 9 +-
drivers/gpu/drm/ast/ast_post.c | 2 +-
drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 +
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 3 +
drivers/gpu/drm/i915/gvt/gtt.c | 27 +--
drivers/gpu/drm/i915/gvt/gtt.h | 1 -
drivers/gpu/drm/i915/i915_request.c | 7 +-
drivers/gpu/drm/mxsfb/mxsfb_kms.c | 9 +
drivers/gpu/drm/virtio/virtgpu_ioctl.c | 30 +--
drivers/hwspinlock/qcom_hwspinlock.c | 9 +
drivers/input/keyboard/tca6416-keypad.c | 31 +--
drivers/input/misc/iqs7222.c | 8 +-
drivers/mailbox/qcom-ipcc.c | 4 +-
drivers/mtd/nand/raw/brcmnand/brcmnand.c | 112 ++++++----
drivers/mtd/spi-nor/winbond.c | 5 +-
drivers/net/dsa/sja1105/sja1105.h | 4 +
drivers/net/dsa/sja1105/sja1105_dynamic_config.c | 93 ++++----
drivers/net/dsa/sja1105/sja1105_main.c | 120 ++++++++---
drivers/net/dsa/sja1105/sja1105_spi.c | 4 +
drivers/net/ethernet/adi/adin1110.c | 10 +-
drivers/net/ethernet/cadence/macb.h | 4 +
drivers/net/ethernet/cadence/macb_main.c | 18 +-
drivers/net/ethernet/google/gve/gve_rx_dqo.c | 5 +-
drivers/net/ethernet/hisilicon/hns3/hnae3.h | 1 +
drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c | 7 +-
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 19 +-
drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 4 +-
.../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 20 +-
.../ethernet/hisilicon/hns3/hns3pf/hclge_debugfs.c | 14 +-
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 5 +-
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.h | 2 -
drivers/net/ethernet/intel/igb/igb.h | 4 +-
drivers/net/ethernet/intel/igb/igb_main.c | 5 +-
drivers/net/ethernet/intel/igbvf/igbvf.h | 4 +-
drivers/net/ethernet/intel/igc/igc.h | 4 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 28 +--
drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 5 +
.../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 21 +-
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 3 +
.../ethernet/mellanox/mlx5/core/en/tc_tun_encap.c | 5 +-
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 26 ++-
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 +-
drivers/net/usb/r8152.c | 3 +
drivers/net/veth.c | 4 +-
drivers/parisc/led.c | 4 +-
drivers/pinctrl/intel/pinctrl-cherryview.c | 5 +-
drivers/platform/mellanox/Kconfig | 4 +-
drivers/platform/mellanox/mlxbf-pmc.c | 41 ++--
drivers/platform/mellanox/mlxbf-tmfifo.c | 90 +++++---
drivers/pwm/pwm-atmel-tcb.c | 70 +++---
drivers/pwm/pwm-lpc32xx.c | 16 +-
drivers/s390/crypto/zcrypt_api.c | 1 +
drivers/scsi/qla2xxx/qla_attr.c | 2 -
drivers/scsi/qla2xxx/qla_dbg.c | 2 +-
drivers/scsi/qla2xxx/qla_def.h | 21 +-
drivers/scsi/qla2xxx/qla_dfs.c | 10 +
drivers/scsi/qla2xxx/qla_gbl.h | 1 +
drivers/scsi/qla2xxx/qla_init.c | 234 +++++++++++++--------
drivers/scsi/qla2xxx/qla_inline.h | 57 ++++-
drivers/scsi/qla2xxx/qla_iocb.c | 1 +
drivers/scsi/qla2xxx/qla_isr.c | 7 +-
drivers/scsi/qla2xxx/qla_mbx.c | 7 +-
drivers/scsi/qla2xxx/qla_nvme.c | 3 +-
drivers/scsi/qla2xxx/qla_os.c | 26 ++-
drivers/scsi/qla2xxx/qla_target.c | 14 +-
drivers/soc/qcom/qmi_encdec.c | 4 +-
drivers/video/backlight/gpio_backlight.c | 3 +-
drivers/video/fbdev/ep93xx-fb.c | 1 -
drivers/watchdog/intel-mid_wdt.c | 1 +
fs/btrfs/disk-io.c | 5 +-
fs/btrfs/extent-tree.c | 43 ++--
fs/btrfs/inode.c | 7 +
fs/btrfs/relocation.c | 12 +-
fs/btrfs/space-info.c | 6 +-
fs/btrfs/transaction.c | 26 ++-
fs/btrfs/zoned.c | 16 +-
fs/ext4/balloc.c | 15 +-
fs/ext4/block_validity.c | 8 +-
fs/ext4/crypto.c | 4 +
fs/ext4/ext4.h | 2 +
fs/f2fs/f2fs.h | 24 ++-
fs/f2fs/inline.c | 3 +-
fs/f2fs/segment.c | 2 +
fs/fuse/readdir.c | 10 +-
fs/gfs2/aops.c | 4 +-
fs/gfs2/log.c | 25 +--
fs/jbd2/checkpoint.c | 22 +-
fs/jbd2/recovery.c | 12 +-
fs/nfs/direct.c | 20 +-
fs/nfs/pnfs_dev.c | 2 +-
fs/smb/client/cached_dir.c | 2 +-
fs/smb/client/cifs_dfs_ref.c | 100 ++++-----
fs/smb/client/cifsglob.h | 1 +
fs/smb/client/connect.c | 1 +
fs/smb/client/fscache.c | 2 +-
fs/smb/client/smb2ops.c | 11 +-
fs/smb/client/smb2pdu.c | 11 +
fs/smb/common/smb2pdu.h | 22 ++
include/linux/bpf.h | 24 +--
include/linux/bpf_verifier.h | 13 ++
include/linux/ceph/ceph_fs.h | 24 ++-
include/linux/ipv6.h | 1 +
include/linux/micrel_phy.h | 6 +-
include/linux/mm_inline.h | 4 +-
include/linux/mmzone.h | 8 +-
include/linux/skbuff.h | 8 +
include/linux/tca6416_keypad.h | 1 -
include/net/ip.h | 1 +
include/net/ip6_fib.h | 14 +-
include/net/ip_fib.h | 5 +-
include/net/ip_tunnels.h | 15 +-
include/net/ipv6.h | 7 +-
include/net/sock.h | 12 +-
include/trace/events/fib.h | 5 +-
include/trace/events/fib6.h | 5 +-
io_uring/io-wq.c | 17 +-
io_uring/io-wq.h | 3 +-
io_uring/io_uring.c | 31 ++-
io_uring/net.c | 8 +-
io_uring/poll.c | 3 +-
io_uring/sqpoll.c | 17 ++
io_uring/sqpoll.h | 1 +
kernel/bpf/syscall.c | 7 +-
kernel/bpf/trampoline.c | 81 +++++--
lib/idr.c | 2 +-
lib/kunit/test.c | 3 +-
lib/test_meminit.c | 2 +-
lib/test_scanf.c | 2 +-
mm/hugetlb_vmemmap.c | 34 ++-
mm/memcontrol.c | 10 -
mm/vmscan.c | 50 +++--
net/core/flow_dissector.c | 3 +-
net/core/skbuff.c | 49 ++---
net/core/skmsg.c | 12 +-
net/core/sock.c | 19 +-
net/ethtool/ioctl.c | 10 +-
net/hsr/hsr_forward.c | 1 +
net/ipv4/devinet.c | 10 +-
net/ipv4/fib_semantics.c | 5 +-
net/ipv4/fib_trie.c | 3 +-
net/ipv4/inet_hashtables.c | 43 ++--
net/ipv4/ip_input.c | 3 +-
net/ipv4/route.c | 1 +
net/ipv4/tcp_output.c | 2 +-
net/ipv4/udp.c | 6 +-
net/ipv6/addrconf.c | 2 +-
net/ipv6/ip6_input.c | 3 +-
net/ipv6/route.c | 3 +
net/kcm/kcmsock.c | 15 +-
net/mptcp/protocol.c | 23 +-
net/netfilter/nfnetlink_osf.c | 8 +
net/netfilter/nft_exthdr.c | 22 +-
net/sched/sch_fq_pie.c | 27 ++-
net/sched/sch_plug.c | 2 +-
net/sched/sch_qfq.c | 22 +-
net/sctp/proc.c | 2 +-
net/sctp/socket.c | 10 +-
net/smc/smc_core.c | 2 +
net/tls/tls_sw.c | 4 +-
net/unix/af_unix.c | 2 +-
net/unix/scm.c | 6 +-
net/xdp/xsk_diag.c | 3 +
scripts/kconfig/preprocess.c | 3 +
scripts/package/mkspec | 2 +-
sound/soc/tegra/tegra210_sfc.c | 31 ++-
sound/soc/tegra/tegra210_sfc.h | 4 +-
tools/perf/builtin-top.c | 1 +
tools/perf/builtin-trace.c | 15 +-
.../pmu-events/arch/powerpc/power10/cache.json | 4 +-
.../arch/powerpc/power10/floating_point.json | 7 -
.../pmu-events/arch/powerpc/power10/frontend.json | 30 +--
.../pmu-events/arch/powerpc/power10/marked.json | 30 +--
.../pmu-events/arch/powerpc/power10/memory.json | 6 +-
.../pmu-events/arch/powerpc/power10/metrics.json | 6 -
.../pmu-events/arch/powerpc/power10/others.json | 53 +++--
.../pmu-events/arch/powerpc/power10/pipeline.json | 30 +--
.../perf/pmu-events/arch/powerpc/power10/pmc.json | 4 +-
.../arch/powerpc/power10/translation.json | 11 +-
tools/perf/tests/shell/stat_bpf_counters.sh | 4 +-
tools/perf/tests/shell/stat_bpf_counters_cgrp.sh | 28 ++-
tools/perf/ui/browsers/hists.c | 60 +++---
tools/perf/util/annotate.c | 10 +-
tools/perf/util/header.c | 11 +-
tools/testing/selftests/kselftest/runner.sh | 3 +-
tools/testing/selftests/lib.mk | 4 +-
232 files changed, 2129 insertions(+), 1340 deletions(-)
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6.1 000/219] 6.1.54-rc1 review
2023-09-17 19:12 [PATCH 6.1 000/219] 6.1.54-rc1 review Greg Kroah-Hartman
@ 2023-09-17 20:47 ` SeongJae Park
2023-09-18 5:34 ` Takeshi Ogasawara
` (10 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: SeongJae Park @ 2023-09-17 20:47 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: stable, patches, linux-kernel, torvalds, akpm, linux, shuah,
patches, lkft-triage, pavel, jonathanh, f.fainelli,
sudipm.mukherjee, srw, rwarsow, conor, damon, SeongJae Park
Hello,
On Sun, 17 Sep 2023 21:12:07 +0200 Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> This is the start of the stable review cycle for the 6.1.54 release.
> There are 219 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.54-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
> and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine.
Attaching the test results summary below. Please note that I retrieved the
kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park <sj@kernel.org>
[1] https://github.com/awslabs/damon-tests/tree/next/corr
[2] 89fc7c511aa5 ("Linux 6.1.54-rc1")
Thanks,
SJ
[...]
---
# .config:1408:warning: override: reassigning to symbol CGROUPS
ok 15 selftests: damon-tests: build_nomemcg.sh
# kselftest dir '/home/sjpark/damon-tests-cont/linux/tools/testing/selftests/damon-tests' is in dirty state.
# the log is at '/home/sjpark/log'.
[32m
ok 1 selftests: damon: debugfs_attrs.sh
ok 2 selftests: damon: debugfs_schemes.sh
ok 3 selftests: damon: debugfs_target_ids.sh
ok 4 selftests: damon: debugfs_empty_targets.sh
ok 5 selftests: damon: debugfs_huge_count_read_write.sh
ok 6 selftests: damon: debugfs_duplicate_context_creation.sh
ok 7 selftests: damon: sysfs.sh
ok 1 selftests: damon-tests: kunit.sh
ok 2 selftests: damon-tests: huge_count_read_write.sh
ok 3 selftests: damon-tests: buffer_overflow.sh
ok 4 selftests: damon-tests: rm_contexts.sh
ok 5 selftests: damon-tests: record_null_deref.sh
ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh
ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh
ok 8 selftests: damon-tests: damo_tests.sh
ok 9 selftests: damon-tests: masim-record.sh
ok 10 selftests: damon-tests: build_i386.sh
ok 11 selftests: damon-tests: build_m68k.sh
ok 12 selftests: damon-tests: build_arm64.sh
ok 13 selftests: damon-tests: build_i386_idle_flag.sh
ok 14 selftests: damon-tests: build_i386_highpte.sh
ok 15 selftests: damon-tests: build_nomemcg.sh
[33m
[92mPASS [39m
_remote_run_corr.sh SUCCESS
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6.1 000/219] 6.1.54-rc1 review
2023-09-17 19:12 [PATCH 6.1 000/219] 6.1.54-rc1 review Greg Kroah-Hartman
2023-09-17 20:47 ` SeongJae Park
@ 2023-09-18 5:34 ` Takeshi Ogasawara
2023-09-18 6:42 ` Bagas Sanjaya
` (9 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: Takeshi Ogasawara @ 2023-09-18 5:34 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: stable, patches, linux-kernel, torvalds, akpm, linux, shuah,
patches, lkft-triage, pavel, jonathanh, f.fainelli,
sudipm.mukherjee, srw, rwarsow, conor
Hi Greg
On Mon, Sep 18, 2023 at 5:03 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> This is the start of the stable review cycle for the 6.1.54 release.
> There are 219 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.54-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
6.1.54-rc1 tested.
Build successfully completed.
Boot successfully completed.
No dmesg regressions.
Video output normal.
Sound output normal.
Lenovo ThinkPad X1 Carbon Gen10(Intel i7-1260P(x86_64) arch linux)
Thanks
Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6.1 000/219] 6.1.54-rc1 review
2023-09-17 19:12 [PATCH 6.1 000/219] 6.1.54-rc1 review Greg Kroah-Hartman
2023-09-17 20:47 ` SeongJae Park
2023-09-18 5:34 ` Takeshi Ogasawara
@ 2023-09-18 6:42 ` Bagas Sanjaya
2023-09-18 11:24 ` Conor Dooley
` (8 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: Bagas Sanjaya @ 2023-09-18 6:42 UTC (permalink / raw)
To: Greg Kroah-Hartman, stable
Cc: patches, linux-kernel, torvalds, akpm, linux, shuah, patches,
lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw,
rwarsow, conor
[-- Attachment #1: Type: text/plain, Size: 559 bytes --]
On Sun, Sep 17, 2023 at 09:12:07PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.1.54 release.
> There are 219 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
Successfully compiled and installed bindeb-pkgs on my computer (Acer
Aspire E15, Intel Core i3 Haswell). No noticeable regressions.
Tested-by: Bagas Sanjaya <bagasdotme@gmail.com>
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6.1 000/219] 6.1.54-rc1 review
2023-09-17 19:12 [PATCH 6.1 000/219] 6.1.54-rc1 review Greg Kroah-Hartman
` (2 preceding siblings ...)
2023-09-18 6:42 ` Bagas Sanjaya
@ 2023-09-18 11:24 ` Conor Dooley
2023-09-18 12:08 ` Ron Economos
` (7 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: Conor Dooley @ 2023-09-18 11:24 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: stable, patches, linux-kernel, torvalds, akpm, linux, shuah,
patches, lkft-triage, pavel, jonathanh, f.fainelli,
sudipm.mukherjee, srw, rwarsow, conor
[-- Attachment #1: Type: text/plain, Size: 371 bytes --]
On Sun, Sep 17, 2023 at 09:12:07PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.1.54 release.
> There are 219 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Thanks,
Conor.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6.1 000/219] 6.1.54-rc1 review
2023-09-17 19:12 [PATCH 6.1 000/219] 6.1.54-rc1 review Greg Kroah-Hartman
` (3 preceding siblings ...)
2023-09-18 11:24 ` Conor Dooley
@ 2023-09-18 12:08 ` Ron Economos
2023-09-18 12:48 ` Jon Hunter
` (6 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: Ron Economos @ 2023-09-18 12:08 UTC (permalink / raw)
To: Greg Kroah-Hartman, stable
Cc: patches, linux-kernel, torvalds, akpm, linux, shuah, patches,
lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw,
rwarsow, conor
On 9/17/23 12:12 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.1.54 release.
> There are 219 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.54-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos <re@w6rz.net>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6.1 000/219] 6.1.54-rc1 review
2023-09-17 19:12 [PATCH 6.1 000/219] 6.1.54-rc1 review Greg Kroah-Hartman
` (4 preceding siblings ...)
2023-09-18 12:08 ` Ron Economos
@ 2023-09-18 12:48 ` Jon Hunter
2023-09-18 18:34 ` Florian Fainelli
` (5 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: Jon Hunter @ 2023-09-18 12:48 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Greg Kroah-Hartman, patches, linux-kernel, torvalds, akpm, linux,
shuah, patches, lkft-triage, pavel, jonathanh, f.fainelli,
sudipm.mukherjee, srw, rwarsow, conor, linux-tegra, stable
On Sun, 17 Sep 2023 21:12:07 +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.1.54 release.
> There are 219 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.54-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
All tests passing for Tegra ...
Test results for stable-v6.1:
11 builds: 11 pass, 0 fail
28 boots: 28 pass, 0 fail
130 tests: 130 pass, 0 fail
Linux version: 6.1.54-rc1-g89fc7c511aa5
Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000,
tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000,
tegra20-ventana, tegra210-p2371-2180,
tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Jon
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6.1 000/219] 6.1.54-rc1 review
2023-09-17 19:12 [PATCH 6.1 000/219] 6.1.54-rc1 review Greg Kroah-Hartman
` (5 preceding siblings ...)
2023-09-18 12:48 ` Jon Hunter
@ 2023-09-18 18:34 ` Florian Fainelli
2023-09-18 18:41 ` Guenter Roeck
` (4 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: Florian Fainelli @ 2023-09-18 18:34 UTC (permalink / raw)
To: Greg Kroah-Hartman, stable
Cc: patches, linux-kernel, torvalds, akpm, linux, shuah, patches,
lkft-triage, pavel, jonathanh, sudipm.mukherjee, srw, rwarsow,
conor
On 9/17/2023 12:12 PM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.1.54 release.
> There are 219 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.54-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on
BMIPS_GENERIC:
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
--
Florian
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6.1 000/219] 6.1.54-rc1 review
2023-09-17 19:12 [PATCH 6.1 000/219] 6.1.54-rc1 review Greg Kroah-Hartman
` (6 preceding siblings ...)
2023-09-18 18:34 ` Florian Fainelli
@ 2023-09-18 18:41 ` Guenter Roeck
2023-09-18 20:56 ` Naresh Kamboju
` (3 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: Guenter Roeck @ 2023-09-18 18:41 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: stable, patches, linux-kernel, torvalds, akpm, shuah, patches,
lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw,
rwarsow, conor
On Sun, Sep 17, 2023 at 09:12:07PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.1.54 release.
> There are 219 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000.
> Anything received after that time might be too late.
>
Build results:
total: 157 pass: 157 fail: 0
Qemu test results:
total: 529 pass: 529 fail: 0
Tested-by: Guenter Roeck <linux@roeck-us.net>
Guenter
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6.1 000/219] 6.1.54-rc1 review
2023-09-17 19:12 [PATCH 6.1 000/219] 6.1.54-rc1 review Greg Kroah-Hartman
` (7 preceding siblings ...)
2023-09-18 18:41 ` Guenter Roeck
@ 2023-09-18 20:56 ` Naresh Kamboju
2023-09-18 22:21 ` Shuah Khan
` (2 subsequent siblings)
11 siblings, 0 replies; 39+ messages in thread
From: Naresh Kamboju @ 2023-09-18 20:56 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: stable, patches, linux-kernel, torvalds, akpm, linux, shuah,
patches, lkft-triage, pavel, jonathanh, f.fainelli,
sudipm.mukherjee, srw, rwarsow, conor
On Mon, 18 Sept 2023 at 01:30, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> This is the start of the stable review cycle for the 6.1.54 release.
> There are 219 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.54-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
## Build
* kernel: 6.1.54-rc1
* git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
* git branch: linux-6.1.y
* git commit: 89fc7c511aa5cd0b21e82ec42611db04d9e3b7c2
* git describe: v6.1.52-813-g89fc7c511aa5
* test details:
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.1.y/build/v6.1.52-813-g89fc7c511aa5
## Test Regressions (compared to v6.1.52)
## Metric Regressions (compared to v6.1.52)
## Test Fixes (compared to v6.1.52)
## Metric Fixes (compared to v6.1.52)
## Test result summary
total: 206086, pass: 176646, fail: 2859, skip: 26303, xfail: 278
## Build Summary
* arc: 10 total, 10 passed, 0 failed
* arm: 284 total, 283 passed, 1 failed
* arm64: 89 total, 87 passed, 2 failed
* i386: 67 total, 65 passed, 2 failed
* mips: 56 total, 54 passed, 2 failed
* parisc: 7 total, 7 passed, 0 failed
* powerpc: 70 total, 68 passed, 2 failed
* riscv: 28 total, 26 passed, 2 failed
* s390: 28 total, 27 passed, 1 failed
* sh: 26 total, 24 passed, 2 failed
* sparc: 14 total, 14 passed, 0 failed
* x86_64: 76 total, 72 passed, 4 failed
## Test suites summary
* boot
* kselftest-android
* kselftest-arm64
* kselftest-breakpoints
* kselftest-capabilities
* kselftest-clone3
* kselftest-core
* kselftest-cpu-hotplug
* kselftest-drivers-dma-buf
* kselftest-exec
* kselftest-fpu
* kselftest-ftrace
* kselftest-futex
* kselftest-gpio
* kselftest-intel_pstate
* kselftest-ipc
* kselftest-ir
* kselftest-kcmp
* kselftest-kexec
* kselftest-lib
* kselftest-membarrier
* kselftest-memfd
* kselftest-memory-hotplug
* kselftest-mincore
* kselftest-mount
* kselftest-mqueue
* kselftest-net
* kselftest-net-forwarding
* kselftest-net-mptcp
* kselftest-netfilter
* kselftest-nsfs
* kselftest-openat2
* kselftest-pid_namespace
* kselftest-pidfd
* kselftest-proc
* kselftest-pstore
* kselftest-ptrace
* kselftest-rseq
* kselftest-rtc
* kselftest-seccomp
* kselftest-sigaltstack
* kselftest-size
* kselftest-splice
* kselftest-static_keys
* kselftest-sync
* kselftest-sysctl
* kselftest-tc-testing
* kselftest-timens
* kselftest-timers
* kselftest-tmpfs
* kselftest-tpm2
* kselftest-user
* kselftest-user_events
* kselftest-vDSO
* kselftest-vm
* kselftest-watchdog
* kselftest-x86
* kunit
* kvm-unit-tests
* libgpiod
* log-parser-boot
* log-parser-test
* ltp-cap_bounds
* ltp-commands
* ltp-containers
* ltp-controllers
* ltp-cpuhotplug
* ltp-crypto
* ltp-cve
* ltp-dio
* ltp-fcntl-locktests
* ltp-filecaps
* ltp-fs
* ltp-fs_bind
* ltp-fs_perms_simple
* ltp-fsx
* ltp-hugetlb
* ltp-io
* ltp-ipc
* ltp-math
* ltp-mm
* ltp-nptl
* ltp-pty
* ltp-sched
* ltp-securebits
* ltp-smoke
* ltp-syscalls
* ltp-tracing
* perf
* rcutorture
--
Linaro LKFT
https://lkft.linaro.org
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [PATCH 6.1 000/219] 6.1.54-rc1 review
2023-09-17 19:12 [PATCH 6.1 000/219] 6.1.54-rc1 review Greg Kroah-Hartman
` (8 preceding siblings ...)
2023-09-18 20:56 ` Naresh Kamboju
@ 2023-09-18 22:21 ` Shuah Khan
[not found] ` <20230917191042.204185566@linuxfoundation.org>
2023-09-21 13:04 ` [PATCH 6.1 000/219] 6.1.54-rc1 review Conor Dooley
11 siblings, 0 replies; 39+ messages in thread
From: Shuah Khan @ 2023-09-18 22:21 UTC (permalink / raw)
To: Greg Kroah-Hartman, stable
Cc: patches, linux-kernel, torvalds, akpm, linux, shuah, patches,
lkft-triage, pavel, jonathanh, f.fainelli, sudipm.mukherjee, srw,
rwarsow, conor, Shuah Khan
On 9/17/23 13:12, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.1.54 release.
> There are 219 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Tue, 19 Sep 2023 19:10:04 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.54-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
thanks,
-- Shuah
^ permalink raw reply [flat|nested] 39+ messages in thread
* [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
[not found] ` <20230917191042.204185566@linuxfoundation.org>
@ 2023-09-20 8:11 ` Jeremi Piotrowski
2023-09-20 8:43 ` Michal Hocko
2023-09-22 11:14 ` Linux regression tracking #adding (Thorsten Leemhuis)
0 siblings, 2 replies; 39+ messages in thread
From: Jeremi Piotrowski @ 2023-09-20 8:11 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: stable, patches, Michal Hocko, Shakeel Butt, Johannes Weiner,
Roman Gushchin, Muchun Song, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Sun, Sep 17, 2023 at 09:12:40PM +0200, Greg Kroah-Hartman wrote:
> 6.1-stable review patch. If anyone has any objections, please let me know.
>
> ------------------
Hi Greg/Michal,
This commit breaks userspace which makes it a bad commit for mainline and an
even worse commit for stable.
We ingested 6.1.54 into our nightly testing and found that runc fails to gather
cgroup statistics (when reading kmem.limit_in_bytes). The same code is vendored
into kubelet and kubelet fails to start if this operation fails. 6.1.53 is
fine.
> Address this by wiping out the file completely and effectively get back to
> pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
On reads, the runc code checks for MEMCG_KMEM=n by checking
kmem.usage_in_bytes. If it is present then runc expects the other cgroup files
to be there (including kmem.limit_in_bytes). So this change is not effectively
the same.
Here's a link to the PR that would be needed to handle this change in userspace
(not merged yet and would need to be propagated through the ecosystem):
https://github.com/opencontainers/runc/pull/4018.
Jeremi
>
> From: Michal Hocko <mhocko@suse.com>
>
> commit 86327e8eb94c52eca4f93cfece2e29d1bf52acbf upstream.
>
> kmem.limit_in_bytes (v1 way to limit kernel memory usage) has been
> deprecated since 58056f77502f ("memcg, kmem: further deprecate
> kmem.limit_in_bytes") merged in 5.16. We haven't heard about any serious
> users since then but it seems that the mere presence of the file is
> causing more harm thatn good. We (SUSE) have had several bug reports from
> customers where Docker based containers started to fail because a write to
> kmem.limit_in_bytes has failed.
>
> This was unexpected because runc code only expects ENOENT (kmem disabled)
> or EBUSY (tasks already running within cgroup). So a new error code was
> unexpected and the whole container startup failed. This has been later
> addressed by
> https://github.com/opencontainers/runc/commit/52390d68040637dfc77f9fda6bbe70952423d380
> so current Docker runtimes do not suffer from the problem anymore. There
> are still older version of Docker in use and likely hard to get rid of
> completely.
>
> Address this by wiping out the file completely and effectively get back to
> pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
>
> I would recommend backporting to stable trees which have picked up
> 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes").
>
> [mhocko@suse.com: restore _KMEM switch case]
> Link: https://lkml.kernel.org/r/ZKe5wxdbvPi5Cwd7@dhcp22.suse.cz
> Link: https://lkml.kernel.org/r/20230704115240.14672-1-mhocko@kernel.org
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> Acked-by: Shakeel Butt <shakeelb@google.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
> Documentation/admin-guide/cgroup-v1/memory.rst | 2 --
> mm/memcontrol.c | 10 ----------
> 2 files changed, 12 deletions(-)
>
> --- a/Documentation/admin-guide/cgroup-v1/memory.rst
> +++ b/Documentation/admin-guide/cgroup-v1/memory.rst
> @@ -91,8 +91,6 @@ Brief summary of control files.
> memory.oom_control set/show oom controls.
> memory.numa_stat show the number of memory usage per numa
> node
> - memory.kmem.limit_in_bytes This knob is deprecated and writing to
> - it will return -ENOTSUPP.
> memory.kmem.usage_in_bytes show current kernel memory allocation
> memory.kmem.failcnt show the number of kernel memory usage
> hits limits
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3841,10 +3841,6 @@ static ssize_t mem_cgroup_write(struct k
> case _MEMSWAP:
> ret = mem_cgroup_resize_max(memcg, nr_pages, true);
> break;
> - case _KMEM:
> - /* kmem.limit_in_bytes is deprecated. */
> - ret = -EOPNOTSUPP;
> - break;
> case _TCP:
> ret = memcg_update_tcp_max(memcg, nr_pages);
> break;
> @@ -5056,12 +5052,6 @@ static struct cftype mem_cgroup_legacy_f
> },
> #endif
> {
> - .name = "kmem.limit_in_bytes",
> - .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
> - .write = mem_cgroup_write,
> - .read_u64 = mem_cgroup_read_u64,
> - },
> - {
> .name = "kmem.usage_in_bytes",
> .private = MEMFILE_PRIVATE(_KMEM, RES_USAGE),
> .read_u64 = mem_cgroup_read_u64,
>
>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 8:11 ` [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes Jeremi Piotrowski
@ 2023-09-20 8:43 ` Michal Hocko
2023-09-20 9:25 ` Greg Kroah-Hartman
2023-09-20 10:04 ` Jeremi Piotrowski
2023-09-22 11:14 ` Linux regression tracking #adding (Thorsten Leemhuis)
1 sibling, 2 replies; 39+ messages in thread
From: Michal Hocko @ 2023-09-20 8:43 UTC (permalink / raw)
To: Jeremi Piotrowski
Cc: Greg Kroah-Hartman, stable, patches, Shakeel Butt,
Johannes Weiner, Roman Gushchin, Muchun Song, Tejun Heo,
Andrew Morton, linux-kernel, regressions, mathieu.tortuyaux
On Wed 20-09-23 01:11:01, Jeremi Piotrowski wrote:
> On Sun, Sep 17, 2023 at 09:12:40PM +0200, Greg Kroah-Hartman wrote:
> > 6.1-stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
>
> Hi Greg/Michal,
>
> This commit breaks userspace which makes it a bad commit for mainline and an
> even worse commit for stable.
>
> We ingested 6.1.54 into our nightly testing and found that runc fails to gather
> cgroup statistics (when reading kmem.limit_in_bytes). The same code is vendored
> into kubelet and kubelet fails to start if this operation fails. 6.1.53 is
> fine.
Could you expand some more on why is the file read? It doesn't support
writing to it for some time so how does reading it helps in any sense?
Anyway, I do agree that the stable backport should be reverted.
> > Address this by wiping out the file completely and effectively get back to
> > pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
>
> On reads, the runc code checks for MEMCG_KMEM=n by checking
> kmem.usage_in_bytes. If it is present then runc expects the other cgroup files
> to be there (including kmem.limit_in_bytes). So this change is not effectively
> the same.
>
> Here's a link to the PR that would be needed to handle this change in userspace
> (not merged yet and would need to be propagated through the ecosystem):
>
> https://github.com/opencontainers/runc/pull/4018.
Thanks. Does that mean the revert is still necessary for the Linus tree
or do you expect that the fix can be merged and propagated in a
reasonable time?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 8:43 ` Michal Hocko
@ 2023-09-20 9:25 ` Greg Kroah-Hartman
2023-09-20 10:21 ` Jeremi Piotrowski
2023-09-20 10:04 ` Jeremi Piotrowski
1 sibling, 1 reply; 39+ messages in thread
From: Greg Kroah-Hartman @ 2023-09-20 9:25 UTC (permalink / raw)
To: Michal Hocko, Jeremi Piotrowski
Cc: stable, patches, Shakeel Butt, Johannes Weiner, Roman Gushchin,
Muchun Song, Tejun Heo, Andrew Morton, linux-kernel, regressions,
mathieu.tortuyaux
On Wed, Sep 20, 2023 at 10:43:56AM +0200, Michal Hocko wrote:
> On Wed 20-09-23 01:11:01, Jeremi Piotrowski wrote:
> > On Sun, Sep 17, 2023 at 09:12:40PM +0200, Greg Kroah-Hartman wrote:
> > > 6.1-stable review patch. If anyone has any objections, please let me know.
> > >
> > > ------------------
> >
> > Hi Greg/Michal,
> >
> > This commit breaks userspace which makes it a bad commit for mainline and an
> > even worse commit for stable.
> >
> > We ingested 6.1.54 into our nightly testing and found that runc fails to gather
> > cgroup statistics (when reading kmem.limit_in_bytes). The same code is vendored
> > into kubelet and kubelet fails to start if this operation fails. 6.1.53 is
> > fine.
>
> Could you expand some more on why is the file read? It doesn't support
> writing to it for some time so how does reading it helps in any sense?
>
> Anyway, I do agree that the stable backport should be reverted.
That will just postpone the breakage, we really shouldn't break
userspace.
That being said, having userspace "break" because a file is no longer
present is not good coding style on the userspace side at all. That's
why we have sysfs and single-value-files now, if the file isn't present,
then userspace instantly notices and can handle it. Much easier than
the old-style multi-fields-in-one-file problem.
> > > Address this by wiping out the file completely and effectively get back to
> > > pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
The fact that this is a valid option (i.e. no file) with that config
option disabled makes me want to keep this as well, as how does
userspace handle this option disabled at all? Or old kernels?
I can drop this from stable kernels, but again, this feels like the runc
developers are just postponing the problem...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 8:43 ` Michal Hocko
2023-09-20 9:25 ` Greg Kroah-Hartman
@ 2023-09-20 10:04 ` Jeremi Piotrowski
2023-09-20 11:07 ` Michal Hocko
1 sibling, 1 reply; 39+ messages in thread
From: Jeremi Piotrowski @ 2023-09-20 10:04 UTC (permalink / raw)
To: Michal Hocko
Cc: Greg Kroah-Hartman, stable, patches, Shakeel Butt,
Johannes Weiner, Roman Gushchin, Muchun Song, Tejun Heo,
Andrew Morton, linux-kernel, regressions, mathieu.tortuyaux
On 9/20/2023 10:43 AM, Michal Hocko wrote:
> On Wed 20-09-23 01:11:01, Jeremi Piotrowski wrote:
>> On Sun, Sep 17, 2023 at 09:12:40PM +0200, Greg Kroah-Hartman wrote:
>>> 6.1-stable review patch. If anyone has any objections, please let me know.
>>>
>>> ------------------
>>
>> Hi Greg/Michal,
>>
>> This commit breaks userspace which makes it a bad commit for mainline and an
>> even worse commit for stable.
>>
>> We ingested 6.1.54 into our nightly testing and found that runc fails to gather
>> cgroup statistics (when reading kmem.limit_in_bytes). The same code is vendored
>> into kubelet and kubelet fails to start if this operation fails. 6.1.53 is
>> fine.
>
> Could you expand some more on why is the file read? It doesn't support
> writing to it for some time so how does reading it helps in any sense?
>
> Anyway, I do agree that the stable backport should be reverted.
>
This file is read together with all the other memcg files. Each prefix:
memory
memory.memsw
memory.kmem
memory.kmem.tcp
is combined with these suffixes
.usage_in_bytes
.max_usage_in_bytes
.failcnt
.limit_in_bytes
and read, the values are then forwarded on to other components for scheduling decisions.
You want to know the limit when checking the usage (is the usage close to the limit or not).
Userspace tolerates MEMCG/MEMCG_KMEM being disabled, but having a single file out of the
set missing is an anomaly. So maybe we could keep the dummy file just for the
sake of consistency? Cgroupv1 is legacy after all.
>>> Address this by wiping out the file completely and effectively get back to
>>> pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
>>
>> On reads, the runc code checks for MEMCG_KMEM=n by checking
>> kmem.usage_in_bytes. If it is present then runc expects the other cgroup files
>> to be there (including kmem.limit_in_bytes). So this change is not effectively
>> the same.
>>
>> Here's a link to the PR that would be needed to handle this change in userspace
>> (not merged yet and would need to be propagated through the ecosystem):
>>
>> https://github.com/opencontainers/runc/pull/4018.
>
> Thanks. Does that mean the revert is still necessary for the Linus tree
> or do you expect that the fix can be merged and propagated in a
> reasonable time?
>
We can probably get runc and currently supported kubernetes versions patched in time
before 6.6 (or the next LTS kernel) hits LTS distros.
But there's still a bunch of users running cgroupv1 with unsupported kubernetes
versions that are still taking kernel updates as they come, so this might get reported
again next year if it stays in mainline.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 9:25 ` Greg Kroah-Hartman
@ 2023-09-20 10:21 ` Jeremi Piotrowski
2023-09-20 10:45 ` Greg Kroah-Hartman
0 siblings, 1 reply; 39+ messages in thread
From: Jeremi Piotrowski @ 2023-09-20 10:21 UTC (permalink / raw)
To: Greg Kroah-Hartman, Michal Hocko
Cc: stable, patches, Shakeel Butt, Johannes Weiner, Roman Gushchin,
Muchun Song, Tejun Heo, Andrew Morton, linux-kernel, regressions,
mathieu.tortuyaux
On 9/20/2023 11:25 AM, Greg Kroah-Hartman wrote:
> On Wed, Sep 20, 2023 at 10:43:56AM +0200, Michal Hocko wrote:
>> On Wed 20-09-23 01:11:01, Jeremi Piotrowski wrote:
>>> On Sun, Sep 17, 2023 at 09:12:40PM +0200, Greg Kroah-Hartman wrote:
>>>> 6.1-stable review patch. If anyone has any objections, please let me know.
>>>>
>>>> ------------------
>>>
>>> Hi Greg/Michal,
>>>
>>> This commit breaks userspace which makes it a bad commit for mainline and an
>>> even worse commit for stable.
>>>
>>> We ingested 6.1.54 into our nightly testing and found that runc fails to gather
>>> cgroup statistics (when reading kmem.limit_in_bytes). The same code is vendored
>>> into kubelet and kubelet fails to start if this operation fails. 6.1.53 is
>>> fine.
>>
>> Could you expand some more on why is the file read? It doesn't support
>> writing to it for some time so how does reading it helps in any sense?
>>
>> Anyway, I do agree that the stable backport should be reverted.
>
> That will just postpone the breakage, we really shouldn't break
> userspace.
>
> That being said, having userspace "break" because a file is no longer
> present is not good coding style on the userspace side at all. That's
> why we have sysfs and single-value-files now, if the file isn't present,
> then userspace instantly notices and can handle it. Much easier than
> the old-style multi-fields-in-one-file problem.
>
The memcg files in this case are single-value, but userspace expects to be able
to read memcg limits when it can read the usage (indicating MEMCG is enabled).
If it can't - then something is off, and the node is marked unhealthy.
>>>> Address this by wiping out the file completely and effectively get back to
>>>> pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
>
> The fact that this is a valid option (i.e. no file) with that config
> option disabled makes me want to keep this as well, as how does
> userspace handle this option disabled at all? Or old kernels?
>
Userspace has had to handle the case of MEMCG_KMEM=n, but that had 2 cases so far:
limits/usage/max_usage/failcnt files are all available or none of them are available.
Now it needs to handle 3 of 4 files being available, but only for kmem (and not plain
memory, memsw or kmem.tcp). That's an inconsistency.
> I can drop this from stable kernels, but again, this feels like the runc
> developers are just postponing the problem...
>
Since cgroups v1 is deprecated, I think the runc developers haven't touched this part
of the code in years and expected it to keep working while they wait for the long tail
of usage to die out.
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 10:21 ` Jeremi Piotrowski
@ 2023-09-20 10:45 ` Greg Kroah-Hartman
2023-09-20 11:08 ` Michal Hocko
0 siblings, 1 reply; 39+ messages in thread
From: Greg Kroah-Hartman @ 2023-09-20 10:45 UTC (permalink / raw)
To: Jeremi Piotrowski
Cc: Michal Hocko, stable, patches, Shakeel Butt, Johannes Weiner,
Roman Gushchin, Muchun Song, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Wed, Sep 20, 2023 at 12:21:37PM +0200, Jeremi Piotrowski wrote:
> On 9/20/2023 11:25 AM, Greg Kroah-Hartman wrote:
> > On Wed, Sep 20, 2023 at 10:43:56AM +0200, Michal Hocko wrote:
> >> On Wed 20-09-23 01:11:01, Jeremi Piotrowski wrote:
> >>> On Sun, Sep 17, 2023 at 09:12:40PM +0200, Greg Kroah-Hartman wrote:
> >>>> 6.1-stable review patch. If anyone has any objections, please let me know.
> >>>>
> >>>> ------------------
> >>>
> >>> Hi Greg/Michal,
> >>>
> >>> This commit breaks userspace which makes it a bad commit for mainline and an
> >>> even worse commit for stable.
> >>>
> >>> We ingested 6.1.54 into our nightly testing and found that runc fails to gather
> >>> cgroup statistics (when reading kmem.limit_in_bytes). The same code is vendored
> >>> into kubelet and kubelet fails to start if this operation fails. 6.1.53 is
> >>> fine.
> >>
> >> Could you expand some more on why is the file read? It doesn't support
> >> writing to it for some time so how does reading it helps in any sense?
> >>
> >> Anyway, I do agree that the stable backport should be reverted.
> >
> > That will just postpone the breakage, we really shouldn't break
> > userspace.
> >
> > That being said, having userspace "break" because a file is no longer
> > present is not good coding style on the userspace side at all. That's
> > why we have sysfs and single-value-files now, if the file isn't present,
> > then userspace instantly notices and can handle it. Much easier than
> > the old-style multi-fields-in-one-file problem.
> >
>
> The memcg files in this case are single-value, but userspace expects to be able
> to read memcg limits when it can read the usage (indicating MEMCG is enabled).
> If it can't - then something is off, and the node is marked unhealthy.
>
> >>>> Address this by wiping out the file completely and effectively get back to
> >>>> pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
> >
> > The fact that this is a valid option (i.e. no file) with that config
> > option disabled makes me want to keep this as well, as how does
> > userspace handle this option disabled at all? Or old kernels?
> >
>
> Userspace has had to handle the case of MEMCG_KMEM=n, but that had 2 cases so far:
>
> limits/usage/max_usage/failcnt files are all available or none of them are available.
>
> Now it needs to handle 3 of 4 files being available, but only for kmem (and not plain
> memory, memsw or kmem.tcp). That's an inconsistency.
>
> > I can drop this from stable kernels, but again, this feels like the runc
> > developers are just postponing the problem...
> >
>
> Since cgroups v1 is deprecated, I think the runc developers haven't touched this part
> of the code in years and expected it to keep working while they wait for the long tail
> of usage to die out.
Ok, then we should revert this, I'll go drop it in the stable trees, it
should also be reverted in Linus's tree too.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 10:04 ` Jeremi Piotrowski
@ 2023-09-20 11:07 ` Michal Hocko
2023-09-20 13:25 ` Jeremi Piotrowski
0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2023-09-20 11:07 UTC (permalink / raw)
To: Jeremi Piotrowski
Cc: Greg Kroah-Hartman, stable, patches, Shakeel Butt,
Johannes Weiner, Roman Gushchin, Muchun Song, Tejun Heo,
Andrew Morton, linux-kernel, regressions, mathieu.tortuyaux
On Wed 20-09-23 12:04:48, Jeremi Piotrowski wrote:
> On 9/20/2023 10:43 AM, Michal Hocko wrote:
> > On Wed 20-09-23 01:11:01, Jeremi Piotrowski wrote:
> >> On Sun, Sep 17, 2023 at 09:12:40PM +0200, Greg Kroah-Hartman wrote:
> >>> 6.1-stable review patch. If anyone has any objections, please let me know.
> >>>
> >>> ------------------
> >>
> >> Hi Greg/Michal,
> >>
> >> This commit breaks userspace which makes it a bad commit for mainline and an
> >> even worse commit for stable.
> >>
> >> We ingested 6.1.54 into our nightly testing and found that runc fails to gather
> >> cgroup statistics (when reading kmem.limit_in_bytes). The same code is vendored
> >> into kubelet and kubelet fails to start if this operation fails. 6.1.53 is
> >> fine.
> >
> > Could you expand some more on why is the file read? It doesn't support
> > writing to it for some time so how does reading it helps in any sense?
> >
> > Anyway, I do agree that the stable backport should be reverted.
> >
>
> This file is read together with all the other memcg files. Each prefix:
>
> memory
> memory.memsw
> memory.kmem
> memory.kmem.tcp
>
> is combined with these suffixes
>
> .usage_in_bytes
> .max_usage_in_bytes
> .failcnt
> .limit_in_bytes
>
> and read, the values are then forwarded on to other components for scheduling decisions.
> You want to know the limit when checking the usage (is the usage close to the limit or not).
You know there is no kmem limit as there is no way to set it for some
time (since 5.16 - i.e. 2 years ago). I can see that users following old
kernels could have missed that though.
> Userspace tolerates MEMCG/MEMCG_KMEM being disabled, but having a single file out of the
> set missing is an anomaly. So maybe we could keep the dummy file just for the
> sake of consistency? Cgroupv1 is legacy after all.
What we had was a dummy file. It didn't allow to write any value so it
would have always reported unlimited. The reason I've decided to remove
the file was that there were other users not being able to handle the
write failure while they are just fine not having the file. So we are
effectively between a rock and hard place here. Either way something is
broken. The other SW got fixed as well but similar to your case it takes
some time to absorb the change through all 3rd party users.
> >>> Address this by wiping out the file completely and effectively get back to
> >>> pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
> >>
> >> On reads, the runc code checks for MEMCG_KMEM=n by checking
> >> kmem.usage_in_bytes.
Just one side note. Config options get renamed and their semantic
changes over time so I would just recomment to never make any
dependencies on any specific one.
> >> If it is present then runc expects the other cgroup files
> >> to be there (including kmem.limit_in_bytes). So this change is not effectively
> >> the same.
> >>
> >> Here's a link to the PR that would be needed to handle this change in userspace
> >> (not merged yet and would need to be propagated through the ecosystem):
> >>
> >> https://github.com/opencontainers/runc/pull/4018.
> >
> > Thanks. Does that mean the revert is still necessary for the Linus tree
> > or do you expect that the fix can be merged and propagated in a
> > reasonable time?
> >
>
> We can probably get runc and currently supported kubernetes versions patched in time
> before 6.6 (or the next LTS kernel) hits LTS distros.
>
> But there's still a bunch of users running cgroupv1 with unsupported kubernetes
> versions that are still taking kernel updates as they come, so this might get reported
> again next year if it stays in mainline.
I can see how 3rd party users are hard to get aligned but having a fix
available should allow them to apply it or is there any actual roadblock
for them to adapt as soon as they hit the issue?
I mean, normally I would be just fine reverting this API change because
it is disruptive but the only way to have the file available and not
break somebody is to revert 58056f77502f ("memcg, kmem: further
deprecate kmem.limit_in_bytes") as well. Or to ignore any value written
there but that sounds rather dubious. Although one could argue this
would mimic nokmem kernel option.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 10:45 ` Greg Kroah-Hartman
@ 2023-09-20 11:08 ` Michal Hocko
2023-09-20 11:16 ` Greg Kroah-Hartman
0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2023-09-20 11:08 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Jeremi Piotrowski, stable, patches, Shakeel Butt, Johannes Weiner,
Roman Gushchin, Muchun Song, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Wed 20-09-23 12:45:08, Greg KH wrote:
[...]
> Ok, then we should revert this, I'll go drop it in the stable trees, it
> should also be reverted in Linus's tree too.
A simple revert would break other users as noted in other response so
wait with sending reverts to Linus before we agreen on the least painful
solution.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 11:08 ` Michal Hocko
@ 2023-09-20 11:16 ` Greg Kroah-Hartman
0 siblings, 0 replies; 39+ messages in thread
From: Greg Kroah-Hartman @ 2023-09-20 11:16 UTC (permalink / raw)
To: Michal Hocko
Cc: Jeremi Piotrowski, stable, patches, Shakeel Butt, Johannes Weiner,
Roman Gushchin, Muchun Song, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Wed, Sep 20, 2023 at 01:08:43PM +0200, Michal Hocko wrote:
> On Wed 20-09-23 12:45:08, Greg KH wrote:
> [...]
> > Ok, then we should revert this, I'll go drop it in the stable trees, it
> > should also be reverted in Linus's tree too.
>
> A simple revert would break other users as noted in other response so
> wait with sending reverts to Linus before we agreen on the least painful
> solution.
A revert should cause the systems that stopped working to start working
again, so I'll keep the revert in the stable trees and wait for you to
work out the real solution in Linus's tree and then backport all of them
as needed.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 11:07 ` Michal Hocko
@ 2023-09-20 13:25 ` Jeremi Piotrowski
2023-09-20 13:47 ` Michal Hocko
0 siblings, 1 reply; 39+ messages in thread
From: Jeremi Piotrowski @ 2023-09-20 13:25 UTC (permalink / raw)
To: Michal Hocko
Cc: Greg Kroah-Hartman, stable, patches, Shakeel Butt,
Johannes Weiner, Roman Gushchin, Muchun Song, Tejun Heo,
Andrew Morton, linux-kernel, regressions, mathieu.tortuyaux
On 9/20/2023 1:07 PM, Michal Hocko wrote:
> On Wed 20-09-23 12:04:48, Jeremi Piotrowski wrote:
>> On 9/20/2023 10:43 AM, Michal Hocko wrote:
>>> On Wed 20-09-23 01:11:01, Jeremi Piotrowski wrote:
>>>> On Sun, Sep 17, 2023 at 09:12:40PM +0200, Greg Kroah-Hartman wrote:
>>>>> 6.1-stable review patch. If anyone has any objections, please let me know.
>>>>>
>>>>> ------------------
>>>>
>>>> Hi Greg/Michal,
>>>>
>>>> This commit breaks userspace which makes it a bad commit for mainline and an
>>>> even worse commit for stable.
>>>>
>>>> We ingested 6.1.54 into our nightly testing and found that runc fails to gather
>>>> cgroup statistics (when reading kmem.limit_in_bytes). The same code is vendored
>>>> into kubelet and kubelet fails to start if this operation fails. 6.1.53 is
>>>> fine.
>>>
>>> Could you expand some more on why is the file read? It doesn't support
>>> writing to it for some time so how does reading it helps in any sense?
>>>
>>> Anyway, I do agree that the stable backport should be reverted.
>>>
>>
>> This file is read together with all the other memcg files. Each prefix:
>>
>> memory
>> memory.memsw
>> memory.kmem
>> memory.kmem.tcp
>>
>> is combined with these suffixes
>>
>> .usage_in_bytes
>> .max_usage_in_bytes
>> .failcnt
>> .limit_in_bytes
>>
>> and read, the values are then forwarded on to other components for scheduling decisions.
>> You want to know the limit when checking the usage (is the usage close to the limit or not).
>
> You know there is no kmem limit as there is no way to set it for some
> time (since 5.16 - i.e. 2 years ago). I can see that users following old
> kernels could have missed that though.
I know what you mean, but I think this generally went unnoticed because the limit file is read
unconditionally, but only written when a kmem limit is explicitly requested for a specific
container, which is rarely (if ever) done.
Regarding following old kernels: a majority of kubernetes users are still on 5.15 and only slowly
started shifting to >=6.1 very recently (this summer). This is mostly driven by distro vendor
policies which tend to follow the pattern of "follow LTS kernels but don't switch to the next
LTS immediately".
I know this is far from ideal for reporting these kinds of issues, would love to report
them as soon as a kernel release happens.
>
>> Userspace tolerates MEMCG/MEMCG_KMEM being disabled, but having a single file out of the
>> set missing is an anomaly. So maybe we could keep the dummy file just for the
>> sake of consistency? Cgroupv1 is legacy after all.
>
> What we had was a dummy file. It didn't allow to write any value so it
> would have always reported unlimited. The reason I've decided to remove
> the file was that there were other users not being able to handle the
> write failure while they are just fine not having the file. So we are
> effectively between a rock and hard place here. Either way something is
> broken. The other SW got fixed as well but similar to your case it takes
> some time to absorb the change through all 3rd party users.
>
>>>>> Address this by wiping out the file completely and effectively get back to
>>>>> pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
>>>>
>>>> On reads, the runc code checks for MEMCG_KMEM=n by checking
>>>> kmem.usage_in_bytes.
>
> Just one side note. Config options get renamed and their semantic
> changes over time so I would just recomment to never make any
> dependencies on any specific one.
>
Right, what i meant is the logic is this, with checking the "usage"
file to determine whether the controller is available:
value, err := fscommon.GetCgroupParamUint(path, usage)
if err != nil {
if name != "" && os.IsNotExist(err) {
// Ignore ENOENT as swap and kmem controllers
// are optional in the kernel.
return cgroups.MemoryData{}, nil
}
return cgroups.MemoryData{}, err
}
and if it is, then it proceeds to read "limit_in_bytes" and the others.
>>>> If it is present then runc expects the other cgroup files
>>>> to be there (including kmem.limit_in_bytes). So this change is not effectively
>>>> the same.
>>>>
>>>> Here's a link to the PR that would be needed to handle this change in userspace
>>>> (not merged yet and would need to be propagated through the ecosystem):
>>>>
>>>> https://github.com/opencontainers/runc/pull/4018.
>>>
>>> Thanks. Does that mean the revert is still necessary for the Linus tree
>>> or do you expect that the fix can be merged and propagated in a
>>> reasonable time?
>>>
>>
>> We can probably get runc and currently supported kubernetes versions patched in time
>> before 6.6 (or the next LTS kernel) hits LTS distros.
>>
>> But there's still a bunch of users running cgroupv1 with unsupported kubernetes
>> versions that are still taking kernel updates as they come, so this might get reported
>> again next year if it stays in mainline.
>
> I can see how 3rd party users are hard to get aligned but having a fix
> available should allow them to apply it or is there any actual roadblock
> for them to adapt as soon as they hit the issue?
>
The issue with this is that these users are running a frozen set of kubernetes (+runc)
binaries, but still pull kernel updates from the base distro. These kubernetes versions
are out of maintenance so the code will not get fixed and no one will release fixed
binaries.
> I mean, normally I would be just fine reverting this API change because
> it is disruptive but the only way to have the file available and not
> break somebody is to revert 58056f77502f ("memcg, kmem: further
> deprecate kmem.limit_in_bytes") as well. Or to ignore any value written
> there but that sounds rather dubious. Although one could argue this
> would mimic nokmem kernel option.
>
I just want to make sure we don't introduce yet another new behavior in this legacy
system. I have not seen breakage due to 58056f77502f. Mimicing nokmem sounds good but
does this mean "don't enforce limits" (that should be fine) or "ignore writes to the limit"
(=don't event store the written limit). The latter might have unintended consequences.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 13:25 ` Jeremi Piotrowski
@ 2023-09-20 13:47 ` Michal Hocko
2023-09-20 15:32 ` Shakeel Butt
2023-09-22 23:00 ` Roman Gushchin
0 siblings, 2 replies; 39+ messages in thread
From: Michal Hocko @ 2023-09-20 13:47 UTC (permalink / raw)
To: Jeremi Piotrowski, Shakeel Butt, Johannes Weiner, Roman Gushchin,
Muchun Song
Cc: Greg Kroah-Hartman, stable, patches, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Wed 20-09-23 15:25:23, Jeremi Piotrowski wrote:
> On 9/20/2023 1:07 PM, Michal Hocko wrote:
[...]
> > I mean, normally I would be just fine reverting this API change because
> > it is disruptive but the only way to have the file available and not
> > break somebody is to revert 58056f77502f ("memcg, kmem: further
> > deprecate kmem.limit_in_bytes") as well. Or to ignore any value written
> > there but that sounds rather dubious. Although one could argue this
> > would mimic nokmem kernel option.
> >
>
> I just want to make sure we don't introduce yet another new behavior in this legacy
> system. I have not seen breakage due to 58056f77502f. Mimicing nokmem sounds good but
> does this mean "don't enforce limits" (that should be fine) or "ignore writes to the limit"
> (=don't event store the written limit). The latter might have unintended consequences.
Yes it would mean that the limit is never enforced. Bad as it is the
thing is that the hard limit on kernel memory is broken by design and
unfixable. This causes all sorts of unexpected kernel allocation
failures that this is simply unsafe to use.
All that being said I can see the following options
1) keep the current upstream status and not export the file
2) revert both 58056f77502f and 86327e8eb94 and make it clear
that kmem.limit_in_bytes is unsupported so failures or misbehavior
as a result of the limit being hit are likely not going to be
investigated or fixed.
3) reverting like in 2) but never inforce the limit (so basically nokmem
semantic)
Shakeel, Johannes, Roman, Muchun Song what do you think?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 13:47 ` Michal Hocko
@ 2023-09-20 15:32 ` Shakeel Butt
2023-09-20 16:55 ` Michal Hocko
2023-09-22 23:00 ` Roman Gushchin
1 sibling, 1 reply; 39+ messages in thread
From: Shakeel Butt @ 2023-09-20 15:32 UTC (permalink / raw)
To: Michal Hocko
Cc: Jeremi Piotrowski, Johannes Weiner, Roman Gushchin, Muchun Song,
Greg Kroah-Hartman, stable, patches, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Wed, Sep 20, 2023 at 6:47 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 20-09-23 15:25:23, Jeremi Piotrowski wrote:
> > On 9/20/2023 1:07 PM, Michal Hocko wrote:
> [...]
> > > I mean, normally I would be just fine reverting this API change because
> > > it is disruptive but the only way to have the file available and not
> > > break somebody is to revert 58056f77502f ("memcg, kmem: further
> > > deprecate kmem.limit_in_bytes") as well. Or to ignore any value written
> > > there but that sounds rather dubious. Although one could argue this
> > > would mimic nokmem kernel option.
> > >
> >
> > I just want to make sure we don't introduce yet another new behavior in this legacy
> > system. I have not seen breakage due to 58056f77502f. Mimicing nokmem sounds good but
> > does this mean "don't enforce limits" (that should be fine) or "ignore writes to the limit"
> > (=don't event store the written limit). The latter might have unintended consequences.
>
> Yes it would mean that the limit is never enforced. Bad as it is the
> thing is that the hard limit on kernel memory is broken by design and
> unfixable. This causes all sorts of unexpected kernel allocation
> failures that this is simply unsafe to use.
>
> All that being said I can see the following options
> 1) keep the current upstream status and not export the file
> 2) revert both 58056f77502f and 86327e8eb94 and make it clear
> that kmem.limit_in_bytes is unsupported so failures or misbehavior
> as a result of the limit being hit are likely not going to be
> investigated or fixed.
> 3) reverting like in 2) but never inforce the limit (so basically nokmem
> semantic)
>
> Shakeel, Johannes, Roman, Muchun Song what do you think?
I think the safe option would be to revert 86327e8eb94 for now and put
pr_warn_once even for the read of kmem.limit_in_bytes? We can retry
86327e8eb94 in a year or so.
However personally I would prefer option 1. Also I don't think
reverting 58056f77502f would give any benefit.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 15:32 ` Shakeel Butt
@ 2023-09-20 16:55 ` Michal Hocko
2023-09-20 19:46 ` Shakeel Butt
0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2023-09-20 16:55 UTC (permalink / raw)
To: Shakeel Butt
Cc: Jeremi Piotrowski, Johannes Weiner, Roman Gushchin, Muchun Song,
Greg Kroah-Hartman, stable, patches, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Wed 20-09-23 08:32:42, Shakeel Butt wrote:
> Also I don't think reverting 58056f77502f would give any benefit.
Not reverting 58056f77502f would re-introduce the regression in some
non-patched versions of Docker runtimes which cannot handle ENOTSUPP.
So I think we need to revert both or none of them. I would prefer the
later (option 1) as the fix is trivial but I do understand headache
of chasing all those outdated deployments or vendor code forks.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 16:55 ` Michal Hocko
@ 2023-09-20 19:46 ` Shakeel Butt
2023-09-20 20:08 ` Michal Hocko
0 siblings, 1 reply; 39+ messages in thread
From: Shakeel Butt @ 2023-09-20 19:46 UTC (permalink / raw)
To: Michal Hocko
Cc: Jeremi Piotrowski, Johannes Weiner, Roman Gushchin, Muchun Song,
Greg Kroah-Hartman, stable, patches, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Wed, Sep 20, 2023 at 9:55 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 20-09-23 08:32:42, Shakeel Butt wrote:
> > Also I don't think reverting 58056f77502f would give any benefit.
>
> Not reverting 58056f77502f would re-introduce the regression in some
> non-patched versions of Docker runtimes which cannot handle ENOTSUPP.
> So I think we need to revert both or none of them. I would prefer the
> later (option 1) as the fix is trivial but I do understand headache
> of chasing all those outdated deployments or vendor code forks.
I think that would be too much conservative an approach but I don't
have a strong opinion against it. Also just to be clear we are not
talking about full revert of 58056f77502f but just the returning of
EOPNOTSUPP, right?
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 19:46 ` Shakeel Butt
@ 2023-09-20 20:08 ` Michal Hocko
2023-09-20 21:46 ` Shakeel Butt
0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2023-09-20 20:08 UTC (permalink / raw)
To: Shakeel Butt
Cc: Jeremi Piotrowski, Johannes Weiner, Roman Gushchin, Muchun Song,
Greg Kroah-Hartman, stable, patches, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Wed 20-09-23 12:46:23, Shakeel Butt wrote:
> On Wed, Sep 20, 2023 at 9:55 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 20-09-23 08:32:42, Shakeel Butt wrote:
> > > Also I don't think reverting 58056f77502f would give any benefit.
> >
> > Not reverting 58056f77502f would re-introduce the regression in some
> > non-patched versions of Docker runtimes which cannot handle ENOTSUPP.
> > So I think we need to revert both or none of them. I would prefer the
> > later (option 1) as the fix is trivial but I do understand headache
> > of chasing all those outdated deployments or vendor code forks.
>
> I think that would be too much conservative an approach but I don't
Well, TBH I do not really see any sifference between one set of broken
userspace or the other. Both are making assumptions on our interfaces
and they do not overlap unfortunately.
> have a strong opinion against it. Also just to be clear we are not
> talking about full revert of 58056f77502f but just the returning of
> EOPNOTSUPP, right?
If we allow the limit to be set without returning a failure then we
still have options 2 and 3 on how to deal with that. One of them is to
enforce the limit.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 20:08 ` Michal Hocko
@ 2023-09-20 21:46 ` Shakeel Butt
2023-09-21 7:52 ` Michal Hocko
0 siblings, 1 reply; 39+ messages in thread
From: Shakeel Butt @ 2023-09-20 21:46 UTC (permalink / raw)
To: Michal Hocko
Cc: Jeremi Piotrowski, Johannes Weiner, Roman Gushchin, Muchun Song,
Greg Kroah-Hartman, stable, patches, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Wed, Sep 20, 2023 at 1:08 PM Michal Hocko <mhocko@suse.com> wrote:
>
[...]
> > have a strong opinion against it. Also just to be clear we are not
> > talking about full revert of 58056f77502f but just the returning of
> > EOPNOTSUPP, right?
>
> If we allow the limit to be set without returning a failure then we
> still have options 2 and 3 on how to deal with that. One of them is to
> enforce the limit.
>
Option 3 is a partial revert of 58056f77502f where we keep the no
limit enforcement and remove the EOPNOTSUPP return on write. Let's go
with option 3. In addition, let's add pr_warn_once on the read of
kmem.limit_in_bytes as well.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 21:46 ` Shakeel Butt
@ 2023-09-21 7:52 ` Michal Hocko
2023-09-21 10:43 ` Jeremi Piotrowski
0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2023-09-21 7:52 UTC (permalink / raw)
To: Shakeel Butt
Cc: Jeremi Piotrowski, Johannes Weiner, Roman Gushchin, Muchun Song,
Greg Kroah-Hartman, stable, patches, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Wed 20-09-23 14:46:52, Shakeel Butt wrote:
> On Wed, Sep 20, 2023 at 1:08 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> [...]
> > > have a strong opinion against it. Also just to be clear we are not
> > > talking about full revert of 58056f77502f but just the returning of
> > > EOPNOTSUPP, right?
> >
> > If we allow the limit to be set without returning a failure then we
> > still have options 2 and 3 on how to deal with that. One of them is to
> > enforce the limit.
> >
>
> Option 3 is a partial revert of 58056f77502f where we keep the no
> limit enforcement and remove the EOPNOTSUPP return on write. Let's go
> with option 3. In addition, let's add pr_warn_once on the read of
> kmem.limit_in_bytes as well.
How about this?
---
From 81ae0797d8da1b9cfbf357b4be4787a5bbf46bb4 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 21 Sep 2023 09:38:29 +0200
Subject: [PATCH] mm, memcg: reconsider kmem.limit_in_bytes deprecation
This reverts commits 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
and partially reverts 58056f77502f ("memcg, kmem: further deprecate
kmem.limit_in_bytes") which have incrementally removed support for the
kernel memory accounting hard limit. Unfortunately it has turned out
that there is still userspace depending on the existence of
memory.kmem.limit_in_bytes [1]. The underlying functionality is not
really required but the non-existent file just confuses the userspace
which fails in the result. The patch to fix this on the userspace side
has been submitted but it is hard to predict how it will propagate
through the maze of 3rd party consumers of the software.
Now, reverting alone 86327e8eb94c is not an option because there is
another set of userspace which cannot cope with ENOTSUPP returned when
writing to the file. Therefore we have to go and revisit 58056f77502f
as well. There are two ways to go ahead. Either we give up on the
deprecation and fully revert 58056f77502f as well or we can keep
kmem.limit_in_bytes but make the write a noop and warn about the fact.
This should work for both known breaking workloads which depend on the
existence but do not depend on the hard limit enforcement.
[1] http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Fixes: 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
Fixes: 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes")
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
Documentation/admin-guide/cgroup-v1/memory.rst | 7 +++++++
mm/memcontrol.c | 12 ++++++++++++
2 files changed, 19 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 5f502bf68fbc..ff456871bf4b 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -92,6 +92,13 @@ Brief summary of control files.
memory.oom_control set/show oom controls.
memory.numa_stat show the number of memory usage per numa
node
+ memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel
+ memory hard limit. Kernel hard limit is not
+ supported since 5.16. Writing any value to
+ do file will not have any effect same as if
+ nokmem kernel parameter was specified.
+ Kernel memory is still charged and reported
+ by memory.kmem.usage_in_bytes.
memory.kmem.usage_in_bytes show current kernel memory allocation
memory.kmem.failcnt show the number of kernel memory usage
hits limits
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a4d3282493b6..ac7f14b2338d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3097,6 +3097,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
unsigned int nr_pages)
{
+ struct page_counter *counter;
struct mem_cgroup *memcg;
int ret;
@@ -3107,6 +3108,10 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
goto out;
memcg_account_kmem(memcg, nr_pages);
+
+ /* There is no way to set up kmem hard limit so this operation cannot fail */
+ if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
+ WARN_ON(!page_counter_try_charge(&memcg->kmem, nr_pages, &counter));
out:
css_put(&memcg->css);
@@ -3867,6 +3872,13 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of,
case _MEMSWAP:
ret = mem_cgroup_resize_max(memcg, nr_pages, true);
break;
+ case _KMEM:
+ pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. "
+ "Writing any value to this file has no effect. "
+ "Please report your usecase to linux-mm@kvack.org if you "
+ "depend on this functionality.\n");
+ ret = 0;
+ break;
case _TCP:
ret = memcg_update_tcp_max(memcg, nr_pages);
break;
--
2.30.2
--
Michal Hocko
SUSE Labs
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-21 7:52 ` Michal Hocko
@ 2023-09-21 10:43 ` Jeremi Piotrowski
2023-09-21 11:21 ` Michal Hocko
0 siblings, 1 reply; 39+ messages in thread
From: Jeremi Piotrowski @ 2023-09-21 10:43 UTC (permalink / raw)
To: Michal Hocko, Shakeel Butt
Cc: Johannes Weiner, Roman Gushchin, Muchun Song, Greg Kroah-Hartman,
stable, patches, Tejun Heo, Andrew Morton, linux-kernel,
regressions, mathieu.tortuyaux
On 9/21/2023 9:52 AM, Michal Hocko wrote:
> On Wed 20-09-23 14:46:52, Shakeel Butt wrote:
>> On Wed, Sep 20, 2023 at 1:08 PM Michal Hocko <mhocko@suse.com> wrote:
>>>
>> [...]
>>>> have a strong opinion against it. Also just to be clear we are not
>>>> talking about full revert of 58056f77502f but just the returning of
>>>> EOPNOTSUPP, right?
>>>
>>> If we allow the limit to be set without returning a failure then we
>>> still have options 2 and 3 on how to deal with that. One of them is to
>>> enforce the limit.
>>>
>>
>> Option 3 is a partial revert of 58056f77502f where we keep the no
>> limit enforcement and remove the EOPNOTSUPP return on write. Let's go
>> with option 3. In addition, let's add pr_warn_once on the read of
>> kmem.limit_in_bytes as well.
>
> How about this?
> ---
I'm OK with this approach. You're missing this in the patch below:
// static struct cftype mem_cgroup_legacy_files[] = {
+ {
+ .name = "kmem.limit_in_bytes",
+ .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
+ .write = mem_cgroup_write,
+ .read_u64 = mem_cgroup_read_u64,
+ },
Thanks,
Jeremi
>>From 81ae0797d8da1b9cfbf357b4be4787a5bbf46bb4 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 21 Sep 2023 09:38:29 +0200
> Subject: [PATCH] mm, memcg: reconsider kmem.limit_in_bytes deprecation
>
> This reverts commits 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
> and partially reverts 58056f77502f ("memcg, kmem: further deprecate
> kmem.limit_in_bytes") which have incrementally removed support for the
> kernel memory accounting hard limit. Unfortunately it has turned out
> that there is still userspace depending on the existence of
> memory.kmem.limit_in_bytes [1]. The underlying functionality is not
> really required but the non-existent file just confuses the userspace
> which fails in the result. The patch to fix this on the userspace side
> has been submitted but it is hard to predict how it will propagate
> through the maze of 3rd party consumers of the software.
>
> Now, reverting alone 86327e8eb94c is not an option because there is
> another set of userspace which cannot cope with ENOTSUPP returned when
> writing to the file. Therefore we have to go and revisit 58056f77502f
> as well. There are two ways to go ahead. Either we give up on the
> deprecation and fully revert 58056f77502f as well or we can keep
> kmem.limit_in_bytes but make the write a noop and warn about the fact.
> This should work for both known breaking workloads which depend on the
> existence but do not depend on the hard limit enforcement.
>
> [1] http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
> Fixes: 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
> Fixes: 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes")
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
> Documentation/admin-guide/cgroup-v1/memory.rst | 7 +++++++
> mm/memcontrol.c | 12 ++++++++++++
> 2 files changed, 19 insertions(+)
>
> diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
> index 5f502bf68fbc..ff456871bf4b 100644
> --- a/Documentation/admin-guide/cgroup-v1/memory.rst
> +++ b/Documentation/admin-guide/cgroup-v1/memory.rst
> @@ -92,6 +92,13 @@ Brief summary of control files.
> memory.oom_control set/show oom controls.
> memory.numa_stat show the number of memory usage per numa
> node
> + memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel
> + memory hard limit. Kernel hard limit is not
> + supported since 5.16. Writing any value to
> + do file will not have any effect same as if
> + nokmem kernel parameter was specified.
> + Kernel memory is still charged and reported
> + by memory.kmem.usage_in_bytes.
> memory.kmem.usage_in_bytes show current kernel memory allocation
> memory.kmem.failcnt show the number of kernel memory usage
> hits limits
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a4d3282493b6..ac7f14b2338d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3097,6 +3097,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
> static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> unsigned int nr_pages)
> {
> + struct page_counter *counter;
> struct mem_cgroup *memcg;
> int ret;
>
> @@ -3107,6 +3108,10 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> goto out;
>
> memcg_account_kmem(memcg, nr_pages);
> +
> + /* There is no way to set up kmem hard limit so this operation cannot fail */
> + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> + WARN_ON(!page_counter_try_charge(&memcg->kmem, nr_pages, &counter));
> out:
> css_put(&memcg->css);
>
> @@ -3867,6 +3872,13 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of,
> case _MEMSWAP:
> ret = mem_cgroup_resize_max(memcg, nr_pages, true);
> break;
> + case _KMEM:
> + pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. "
> + "Writing any value to this file has no effect. "
> + "Please report your usecase to linux-mm@kvack.org if you "
> + "depend on this functionality.\n");
> + ret = 0;
> + break;
> case _TCP:
> ret = memcg_update_tcp_max(memcg, nr_pages);
> break;
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-21 10:43 ` Jeremi Piotrowski
@ 2023-09-21 11:21 ` Michal Hocko
2023-09-21 17:25 ` Shakeel Butt
2023-09-22 13:30 ` Johannes Weiner
0 siblings, 2 replies; 39+ messages in thread
From: Michal Hocko @ 2023-09-21 11:21 UTC (permalink / raw)
To: Jeremi Piotrowski
Cc: Shakeel Butt, Johannes Weiner, Roman Gushchin, Muchun Song,
Greg Kroah-Hartman, stable, patches, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Thu 21-09-23 12:43:05, Jeremi Piotrowski wrote:
> On 9/21/2023 9:52 AM, Michal Hocko wrote:
> > On Wed 20-09-23 14:46:52, Shakeel Butt wrote:
> >> On Wed, Sep 20, 2023 at 1:08 PM Michal Hocko <mhocko@suse.com> wrote:
> >>>
> >> [...]
> >>>> have a strong opinion against it. Also just to be clear we are not
> >>>> talking about full revert of 58056f77502f but just the returning of
> >>>> EOPNOTSUPP, right?
> >>>
> >>> If we allow the limit to be set without returning a failure then we
> >>> still have options 2 and 3 on how to deal with that. One of them is to
> >>> enforce the limit.
> >>>
> >>
> >> Option 3 is a partial revert of 58056f77502f where we keep the no
> >> limit enforcement and remove the EOPNOTSUPP return on write. Let's go
> >> with option 3. In addition, let's add pr_warn_once on the read of
> >> kmem.limit_in_bytes as well.
> >
> > How about this?
> > ---
>
> I'm OK with this approach. You're missing this in the patch below:
>
> // static struct cftype mem_cgroup_legacy_files[] = {
>
> + {
> + .name = "kmem.limit_in_bytes",
> + .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
> + .write = mem_cgroup_write,
> + .read_u64 = mem_cgroup_read_u64,
> + },
Of course. I've lost the hunk while massaging the revert. Thanks for
spotting. Updated version below. Btw. I've decided to not pr_{warn,info}
on the read side because realistically I do not think this will help all
that much. I am worried we will get stuck with this for ever because
there always be somebody stuck on unpatched userspace.
---
From bb6702b698efd31f3f90f4f1dd36ffe223397bec Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 21 Sep 2023 09:38:29 +0200
Subject: [PATCH] mm, memcg: reconsider kmem.limit_in_bytes deprecation
This reverts commits 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
and partially reverts 58056f77502f ("memcg, kmem: further deprecate
kmem.limit_in_bytes") which have incrementally removed support for the
kernel memory accounting hard limit. Unfortunately it has turned out
that there is still userspace depending on the existence of
memory.kmem.limit_in_bytes [1]. The underlying functionality is not
really required but the non-existent file just confuses the userspace
which fails in the result. The patch to fix this on the userspace side
has been submitted but it is hard to predict how it will propagate
through the maze of 3rd party consumers of the software.
Now, reverting alone 86327e8eb94c is not an option because there is
another set of userspace which cannot cope with ENOTSUPP returned when
writing to the file. Therefore we have to go and revisit 58056f77502f
as well. There are two ways to go ahead. Either we give up on the
deprecation and fully revert 58056f77502f as well or we can keep
kmem.limit_in_bytes but make the write a noop and warn about the fact.
This should work for both known breaking workloads which depend on the
existence but do not depend on the hard limit enforcement.
[1] http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Fixes: 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
Fixes: 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes")
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
Documentation/admin-guide/cgroup-v1/memory.rst | 7 +++++++
mm/memcontrol.c | 18 ++++++++++++++++++
2 files changed, 25 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 5f502bf68fbc..ff456871bf4b 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -92,6 +92,13 @@ Brief summary of control files.
memory.oom_control set/show oom controls.
memory.numa_stat show the number of memory usage per numa
node
+ memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel
+ memory hard limit. Kernel hard limit is not
+ supported since 5.16. Writing any value to
+ do file will not have any effect same as if
+ nokmem kernel parameter was specified.
+ Kernel memory is still charged and reported
+ by memory.kmem.usage_in_bytes.
memory.kmem.usage_in_bytes show current kernel memory allocation
memory.kmem.failcnt show the number of kernel memory usage
hits limits
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a4d3282493b6..0b161705ef36 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3097,6 +3097,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
unsigned int nr_pages)
{
+ struct page_counter *counter;
struct mem_cgroup *memcg;
int ret;
@@ -3107,6 +3108,10 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
goto out;
memcg_account_kmem(memcg, nr_pages);
+
+ /* There is no way to set up kmem hard limit so this operation cannot fail */
+ if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
+ WARN_ON(!page_counter_try_charge(&memcg->kmem, nr_pages, &counter));
out:
css_put(&memcg->css);
@@ -3867,6 +3872,13 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of,
case _MEMSWAP:
ret = mem_cgroup_resize_max(memcg, nr_pages, true);
break;
+ case _KMEM:
+ pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. "
+ "Writing any value to this file has no effect. "
+ "Please report your usecase to linux-mm@kvack.org if you "
+ "depend on this functionality.\n");
+ ret = 0;
+ break;
case _TCP:
ret = memcg_update_tcp_max(memcg, nr_pages);
break;
@@ -5077,6 +5089,12 @@ static struct cftype mem_cgroup_legacy_files[] = {
.seq_show = memcg_numa_stat_show,
},
#endif
+ {
+ .name = "kmem.limit_in_bytes",
+ .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
+ .write = mem_cgroup_write,
+ .read_u64 = mem_cgroup_read_u64,
+ },
{
.name = "kmem.usage_in_bytes",
.private = MEMFILE_PRIVATE(_KMEM, RES_USAGE),
--
2.30.2
--
Michal Hocko
SUSE Labs
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [PATCH 6.1 000/219] 6.1.54-rc1 review
2023-09-17 19:12 [PATCH 6.1 000/219] 6.1.54-rc1 review Greg Kroah-Hartman
` (10 preceding siblings ...)
[not found] ` <20230917191042.204185566@linuxfoundation.org>
@ 2023-09-21 13:04 ` Conor Dooley
11 siblings, 0 replies; 39+ messages in thread
From: Conor Dooley @ 2023-09-21 13:04 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: stable, patches, linux-kernel, torvalds, akpm, linux, shuah,
patches, lkft-triage, pavel, jonathanh, f.fainelli,
sudipm.mukherjee, srw, rwarsow
[-- Attachment #1: Type: text/plain, Size: 371 bytes --]
On Sun, Sep 17, 2023 at 09:12:07PM +0200, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 6.1.54 release.
> There are 219 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Thanks,
Conor.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-21 11:21 ` Michal Hocko
@ 2023-09-21 17:25 ` Shakeel Butt
2023-09-21 19:50 ` Michal Hocko
2023-09-22 13:30 ` Johannes Weiner
1 sibling, 1 reply; 39+ messages in thread
From: Shakeel Butt @ 2023-09-21 17:25 UTC (permalink / raw)
To: Michal Hocko
Cc: Jeremi Piotrowski, Johannes Weiner, Roman Gushchin, Muchun Song,
Greg Kroah-Hartman, stable, patches, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Thu, Sep 21, 2023 at 4:21 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Thu 21-09-23 12:43:05, Jeremi Piotrowski wrote:
> > On 9/21/2023 9:52 AM, Michal Hocko wrote:
> > > On Wed 20-09-23 14:46:52, Shakeel Butt wrote:
> > >> On Wed, Sep 20, 2023 at 1:08 PM Michal Hocko <mhocko@suse.com> wrote:
> > >>>
> > >> [...]
> > >>>> have a strong opinion against it. Also just to be clear we are not
> > >>>> talking about full revert of 58056f77502f but just the returning of
> > >>>> EOPNOTSUPP, right?
> > >>>
> > >>> If we allow the limit to be set without returning a failure then we
> > >>> still have options 2 and 3 on how to deal with that. One of them is to
> > >>> enforce the limit.
> > >>>
> > >>
> > >> Option 3 is a partial revert of 58056f77502f where we keep the no
> > >> limit enforcement and remove the EOPNOTSUPP return on write. Let's go
> > >> with option 3. In addition, let's add pr_warn_once on the read of
> > >> kmem.limit_in_bytes as well.
> > >
> > > How about this?
> > > ---
> >
> > I'm OK with this approach. You're missing this in the patch below:
> >
> > // static struct cftype mem_cgroup_legacy_files[] = {
> >
> > + {
> > + .name = "kmem.limit_in_bytes",
> > + .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
> > + .write = mem_cgroup_write,
> > + .read_u64 = mem_cgroup_read_u64,
> > + },
>
> Of course. I've lost the hunk while massaging the revert. Thanks for
> spotting. Updated version below. Btw. I've decided to not pr_{warn,info}
> on the read side because realistically I do not think this will help all
> that much. I am worried we will get stuck with this for ever because
> there always be somebody stuck on unpatched userspace.
> ---
> From bb6702b698efd31f3f90f4f1dd36ffe223397bec Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Thu, 21 Sep 2023 09:38:29 +0200
> Subject: [PATCH] mm, memcg: reconsider kmem.limit_in_bytes deprecation
>
> This reverts commits 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
> and partially reverts 58056f77502f ("memcg, kmem: further deprecate
> kmem.limit_in_bytes") which have incrementally removed support for the
> kernel memory accounting hard limit. Unfortunately it has turned out
> that there is still userspace depending on the existence of
> memory.kmem.limit_in_bytes [1]. The underlying functionality is not
> really required but the non-existent file just confuses the userspace
> which fails in the result. The patch to fix this on the userspace side
> has been submitted but it is hard to predict how it will propagate
> through the maze of 3rd party consumers of the software.
>
> Now, reverting alone 86327e8eb94c is not an option because there is
> another set of userspace which cannot cope with ENOTSUPP returned when
> writing to the file. Therefore we have to go and revisit 58056f77502f
> as well. There are two ways to go ahead. Either we give up on the
> deprecation and fully revert 58056f77502f as well or we can keep
> kmem.limit_in_bytes but make the write a noop and warn about the fact.
> This should work for both known breaking workloads which depend on the
> existence but do not depend on the hard limit enforcement.
>
> [1] http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
> Fixes: 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
> Fixes: 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes")
> Signed-off-by: Michal Hocko <mhocko@suse.com>
With one request below:
Acked-by: Shakeel Butt <shakeelb@google.com>
> ---
> Documentation/admin-guide/cgroup-v1/memory.rst | 7 +++++++
> mm/memcontrol.c | 18 ++++++++++++++++++
> 2 files changed, 25 insertions(+)
>
> diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
> index 5f502bf68fbc..ff456871bf4b 100644
> --- a/Documentation/admin-guide/cgroup-v1/memory.rst
> +++ b/Documentation/admin-guide/cgroup-v1/memory.rst
> @@ -92,6 +92,13 @@ Brief summary of control files.
> memory.oom_control set/show oom controls.
> memory.numa_stat show the number of memory usage per numa
> node
> + memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel
> + memory hard limit. Kernel hard limit is not
> + supported since 5.16. Writing any value to
> + do file will not have any effect same as if
> + nokmem kernel parameter was specified.
> + Kernel memory is still charged and reported
> + by memory.kmem.usage_in_bytes.
> memory.kmem.usage_in_bytes show current kernel memory allocation
> memory.kmem.failcnt show the number of kernel memory usage
> hits limits
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a4d3282493b6..0b161705ef36 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3097,6 +3097,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
> static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> unsigned int nr_pages)
> {
> + struct page_counter *counter;
> struct mem_cgroup *memcg;
> int ret;
>
> @@ -3107,6 +3108,10 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> goto out;
>
> memcg_account_kmem(memcg, nr_pages);
> +
> + /* There is no way to set up kmem hard limit so this operation cannot fail */
> + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> + WARN_ON(!page_counter_try_charge(&memcg->kmem, nr_pages, &counter));
WARN_ON_ONCE() please.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-21 17:25 ` Shakeel Butt
@ 2023-09-21 19:50 ` Michal Hocko
0 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2023-09-21 19:50 UTC (permalink / raw)
To: Shakeel Butt
Cc: Jeremi Piotrowski, Johannes Weiner, Roman Gushchin, Muchun Song,
Greg Kroah-Hartman, stable, patches, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Thu 21-09-23 10:25:11, Shakeel Butt wrote:
> On Thu, Sep 21, 2023 at 4:21 AM Michal Hocko <mhocko@suse.com> wrote:
[...]
> With one request below:
>
> Acked-by: Shakeel Butt <shakeelb@google.com>
Thanks.
> > @@ -3107,6 +3108,10 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> > goto out;
> >
> > memcg_account_kmem(memcg, nr_pages);
> > +
> > + /* There is no way to set up kmem hard limit so this operation cannot fail */
> > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > + WARN_ON(!page_counter_try_charge(&memcg->kmem, nr_pages, &counter));
>
> WARN_ON_ONCE() please.
Sure. This shouldn't really trigger, but it is true that if something
unexpected happens then it is likly to flood the log so _ONCE is safer.
I will wait for others to comment before I send the official patch.
To be completely honest I am not super happy about this way of handling
stuff, but considering the level of brokenness this seems like the
safest option. Especially when nobody really want to use the kernel
memory hard limit AFAIU.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 8:11 ` [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes Jeremi Piotrowski
2023-09-20 8:43 ` Michal Hocko
@ 2023-09-22 11:14 ` Linux regression tracking #adding (Thorsten Leemhuis)
1 sibling, 0 replies; 39+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-09-22 11:14 UTC (permalink / raw)
To: regressions; +Cc: linux-kernel
[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]
On 20.09.23 10:11, Jeremi Piotrowski wrote:
> On Sun, Sep 17, 2023 at 09:12:40PM +0200, Greg Kroah-Hartman wrote:
>> 6.1-stable review patch. If anyone has any objections, please let me know.
>>
>> ------------------
>
> Hi Greg/Michal,
>
> This commit breaks userspace which makes it a bad commit for mainline and an
> even worse commit for stable.
>
> We ingested 6.1.54 into our nightly testing and found that runc fails to gather
> cgroup statistics (when reading kmem.limit_in_bytes). The same code is vendored
> into kubelet and kubelet fails to start if this operation fails. 6.1.53 is
> fine.
>
>> Address this by wiping out the file completely and effectively get back to
>> pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
>
> On reads, the runc code checks for MEMCG_KMEM=n by checking
> kmem.usage_in_bytes. If it is present then runc expects the other cgroup files
> to be there (including kmem.limit_in_bytes). So this change is not effectively
> the same.
>
> Here's a link to the PR that would be needed to handle this change in userspace
> (not merged yet and would need to be propagated through the ecosystem):
>
> https://github.com/opencontainers/runc/pull/4018.
Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:
#regzbot ^introduced 86327e8eb94c52
#regzbot title mm, memcg: runc fails to gather cgroup statistics
#regzbot fix: mm, memcg: reconsider kmem.limit_in_bytes deprecation
#regzbot ignore-activity
FWIW, the porposed fix can be found here:
https://lore.kernel.org/all/ZQwnUpX7FlzIOWXP@dhcp22.suse.cz/
This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-21 11:21 ` Michal Hocko
2023-09-21 17:25 ` Shakeel Butt
@ 2023-09-22 13:30 ` Johannes Weiner
2023-09-25 7:40 ` Michal Hocko
1 sibling, 1 reply; 39+ messages in thread
From: Johannes Weiner @ 2023-09-22 13:30 UTC (permalink / raw)
To: Michal Hocko
Cc: Jeremi Piotrowski, Shakeel Butt, Roman Gushchin, Muchun Song,
Greg Kroah-Hartman, stable, patches, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Thu, Sep 21, 2023 at 01:21:54PM +0200, Michal Hocko wrote:
> @@ -3097,6 +3097,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
> static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> unsigned int nr_pages)
> {
> + struct page_counter *counter;
> struct mem_cgroup *memcg;
> int ret;
>
> @@ -3107,6 +3108,10 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> goto out;
>
> memcg_account_kmem(memcg, nr_pages);
> +
> + /* There is no way to set up kmem hard limit so this operation cannot fail */
> + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> + WARN_ON(!page_counter_try_charge(&memcg->kmem, nr_pages, &counter));
This hunk doesn't look quite right.
static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages)
{
mod_memcg_state(memcg, MEMCG_KMEM, nr_pages);
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
if (nr_pages > 0)
page_counter_charge(&memcg->kmem, nr_pages);
else
page_counter_uncharge(&memcg->kmem, -nr_pages);
}
}
Other than that, please add
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-20 13:47 ` Michal Hocko
2023-09-20 15:32 ` Shakeel Butt
@ 2023-09-22 23:00 ` Roman Gushchin
2023-09-25 7:41 ` Michal Hocko
1 sibling, 1 reply; 39+ messages in thread
From: Roman Gushchin @ 2023-09-22 23:00 UTC (permalink / raw)
To: Michal Hocko
Cc: Jeremi Piotrowski, Shakeel Butt, Johannes Weiner, Muchun Song,
Greg Kroah-Hartman, stable, patches, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Wed, Sep 20, 2023 at 03:47:37PM +0200, Michal Hocko wrote:
> On Wed 20-09-23 15:25:23, Jeremi Piotrowski wrote:
> > On 9/20/2023 1:07 PM, Michal Hocko wrote:
> [...]
> > > I mean, normally I would be just fine reverting this API change because
> > > it is disruptive but the only way to have the file available and not
> > > break somebody is to revert 58056f77502f ("memcg, kmem: further
> > > deprecate kmem.limit_in_bytes") as well. Or to ignore any value written
> > > there but that sounds rather dubious. Although one could argue this
> > > would mimic nokmem kernel option.
> > >
> >
> > I just want to make sure we don't introduce yet another new behavior in this legacy
> > system. I have not seen breakage due to 58056f77502f. Mimicing nokmem sounds good but
> > does this mean "don't enforce limits" (that should be fine) or "ignore writes to the limit"
> > (=don't event store the written limit). The latter might have unintended consequences.
>
> Yes it would mean that the limit is never enforced. Bad as it is the
> thing is that the hard limit on kernel memory is broken by design and
> unfixable. This causes all sorts of unexpected kernel allocation
> failures that this is simply unsafe to use.
>
> All that being said I can see the following options
> 1) keep the current upstream status and not export the file
> 2) revert both 58056f77502f and 86327e8eb94 and make it clear
> that kmem.limit_in_bytes is unsupported so failures or misbehavior
> as a result of the limit being hit are likely not going to be
> investigated or fixed.
> 3) reverting like in 2) but never inforce the limit (so basically nokmem
> semantic)
Since it's a part of cgroup v1 interface, which is in a frozen state as a whole,
and there is no significant (performance, code complexity) benefit of
additionally deprecating kmem.limit_in_bytes, I vote for 2).
1) is also an option.
Thanks!
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-22 13:30 ` Johannes Weiner
@ 2023-09-25 7:40 ` Michal Hocko
0 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2023-09-25 7:40 UTC (permalink / raw)
To: Johannes Weiner, Andrew Morton
Cc: Jeremi Piotrowski, Shakeel Butt, Roman Gushchin, Muchun Song,
Greg Kroah-Hartman, stable, patches, Tejun Heo, linux-kernel,
regressions, mathieu.tortuyaux
On Fri 22-09-23 09:30:17, Johannes Weiner wrote:
> On Thu, Sep 21, 2023 at 01:21:54PM +0200, Michal Hocko wrote:
> > @@ -3097,6 +3097,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
> > static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> > unsigned int nr_pages)
> > {
> > + struct page_counter *counter;
> > struct mem_cgroup *memcg;
> > int ret;
> >
> > @@ -3107,6 +3108,10 @@ static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
> > goto out;
> >
> > memcg_account_kmem(memcg, nr_pages);
> > +
> > + /* There is no way to set up kmem hard limit so this operation cannot fail */
> > + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > + WARN_ON(!page_counter_try_charge(&memcg->kmem, nr_pages, &counter));
>
> This hunk doesn't look quite right.
>
> static void memcg_account_kmem(struct mem_cgroup *memcg, int nr_pages)
> {
> mod_memcg_state(memcg, MEMCG_KMEM, nr_pages);
> if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> if (nr_pages > 0)
> page_counter_charge(&memcg->kmem, nr_pages);
> else
> page_counter_uncharge(&memcg->kmem, -nr_pages);
> }
> }
>
> Other than that, please add
Good point. I have missed a8c49af3be5f ("memcg: add per-memcg total kernel memory stat")
introduced in 4.18
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Fixed version below. Andrew, it seems we have a good consensus for this.
Could you queue this up and send it to Linus please?
---
From 8c3cbe68bba0fe5103d8fe73a06b3608ed49bda0 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 21 Sep 2023 09:38:29 +0200
Subject: [PATCH] mm, memcg: reconsider kmem.limit_in_bytes deprecation
This reverts commits 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
and partially reverts 58056f77502f ("memcg, kmem: further deprecate
kmem.limit_in_bytes") which have incrementally removed support for the
kernel memory accounting hard limit. Unfortunately it has turned out
that there is still userspace depending on the existence of
memory.kmem.limit_in_bytes [1]. The underlying functionality is not
really required but the non-existent file just confuses the userspace
which fails in the result. The patch to fix this on the userspace side
has been submitted but it is hard to predict how it will propagate
through the maze of 3rd party consumers of the software.
Now, reverting alone 86327e8eb94c is not an option because there is
another set of userspace which cannot cope with ENOTSUPP returned when
writing to the file. Therefore we have to go and revisit 58056f77502f
as well. There are two ways to go ahead. Either we give up on the
deprecation and fully revert 58056f77502f as well or we can keep
kmem.limit_in_bytes but make the write a noop and warn about the fact.
This should work for both known breaking workloads which depend on the
existence but do not depend on the hard limit enforcement.
Note to backporters to stable trees. a8c49af3be5f ("memcg: add per-memcg
total kernel memory stat") introduced in 4.18 has added memcg_account_kmem
so the accounting is not done by obj_cgroup_charge_pages directly for v1
anymore. Prior kernels need to add it explicitly (thanks to Johannes for
pointing this out).
[1] http://lkml.kernel.org/r/20230920081101.GA12096@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Cc: stable
Fixes: 86327e8eb94c ("memcg: drop kmem.limit_in_bytes")
Fixes: 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes")
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
Documentation/admin-guide/cgroup-v1/memory.rst | 7 +++++++
mm/memcontrol.c | 14 ++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 5f502bf68fbc..ff456871bf4b 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -92,6 +92,13 @@ Brief summary of control files.
memory.oom_control set/show oom controls.
memory.numa_stat show the number of memory usage per numa
node
+ memory.kmem.limit_in_bytes Deprecated knob to set and read the kernel
+ memory hard limit. Kernel hard limit is not
+ supported since 5.16. Writing any value to
+ do file will not have any effect same as if
+ nokmem kernel parameter was specified.
+ Kernel memory is still charged and reported
+ by memory.kmem.usage_in_bytes.
memory.kmem.usage_in_bytes show current kernel memory allocation
memory.kmem.failcnt show the number of kernel memory usage
hits limits
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a4d3282493b6..63bdaab2a906 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3097,6 +3097,7 @@ static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg,
static int obj_cgroup_charge_pages(struct obj_cgroup *objcg, gfp_t gfp,
unsigned int nr_pages)
{
+ struct page_counter *counter;
struct mem_cgroup *memcg;
int ret;
@@ -3867,6 +3868,13 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of,
case _MEMSWAP:
ret = mem_cgroup_resize_max(memcg, nr_pages, true);
break;
+ case _KMEM:
+ pr_warn_once("kmem.limit_in_bytes is deprecated and will be removed. "
+ "Writing any value to this file has no effect. "
+ "Please report your usecase to linux-mm@kvack.org if you "
+ "depend on this functionality.\n");
+ ret = 0;
+ break;
case _TCP:
ret = memcg_update_tcp_max(memcg, nr_pages);
break;
@@ -5077,6 +5085,12 @@ static struct cftype mem_cgroup_legacy_files[] = {
.seq_show = memcg_numa_stat_show,
},
#endif
+ {
+ .name = "kmem.limit_in_bytes",
+ .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
+ .write = mem_cgroup_write,
+ .read_u64 = mem_cgroup_read_u64,
+ },
{
.name = "kmem.usage_in_bytes",
.private = MEMFILE_PRIVATE(_KMEM, RES_USAGE),
--
2.30.2
--
Michal Hocko
SUSE Labs
^ permalink raw reply related [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-22 23:00 ` Roman Gushchin
@ 2023-09-25 7:41 ` Michal Hocko
2023-09-26 2:49 ` Roman Gushchin
0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2023-09-25 7:41 UTC (permalink / raw)
To: Roman Gushchin
Cc: Jeremi Piotrowski, Shakeel Butt, Johannes Weiner, Muchun Song,
Greg Kroah-Hartman, stable, patches, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Fri 22-09-23 16:00:30, Roman Gushchin wrote:
> On Wed, Sep 20, 2023 at 03:47:37PM +0200, Michal Hocko wrote:
> > On Wed 20-09-23 15:25:23, Jeremi Piotrowski wrote:
> > > On 9/20/2023 1:07 PM, Michal Hocko wrote:
> > [...]
> > > > I mean, normally I would be just fine reverting this API change because
> > > > it is disruptive but the only way to have the file available and not
> > > > break somebody is to revert 58056f77502f ("memcg, kmem: further
> > > > deprecate kmem.limit_in_bytes") as well. Or to ignore any value written
> > > > there but that sounds rather dubious. Although one could argue this
> > > > would mimic nokmem kernel option.
> > > >
> > >
> > > I just want to make sure we don't introduce yet another new behavior in this legacy
> > > system. I have not seen breakage due to 58056f77502f. Mimicing nokmem sounds good but
> > > does this mean "don't enforce limits" (that should be fine) or "ignore writes to the limit"
> > > (=don't event store the written limit). The latter might have unintended consequences.
> >
> > Yes it would mean that the limit is never enforced. Bad as it is the
> > thing is that the hard limit on kernel memory is broken by design and
> > unfixable. This causes all sorts of unexpected kernel allocation
> > failures that this is simply unsafe to use.
> >
> > All that being said I can see the following options
> > 1) keep the current upstream status and not export the file
> > 2) revert both 58056f77502f and 86327e8eb94 and make it clear
> > that kmem.limit_in_bytes is unsupported so failures or misbehavior
> > as a result of the limit being hit are likely not going to be
> > investigated or fixed.
> > 3) reverting like in 2) but never inforce the limit (so basically nokmem
> > semantic)
>
> Since it's a part of cgroup v1 interface, which is in a frozen state as a whole,
> and there is no significant (performance, code complexity) benefit of
> additionally deprecating kmem.limit_in_bytes, I vote for 2).
> 1) is also an option.
We have a stronger agrement over 3)
http://lkml.kernel.org/r/ZRE5VJozPZt9bRPy@dhcp22.suse.cz. Please speak
up if you disagree.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 39+ messages in thread
* Re: [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes
2023-09-25 7:41 ` Michal Hocko
@ 2023-09-26 2:49 ` Roman Gushchin
0 siblings, 0 replies; 39+ messages in thread
From: Roman Gushchin @ 2023-09-26 2:49 UTC (permalink / raw)
To: Michal Hocko
Cc: Jeremi Piotrowski, Shakeel Butt, Johannes Weiner, Muchun Song,
Greg Kroah-Hartman, stable, patches, Tejun Heo, Andrew Morton,
linux-kernel, regressions, mathieu.tortuyaux
On Mon, Sep 25, 2023 at 09:41:24AM +0200, Michal Hocko wrote:
> On Fri 22-09-23 16:00:30, Roman Gushchin wrote:
> > On Wed, Sep 20, 2023 at 03:47:37PM +0200, Michal Hocko wrote:
> > > On Wed 20-09-23 15:25:23, Jeremi Piotrowski wrote:
> > > > On 9/20/2023 1:07 PM, Michal Hocko wrote:
> > > [...]
> > > > > I mean, normally I would be just fine reverting this API change because
> > > > > it is disruptive but the only way to have the file available and not
> > > > > break somebody is to revert 58056f77502f ("memcg, kmem: further
> > > > > deprecate kmem.limit_in_bytes") as well. Or to ignore any value written
> > > > > there but that sounds rather dubious. Although one could argue this
> > > > > would mimic nokmem kernel option.
> > > > >
> > > >
> > > > I just want to make sure we don't introduce yet another new behavior in this legacy
> > > > system. I have not seen breakage due to 58056f77502f. Mimicing nokmem sounds good but
> > > > does this mean "don't enforce limits" (that should be fine) or "ignore writes to the limit"
> > > > (=don't event store the written limit). The latter might have unintended consequences.
> > >
> > > Yes it would mean that the limit is never enforced. Bad as it is the
> > > thing is that the hard limit on kernel memory is broken by design and
> > > unfixable. This causes all sorts of unexpected kernel allocation
> > > failures that this is simply unsafe to use.
> > >
> > > All that being said I can see the following options
> > > 1) keep the current upstream status and not export the file
> > > 2) revert both 58056f77502f and 86327e8eb94 and make it clear
> > > that kmem.limit_in_bytes is unsupported so failures or misbehavior
> > > as a result of the limit being hit are likely not going to be
> > > investigated or fixed.
> > > 3) reverting like in 2) but never inforce the limit (so basically nokmem
> > > semantic)
> >
> > Since it's a part of cgroup v1 interface, which is in a frozen state as a whole,
> > and there is no significant (performance, code complexity) benefit of
> > additionally deprecating kmem.limit_in_bytes, I vote for 2).
> > 1) is also an option.
>
> We have a stronger agrement over 3)
> http://lkml.kernel.org/r/ZRE5VJozPZt9bRPy@dhcp22.suse.cz. Please speak
> up if you disagree.
This works for me too.
Thank you!
Btw, it seems like going forward we should be more resistant for any
cgroup v1 changes and just leave it as it is.
Thanks.
^ permalink raw reply [flat|nested] 39+ messages in thread
end of thread, other threads:[~2023-09-26 2:49 UTC | newest]
Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-17 19:12 [PATCH 6.1 000/219] 6.1.54-rc1 review Greg Kroah-Hartman
2023-09-17 20:47 ` SeongJae Park
2023-09-18 5:34 ` Takeshi Ogasawara
2023-09-18 6:42 ` Bagas Sanjaya
2023-09-18 11:24 ` Conor Dooley
2023-09-18 12:08 ` Ron Economos
2023-09-18 12:48 ` Jon Hunter
2023-09-18 18:34 ` Florian Fainelli
2023-09-18 18:41 ` Guenter Roeck
2023-09-18 20:56 ` Naresh Kamboju
2023-09-18 22:21 ` Shuah Khan
[not found] ` <20230917191042.204185566@linuxfoundation.org>
2023-09-20 8:11 ` [REGRESSION] Re: [PATCH 6.1 033/219] memcg: drop kmem.limit_in_bytes Jeremi Piotrowski
2023-09-20 8:43 ` Michal Hocko
2023-09-20 9:25 ` Greg Kroah-Hartman
2023-09-20 10:21 ` Jeremi Piotrowski
2023-09-20 10:45 ` Greg Kroah-Hartman
2023-09-20 11:08 ` Michal Hocko
2023-09-20 11:16 ` Greg Kroah-Hartman
2023-09-20 10:04 ` Jeremi Piotrowski
2023-09-20 11:07 ` Michal Hocko
2023-09-20 13:25 ` Jeremi Piotrowski
2023-09-20 13:47 ` Michal Hocko
2023-09-20 15:32 ` Shakeel Butt
2023-09-20 16:55 ` Michal Hocko
2023-09-20 19:46 ` Shakeel Butt
2023-09-20 20:08 ` Michal Hocko
2023-09-20 21:46 ` Shakeel Butt
2023-09-21 7:52 ` Michal Hocko
2023-09-21 10:43 ` Jeremi Piotrowski
2023-09-21 11:21 ` Michal Hocko
2023-09-21 17:25 ` Shakeel Butt
2023-09-21 19:50 ` Michal Hocko
2023-09-22 13:30 ` Johannes Weiner
2023-09-25 7:40 ` Michal Hocko
2023-09-22 23:00 ` Roman Gushchin
2023-09-25 7:41 ` Michal Hocko
2023-09-26 2:49 ` Roman Gushchin
2023-09-22 11:14 ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-09-21 13:04 ` [PATCH 6.1 000/219] 6.1.54-rc1 review Conor Dooley
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox