public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Linux 6.18-rc6
@ 2025-11-16 22:42 Linus Torvalds
  2025-11-17  8:20 ` David Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Linus Torvalds @ 2025-11-16 22:42 UTC (permalink / raw)
  To: Linux Kernel Mailing List

So we have a slightly larger rc6 than usual, but I think it's just the
random noise and a result of pull request timings rather than due to
any issues with the release. But I guess we have a couple of weeks
remaining to find out....

The fixes are all over the place, and nothing stands out, except for
how it really is pretty varied. It's all small one- and few-liners,
with the biggest patches all being to the selftests rather than actual
kernel code. In fact, the selftests account for over a quarter of the
rc6 patch.

Outside of the selftests, drivers account for (only) a quarter of the
rest of the changes - again, I think this is more a sign of pull
request timings rather than anything else, and the reason rc6 is on
the larger side is just that we ended up having a number of varied
fixes pulls just happen to come in this week.

Another quarter of the kernel changes is from architecture fixes
(arm64, loongarch and x86), and the rest is really pretty mixed random
stuff (networking, bpf, core kernel, filesystems, core VM - just a
little bit of everything).

Shortlog with the detailed overview appended below as usual,

                 Linus

---

Adrian Barnaś (2):
      arm64: Fail module loading if dynamic SCS patching fails
      arm64: Reject modules with internal alternative callbacks

Akiva Goldberger (1):
      mlx5: Fix default values in create CQ

Aksh Garg (2):
      net: ethernet: ti: am65-cpsw-qos: fix IET verify/response timeout
      net: ethernet: ti: am65-cpsw-qos: fix IET verify retry mechanism

Aleksei Nikiforov (1):
      mm/kmsan: fix kmsan kmalloc hook when no stack depots are allocated yet

Alex Mastro (4):
      vfio: selftests: add iova range query helpers
      vfio: selftests: fix map limit tests to use last available iova
      vfio: selftests: add iova allocator
      vfio: selftests: replace iova=vaddr with allocated iovas

Alexander Sverdlin (1):
      selftests: net: local_termination: Wait for interfaces to come up

Alvaro Gamez Machado (1):
      spi: xilinx: increase number of retries before declaring stall

Andrew Donnellan (1):
      entry: Fix ifndef around arch_xfer_to_guest_mode_handle_work() stub

André Draszik (1):
      pmdomain: samsung: plug potential memleak during probe

Ankit Khushwaha (1):
      selftests/user_events: fix type cast for write_index packed
member in perf_test

Arnaldo Carvalho de Melo (2):
      perf build: Don't fail fast path feature detection when
binutils-devel is not available
      tools headers UAPI: Sync KVM's vmx.h with the kernel to pick
SEAMCALL exit reason

Baojun Xu (2):
      ALSA: hda/tas2781: Add new quirk for HP new projects
      ALSA: hda/tas2781: Correct the wrong project ID

Benjamin Berg (1):
      wifi: mac80211: skip rate verification for not captured PSDUs

Bibo Mao (5):
      LoongArch: KVM: Set page with write attribute if dirty track disabled
      LoongArch: KVM: Add delay until timer interrupt injected
      LoongArch: KVM: Restore guest PMU if it is enabled
      LoongArch: KVM: Skip PMU checking on vCPU context switch
      LoongArch: KVM: Fix max supported vCPUs set with EIOINTC

Bjorn Helgaas (5):
      PCI/ASPM: Cache L0s/L1 Supported so advertised link states can
be overridden
      PCI/ASPM: Add pcie_aspm_remove_cap() to override advertised link states
      PCI/ASPM: Convert quirks to override advertised link states
      PCI/ASPM: Avoid L0s and L1 on Freescale [1957:0451] Root Ports
      PCI/ASPM: Avoid L0s and L1 on PA Semi [1959:a002] Root Ports

Boris Brezillon (1):
      drm/panthor: Flush shmem writes before mapping buffers CPU-uncached

Borislav Petkov (AMD) (1):
      x86/microcode/AMD: Add Zen5 model 0x44, stepping 0x1 minrev

Breno Leitao (4):
      net: netpoll: fix incorrect refcount handling causing incorrect cleanup
      selftest: netcons: refactor target creation
      selftest: netcons: create a torture test
      selftest: netcons: add test for netconsole over bonded interfaces

Buday Csaba (1):
      net: mdio: fix resource leak in mdiobus_register_device()

Caleb Sander Mateos (1):
      io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs

Carlos Llamas (1):
      scripts/decode_stacktrace.sh: fix build ID and PC source parsing

Carolina Jubran (1):
      net/mlx5e: Fix missing error assignment in mlx5e_xfrm_add_state()

Catalin Marinas (2):
      arm64: Use load LSE atomics for the non-return per-CPU atomic operations
      mm/huge_memory: initialise the tags of the huge zero folio

Chao Gao (1):
      KVM: x86: Call out MSR_IA32_S_CET is not handled by XSAVES

Chris Li (1):
      MAINTAINERS: add Chris and Kairui as the swap maintainer

Chuang Wang (1):
      ipv4: route: Prevent rt_bind_exception() from rebinding stale fnhe

Chuck Lever (3):
      NFSD: Skip close replay processing if XDR encoding fails
      NFSD: Never cache a COMPOUND when the SEQUENCE operation fails
      Revert "SUNRPC: Make RPCSEC_GSS_KRB5 select CRYPTO instead of
depending on it"

Chunhai Guo (1):
      MAINTAINERS: erofs: add myself as reviewer

Claudiu Beznea (1):
      ASoC: da7213: Use component driver suspend/resume

Cosmin Ratiu (1):
      net/mlx5e: Trim the length of the num_doorbell error

Cryolitia PukNgae (1):
      hwmon: (gpd-fan) initialize EC on driver load for Win 4

D. Wythe (1):
      net/smc: fix mismatch between CLC header and proposal

Dai Ngo (1):
      NFS: Fix LTP test failures when timestamps are delegated

Dave Jiang (2):
      cxl: Adjust offset calculation for poison injection
      acpi/hmat: Fix lockdep warning for hmem_register_resource()

David Hildenbrand (Red Hat) (2):
      mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb
      MAINTAINERS: update David Hildenbrand's email address

Dev Jain (1):
      mm/mremap: honour writable bit in mremap pte batching

Eduard Zingerman (2):
      bpf: account for current allocated stack depth in
widen_imprecise_scalars()
      selftests/bpf: Test widen_imprecise_scalars() with different stack depth

Edward Adam Davis (2):
      cifs: client: fix memory leak in smb3_fs_context_parse_param
      nilfs2: avoid having an active sc_timer before freeing sci

Eric Dumazet (3):
      sctp: prevent possible shift-out-of-bounds in sctp_transport_update_rto
      net_sched: limit try_bulk_dequeue_skb() batches
      bpf: Add bpf_prog_run_data_pointers()

Eslam Khafagy (1):
      posix-timers: Plug potential memory leak in do_timer_create()

Fangyu Yu (2):
      RISC-V: KVM: Read HGEIP CSR on the correct cpu
      RISC-V: KVM: Remove automatic I/O mapping for VM_PFNMAP

Felix Maurer (2):
      hsr: Fix supervision frame sending on HSRv0
      hsr: Follow standard for HSRv0 supervision frames

Feng Jiang (2):
      riscv: Build loader.bin exclusively for Canaan K210
      riscv: Remove redundant judgment for the default build target

Filipe Manana (1):
      btrfs: do not update last_log_commit when logging inode due to a new name

Gal Pressman (4):
      docs: netlink: Couple of intro-specs documentation fixes
      net/mlx5e: Fix maxrate wraparound in threshold between units
      net/mlx5e: Fix wraparound in rate limiting for values above 255 Gbps
      net/mlx5e: Fix potentially misleading debug message

Gao Xiang (1):
      erofs: avoid infinite loop due to incomplete zstd-compressed data

Gautham R. Shenoy (4):
      ACPI: CPPC: Detect preferred core availability on online CPUs
      ACPI: CPPC: Check _CPC validity for only the online CPUs
      ACPI: CPPC: Perform fast check switch only for online CPUs
      ACPI: CPPC: Limit perf ctrs in PCC check only to online CPUs

Gopi Krishna Menon (1):
      hwmon: (gpd-fan) Fix compilation error in non-ACPI builds

Haein Lee (1):
      ALSA: usb-audio: Fix NULL pointer dereference in
snd_usb_mixer_controls_badd

Hans de Goede (2):
      spi: Try to get ACPI GPIO IRQ earlier
      spi: Add TODO comment about ACPI GPIO setup

Hao Ge (1):
      codetag: debug: handle existing CODETAG_EMPTY in
mark_objexts_empty for slabobj_ext

Haotian Zhang (4):
      regulator: fixed: fix GPIO descriptor leak on register failure
      ASoC: cs4271: Fix regulator leak on probe failure
      ASoC: codecs: va-macro: fix resource leak in probe error path
      ASoC: rsnd: fix OF node reference leak in rsnd_ssiu_probe()

Harish Kasiviswanathan (1):
      drm/amdkfd: Fix GPU mappings for APU after prefetch

Harry Yoo (1):
      mm/slub: fix memory leak in free_to_pcs_bulk()

Heiko Carstens (1):
      s390/mm: Fix __ptep_rdp() inline assembly

Henrique Carvalho (1):
      smb: client: fix cifs_pick_channel when channel needs reconnect

Horatiu Vultur (1):
      net: phy: micrel: lan8814 fix reset of the QSGMII interface

Huacai Chen (5):
      LoongArch: Clarify 3 MSG interrupt features
      LoongArch: Use physical addresses for CSR_MERRENTRY/CSR_TLBRENTRY
      LoongArch: Consolidate early_ioremap()/ioremap_prot()
      LoongArch: Consolidate max_pfn & max_low_pfn calculation
      LoongArch: Use correct accessor to read FWPC/MWPC

Ian Forbes (3):
      drm/vmwgfx: Validate command header size against SVGA_CMD_MAX_DATASIZE
      drm/vmwgfx: Use kref in vmw_bo_dirty
      drm/vmwgfx: Restore Guest-Backed only cursor plane support

Ian Rogers (1):
      perf libbfd: Ensure libbfd is initialized prior to use

Ilan Peer (1):
      wifi: mac80211_hwsim: Fix possible NULL dereference

Imre Deak (1):
      drm/i915/dp_mst: Disable Panel Replay

Isaac J. Manjarres (1):
      mm/mm_init: fix hash table order logging in alloc_large_system_hash()

Ivan Lipski (1):
      drm/amd/display: Allow VRR params change if unsynced with the stream

James Clark (1):
      dma-mapping: Allow use of DMA_BIT_MASK(64) in global scope

Jani Nikula (1):
      drm/i915/psr: fix pipe to vblank conversion

Jens Axboe (2):
      MAINTAINERS: correct git location for block layer tree
      io_uring/rw: ensure allocated iovec gets cleared for early failure

Jesse.Zhang (1):
      drm/amdgpu: fix lock warning in amdgpu_userq_fence_driver_process

Jiayuan Chen (3):
      mptcp: Disallow MPTCP subflows from sockmap
      mptcp: Fix proto fallback detection with BPF
      selftests/bpf: Add mptcp test with sockmap

Jiri Olsa (4):
      Revert "perf/x86: Always store regs->ip in perf_callchain_kernel()"
      x86/fgraph,bpf: Fix stack ORC unwind from kprobe_multi return probe
      selftests/bpf: Add stacktrace ips test for kprobe_multi/kretprobe_multi
      selftests/bpf: Add stacktrace ips test for raw_tp

Johan Hovold (1):
      mmc: wmt-sdmmc: fix compile test default

Johannes Berg (2):
      wifi: mac80211: reject address change while connecting
      wifi: iwlwifi: mvm: fix beacon template/fixed rate

Jonas Gorski (1):
      net: dsa: tag_brcm: do not mark link local traffic as offloaded

Jonathan Kim (1):
      drm/amdkfd: relax checks for over allocation of save area

Joshua Rogers (2):
      smb: server: rdma: avoid unmapping posted recv on accept failure
      ksmbd: close accepted socket when per-IP limit rejects connection

Junjie Cao (1):
      wifi: iwlwifi: fix aux ROC time event iterator usage

Kairui Song (2):
      mm/shmem: fix THP allocation and fallback loop
      mm, swap: fix potential UAF issue for VMA readahead

Kaushlendra Kumar (1):
      ACPI: MRRM: Fix memory leaks and improve error handling

Kiryl Shutsemau (3):
      mm/memory: do not populate page table entries beyond i_size
      mm/truncate: unmap large folio on split failure
      MAINTAINERS: Update name spelling

Kriish Sharma (1):
      ethtool: fix incorrect kernel-doc style comment in ethtool.h

Kuniyuki Iwashima (2):
      tipc: Fix use-after-free in tipc_mon_reinit_self().
      af_unix: Initialise scc_index in unix_add_edge().

Lance Yang (1):
      mm/secretmem: fix use-after-free race in fault handler

Linus Torvalds (1):
      Linux 6.18-rc6

Luiz Augusto von Dentz (2):
      Bluetooth: hci_conn: Fix not cleaning up PA_LINK connections
      Bluetooth: hci_event: Fix not handling PA Sync Lost event

Lushih Hsieh (1):
      ALSA: usb-audio: Add native DSD quirks for PureAudio DAC series

Magnus Lindholm (1):
      MAINTAINERS: Add Magnus Lindholm as maintainer for alpha port

Marc Zyngier (3):
      KVM: arm64: Make all 32bit ID registers fully writable
      KVM: arm64: Set ID_{AA64PFR0,PFR1}_EL1.GIC when GICv3 is configured
      KVM: arm64: Limit clearing of ID_{AA64PFR0,PFR1}_EL1.GIC to
userspace irqchip

Marek Szyprowski (1):
      pmdomain: samsung: Rework legacy splash-screen handover workaround

Mario Limonciello (1):
      x86/CPU/AMD: Add additional fixed RDSEED microcode revisions

Mario Limonciello (AMD) (3):
      PM: hibernate: Emit an error when image writing fails
      PM: hibernate: Use atomic64_t for compressed_size variable
      PM: hibernate: Fix style issues in save_compressed_image()

Mark Brown (2):
      KVM: arm64: selftests: Add SCTLR2_EL2 to get-reg-list
      KVM: arm64: selftests: Filter ZCR_EL2 in get-reg-list

Martin Kaiser (1):
      maple_tree: fix tracepoint string pointers

Matthieu Baerts (NGI0) (6):
      selftests: mptcp: connect: fix fallback note due to OoO
      selftests: mptcp: join: rm: set backup flag
      selftests: mptcp: join: endpoints: longer transfer
      selftests: mptcp: join: userspace: longer transfer
      selftests: mptcp: connect: trunc: read all recv data
      selftests: mptcp: join: properly kill background tasks

Max Chou (1):
      Bluetooth: btrtl: Avoid loading the config file on security chips

Maxim Levitsky (1):
      KVM: SVM: switch to raw spinlock for svm->ir_list_lock

Maximilian Dittgen (1):
      KVM: selftests: fix MAPC RDbase target formatting in vgic_lpi_stress

Miaoqian Lin (3):
      pmdomain: imx: Fix reference count leak in imx_gpc_remove
      crypto: hisilicon/qm - Fix device reference leak in qm_get_qos_value
      ASoC: sdw_utils: fix device reference leak in is_sdca_endpoint_present()

Mike Snitzer (5):
      nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE support
      nfs/localio: add refcounting for each iocb IO associated with
NFS pgio header
      nfs/localio: backfill missing partial read support for misaligned DIO
      nfs/localio: Ensure DIO WRITE's IO on stable storage upon completion
      nfs/localio: do not issue misaligned DIO out-of-order

Miri Korenblit (1):
      wifi: iwlwifi: mld: always take beacon ies in link grading

Mykyta Yatsenko (2):
      bpf:add _impl suffix for bpf_task_work_schedule* kfuncs
      bpf: add _impl suffix for bpf_stream_vprintk() kfunc

Naohiro Aota (2):
      btrfs: zoned: fix conventional zone capacity calculation
      btrfs: zoned: fix stripe width calculation

Nate Karstens (1):
      strparser: Fix signed/unsigned mismatch bug

Nathan Chancellor (1):
      riscv: Fix CONFIG_AS_HAS_INSN for new .insn usage

NeilBrown (2):
      nfsd: fix refcount leak in nfsd_set_fh_dentry()
      nfsd: ensure SEQUENCE replay sends a valid reply.

Nick Hu (1):
      irqchip/riscv-intc: Add missing free() callback in riscv_intc_domain_ops

Nicolas Dichtel (1):
      bonding: fix mii_status when slave is down

Nicolas Escande (1):
      wifi: ath11k: zero init info->status in wmi_process_mgmt_tx_comp()

Niranjan H Y (2):
      ASoC: tas2783A: Fix issues in firmware parsing
      ASoC: SDCA: bug fix while parsing mipi-sdca-control-cn-list

Niravkumar L Rabara (2):
      EDAC/altera: Handle OCRAM ECC enable after warm reset
      EDAC/altera: Use INTTEST register for Ethernet and USB SBE injection

Nitin Gote (2):
      drm/xe/xe3: Add WA_14024681466 for Xe3_LPG
      drm/xe/xe3lpg: Extend Wa_15016589081 for xe3lpg

Olga Kornievskaia (2):
      nfsd: add missing FATTR4_WORD2_CLONE_BLKSIZE from supported attributes
      NFSD: free copynotify stateid in nfs4_free_ol_stateid()

Oliver Upton (3):
      KVM: arm64: vgic-v3: Reinstate IRQ lock ordering for LPI xarray
      KVM: arm64: vgic-v3: Release reserved slot outside of lpi_xa's lock
      MAINTAINERS: Switch myself to using kernel.org address

Pasha Tatashin (4):
      kho: warn and fail on metadata or preserved memory in scratch area
      kho: increase metadata bitmap size to PAGE_SIZE
      kho: allocate metadata directly from the buddy allocator
      lib/test_kho: check if KHO is enabled

Pauli Virtanen (7):
      ALSA: usb-audio: add min_mute quirk for SteelSeries Arctis
      Bluetooth: MGMT: cancel mesh send timer when hdev removed
      Bluetooth: 6lowpan: reset link-local header on ipv6 recv path
      Bluetooth: 6lowpan: fix BDADDR_LE vs ADDR_LE_DEV address type confusion
      Bluetooth: L2CAP: export l2cap_chan_hold for modules
      Bluetooth: 6lowpan: Don't hold spin lock over sleeping functions
      Bluetooth: 6lowpan: add missing l2cap_chan_lock()

Pavel Begunkov (1):
      io_uring/query: return number of available queries

Pawel Dembicki (1):
      wifi: mwl8k: inject DSSS Parameter Set element into beacons if missing

Pedro Demarchi Gomes (1):
      ksm: use range-walk function to jump over holes in scan_get_next_rmap_item

Peter Oberparleiter (1):
      gcov: add support for GCC 15

Pierre-Eric Pelloux-Prayer (1):
      drm/amdgpu: jump to the correct label on failure

Pratyush Yadav (3):
      kho: fix out-of-bounds access of vmalloc chunk
      kho: fix unpreservation of higher-order vmalloc preservations
      kho: warn and exit when unpreserved page wasn't preserved

Punit Agrawal (2):
      Revert "ACPI: Suppress misleading SPCR console message when SPCR
table is absent"
      arm64: acpi: Drop message logging SPCR default console

Qiang Ma (1):
      LoongArch: kexec: Print out debugging message if required

Qinxin Xia (1):
      dma-mapping: benchmark: Restore padding to ensure uABI remained consistent

Quanmin Yan (2):
      mm/damon/stat: change last_refresh_jiffies to a global variable
      mm/damon/sysfs: change next_update_jiffies to a global variable

Rakuram Eswaran (1):
      mmc: pxamci: Simplify pxamci_probe() error handling using devm APIs

Randy Dunlap (1):
      drm/client: fix MODULE_PARM_DESC string for "active"

Ranganath V N (2):
      net: sched: act_connmark: initialize struct tc_ife to fix kernel leak
      net: sched: act_ife: initialize struct tc_ife to fix KMSAN kernel-infoleak

Raphael Pinsonneault-Thibeault (1):
      Bluetooth: btusb: reorder cleanup in btusb_disconnect to avoid UAF

Ravi Bangoria (2):
      perf lock: Fix segfault due to missing kernel map
      perf test: Fix lock contention test

Richard Fitzgerald (1):
      ASoC: doc: cs35l56: Update firmware filename description for B0 silicon

Robin Gong (1):
      spi: imx: keep dma request disabled before dma transfer setup

Ryan Roberts (3):
      arm64: mm: Don't sleep in split_kernel_leaf_mapping() when in
atomic context
      arm64: mm: Optimize range_split_to_ptes()
      arm64: mm: Tidy up force_pte_mapping()

Sami Tolvanen (1):
      gendwarfksyms: Skip files with no exports

Samuel Holland (1):
      RISC-V: KVM: Fix check for local interrupts on riscv32

Sascha Bischoff (1):
      KVM: arm64: vgic-v3: Trap all if no in-kernel irqchip

Sathishkumar S (1):
      drm/amdgpu/jpeg: Add parse_cs for JPEG5_0_1

Sean Christopherson (7):
      KVM: VMX: Inject #UD if guest tries to execute SEAMCALL or TDCALL
      KVM: x86: Unload "FPU" state on INIT if and only if its currently in-use
      KVM: x86: Harden KVM against imbalanced load/put of guest FPU state
      KVM: SVM: Initialize per-CPU svm_data at the end of hardware setup
      KVM: SVM: Unregister KVM's GALog notifier on kvm-amd.ko exit
      KVM: SVM: Make avic_ga_log_notifier() local to avic.c
      KVM: guest_memfd: Remove bindings on memslot deletion when gmem is dying

Sebastian Ene (1):
      KVM: arm64: Check the untrusted offset in FF-A memory share

Shawn Lin (3):
      mmc: sdhci-of-dwcmshc: Change DLL_STRBIN_TAPNUM_DEFAULT to 0x4
      mmc: dw_mmc-rockchip: Fix wrong internal phase calculate
      PCI/ASPM: Avoid L0s and L1 on Hi1105 [19e5:1105] Wi-Fi

Shenghao Ding (1):
      ASoC: tas2781: fix getting the wrong device number

Shuai Xue (1):
      acpi,srat: Fix incorrect device handle check for Generic Initiator

Shubhrajyoti Datta (1):
      EDAC/versalnet: Handle split messages for non-standard errors

Song Liu (3):
      ftrace: Fix BPF fexit with livepatch
      ftrace: bpf: Fix IPMODIFY + DIRECT in modify_ftrace_direct()
      selftests/bpf: Add tests for livepatch + bpf trampoline

Sourabh Jain (1):
      crash: fix crashkernel resource shrink

Srinivas Pandruvada (1):
      cpufreq: intel_pstate: Check IDA only before MSR_IA32_PERF_CTL writes

Stefan Metzmacher (2):
      smb: server: let smb_direct_disconnect_rdma_connection() turn
CREATED into DISCONNECTED
      smb: client: let smbd_disconnect_rdma_connection() turn CREATED
into DISCONNECTED

Steven Rostedt (1):
      selftests/tracing: Run sample events to clear page cache events

Sudeep Holla (1):
      pmdomain: arm: scmi: Fix genpd leak on provider registration failure

Sukrit Bhatnagar (1):
      KVM: VMX: Fix check for valid GVA on an EPT violation

Sultan Alsawaf (1):
      drm/amd/amdgpu: Ensure isp_kernel_buffer_alloc() creates a new BO

Takashi Iwai (2):
      ALSA: hda/hdmi: Fix breakage at probing nvhdmi-mcp driver
      ALSA: usb-audio: Fix potential overflow of PCM transfer buffer

Takashi Sakamoto (1):
      firewire: core: fix to update generation field in topology map

Tangudu Tilak Tirumalesh (1):
      drm/xe/xe3: Extend wa_14023061436

Thomas Falcon (1):
      perf header: Write bpf_prog (infos|btfs)_cnt to data file

Tianyang Zhang (1):
      LoongArch: Let {pte,pmd}_modify() record the status of _PAGE_DIRTY

Tiezhu Yang (1):
      LoongArch: Refine the init_hw_perf_events() function

Trond Myklebust (6):
      pnfs: Fix TLS logic in _nfs4_pnfs_v3_ds_connect()
      pnfs: Fix TLS logic in _nfs4_pnfs_v4_ds_connect()
      pnfs: Set transport security policy to RPC_XPRTSEC_NONE unless using TLS
      NFS: Check the TLS certificate fields in nfs_match_client()
      NFSv2/v3: Fix error handling in nfs_atomic_open_v23()
      NFSv4: Fix an incorrect parameter when calling nfs4_call_sync()

Victor Nogueira (2):
      net/sched: Abort __tc_modify_qdisc if parent is a clsact/ingress qdisc
      selftests/tc-testing: Create tests trying to add children to
clsact/ingress qdiscs

Ville Syrjälä (1):
      firewire: core: Initialize topology_map.lock

Vincent Donnefort (1):
      KVM: arm64: Check range args for pKVM mem transitions

Vishal Moola (Oracle) (1):
      LoongArch: Remove __GFP_HIGHMEM masking in pud_alloc_one()

Vitaly Prosyak (1):
      drm/amdgpu: disable peer-to-peer access for DCC-enabled GC12 VRAM surfaces

Wei Fang (1):
      net: fec: correct rx_bytes statistic for the case SHIFT16 is set

Wei Yang (1):
      fs/proc: fix uaf in proc_readdir_de()

Xi Ruoyao (1):
      rust: Add -fno-isolate-erroneous-paths-dereference to bindgen_skip_c_flags

Xuan Zhuo (1):
      virtio-net: fix incorrect flags recording in big mode

Yang Shi (1):
      arm64: kprobes: check the return value of set_memory_rox()

Yang Xiuwei (1):
      NFS: sysfs: fix leak when nfs_client kobject add fails

Yiqi Sun (1):
      smb: fix invalid username check in smb3_fs_context_parse_param()

Yosry Ahmed (3):
      KVM: SVM: Mark VMCB_LBR dirty when MSR_IA32_DEBUGCTLMSR is updated
      KVM: nSVM: Always recalculate LBR MSR intercepts in svm_update_lbrv()
      KVM: nSVM: Fix and simplify LBR virtualization handling with nested

Youling Tang (1):
      LoongArch: kexec: Initialize the kexec_buf structure

Zahari Doychev (1):
      tools: ynl: call nested attribute free function for indexed arrays

Zi Yan (3):
      mm/huge_memory: do not change split_huge_page*() target order silently
      mm/huge_memory: preserve PG_has_hwpoisoned if a folio is split to >0 order
      mm/huge_memory: fix folio split check for anon folios in swapcache

Zilin Guan (3):
      btrfs: scrub: put bio after errors in scrub_raid56_parity_stripe()
      btrfs: release root after error in data_reloc_print_warning_inode()
      net/handshake: Fix memory leak in tls_handshake_accept()

shechenglong (2):
      arm64: proton-pack: Drop print when
!CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY
      arm64: proton-pack: Fix hard lockup due to print in scheduler context

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-16 22:42 Linux 6.18-rc6 Linus Torvalds
@ 2025-11-17  8:20 ` David Wang
  2025-11-17 10:33   ` Linus Torvalds
  2025-11-17 18:13 ` Guenter Roeck
  2025-11-18 17:23 ` Stephanie Gawroriski
  2 siblings, 1 reply; 39+ messages in thread
From: David Wang @ 2025-11-17  8:20 UTC (permalink / raw)
  To: torvalds; +Cc: linux-kernel

Hi,

After upgrade to 6.18-rc6, all my golang programs start to crash, even Go compiler crashes;
and when I started bisect, `make vmlinux` crashes too.
I am running bisect with 6.18-rc5 now, any chance this has already caught/fixed?


Thanks
David


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17  8:20 ` David Wang
@ 2025-11-17 10:33   ` Linus Torvalds
  2025-11-17 12:56     ` David Wang
  0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2025-11-17 10:33 UTC (permalink / raw)
  To: David Wang; +Cc: linux-kernel

On Mon, 17 Nov 2025 at 00:20, David Wang <00107082@163.com> wrote:
>
> After upgrade to 6.18-rc6, all my golang programs start to crash, even Go compiler crashes;
> and when I started bisect, `make vmlinux` crashes too.

Funky funky. Certainly doesn't happen here.

> I am running bisect with 6.18-rc5 now, any chance this has already caught/fixed?

Please do run the bisect (obviously you'll have to build using a
kernel that works for you), I am not aware of anybody reporting
anything like this.

               Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 10:33   ` Linus Torvalds
@ 2025-11-17 12:56     ` David Wang
  2025-11-17 13:30       ` David Hildenbrand (Red Hat)
  2025-11-17 16:42       ` Linus Torvalds
  0 siblings, 2 replies; 39+ messages in thread
From: David Wang @ 2025-11-17 12:56 UTC (permalink / raw)
  To: Linus Torvalds, catalin.marinas, david, lance.yang, b-padhi, akpm
  Cc: linux-kernel


At 2025-11-17 18:33:58, "Linus Torvalds" <torvalds@linux-foundation.org> wrote:
>On Mon, 17 Nov 2025 at 00:20, David Wang <00107082@163.com> wrote:
>>
>> After upgrade to 6.18-rc6, all my golang programs start to crash, even Go compiler crashes;
>> and when I started bisect, `make vmlinux` crashes too.
>
>Funky funky. Certainly doesn't happen here.
>
>> I am running bisect with 6.18-rc5 now, any chance this has already caught/fixed?
>
>Please do run the bisect (obviously you'll have to build using a
>kernel that works for you), I am not aware of anybody reporting
>anything like this.

Hi,

Bisect narrowed it down to 
[adfb6609c6809e107ded9a1cd46f519c882e64ea] mm/huge_memory: initialise the tags of the huge zero folio

It seems happen to program build with older version of go, mine is 1.18.4;   and I cannot reproduce it with go1.25.4.

When I upgraded to 6.18-rc6,  go1.18.4 compiler/program would crash with,

    fatal error: arena already initialized
    
    runtime stack:
    runtime.throw({0x9ea9fe?, 0x0?})
	    /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0x7ffdf458a248 sp=0x7ffdf458a218 pc=0x435ab1
    runtime.(*mheap).sysAlloc(0xddec20, 0x0?)
	    /usr/local/go/src/runtime/malloc.go:749 +0x2e9 fp=0x7ffdf458a2e0 sp=0x7ffdf458a248 pc=0x40dee9
    ...


And when I `make vmlinux`, I got something like this :
	  AR      vmlinux.a
	  LD      vmlinux.o
	  GEN     .vmlinux.objs
	  MODPOST vmlinux.symvers
	  CC      .vmlinux.export.o
	  UPD     include/generated/utsversion.h
	  CC      init/version-timestamp.o
	  KSYMS   .tmp_vmlinux0.kallsyms.S
	  AS      .tmp_vmlinux0.kallsyms.o
	  LD      .tmp_vmlinux1
	  NM      .tmp_vmlinux1.syms
	  KSYMS   .tmp_vmlinux1.kallsyms.S
	  AS      .tmp_vmlinux1.kallsyms.o
	.tmp_vmlinux1.kallsyms.S:361542:19: internal compiler error: Aborted
	361542 |         .byte 0x00, 0x99, 0xe9  /* T__pfx___x64_sys_inotify_add_watch */
	       |                   ^
	0x7f983a551def ???
		./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
	0x7f983a5a695c __pthread_kill_implementation
		./nptl/pthread_kill.c:44
	0x7f983a551cc1 __GI_raise
		../sysdeps/posix/raise.c:26
	0x7f983a53a4ab __GI_abort
		./stdlib/abort.c:73
	0x7f983a53bca7 __libc_start_call_main
		../sysdeps/nptl/libc_start_call_main.h:58
	0x7f983a53bd64 __libc_start_main_impl
		../csu/libc-start.c:360
	Please submit a full bug report, with preprocessed source (by using -freport-bug).
	Please include the complete backtrace with any bug report.
	See <file:///usr/share/doc/gcc-14/README.Bugs> for instructions.
	make[2]: *** [scripts/Makefile.vmlinux:72: vmlinux.unstripped] Error 1
	make[1]: *** [/home/linan/codes/linux-kernel/linux/Makefile:1242: vmlinux] Error 2
	make: *** [Makefile:248: __sub-make] Error 2
$ gcc --version
gcc (Debian 14.2.0-19) 14.2.0

(It dose not always happens, sometimes I don't get this crash and It dose not always err on same line.)

And nothing abnormal shows up in kernel message.

Revert adfb6609c6809e107ded9a1cd46f519c882e64ea can fix my crashing go programs.



FYI
David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 12:56     ` David Wang
@ 2025-11-17 13:30       ` David Hildenbrand (Red Hat)
  2025-11-17 13:45         ` David Wang
  2025-11-17 16:42       ` Linus Torvalds
  1 sibling, 1 reply; 39+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-17 13:30 UTC (permalink / raw)
  To: David Wang, Linus Torvalds, catalin.marinas, lance.yang, b-padhi,
	akpm
  Cc: linux-kernel, Jan Polensky

On 17.11.25 13:56, David Wang wrote:
> 
> At 2025-11-17 18:33:58, "Linus Torvalds" <torvalds@linux-foundation.org> wrote:
>> On Mon, 17 Nov 2025 at 00:20, David Wang <00107082@163.com> wrote:
>>>
>>> After upgrade to 6.18-rc6, all my golang programs start to crash, even Go compiler crashes;
>>> and when I started bisect, `make vmlinux` crashes too.
>>
>> Funky funky. Certainly doesn't happen here.
>>
>>> I am running bisect with 6.18-rc5 now, any chance this has already caught/fixed?
>>
>> Please do run the bisect (obviously you'll have to build using a
>> kernel that works for you), I am not aware of anybody reporting
>> anything like this.
> 
> Hi,
> 
> Bisect narrowed it down to
> [adfb6609c6809e107ded9a1cd46f519c882e64ea] mm/huge_memory: initialise the tags of the huge zero folio
> 
> It seems happen to program build with older version of go, mine is 1.18.4;   and I cannot reproduce it with go1.25.4.
> 
> When I upgraded to 6.18-rc6,  go1.18.4 compiler/program would crash with,
> 
>      fatal error: arena already initialized
>      
>      runtime stack:
>      runtime.throw({0x9ea9fe?, 0x0?})
> 	    /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0x7ffdf458a248 sp=0x7ffdf458a218 pc=0x435ab1
>      runtime.(*mheap).sysAlloc(0xddec20, 0x0?)
> 	    /usr/local/go/src/runtime/malloc.go:749 +0x2e9 fp=0x7ffdf458a2e0 sp=0x7ffdf458a248 pc=0x40dee9
>      ...
> 
> 
> And when I `make vmlinux`, I got something like this :
> 	  AR      vmlinux.a
> 	  LD      vmlinux.o
> 	  GEN     .vmlinux.objs
> 	  MODPOST vmlinux.symvers
> 	  CC      .vmlinux.export.o
> 	  UPD     include/generated/utsversion.h
> 	  CC      init/version-timestamp.o
> 	  KSYMS   .tmp_vmlinux0.kallsyms.S
> 	  AS      .tmp_vmlinux0.kallsyms.o
> 	  LD      .tmp_vmlinux1
> 	  NM      .tmp_vmlinux1.syms
> 	  KSYMS   .tmp_vmlinux1.kallsyms.S
> 	  AS      .tmp_vmlinux1.kallsyms.o
> 	.tmp_vmlinux1.kallsyms.S:361542:19: internal compiler error: Aborted
> 	361542 |         .byte 0x00, 0x99, 0xe9  /* T__pfx___x64_sys_inotify_add_watch */
> 	       |                   ^
> 	0x7f983a551def ???
> 		./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
> 	0x7f983a5a695c __pthread_kill_implementation
> 		./nptl/pthread_kill.c:44
> 	0x7f983a551cc1 __GI_raise
> 		../sysdeps/posix/raise.c:26
> 	0x7f983a53a4ab __GI_abort
> 		./stdlib/abort.c:73
> 	0x7f983a53bca7 __libc_start_call_main
> 		../sysdeps/nptl/libc_start_call_main.h:58
> 	0x7f983a53bd64 __libc_start_main_impl
> 		../csu/libc-start.c:360
> 	Please submit a full bug report, with preprocessed source (by using -freport-bug).
> 	Please include the complete backtrace with any bug report.
> 	See <file:///usr/share/doc/gcc-14/README.Bugs> for instructions.
> 	make[2]: *** [scripts/Makefile.vmlinux:72: vmlinux.unstripped] Error 1
> 	make[1]: *** [/home/linan/codes/linux-kernel/linux/Makefile:1242: vmlinux] Error 2
> 	make: *** [Makefile:248: __sub-make] Error 2
> $ gcc --version
> gcc (Debian 14.2.0-19) 14.2.0
> 
> (It dose not always happens, sometimes I don't get this crash and It dose not always err on same line.)
> 
> And nothing abnormal shows up in kernel message.
> 
> Revert adfb6609c6809e107ded9a1cd46f519c882e64ea can fix my crashing go programs.


I just replied privately to a similar report the following:

Hi,

I observed something similar while testing on Friday between rc4 (good)
and rc5+ (bad).


I'm sure this it the known issue of
adfb6609c6809e107ded9a1cd46f519c882e64ea we discussed already here [1].


@Jan, can you send the fix out today? Otherwise I can take care of this
so we get this fixed asap.

[1] https://lkml.kernel.org/r/20251109003613.1461433-1-japo@linux.ibm.com


-- 
Cheers

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 13:30       ` David Hildenbrand (Red Hat)
@ 2025-11-17 13:45         ` David Wang
  2025-11-17 14:08           ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 39+ messages in thread
From: David Wang @ 2025-11-17 13:45 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Linus Torvalds, catalin.marinas, lance.yang, b-padhi, akpm,
	linux-kernel, Jan Polensky



At 2025-11-17 21:30:42, "David Hildenbrand (Red Hat)" <david@kernel.org> wrote:
>On 17.11.25 13:56, David Wang wrote:
>> 
>> At 2025-11-17 18:33:58, "Linus Torvalds" <torvalds@linux-foundation.org> wrote:
>>> On Mon, 17 Nov 2025 at 00:20, David Wang <00107082@163.com> wrote:
>>>>
>>>> After upgrade to 6.18-rc6, all my golang programs start to crash, even Go compiler crashes;
>>>> and when I started bisect, `make vmlinux` crashes too.
>>>
>>> Funky funky. Certainly doesn't happen here.
>>>
>>>> I am running bisect with 6.18-rc5 now, any chance this has already caught/fixed?
>>>
>>> Please do run the bisect (obviously you'll have to build using a
>>> kernel that works for you), I am not aware of anybody reporting
>>> anything like this.
>> 
>> Hi,
>> 
>> Bisect narrowed it down to
>> [adfb6609c6809e107ded9a1cd46f519c882e64ea] mm/huge_memory: initialise the tags of the huge zero folio
>> 
>> It seems happen to program build with older version of go, mine is 1.18.4;   and I cannot reproduce it with go1.25.4.
>> 
>> When I upgraded to 6.18-rc6,  go1.18.4 compiler/program would crash with,
>> 
>>      fatal error: arena already initialized
>>      
>>      runtime stack:
>>      runtime.throw({0x9ea9fe?, 0x0?})
>> 	    /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0x7ffdf458a248 sp=0x7ffdf458a218 pc=0x435ab1
>>      runtime.(*mheap).sysAlloc(0xddec20, 0x0?)
>> 	    /usr/local/go/src/runtime/malloc.go:749 +0x2e9 fp=0x7ffdf458a2e0 sp=0x7ffdf458a248 pc=0x40dee9
>>      ...
>> 
>> 
>> And when I `make vmlinux`, I got something like this :
>> 	  AR      vmlinux.a
>> 	  LD      vmlinux.o
>> 	  GEN     .vmlinux.objs
>> 	  MODPOST vmlinux.symvers
>> 	  CC      .vmlinux.export.o
>> 	  UPD     include/generated/utsversion.h
>> 	  CC      init/version-timestamp.o
>> 	  KSYMS   .tmp_vmlinux0.kallsyms.S
>> 	  AS      .tmp_vmlinux0.kallsyms.o
>> 	  LD      .tmp_vmlinux1
>> 	  NM      .tmp_vmlinux1.syms
>> 	  KSYMS   .tmp_vmlinux1.kallsyms.S
>> 	  AS      .tmp_vmlinux1.kallsyms.o
>> 	.tmp_vmlinux1.kallsyms.S:361542:19: internal compiler error: Aborted
>> 	361542 |         .byte 0x00, 0x99, 0xe9  /* T__pfx___x64_sys_inotify_add_watch */
>> 	       |                   ^
>> 	0x7f983a551def ???
>> 		./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
>> 	0x7f983a5a695c __pthread_kill_implementation
>> 		./nptl/pthread_kill.c:44
>> 	0x7f983a551cc1 __GI_raise
>> 		../sysdeps/posix/raise.c:26
>> 	0x7f983a53a4ab __GI_abort
>> 		./stdlib/abort.c:73
>> 	0x7f983a53bca7 __libc_start_call_main
>> 		../sysdeps/nptl/libc_start_call_main.h:58
>> 	0x7f983a53bd64 __libc_start_main_impl
>> 		../csu/libc-start.c:360
>> 	Please submit a full bug report, with preprocessed source (by using -freport-bug).
>> 	Please include the complete backtrace with any bug report.
>> 	See <file:///usr/share/doc/gcc-14/README.Bugs> for instructions.
>> 	make[2]: *** [scripts/Makefile.vmlinux:72: vmlinux.unstripped] Error 1
>> 	make[1]: *** [/home/linan/codes/linux-kernel/linux/Makefile:1242: vmlinux] Error 2
>> 	make: *** [Makefile:248: __sub-make] Error 2
>> $ gcc --version
>> gcc (Debian 14.2.0-19) 14.2.0
>> 
>> (It dose not always happens, sometimes I don't get this crash and It dose not always err on same line.)
>> 
>> And nothing abnormal shows up in kernel message.
>> 
>> Revert adfb6609c6809e107ded9a1cd46f519c882e64ea can fix my crashing go programs.
>
>
>I just replied privately to a similar report the following:
>
>Hi,
>
>I observed something similar while testing on Friday between rc4 (good)
>and rc5+ (bad).
>
>
>I'm sure this it the known issue of
>adfb6609c6809e107ded9a1cd46f519c882e64ea we discussed already here [1].
>
>
>@Jan, can you send the fix out today? Otherwise I can take care of this
>so we get this fixed asap.
>
>[1] https://lkml.kernel.org/r/20251109003613.1461433-1-japo@linux.ibm.com
>
>
>-- 
>Cheers
>
>David

Good to know~

My system is AMD, I would be glad to test the patch when it is ready.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 13:45         ` David Wang
@ 2025-11-17 14:08           ` David Hildenbrand (Red Hat)
  2025-11-17 15:28             ` David Wang
                               ` (3 more replies)
  0 siblings, 4 replies; 39+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-17 14:08 UTC (permalink / raw)
  To: David Wang
  Cc: Linus Torvalds, catalin.marinas, lance.yang, b-padhi, akpm,
	linux-kernel, Jan Polensky

On 17.11.25 14:45, David Wang wrote:
> 
> 
> At 2025-11-17 21:30:42, "David Hildenbrand (Red Hat)" <david@kernel.org> wrote:
>> On 17.11.25 13:56, David Wang wrote:
>>>
>>> At 2025-11-17 18:33:58, "Linus Torvalds" <torvalds@linux-foundation.org> wrote:
>>>> On Mon, 17 Nov 2025 at 00:20, David Wang <00107082@163.com> wrote:
>>>>>
>>>>> After upgrade to 6.18-rc6, all my golang programs start to crash, even Go compiler crashes;
>>>>> and when I started bisect, `make vmlinux` crashes too.
>>>>
>>>> Funky funky. Certainly doesn't happen here.
>>>>
>>>>> I am running bisect with 6.18-rc5 now, any chance this has already caught/fixed?
>>>>
>>>> Please do run the bisect (obviously you'll have to build using a
>>>> kernel that works for you), I am not aware of anybody reporting
>>>> anything like this.
>>>
>>> Hi,
>>>
>>> Bisect narrowed it down to
>>> [adfb6609c6809e107ded9a1cd46f519c882e64ea] mm/huge_memory: initialise the tags of the huge zero folio
>>>
>>> It seems happen to program build with older version of go, mine is 1.18.4;   and I cannot reproduce it with go1.25.4.
>>>
>>> When I upgraded to 6.18-rc6,  go1.18.4 compiler/program would crash with,
>>>
>>>       fatal error: arena already initialized
>>>       
>>>       runtime stack:
>>>       runtime.throw({0x9ea9fe?, 0x0?})
>>> 	    /usr/local/go/src/runtime/panic.go:992 +0x71 fp=0x7ffdf458a248 sp=0x7ffdf458a218 pc=0x435ab1
>>>       runtime.(*mheap).sysAlloc(0xddec20, 0x0?)
>>> 	    /usr/local/go/src/runtime/malloc.go:749 +0x2e9 fp=0x7ffdf458a2e0 sp=0x7ffdf458a248 pc=0x40dee9
>>>       ...
>>>
>>>
>>> And when I `make vmlinux`, I got something like this :
>>> 	  AR      vmlinux.a
>>> 	  LD      vmlinux.o
>>> 	  GEN     .vmlinux.objs
>>> 	  MODPOST vmlinux.symvers
>>> 	  CC      .vmlinux.export.o
>>> 	  UPD     include/generated/utsversion.h
>>> 	  CC      init/version-timestamp.o
>>> 	  KSYMS   .tmp_vmlinux0.kallsyms.S
>>> 	  AS      .tmp_vmlinux0.kallsyms.o
>>> 	  LD      .tmp_vmlinux1
>>> 	  NM      .tmp_vmlinux1.syms
>>> 	  KSYMS   .tmp_vmlinux1.kallsyms.S
>>> 	  AS      .tmp_vmlinux1.kallsyms.o
>>> 	.tmp_vmlinux1.kallsyms.S:361542:19: internal compiler error: Aborted
>>> 	361542 |         .byte 0x00, 0x99, 0xe9  /* T__pfx___x64_sys_inotify_add_watch */
>>> 	       |                   ^
>>> 	0x7f983a551def ???
>>> 		./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
>>> 	0x7f983a5a695c __pthread_kill_implementation
>>> 		./nptl/pthread_kill.c:44
>>> 	0x7f983a551cc1 __GI_raise
>>> 		../sysdeps/posix/raise.c:26
>>> 	0x7f983a53a4ab __GI_abort
>>> 		./stdlib/abort.c:73
>>> 	0x7f983a53bca7 __libc_start_call_main
>>> 		../sysdeps/nptl/libc_start_call_main.h:58
>>> 	0x7f983a53bd64 __libc_start_main_impl
>>> 		../csu/libc-start.c:360
>>> 	Please submit a full bug report, with preprocessed source (by using -freport-bug).
>>> 	Please include the complete backtrace with any bug report.
>>> 	See <file:///usr/share/doc/gcc-14/README.Bugs> for instructions.
>>> 	make[2]: *** [scripts/Makefile.vmlinux:72: vmlinux.unstripped] Error 1
>>> 	make[1]: *** [/home/linan/codes/linux-kernel/linux/Makefile:1242: vmlinux] Error 2
>>> 	make: *** [Makefile:248: __sub-make] Error 2
>>> $ gcc --version
>>> gcc (Debian 14.2.0-19) 14.2.0
>>>
>>> (It dose not always happens, sometimes I don't get this crash and It dose not always err on same line.)
>>>
>>> And nothing abnormal shows up in kernel message.
>>>
>>> Revert adfb6609c6809e107ded9a1cd46f519c882e64ea can fix my crashing go programs.
>>
>>
>> I just replied privately to a similar report the following:
>>
>> Hi,
>>
>> I observed something similar while testing on Friday between rc4 (good)
>> and rc5+ (bad).
>>
>>
>> I'm sure this it the known issue of
>> adfb6609c6809e107ded9a1cd46f519c882e64ea we discussed already here [1].
>>
>>
>> @Jan, can you send the fix out today? Otherwise I can take care of this
>> so we get this fixed asap.
>>
>> [1] https://lkml.kernel.org/r/20251109003613.1461433-1-japo@linux.ibm.com
>>
>>
>> -- 
>> Cheers
>>
>> David
> 
> Good to know~
> 
> My system is AMD, I would be glad to test the patch when it is ready.

To not lose too much time, I just pushed the following patch to

https://github.com/davidhildenbrand/linux.git zerotags

It would be great if you could give that a spin, I'm still
cross-compiling it on a bunch of targets.


 From 58e62699f77738188730d489accd01ad8e3cdeeb Mon Sep 17 00:00:00 2001
From: "David Hildenbrand (Red Hat)" <david@kernel.org>
Date: Mon, 17 Nov 2025 14:49:35 +0100
Subject: [PATCH] mm/huge_memory: fix __GFP_ZEROTAGS on architectures without
  memory tags

Unfortunately, __GFP_ZEROTAGS is not properly ignored on architectures
without memory tags (i.e., on all architectures except arm64), and
ends up calling an empty stub tag_clear_highpage().

Common code in post_alloc_hook() assumes that when we call
tag_clear_highpage(), that both the tags and the memory were clear --
to then skip actual clearing of the memory.

So ever since we started allocating the huge zero folio with __GFP_ZEROTAGS
that implies that we will not be clearing out the content of the huge
zero folio.

Fix it by properly ignoring __GFP_ZEROTAGS if there is no architecture
support, so we compile out the handling in the page allocator code
completely and just zero these pages ordinarily.

Make the default tag_clear_highpage() BUILD_BUG() and guard it by
a new Kconfig option.

Thanks to Jan Polensky for debugging the issue and sending an initial fix.

Reported-by: Jan Polensky <japo@linux.ibm.com>
Closes: https://lore.kernel.org/r/20251109003613.1461433-1-japo@linux.ibm.com
Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/r/6c09aaea.aa4a.19a91e379ab.Coremail.00107082@163.com
Debugged-by: Jan Polensky <japo@linux.ibm.com>
Fixes: 1579227fe0f0 ("mm/huge_memory: initialise the tags of the huge zero folio")
Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org>
---
  arch/Kconfig                   | 4 ++++
  arch/arm64/Kconfig             | 1 +
  arch/arm64/include/asm/page.h  | 1 -
  include/linux/gfp_types.h      | 6 ++++++
  include/linux/highmem.h        | 3 ++-
  include/trace/events/mmflags.h | 9 ++++++++-
  6 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 61130b88964b9..37a3d0b72fab1 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -348,6 +348,10 @@ config ARCH_HAS_SET_MEMORY
  config ARCH_HAS_SET_DIRECT_MAP
  	bool
  
+# Select if memory tags (e.g., GFP_ZEROTAGS) are supported
+config ARCH_HAS_MEMORY_TAGS
+	bool
+
  #
  # Select if the architecture provides the arch_dma_set_uncached symbol to
  # either provide an uncached segment alias for a DMA allocation, or
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 6663ffd23f252..dea73ff9291d6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -35,6 +35,7 @@ config ARM64
  	select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON
  	select ARCH_HAS_KEEPINITRD
  	select ARCH_HAS_MEMBARRIER_SYNC_CORE
+	select ARCH_HAS_MEMORY_TAGS
  	select ARCH_HAS_MEM_ENCRYPT
  	select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
  	select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 2312e6ee595fd..52ecb524af344 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -34,7 +34,6 @@ struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
  #define vma_alloc_zeroed_movable_folio vma_alloc_zeroed_movable_folio
  
  void tag_clear_highpage(struct page *to);
-#define __HAVE_ARCH_TAG_CLEAR_HIGHPAGE
  
  #define clear_user_page(page, vaddr, pg)	clear_page(page)
  #define copy_user_page(to, from, vaddr, pg)	copy_page(to, from)
diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
index 65db9349f9053..9f75c4b5ab2d7 100644
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -47,7 +47,9 @@ enum {
  	___GFP_HARDWALL_BIT,
  	___GFP_THISNODE_BIT,
  	___GFP_ACCOUNT_BIT,
+#ifdef CONFIG_ARCH_HAS_MEMORY_TAGS
  	___GFP_ZEROTAGS_BIT,
+#endif
  #ifdef CONFIG_KASAN_HW_TAGS
  	___GFP_SKIP_ZERO_BIT,
  	___GFP_SKIP_KASAN_BIT,
@@ -85,7 +87,11 @@ enum {
  #define ___GFP_HARDWALL		BIT(___GFP_HARDWALL_BIT)
  #define ___GFP_THISNODE		BIT(___GFP_THISNODE_BIT)
  #define ___GFP_ACCOUNT		BIT(___GFP_ACCOUNT_BIT)
+#ifdef CONFIG_ARCH_HAS_MEMORY_TAGS
  #define ___GFP_ZEROTAGS		BIT(___GFP_ZEROTAGS_BIT)
+#else
+#define ___GFP_ZEROTAGS		0
+#endif
  #ifdef CONFIG_KASAN_HW_TAGS
  #define ___GFP_SKIP_ZERO	BIT(___GFP_SKIP_ZERO_BIT)
  #define ___GFP_SKIP_KASAN	BIT(___GFP_SKIP_KASAN_BIT)
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 105cc4c00cc34..f3eaa605d68c1 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -249,10 +249,11 @@ static inline void clear_highpage_kasan_tagged(struct page *page)
  	kunmap_local(kaddr);
  }
  
-#ifndef __HAVE_ARCH_TAG_CLEAR_HIGHPAGE
+#ifndef CONFIG_ARCH_HAS_MEMORY_TAGS
  
  static inline void tag_clear_highpage(struct page *page)
  {
+	BUILD_BUG();
  }
  
  #endif
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index aa441f593e9a6..c8436c2f428d7 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -36,8 +36,14 @@
  	TRACE_GFP_EM(NOMEMALLOC)		\
  	TRACE_GFP_EM(HARDWALL)			\
  	TRACE_GFP_EM(THISNODE)			\
-	TRACE_GFP_EM(ACCOUNT)			\
+	TRACE_GFP_EM(ACCOUNT)
+
+#ifdef CONFIG_ARCH_HAS_MEMORY_TAGS
+# define TRACE_GFP_FLAGS_TAGS			\
  	TRACE_GFP_EM(ZEROTAGS)
+#else
+# define TRACE_GFP_FLAGS_TAGS
+#endif
  
  #ifdef CONFIG_KASAN_HW_TAGS
  # define TRACE_GFP_FLAGS_KASAN			\
@@ -63,6 +69,7 @@
  
  #define TRACE_GFP_FLAGS				\
  	TRACE_GFP_FLAGS_GENERAL			\
+	TRACE_GFP_FLAGS_TAGS			\
  	TRACE_GFP_FLAGS_KASAN			\
  	TRACE_GFP_FLAGS_LOCKDEP			\
  	TRACE_GFP_FLAGS_SLAB
-- 
2.51.0




-- 
Cheers

David

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 14:08           ` David Hildenbrand (Red Hat)
@ 2025-11-17 15:28             ` David Wang
  2025-11-17 16:59             ` Xi Ruoyao
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 39+ messages in thread
From: David Wang @ 2025-11-17 15:28 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Linus Torvalds, catalin.marinas, lance.yang, b-padhi, akpm,
	linux-kernel, Jan Polensky




At 2025-11-17 22:08:22, "David Hildenbrand (Red Hat)" <david@kernel.org> wrote:
>On 17.11.25 14:45, David Wang wrote:

>> 
>> Good to know~
>> 
>> My system is AMD, I would be glad to test the patch when it is ready.
>
>To not lose too much time, I just pushed the following patch to
>
>https://github.com/davidhildenbrand/linux.git zerotags
>
>It would be great if you could give that a spin, I'm still
>cross-compiling it on a bunch of targets.
>
>
> From 58e62699f77738188730d489accd01ad8e3cdeeb Mon Sep 17 00:00:00 2001
>From: "David Hildenbrand (Red Hat)" <david@kernel.org>
>Date: Mon, 17 Nov 2025 14:49:35 +0100
>Subject: [PATCH] mm/huge_memory: fix __GFP_ZEROTAGS on architectures without
>  memory tags
>
>Unfortunately, __GFP_ZEROTAGS is not properly ignored on architectures
>without memory tags (i.e., on all architectures except arm64), and
>ends up calling an empty stub tag_clear_highpage().
>
>Common code in post_alloc_hook() assumes that when we call
>tag_clear_highpage(), that both the tags and the memory were clear --
>to then skip actual clearing of the memory.
>
>So ever since we started allocating the huge zero folio with __GFP_ZEROTAGS
>that implies that we will not be clearing out the content of the huge
>zero folio.
>
>Fix it by properly ignoring __GFP_ZEROTAGS if there is no architecture
>support, so we compile out the handling in the page allocator code
>completely and just zero these pages ordinarily.
>
>Make the default tag_clear_highpage() BUILD_BUG() and guard it by
>a new Kconfig option.
>
>Thanks to Jan Polensky for debugging the issue and sending an initial fix.
>
>Reported-by: Jan Polensky <japo@linux.ibm.com>
>Closes: https://lore.kernel.org/r/20251109003613.1461433-1-japo@linux.ibm.com
>Reported-by: David Wang <00107082@163.com>
>Closes: https://lore.kernel.org/r/6c09aaea.aa4a.19a91e379ab.Coremail.00107082@163.com
>Debugged-by: Jan Polensky <japo@linux.ibm.com>
>Fixes: 1579227fe0f0 ("mm/huge_memory: initialise the tags of the huge zero folio")
>Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org>
>---
>  arch/Kconfig                   | 4 ++++
>  arch/arm64/Kconfig             | 1 +
>  arch/arm64/include/asm/page.h  | 1 -
>  include/linux/gfp_types.h      | 6 ++++++
>  include/linux/highmem.h        | 3 ++-
>  include/trace/events/mmflags.h | 9 ++++++++-
>  6 files changed, 21 insertions(+), 3 deletions(-)
>
>diff --git a/arch/Kconfig b/arch/Kconfig
>index 61130b88964b9..37a3d0b72fab1 100644
>--- a/arch/Kconfig
>+++ b/arch/Kconfig
>@@ -348,6 +348,10 @@ config ARCH_HAS_SET_MEMORY
>  config ARCH_HAS_SET_DIRECT_MAP
>  	bool
>  
>+# Select if memory tags (e.g., GFP_ZEROTAGS) are supported
>+config ARCH_HAS_MEMORY_TAGS
>+	bool
>+
>  #
>  # Select if the architecture provides the arch_dma_set_uncached symbol to
>  # either provide an uncached segment alias for a DMA allocation, or
>diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>index 6663ffd23f252..dea73ff9291d6 100644
>--- a/arch/arm64/Kconfig
>+++ b/arch/arm64/Kconfig
>@@ -35,6 +35,7 @@ config ARM64
>  	select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON
>  	select ARCH_HAS_KEEPINITRD
>  	select ARCH_HAS_MEMBARRIER_SYNC_CORE
>+	select ARCH_HAS_MEMORY_TAGS
>  	select ARCH_HAS_MEM_ENCRYPT
>  	select ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS
>  	select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS
>diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
>index 2312e6ee595fd..52ecb524af344 100644
>--- a/arch/arm64/include/asm/page.h
>+++ b/arch/arm64/include/asm/page.h
>@@ -34,7 +34,6 @@ struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
>  #define vma_alloc_zeroed_movable_folio vma_alloc_zeroed_movable_folio
>  
>  void tag_clear_highpage(struct page *to);
>-#define __HAVE_ARCH_TAG_CLEAR_HIGHPAGE
>  
>  #define clear_user_page(page, vaddr, pg)	clear_page(page)
>  #define copy_user_page(to, from, vaddr, pg)	copy_page(to, from)
>diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
>index 65db9349f9053..9f75c4b5ab2d7 100644
>--- a/include/linux/gfp_types.h
>+++ b/include/linux/gfp_types.h
>@@ -47,7 +47,9 @@ enum {
>  	___GFP_HARDWALL_BIT,
>  	___GFP_THISNODE_BIT,
>  	___GFP_ACCOUNT_BIT,
>+#ifdef CONFIG_ARCH_HAS_MEMORY_TAGS
>  	___GFP_ZEROTAGS_BIT,
>+#endif
>  #ifdef CONFIG_KASAN_HW_TAGS
>  	___GFP_SKIP_ZERO_BIT,
>  	___GFP_SKIP_KASAN_BIT,
>@@ -85,7 +87,11 @@ enum {
>  #define ___GFP_HARDWALL		BIT(___GFP_HARDWALL_BIT)
>  #define ___GFP_THISNODE		BIT(___GFP_THISNODE_BIT)
>  #define ___GFP_ACCOUNT		BIT(___GFP_ACCOUNT_BIT)
>+#ifdef CONFIG_ARCH_HAS_MEMORY_TAGS
>  #define ___GFP_ZEROTAGS		BIT(___GFP_ZEROTAGS_BIT)
>+#else
>+#define ___GFP_ZEROTAGS		0
>+#endif
>  #ifdef CONFIG_KASAN_HW_TAGS
>  #define ___GFP_SKIP_ZERO	BIT(___GFP_SKIP_ZERO_BIT)
>  #define ___GFP_SKIP_KASAN	BIT(___GFP_SKIP_KASAN_BIT)
>diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>index 105cc4c00cc34..f3eaa605d68c1 100644
>--- a/include/linux/highmem.h
>+++ b/include/linux/highmem.h
>@@ -249,10 +249,11 @@ static inline void clear_highpage_kasan_tagged(struct page *page)
>  	kunmap_local(kaddr);
>  }
>  
>-#ifndef __HAVE_ARCH_TAG_CLEAR_HIGHPAGE
>+#ifndef CONFIG_ARCH_HAS_MEMORY_TAGS
>  
>  static inline void tag_clear_highpage(struct page *page)
>  {
>+	BUILD_BUG();
>  }
>  
>  #endif
>diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
>index aa441f593e9a6..c8436c2f428d7 100644
>--- a/include/trace/events/mmflags.h
>+++ b/include/trace/events/mmflags.h
>@@ -36,8 +36,14 @@
>  	TRACE_GFP_EM(NOMEMALLOC)		\
>  	TRACE_GFP_EM(HARDWALL)			\
>  	TRACE_GFP_EM(THISNODE)			\
>-	TRACE_GFP_EM(ACCOUNT)			\
>+	TRACE_GFP_EM(ACCOUNT)
>+
>+#ifdef CONFIG_ARCH_HAS_MEMORY_TAGS
>+# define TRACE_GFP_FLAGS_TAGS			\
>  	TRACE_GFP_EM(ZEROTAGS)
>+#else
>+# define TRACE_GFP_FLAGS_TAGS
>+#endif
>  
>  #ifdef CONFIG_KASAN_HW_TAGS
>  # define TRACE_GFP_FLAGS_KASAN			\
>@@ -63,6 +69,7 @@
>  
>  #define TRACE_GFP_FLAGS				\
>  	TRACE_GFP_FLAGS_GENERAL			\
>+	TRACE_GFP_FLAGS_TAGS			\
>  	TRACE_GFP_FLAGS_KASAN			\
>  	TRACE_GFP_FLAGS_LOCKDEP			\
>  	TRACE_GFP_FLAGS_SLAB
>-- 
>2.51.0
>
>
>
I managed to merge manually the patch based on 6.18-rc6. So far so good: no go program crash observed, yet. 
I will keep monitoring my system, and update later if anything went wrong.


David Wang 




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 12:56     ` David Wang
  2025-11-17 13:30       ` David Hildenbrand (Red Hat)
@ 2025-11-17 16:42       ` Linus Torvalds
  1 sibling, 0 replies; 39+ messages in thread
From: Linus Torvalds @ 2025-11-17 16:42 UTC (permalink / raw)
  To: David Wang
  Cc: catalin.marinas, david, lance.yang, b-padhi, akpm, linux-kernel

On Mon, 17 Nov 2025 at 04:56, David Wang <00107082@163.com> wrote:
>
> Bisect narrowed it down to commit adfb6609c680 ("mm/huge_memory:
> initialise the tags of the huge zero folio")

Ok, we clearly have a fix for this, but I just wanted to thank you for
the quick report and bisection (and verification of said fix)

                 Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 14:08           ` David Hildenbrand (Red Hat)
  2025-11-17 15:28             ` David Wang
@ 2025-11-17 16:59             ` Xi Ruoyao
  2025-11-17 21:19               ` Joan Bruguera Micó
  2025-11-17 17:28             ` Linus Torvalds
  2025-11-18  3:59             ` Carlos Llamas
  3 siblings, 1 reply; 39+ messages in thread
From: Xi Ruoyao @ 2025-11-17 16:59 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat), David Wang
  Cc: Linus Torvalds, catalin.marinas, lance.yang, b-padhi, akpm,
	linux-kernel, Jan Polensky

On Mon, 2025-11-17 at 15:08 +0100, David Hildenbrand (Red Hat) wrote:

> (It dose not always happens, sometimes I don't get this crash and It
> dose not always err on same line.)

On my system what happened was very random.  Some symptoms:

- ibus-libpinyin sometimes crashed after inputting one or two keys
- epiphany (GNOME browser) sometimes crashed on startup
- GCC sometimes hung building itself (on the insn-match-*.cc files)
- "make" sometimes segfaulted
- GDB sometimes segfaulted (detected when trying to analysis the core
dump from above)

Some "interesting" aspect of them:

- once a symptom showed up, it did kept reproducing until reboot
- when GDB happened to work, it showed that ibus-libpinyin, epiphany,
and make all crashed in their (different) hashtable implementations

I tried to bisect but failed because it was too random.  After applying
the patch I rebooted 10 times and there's no longer such a symptom (at
least yet).  This I'm 90% sure what hit is the same issue.

Tested-by: Xi Ruoyao <xry111@xry111.site>

-- 
Xi Ruoyao <xry111@xry111.site>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 14:08           ` David Hildenbrand (Red Hat)
  2025-11-17 15:28             ` David Wang
  2025-11-17 16:59             ` Xi Ruoyao
@ 2025-11-17 17:28             ` Linus Torvalds
  2025-11-17 17:53               ` David Hildenbrand (Red Hat)
  2025-11-18  3:59             ` Carlos Llamas
  3 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2025-11-17 17:28 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: David Wang, catalin.marinas, lance.yang, b-padhi, akpm,
	linux-kernel, Jan Polensky

On Mon, 17 Nov 2025 at 06:08, David Hildenbrand (Red Hat)
<david@kernel.org> wrote:
>
> To not lose too much time, I just pushed the following patch to
>
> https://github.com/davidhildenbrand/linux.git zerotags

Hmm. Why isn't the fix for this simply this (intentionally
whitespace-damaged - don't apply mindlessly) one-liner:

  --- a/include/linux/highmem.h
  +++ b/include/linux/highmem.h
  @@ -253,5 +253,6 @@ static inline void
clear_highpage_kasan_tagged(struct page *page)

   static inline void tag_clear_highpage(struct page *page)
   {
  +     clear_highpage(page);
   }

because even when the *real* tag_clear_highpage() triggers, it falls down to

        if (!system_supports_mte()) {
                clear_highpage(page);
                return;
        }

so basically I think the fundamental bug here is that our fallback
tag_clear_highpage() was just buggy and didn't do what it was supposed
to do.

That one-liner would seem to be a lot simpler and more robust than
making this configuration-dependent. Just make the fallback do the
right thing - blammo, problem solved.

Am I missing something?

            Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 17:28             ` Linus Torvalds
@ 2025-11-17 17:53               ` David Hildenbrand (Red Hat)
  2025-11-17 17:59                 ` Linus Torvalds
  0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-17 17:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Wang, catalin.marinas, lance.yang, b-padhi, akpm,
	linux-kernel, Jan Polensky

On 17.11.25 18:28, Linus Torvalds wrote:
> On Mon, 17 Nov 2025 at 06:08, David Hildenbrand (Red Hat)
> <david@kernel.org> wrote:
>>
>> To not lose too much time, I just pushed the following patch to
>>
>> https://github.com/davidhildenbrand/linux.git zerotags
> 
> Hmm. Why isn't the fix for this simply this (intentionally
> whitespace-damaged - don't apply mindlessly) one-liner:
> 
>    --- a/include/linux/highmem.h
>    +++ b/include/linux/highmem.h
>    @@ -253,5 +253,6 @@ static inline void
> clear_highpage_kasan_tagged(struct page *page)
> 
>     static inline void tag_clear_highpage(struct page *page)
>     {
>    +     clear_highpage(page);
>     }
> 
> because even when the *real* tag_clear_highpage() triggers, it falls down to
> 
>          if (!system_supports_mte()) {
>                  clear_highpage(page);
>                  return;
>          }
> 
> so basically I think the fundamental bug here is that our fallback
> tag_clear_highpage() was just buggy and didn't do what it was supposed
> to do.
> 
> That one-liner would seem to be a lot simpler and more robust than
> making this configuration-dependent. Just make the fallback do the
> right thing - blammo, problem solved.
> 
> Am I missing something?

I had the same in mind for a second, but then I looked at 
kernel_init_pages() with the kasan_disable_current() handling and 
concluded that it's clearer to just disallow tag_clear_highpage() being 
abused in the first place and reduce the effective code footprint of 
post_alloc_hook().

-- 
Cheers

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 17:53               ` David Hildenbrand (Red Hat)
@ 2025-11-17 17:59                 ` Linus Torvalds
  2025-11-17 18:24                   ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2025-11-17 17:59 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: David Wang, catalin.marinas, lance.yang, b-padhi, akpm,
	linux-kernel, Jan Polensky

On Mon, 17 Nov 2025 at 09:53, David Hildenbrand (Red Hat)
<david@kernel.org> wrote:
>
> I had the same in mind for a second, but then I looked at
> kernel_init_pages() with the kasan_disable_current() handling and
> concluded that it's clearer to just disallow tag_clear_highpage() being
> abused in the first place and reduce the effective code footprint of
> post_alloc_hook().

See, I had the exact opposite reaction: I think the one-liner is
better not just because it's simpler, but exactly *because* of the
mess that is kernel_init_pages().

IOW, that one-liner is either correct *without* all that crud - and
it's unnecessary for the __GFP_ZEROTAGS case because that only happens
at init time - or it shows a bug in the arm64 code.

Either way it's a win. Either it's simpler, or it gives us better coverage.

No?

                   Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-16 22:42 Linux 6.18-rc6 Linus Torvalds
  2025-11-17  8:20 ` David Wang
@ 2025-11-17 18:13 ` Guenter Roeck
  2025-11-18 17:23 ` Stephanie Gawroriski
  2 siblings, 0 replies; 39+ messages in thread
From: Guenter Roeck @ 2025-11-17 18:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Sun, Nov 16, 2025 at 02:42:18PM -0800, Linus Torvalds wrote:
> So we have a slightly larger rc6 than usual, but I think it's just the
> random noise and a result of pull request timings rather than due to
> any issues with the release. But I guess we have a couple of weeks
> remaining to find out....
> 

Build results:
	total: 163 pass: 162 fail: 1
Failed builds:
	i386:allyesconfig
Qemu test results:
	total: 612 pass: 607 fail: 5
Failed tests:
	arm:fuji-bmc:aspeed_g5_defconfig:net=nic:aspeed-bmc-facebook-fuji:initrd
	arm:fuji-bmc:aspeed_g5_defconfig:sd2:net=nic:aspeed-bmc-facebook-fuji:ext2
	arm:fuji-bmc:aspeed_g5_defconfig:usb1:net=nic:aspeed-bmc-facebook-fuji:ext2
	arm:fuji-bmc:aspeed_g5_defconfig:mem1G:mtd128:net=nic:aspeed-bmc-facebook-fuji:ext2
	arm:fuji-bmc:aspeed_g5_defconfig:mem1G:mtd128,0,8,1:net=nic:aspeed-bmc-facebook-fuji:f2fs
Unit test results:
	pass: 665747 fail: 0

The problems are all known, and fixes have been submitted.
From my fixes branch:

1737339334d9 (fixes-v6.18) of: Skip devicetree kunit tests when RISCV+ACPI doesn't populate root node
f676380882c0 power: supply: intel_dc_ti_battery: fix 64bit divisions
286b0aaa6ff5 ARM: dts: aspeed: fuji-data64: Enable mac3 controller

SHAs are of course local. The unit tests affected by the devicetree patch
are disabled in my testbed and not reported.

Guenter

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 17:59                 ` Linus Torvalds
@ 2025-11-17 18:24                   ` David Hildenbrand (Red Hat)
  2025-11-17 19:17                     ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-17 18:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Wang, catalin.marinas, lance.yang, b-padhi, akpm,
	linux-kernel, Jan Polensky

On 17.11.25 18:59, Linus Torvalds wrote:
> On Mon, 17 Nov 2025 at 09:53, David Hildenbrand (Red Hat)
> <david@kernel.org> wrote:
>>
>> I had the same in mind for a second, but then I looked at
>> kernel_init_pages() with the kasan_disable_current() handling and
>> concluded that it's clearer to just disallow tag_clear_highpage() being
>> abused in the first place and reduce the effective code footprint of
>> post_alloc_hook().
> 
> See, I had the exact opposite reaction: I think the one-liner is
> better not just because it's simpler, but exactly *because* of the
> mess that is kernel_init_pages().

Heh, I intuitively avoid runtime checks on the fast paths where avoidable :)

> 
> IOW, that one-liner is either correct *without* all that crud - and
> it's unnecessary for the __GFP_ZEROTAGS case because that only happens
> at init time - or it shows a bug in the arm64 code.

What sticks out is that we perform the tag_clear_highpage() before we do 
all the KASAN poison magic. And we perform the kernel_init_pages() after 
the kasan magic.

The clear_highpage() fallback in tag_clear_highpage() was just recently 
added by Catalin (in the same commit we are fixing here IIRC).

I am no expert on KASAN, but I would suspect that it is important for us 
to clear the pages after doing the 
kasan_unpoison_pages/page_kasan_tag_reset.

I'll have to dig into the history of tag_clear_highpage() a bit to 
understand how this would interact with non-hw-tag-based KASAN. IIRC, 
amd64 also supports SW-tag KASAN.


> 
> Either way it's a win. Either it's simpler, or it gives us better coverage.
> 
> No?

Staring a bit more at the arm64 fallback in tag_clear_highpage() I agree 
that it might help to find arm64 issues we might run into in the fallback.

But the interaction of KASAN config options are not particularly easy to 
understand -- and if this would actually break non-arm configs. I'll 
have to dig ...

-- 
Cheers

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 18:24                   ` David Hildenbrand (Red Hat)
@ 2025-11-17 19:17                     ` David Hildenbrand (Red Hat)
  2025-11-18  1:10                       ` Linus Torvalds
  0 siblings, 1 reply; 39+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-17 19:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Wang, catalin.marinas, lance.yang, b-padhi, akpm,
	linux-kernel, Jan Polensky

On 17.11.25 19:24, David Hildenbrand (Red Hat) wrote:
> On 17.11.25 18:59, Linus Torvalds wrote:
>> On Mon, 17 Nov 2025 at 09:53, David Hildenbrand (Red Hat)
>> <david@kernel.org> wrote:
>>>
>>> I had the same in mind for a second, but then I looked at
>>> kernel_init_pages() with the kasan_disable_current() handling and
>>> concluded that it's clearer to just disallow tag_clear_highpage() being
>>> abused in the first place and reduce the effective code footprint of
>>> post_alloc_hook().
>>
>> See, I had the exact opposite reaction: I think the one-liner is
>> better not just because it's simpler, but exactly *because* of the
>> mess that is kernel_init_pages().
> 
> Heh, I intuitively avoid runtime checks on the fast paths where avoidable :)
> 
>>
>> IOW, that one-liner is either correct *without* all that crud - and
>> it's unnecessary for the __GFP_ZEROTAGS case because that only happens
>> at init time - or it shows a bug in the arm64 code.
> 
> What sticks out is that we perform the tag_clear_highpage() before we do
> all the KASAN poison magic. And we perform the kernel_init_pages() after
> the kasan magic.
> 
> The clear_highpage() fallback in tag_clear_highpage() was just recently
> added by Catalin (in the same commit we are fixing here IIRC).
> 
> I am no expert on KASAN, but I would suspect that it is important for us
> to clear the pages after doing the
> kasan_unpoison_pages/page_kasan_tag_reset.
> 
> I'll have to dig into the history of tag_clear_highpage() a bit to
> understand how this would interact with non-hw-tag-based KASAN. IIRC,
> amd64 also supports SW-tag KASAN.

So, I briefly tried on x86 with KASAN and the one-liner. I was assuming 
that KASAN would complain because we are clearing the page before doing 
the kasan_unpoison_pages() (IOW, writing to a KASAN-poisoned page).

It didn't trigger, and I assume it is because clear_highpage() on x86 
will not be instrumented by KASAN (my theory).

The comment in kernel_init_pages() indicates that s390x uses memset() 
for that purpose and I would assume that that one would be instrumented.

So I'll give it a try on s390x with KASAN. I should be able to get my 
hands on a system later today.

-- 
Cheers

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 16:59             ` Xi Ruoyao
@ 2025-11-17 21:19               ` Joan Bruguera Micó
  0 siblings, 0 replies; 39+ messages in thread
From: Joan Bruguera Micó @ 2025-11-17 21:19 UTC (permalink / raw)
  To: xry111
  Cc: 00107082, akpm, b-padhi, catalin.marinas, david, japo, lance.yang,
	linux-kernel, torvalds

>> (It dose not always happens, sometimes I don't get this crash and It
>> dose not always err on same line.)
>
> On my system what happened was very random.  Some symptoms:

FWIW, I found this consistently reproduced the issue on my system:

#!/bin/sh
seq 1000000 > patterns
echo x > haystack
grep --file patterns haystack

Regards,
- Joan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 19:17                     ` David Hildenbrand (Red Hat)
@ 2025-11-18  1:10                       ` Linus Torvalds
  2025-11-18  4:13                         ` David Wang
  2025-11-18  7:28                         ` David Hildenbrand (Red Hat)
  0 siblings, 2 replies; 39+ messages in thread
From: Linus Torvalds @ 2025-11-18  1:10 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: David Wang, catalin.marinas, lance.yang, b-padhi, akpm,
	linux-kernel, Jan Polensky

[-- Attachment #1: Type: text/plain, Size: 3447 bytes --]

On Mon, 17 Nov 2025 at 11:17, David Hildenbrand (Red Hat)
<david@kernel.org> wrote:
>
> So, I briefly tried on x86 with KASAN and the one-liner. I was assuming
> that KASAN would complain because we are clearing the page before doing
> the kasan_unpoison_pages() (IOW, writing to a KASAN-poisoned page).
>
> It didn't trigger, and I assume it is because clear_highpage() on x86
> will not be instrumented by KASAN (my theory).
>
> The comment in kernel_init_pages() indicates that s390x uses memset()
> for that purpose and I would assume that that one would be instrumented.

So I have thought about this some more, and I am not entirely happy
about any of this, but I think the way forward is to

 (a) make tag_clear_highpage() just do multiple pages in one go (and
rename it as tag_clear_highpage*s*() in the process)

 (b) make it have an actually return value to indicate whether it
initialized things

which means that the post_alloc_hook() code just becomes

        if (zero_tags)
                init = tag_clear_highpages(page, 1 << order);

and then the generic fallback becomes just

  static inline bool tag_clear_highpages(struct page *page, int numpages)
  {
         return false;
  }

which makes this all a complete no-op for architectures that don't do
this memory tagging.

And the one architecture that *does* do it - arm64 - actually
simplifies too, because now instead of being called in a loop - and
having that

        if (!system_supports_mte()) {
                 clear_highpage(page);
                 return;
        }

in every iteration of the loop, it now just gets called *once*, and it
instead just does

        if (!system_supports_mte())
                return false;

and then it does the *clearing* in a loop instead.

End result: that all looks much saner to me, and should avoid all the
issues with KASAN (well, arm64 currently clearly depends on
mte_zero_clear_page_tags() being assembly code that doesn't trigger
KASAN anyway).

But maybe it looks saner to me just because I've written that code now.

Anyway, here's my suggested patch. I still prefer this over having
more config variables and #ifdef's. I'd much rather have code that
just does the right thing and becomes null and void when it's
effecitlvely disabled by not having hardware support.

Comments?

This is all entirely untested, but I did build it on both x86-64 and
arm64. So it must be perfect. Right?

Side note: I really *detest* that stupid "__HAVE_ARCH_XYZ" pattern. I
hate it. Why do people insist on that stupid pattern? We *have* a name
already: the name of the thing that the architecture implements. Don't
make up a new one with all caps and a __HAVE_ARCH_ prefix. If an
architecture implements the feature "xyz", it should just do "define
xyz xyz" and be done with it, and then code can test whether it is
implemented by just doing "#ifdef xyz".

But I did *not* change that stupid existing pattern. I left it alone,
and just added the 'S' since now it's multiple pages.  But I really do
want to bring this up again, because it's so silly to make up new
names to say "I defined that other name". Just *use* the name.

If you implement "xyz" as a macro, you're done. And if it's
implemented as an inline function, just add the "#define xyz xyz" to
show that you did it.

Don't make up new names that only makes it harder to grep for things,
and makes things pointlessly have two different names.

Please.

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 3536 bytes --]

 arch/arm64/include/asm/page.h |  4 ++--
 arch/arm64/mm/fault.c         | 21 +++++++++++----------
 include/linux/highmem.h       |  6 ++++--
 mm/page_alloc.c               |  9 ++-------
 4 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/include/asm/page.h b/arch/arm64/include/asm/page.h
index 2312e6ee595f..258cca4b4873 100644
--- a/arch/arm64/include/asm/page.h
+++ b/arch/arm64/include/asm/page.h
@@ -33,8 +33,8 @@ struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
 						unsigned long vaddr);
 #define vma_alloc_zeroed_movable_folio vma_alloc_zeroed_movable_folio
 
-void tag_clear_highpage(struct page *to);
-#define __HAVE_ARCH_TAG_CLEAR_HIGHPAGE
+bool tag_clear_highpages(struct page *to, int numpages);
+#define __HAVE_ARCH_TAG_CLEAR_HIGHPAGES
 
 #define clear_user_page(page, vaddr, pg)	clear_page(page)
 #define copy_user_page(to, from, vaddr, pg)	copy_page(to, from)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 125dfa6c613b..a193b6a5d1e6 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -967,20 +967,21 @@ struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma,
 	return vma_alloc_folio(flags, 0, vma, vaddr);
 }
 
-void tag_clear_highpage(struct page *page)
+bool tag_clear_highpages(struct page *page, int numpages)
 {
 	/*
 	 * Check if MTE is supported and fall back to clear_highpage().
 	 * get_huge_zero_folio() unconditionally passes __GFP_ZEROTAGS and
-	 * post_alloc_hook() will invoke tag_clear_highpage().
+	 * post_alloc_hook() will invoke tag_clear_highpages().
 	 */
-	if (!system_supports_mte()) {
-		clear_highpage(page);
-		return;
-	}
+	if (!system_supports_mte())
+		return false;
 
-	/* Newly allocated page, shouldn't have been tagged yet */
-	WARN_ON_ONCE(!try_page_mte_tagging(page));
-	mte_zero_clear_page_tags(page_address(page));
-	set_page_mte_tagged(page);
+	/* Newly allocated pages, shouldn't have been tagged yet */
+	for (int i = 0; i < numpages; i++, page++) {
+		WARN_ON_ONCE(!try_page_mte_tagging(page));
+		mte_zero_clear_page_tags(page_address(page));
+		set_page_mte_tagged(page);
+	}
+	return true;
 }
diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index 105cc4c00cc3..abc20f9810fd 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -249,10 +249,12 @@ static inline void clear_highpage_kasan_tagged(struct page *page)
 	kunmap_local(kaddr);
 }
 
-#ifndef __HAVE_ARCH_TAG_CLEAR_HIGHPAGE
+#ifndef __HAVE_ARCH_TAG_CLEAR_HIGHPAGES
 
-static inline void tag_clear_highpage(struct page *page)
+/* Return false to let people know we did not initialize the pages */
+static inline bool tag_clear_highpages(struct page *page, int numpages)
 {
+	return false;
 }
 
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 600d9e981c23..4319cfa7f77d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1822,14 +1822,9 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
 	 * If memory tags should be zeroed
 	 * (which happens only when memory should be initialized as well).
 	 */
-	if (zero_tags) {
-		/* Initialize both memory and memory tags. */
-		for (i = 0; i != 1 << order; ++i)
-			tag_clear_highpage(page + i);
+	if (zero_tags)
+		init = tag_clear_highpages(page, 1 << order);
 
-		/* Take note that memory was initialized by the loop above. */
-		init = false;
-	}
 	if (!should_skip_kasan_unpoison(gfp_flags) &&
 	    kasan_unpoison_pages(page, order, init)) {
 		/* Take note that memory was initialized by KASAN. */

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-17 14:08           ` David Hildenbrand (Red Hat)
                               ` (2 preceding siblings ...)
  2025-11-17 17:28             ` Linus Torvalds
@ 2025-11-18  3:59             ` Carlos Llamas
  3 siblings, 0 replies; 39+ messages in thread
From: Carlos Llamas @ 2025-11-18  3:59 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: David Wang, Linus Torvalds, catalin.marinas, lance.yang, b-padhi,
	akpm, linux-kernel, Jan Polensky

On Mon, Nov 17, 2025 at 03:08:22PM +0100, David Hildenbrand (Red Hat) wrote:
> To not lose too much time, I just pushed the following patch to
> 
> https://github.com/davidhildenbrand/linux.git zerotags
> 
> It would be great if you could give that a spin, I'm still
> cross-compiling it on a bunch of targets.

I was chasing the following kselftest issue:

  Testing move-pmd on anon... ERROR: nr 6144 move failed 5332351020490051840 1
   (errno=17, @linux/tools/testing/selftests/mm/uffd-unit-tests.c:1198)

... and ended up bisecting to adfb6609c680 ("mm/huge_memory: initialise
the tags of the huge zero folio") too.

FWIW, your patch fixed the failures. I'm running an x86_64 image on
crosvm.

Cheers,
Carlos Llamas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-18  1:10                       ` Linus Torvalds
@ 2025-11-18  4:13                         ` David Wang
  2025-11-18 13:55                           ` David Wang
  2025-11-18  7:28                         ` David Hildenbrand (Red Hat)
  1 sibling, 1 reply; 39+ messages in thread
From: David Wang @ 2025-11-18  4:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Hildenbrand (Red Hat), catalin.marinas, lance.yang, b-padhi,
	akpm, linux-kernel, Jan Polensky



At 2025-11-18 09:10:50, "Linus Torvalds" <torvalds@linux-foundation.org> wrote:
>On Mon, 17 Nov 2025 at 11:17, David Hildenbrand (Red Hat)
><david@kernel.org> wrote:
>>
>> So, I briefly tried on x86 with KASAN and the one-liner. I was assuming
>> that KASAN would complain because we are clearing the page before doing
>> the kasan_unpoison_pages() (IOW, writing to a KASAN-poisoned page).
>>
>> It didn't trigger, and I assume it is because clear_highpage() on x86
>> will not be instrumented by KASAN (my theory).
>>
>> The comment in kernel_init_pages() indicates that s390x uses memset()
>> for that purpose and I would assume that that one would be instrumented.
>
>So I have thought about this some more, and I am not entirely happy
>about any of this, but I think the way forward is to
>
> (a) make tag_clear_highpage() just do multiple pages in one go (and
>rename it as tag_clear_highpage*s*() in the process)
>
> (b) make it have an actually return value to indicate whether it
>initialized things
>
>which means that the post_alloc_hook() code just becomes
>
>        if (zero_tags)
>                init = tag_clear_highpages(page, 1 << order);
>
>and then the generic fallback becomes just
>
>  static inline bool tag_clear_highpages(struct page *page, int numpages)
>  {
>         return false;
>  }
>
>which makes this all a complete no-op for architectures that don't do
>this memory tagging.
>
>And the one architecture that *does* do it - arm64 - actually
>simplifies too, because now instead of being called in a loop - and
>having that
>
>        if (!system_supports_mte()) {
>                 clear_highpage(page);
>                 return;
>        }
>
>in every iteration of the loop, it now just gets called *once*, and it
>instead just does
>
>        if (!system_supports_mte())
>                return false;
>
>and then it does the *clearing* in a loop instead.
>
>End result: that all looks much saner to me, and should avoid all the
>issues with KASAN (well, arm64 currently clearly depends on
>mte_zero_clear_page_tags() being assembly code that doesn't trigger
>KASAN anyway).
>
>But maybe it looks saner to me just because I've written that code now.
>
>Anyway, here's my suggested patch. I still prefer this over having
>more config variables and #ifdef's. I'd much rather have code that
>just does the right thing and becomes null and void when it's
>effecitlvely disabled by not having hardware support.
>
>Comments?
>
>This is all entirely untested, but I did build it on both x86-64 and

>arm64. So it must be perfect. Right?


I tried this patch, my prometheus service crash with:
        fatal error: acquireSudog: found s.elem != nil in cache
seems some memory is still not properly zeroed. (I guess)
But this time, my old go compiler works fine.


FYI
David W






>
>Side note: I really *detest* that stupid "__HAVE_ARCH_XYZ" pattern. I
>hate it. Why do people insist on that stupid pattern? We *have* a name
>already: the name of the thing that the architecture implements. Don't
>make up a new one with all caps and a __HAVE_ARCH_ prefix. If an
>architecture implements the feature "xyz", it should just do "define
>xyz xyz" and be done with it, and then code can test whether it is
>implemented by just doing "#ifdef xyz".
>
>But I did *not* change that stupid existing pattern. I left it alone,
>and just added the 'S' since now it's multiple pages.  But I really do
>want to bring this up again, because it's so silly to make up new
>names to say "I defined that other name". Just *use* the name.
>
>If you implement "xyz" as a macro, you're done. And if it's
>implemented as an inline function, just add the "#define xyz xyz" to
>show that you did it.
>
>Don't make up new names that only makes it harder to grep for things,
>and makes things pointlessly have two different names.
>
>Please.
>
>              Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-18  1:10                       ` Linus Torvalds
  2025-11-18  4:13                         ` David Wang
@ 2025-11-18  7:28                         ` David Hildenbrand (Red Hat)
  2025-11-18 16:49                           ` Linus Torvalds
  1 sibling, 1 reply; 39+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-18  7:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Wang, catalin.marinas, lance.yang, b-padhi, akpm,
	linux-kernel, Jan Polensky

On 18.11.25 02:10, Linus Torvalds wrote:
> On Mon, 17 Nov 2025 at 11:17, David Hildenbrand (Red Hat)
> <david@kernel.org> wrote:
>>
>> So, I briefly tried on x86 with KASAN and the one-liner. I was assuming
>> that KASAN would complain because we are clearing the page before doing
>> the kasan_unpoison_pages() (IOW, writing to a KASAN-poisoned page).
>>
>> It didn't trigger, and I assume it is because clear_highpage() on x86
>> will not be instrumented by KASAN (my theory).
>>
>> The comment in kernel_init_pages() indicates that s390x uses memset()
>> for that purpose and I would assume that that one would be instrumented.
> 
> So I have thought about this some more, and I am not entirely happy
> about any of this, but I think the way forward is to
> 
>   (a) make tag_clear_highpage() just do multiple pages in one go (and
> rename it as tag_clear_highpage*s*() in the process)

That sounds reasonable given that the only caller we have wants to iterate.

> 
>   (b) make it have an actually return value to indicate whether it
> initialized things

Works for me.

> 
> which means that the post_alloc_hook() code just becomes
> 
>          if (zero_tags)
>                  init = tag_clear_highpages(page, 1 << order);
> 
> and then the generic fallback becomes just
> 
>    static inline bool tag_clear_highpages(struct page *page, int numpages)
>    {
>           return false;
>    }
> 
> which makes this all a complete no-op for architectures that don't do
> this memory tagging.
> 
> And the one architecture that *does* do it - arm64 - actually
> simplifies too, because now instead of being called in a loop - and
> having that
> 
>          if (!system_supports_mte()) {
>                   clear_highpage(page);
>                   return;
>          }
> 
> in every iteration of the loop, it now just gets called *once*, and it
> instead just does
> 
>          if (!system_supports_mte())
>                  return false;
> 
> and then it does the *clearing* in a loop instead.

Ack.

> 
> End result: that all looks much saner to me, and should avoid all the
> issues with KASAN (well, arm64 currently clearly depends on
> mte_zero_clear_page_tags() being assembly code that doesn't trigger
> KASAN anyway).
> 
> But maybe it looks saner to me just because I've written that code now.

:)

It should optimize out on !arm64 and optimize arm64 as well (less 
function calls for higher-order pages), so that's clearly better.

> 
> Anyway, here's my suggested patch. I still prefer this over having
> more config variables and #ifdef's. I'd much rather have code that
> just does the right thing and becomes null and void when it's
> effecitlvely disabled by not having hardware support.
> 
> Comments?

Works for me and saves me from continuing my fight with KASAN on s390x I 
started yesterday evening to find out if the one-liner would be 
problematic with KASAN poisoning.

I very much prefer to let kernel_init_pages() handle ordinary (non-tag) 
initialization after KASAN did its unpoison magic.


Do you want to quickly send that patch with linux-mm on CC or do you 
just want to commit it? If you're busy I can quickly send it around.

In any case, feel free to add my

Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org>

> 
> This is all entirely untested, but I did build it on both x86-64 and
> arm64. So it must be perfect. Right?
> 
> Side note: I really *detest* that stupid "__HAVE_ARCH_XYZ" pattern. I
> hate it. Why do people insist on that stupid pattern? We *have* a name
> already: the name of the thing that the architecture implements. Don't
> make up a new one with all caps and a __HAVE_ARCH_ prefix. If an
> architecture implements the feature "xyz", it should just do "define
> xyz xyz" and be done with it, and then code can test whether it is
> implemented by just doing "#ifdef xyz".
> 
> But I did *not* change that stupid existing pattern. I left it alone,
> and just added the 'S' since now it's multiple pages.  But I really do
> want to bring this up again, because it's so silly to make up new
> names to say "I defined that other name". Just *use* the name.

I stumbled over that just recently myself, and it's just done extremely 
inconsistently even within MM.

Maybe this one is worth spelling out in the coding style, as I was 
recently also unsure what the best practice is in the end. Let me see if 
I can find time for that.

> 
> If you implement "xyz" as a macro, you're done. And if it's
> implemented as an inline function, just add the "#define xyz xyz" to
> show that you did it.

I general, I agree if it's about real "features" that consist of a 
single function. I think it's a different story once a feature actually 
consists of multiple functions that can be cleanly abstracted in a 
config option.


-- 
Cheers

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-18  4:13                         ` David Wang
@ 2025-11-18 13:55                           ` David Wang
  2025-11-18 14:12                             ` David Hildenbrand (Red Hat)
  0 siblings, 1 reply; 39+ messages in thread
From: David Wang @ 2025-11-18 13:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Hildenbrand (Red Hat), catalin.marinas, lance.yang, b-padhi,
	akpm, linux-kernel, Jan Polensky



At 2025-11-18 12:13:15, "David Wang" <00107082@163.com> wrote:
>
>
>At 2025-11-18 09:10:50, "Linus Torvalds" <torvalds@linux-foundation.org> wrote:
>>On Mon, 17 Nov 2025 at 11:17, David Hildenbrand (Red Hat)
>><david@kernel.org> wrote:
>>>
>>> So, I briefly tried on x86 with KASAN and the one-liner. I was assuming
>>> that KASAN would complain because we are clearing the page before doing
>>> the kasan_unpoison_pages() (IOW, writing to a KASAN-poisoned page).
>>>
>>> It didn't trigger, and I assume it is because clear_highpage() on x86
>>> will not be instrumented by KASAN (my theory).
>>>
>>> The comment in kernel_init_pages() indicates that s390x uses memset()
>>> for that purpose and I would assume that that one would be instrumented.
>>
>>So I have thought about this some more, and I am not entirely happy
>>about any of this, but I think the way forward is to
>>
>> (a) make tag_clear_highpage() just do multiple pages in one go (and
>>rename it as tag_clear_highpage*s*() in the process)
>>
>> (b) make it have an actually return value to indicate whether it
>>initialized things
>>
>>which means that the post_alloc_hook() code just becomes
>>
>>        if (zero_tags)
>>                init = tag_clear_highpages(page, 1 << order);
>>
>>and then the generic fallback becomes just
>>
>>  static inline bool tag_clear_highpages(struct page *page, int numpages)
>>  {
>>         return false;
>>  }
>>
>>which makes this all a complete no-op for architectures that don't do
>>this memory tagging.
>>
>>And the one architecture that *does* do it - arm64 - actually
>>simplifies too, because now instead of being called in a loop - and
>>having that
>>
>>        if (!system_supports_mte()) {
>>                 clear_highpage(page);
>>                 return;
>>        }
>>
>>in every iteration of the loop, it now just gets called *once*, and it
>>instead just does
>>
>>        if (!system_supports_mte())
>>                return false;
>>
>>and then it does the *clearing* in a loop instead.
>>
>>End result: that all looks much saner to me, and should avoid all the
>>issues with KASAN (well, arm64 currently clearly depends on
>>mte_zero_clear_page_tags() being assembly code that doesn't trigger
>>KASAN anyway).
>>
>>But maybe it looks saner to me just because I've written that code now.
>>
>>Anyway, here's my suggested patch. I still prefer this over having
>>more config variables and #ifdef's. I'd much rather have code that
>>just does the right thing and becomes null and void when it's
>>effecitlvely disabled by not having hardware support.
>>
>>Comments?
>>
>>This is all entirely untested, but I did build it on both x86-64 and
>
>>arm64. So it must be perfect. Right?
>
>
>I tried this patch, my prometheus service crash with:
>        fatal error: acquireSudog: found s.elem != nil in cache
>seems some memory is still not properly zeroed. (I guess)
>But this time, my old go compiler works fine.


Update: with this patch, my go programs still crash, It was just  that
the first time I test the patch, old go compiler happened to work. When I reboot, my
go program start to crash again.  The crash seems random, but on my system,
go program crashes with *very* high probability.

(And I applied the patch based on 6.18-rc6.)

  
>
>
>FYI
>David W
>
>
>
>
>
>
>>
>>Side note: I really *detest* that stupid "__HAVE_ARCH_XYZ" pattern. I
>>hate it. Why do people insist on that stupid pattern? We *have* a name
>>already: the name of the thing that the architecture implements. Don't
>>make up a new one with all caps and a __HAVE_ARCH_ prefix. If an
>>architecture implements the feature "xyz", it should just do "define
>>xyz xyz" and be done with it, and then code can test whether it is
>>implemented by just doing "#ifdef xyz".
>>
>>But I did *not* change that stupid existing pattern. I left it alone,
>>and just added the 'S' since now it's multiple pages.  But I really do
>>want to bring this up again, because it's so silly to make up new
>>names to say "I defined that other name". Just *use* the name.
>>
>>If you implement "xyz" as a macro, you're done. And if it's
>>implemented as an inline function, just add the "#define xyz xyz" to
>>show that you did it.
>>
>>Don't make up new names that only makes it harder to grep for things,
>>and makes things pointlessly have two different names.
>>
>>Please.
>>
>>              Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-18 13:55                           ` David Wang
@ 2025-11-18 14:12                             ` David Hildenbrand (Red Hat)
  2025-11-18 14:33                               ` David Wang
  2025-11-18 14:44                               ` Carlos Llamas
  0 siblings, 2 replies; 39+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-18 14:12 UTC (permalink / raw)
  To: David Wang, Linus Torvalds
  Cc: catalin.marinas, lance.yang, b-padhi, akpm, linux-kernel,
	Jan Polensky

On 18.11.25 14:55, David Wang wrote:
> 
> 
> At 2025-11-18 12:13:15, "David Wang" <00107082@163.com> wrote:
>>
>>
>> At 2025-11-18 09:10:50, "Linus Torvalds" <torvalds@linux-foundation.org> wrote:
>>> On Mon, 17 Nov 2025 at 11:17, David Hildenbrand (Red Hat)
>>> <david@kernel.org> wrote:
>>>>
>>>> So, I briefly tried on x86 with KASAN and the one-liner. I was assuming
>>>> that KASAN would complain because we are clearing the page before doing
>>>> the kasan_unpoison_pages() (IOW, writing to a KASAN-poisoned page).
>>>>
>>>> It didn't trigger, and I assume it is because clear_highpage() on x86
>>>> will not be instrumented by KASAN (my theory).
>>>>
>>>> The comment in kernel_init_pages() indicates that s390x uses memset()
>>>> for that purpose and I would assume that that one would be instrumented.
>>>
>>> So I have thought about this some more, and I am not entirely happy
>>> about any of this, but I think the way forward is to
>>>
>>> (a) make tag_clear_highpage() just do multiple pages in one go (and
>>> rename it as tag_clear_highpage*s*() in the process)
>>>
>>> (b) make it have an actually return value to indicate whether it
>>> initialized things
>>>
>>> which means that the post_alloc_hook() code just becomes
>>>
>>>         if (zero_tags)
>>>                 init = tag_clear_highpages(page, 1 << order);
>>>
>>> and then the generic fallback becomes just
>>>
>>>   static inline bool tag_clear_highpages(struct page *page, int numpages)
>>>   {
>>>          return false;
>>>   }
>>>
>>> which makes this all a complete no-op for architectures that don't do
>>> this memory tagging.
>>>
>>> And the one architecture that *does* do it - arm64 - actually
>>> simplifies too, because now instead of being called in a loop - and
>>> having that
>>>
>>>         if (!system_supports_mte()) {
>>>                  clear_highpage(page);
>>>                  return;
>>>         }
>>>
>>> in every iteration of the loop, it now just gets called *once*, and it
>>> instead just does
>>>
>>>         if (!system_supports_mte())
>>>                 return false;
>>>
>>> and then it does the *clearing* in a loop instead.
>>>
>>> End result: that all looks much saner to me, and should avoid all the
>>> issues with KASAN (well, arm64 currently clearly depends on
>>> mte_zero_clear_page_tags() being assembly code that doesn't trigger
>>> KASAN anyway).
>>>
>>> But maybe it looks saner to me just because I've written that code now.
>>>
>>> Anyway, here's my suggested patch. I still prefer this over having
>>> more config variables and #ifdef's. I'd much rather have code that
>>> just does the right thing and becomes null and void when it's
>>> effecitlvely disabled by not having hardware support.
>>>
>>> Comments?
>>>
>>> This is all entirely untested, but I did build it on both x86-64 and
>>
>>> arm64. So it must be perfect. Right?
>>
>>
>> I tried this patch, my prometheus service crash with:
>>         fatal error: acquireSudog: found s.elem != nil in cache
>> seems some memory is still not properly zeroed. (I guess)
>> But this time, my old go compiler works fine.
> 
> 
> Update: with this patch, my go programs still crash, It was just  that
> the first time I test the patch, old go compiler happened to work. When I reboot, my
> go program start to crash again.  The crash seems random, but on my system,
> go program crashes with *very* high probability.
> 
> (And I applied the patch based on 6.18-rc6.)

Can you try with

	init = !tag_clear_highpages(page, 1 << order);

instead of

	init = tag_clear_highpages(page, 1 << order);


So when the function returns "false" (we did not clear), we will have to 
initialize.

-- 
Cheers

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-18 14:12                             ` David Hildenbrand (Red Hat)
@ 2025-11-18 14:33                               ` David Wang
  2025-11-18 14:44                               ` Carlos Llamas
  1 sibling, 0 replies; 39+ messages in thread
From: David Wang @ 2025-11-18 14:33 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Linus Torvalds, catalin.marinas, lance.yang, b-padhi, akpm,
	linux-kernel, Jan Polensky


At 2025-11-18 22:12:08, "David Hildenbrand (Red Hat)" <david@kernel.org> wrote:
>On 18.11.25 14:55, David Wang wrote:
>> 
>> 
>> At 2025-11-18 12:13:15, "David Wang" <00107082@163.com> wrote:
>>>
>>>
>>> At 2025-11-18 09:10:50, "Linus Torvalds" <torvalds@linux-foundation.org> wrote:
>>>> On Mon, 17 Nov 2025 at 11:17, David Hildenbrand (Red Hat)
>>>> <david@kernel.org> wrote:
>>>>>
>>>>> So, I briefly tried on x86 with KASAN and the one-liner. I was assuming
>>>>> that KASAN would complain because we are clearing the page before doing
>>>>> the kasan_unpoison_pages() (IOW, writing to a KASAN-poisoned page).
>>>>>
>>>>> It didn't trigger, and I assume it is because clear_highpage() on x86
>>>>> will not be instrumented by KASAN (my theory).
>>>>>
>>>>> The comment in kernel_init_pages() indicates that s390x uses memset()
>>>>> for that purpose and I would assume that that one would be instrumented.
>>>>
>>>> So I have thought about this some more, and I am not entirely happy
>>>> about any of this, but I think the way forward is to
>>>>
>>>> (a) make tag_clear_highpage() just do multiple pages in one go (and
>>>> rename it as tag_clear_highpage*s*() in the process)
>>>>
>>>> (b) make it have an actually return value to indicate whether it
>>>> initialized things
>>>>
>>>> which means that the post_alloc_hook() code just becomes
>>>>
>>>>         if (zero_tags)
>>>>                 init = tag_clear_highpages(page, 1 << order);
>>>>
>>>> and then the generic fallback becomes just
>>>>
>>>>   static inline bool tag_clear_highpages(struct page *page, int numpages)
>>>>   {
>>>>          return false;
>>>>   }
>>>>
>>>> which makes this all a complete no-op for architectures that don't do
>>>> this memory tagging.
>>>>
>>>> And the one architecture that *does* do it - arm64 - actually
>>>> simplifies too, because now instead of being called in a loop - and
>>>> having that
>>>>
>>>>         if (!system_supports_mte()) {
>>>>                  clear_highpage(page);
>>>>                  return;
>>>>         }
>>>>
>>>> in every iteration of the loop, it now just gets called *once*, and it
>>>> instead just does
>>>>
>>>>         if (!system_supports_mte())
>>>>                 return false;
>>>>
>>>> and then it does the *clearing* in a loop instead.
>>>>
>>>> End result: that all looks much saner to me, and should avoid all the
>>>> issues with KASAN (well, arm64 currently clearly depends on
>>>> mte_zero_clear_page_tags() being assembly code that doesn't trigger
>>>> KASAN anyway).
>>>>
>>>> But maybe it looks saner to me just because I've written that code now.
>>>>
>>>> Anyway, here's my suggested patch. I still prefer this over having
>>>> more config variables and #ifdef's. I'd much rather have code that
>>>> just does the right thing and becomes null and void when it's
>>>> effecitlvely disabled by not having hardware support.
>>>>
>>>> Comments?
>>>>
>>>> This is all entirely untested, but I did build it on both x86-64 and
>>>
>>>> arm64. So it must be perfect. Right?
>>>
>>>
>>> I tried this patch, my prometheus service crash with:
>>>         fatal error: acquireSudog: found s.elem != nil in cache
>>> seems some memory is still not properly zeroed. (I guess)
>>> But this time, my old go compiler works fine.
>> 
>> 
>> Update: with this patch, my go programs still crash, It was just  that
>> the first time I test the patch, old go compiler happened to work. When I reboot, my
>> go program start to crash again.  The crash seems random, but on my system,
>> go program crashes with *very* high probability.
>> 
>> (And I applied the patch based on 6.18-rc6.)
>
>Can you try with
>
>	init = !tag_clear_highpages(page, 1 << order);
>
>instead of
>
>	init = tag_clear_highpages(page, 1 << order);
>
>
>So when the function returns "false" (we did not clear), we will have to 
>initialize.

Oh, yes, this make sense.
With this fix, my crashes are gone again. (I tested it with several round of reboot.)


Thanks
David Wang

>
>-- 
>Cheers
>
>David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-18 14:12                             ` David Hildenbrand (Red Hat)
  2025-11-18 14:33                               ` David Wang
@ 2025-11-18 14:44                               ` Carlos Llamas
  2025-11-18 14:51                                 ` David Hildenbrand (Red Hat)
  1 sibling, 1 reply; 39+ messages in thread
From: Carlos Llamas @ 2025-11-18 14:44 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: David Wang, Linus Torvalds, catalin.marinas, lance.yang, b-padhi,
	akpm, linux-kernel, Jan Polensky

On Tue, Nov 18, 2025 at 03:12:08PM +0100, David Hildenbrand (Red Hat) wrote:
> Can you try with
> 
> 	init = !tag_clear_highpages(page, 1 << order);

Ha, right! That was probably a typo. I just tried Linus' patch again
with this amend and that also fixes the kselftest issue.

--
Carlos Llamas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-18 14:44                               ` Carlos Llamas
@ 2025-11-18 14:51                                 ` David Hildenbrand (Red Hat)
  2025-11-18 14:53                                   ` Carlos Llamas
  2025-11-18 15:09                                   ` David Wang
  0 siblings, 2 replies; 39+ messages in thread
From: David Hildenbrand (Red Hat) @ 2025-11-18 14:51 UTC (permalink / raw)
  To: Carlos Llamas
  Cc: David Wang, Linus Torvalds, catalin.marinas, lance.yang, b-padhi,
	akpm, linux-kernel, Jan Polensky

On 18.11.25 15:44, Carlos Llamas wrote:
> On Tue, Nov 18, 2025 at 03:12:08PM +0100, David Hildenbrand (Red Hat) wrote:
>> Can you try with
>>
>> 	init = !tag_clear_highpages(page, 1 << order);
> 
> Ha, right! That was probably a typo. I just tried Linus' patch again
> with this amend and that also fixes the kselftest issue.

Thanks Carlos and David! If you want to provide an official Tested-by:, 
we'll be happy to apply it :)

-- 
Cheers

David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-18 14:51                                 ` David Hildenbrand (Red Hat)
@ 2025-11-18 14:53                                   ` Carlos Llamas
  2025-11-18 15:09                                   ` David Wang
  1 sibling, 0 replies; 39+ messages in thread
From: Carlos Llamas @ 2025-11-18 14:53 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: David Wang, Linus Torvalds, catalin.marinas, lance.yang, b-padhi,
	akpm, linux-kernel, Jan Polensky

On Tue, Nov 18, 2025 at 03:51:07PM +0100, David Hildenbrand (Red Hat) wrote:
> On 18.11.25 15:44, Carlos Llamas wrote:
> > On Tue, Nov 18, 2025 at 03:12:08PM +0100, David Hildenbrand (Red Hat) wrote:
> > > Can you try with
> > > 
> > > 	init = !tag_clear_highpages(page, 1 << order);
> > 
> > Ha, right! That was probably a typo. I just tried Linus' patch again
> > with this amend and that also fixes the kselftest issue.
> 
> Thanks Carlos and David! If you want to provide an official Tested-by:,
> we'll be happy to apply it :)

Thanks,

Tested-by: Carlos Llamas <cmllamas@google.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-18 14:51                                 ` David Hildenbrand (Red Hat)
  2025-11-18 14:53                                   ` Carlos Llamas
@ 2025-11-18 15:09                                   ` David Wang
  1 sibling, 0 replies; 39+ messages in thread
From: David Wang @ 2025-11-18 15:09 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: Carlos Llamas, Linus Torvalds, catalin.marinas, lance.yang,
	b-padhi, akpm, linux-kernel, Jan Polensky

At 2025-11-18 22:51:07, "David Hildenbrand (Red Hat)" <david@kernel.org> wrote:
>On 18.11.25 15:44, Carlos Llamas wrote:
>> On Tue, Nov 18, 2025 at 03:12:08PM +0100, David Hildenbrand (Red Hat) wrote:
>>> Can you try with
>>>
>>> 	init = !tag_clear_highpages(page, 1 << order);
>> 
>> Ha, right! That was probably a typo. I just tried Linus' patch again
>> with this amend and that also fixes the kselftest issue.
>
>Thanks Carlos and David! If you want to provide an official Tested-by:, 
>we'll be happy to apply it :)

Tested-by: David Wang <00107082@163.com>

Thanks~
>
>-- 
>Cheers
>
>David

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-18  7:28                         ` David Hildenbrand (Red Hat)
@ 2025-11-18 16:49                           ` Linus Torvalds
  2025-11-19 15:42                             ` Catalin Marinas
  0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2025-11-18 16:49 UTC (permalink / raw)
  To: David Hildenbrand (Red Hat)
  Cc: David Wang, catalin.marinas, lance.yang, b-padhi, akpm,
	linux-kernel, Jan Polensky

On Mon, 17 Nov 2025 at 23:28, David Hildenbrand (Red Hat)
<david@kernel.org> wrote:
>
> Do you want to quickly send that patch with linux-mm on CC or do you
> just want to commit it? If you're busy I can quickly send it around.

I applied it (with your fix for the silly inverted assignment to
'init') directly, since I wanted to get this fixed quickly as we're
fairly late in the release cycle.

But if there is some thread on linux-mm that should be notified that
you know of - this is the only one I'm personally aware of - please do
give people there a heads-up.

Thanks,
               Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-16 22:42 Linux 6.18-rc6 Linus Torvalds
  2025-11-17  8:20 ` David Wang
  2025-11-17 18:13 ` Guenter Roeck
@ 2025-11-18 17:23 ` Stephanie Gawroriski
  2025-11-18 18:01   ` Linus Torvalds
  2 siblings, 1 reply; 39+ messages in thread
From: Stephanie Gawroriski @ 2025-11-18 17:23 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Linus Torvalds

Hi!

Since rc6 I am getting crashes in JetBrain's Rider backend for CLion, which 
the backend is a C# application specifically, which gives in its logs:

```
Fatal error. System.AccessViolationException: Attempted to read or write 
protected memory. This is often an indication that other memory is corrupt.
```

Unfortunately, I do not see any traces in `dmesg` at all and there is not much 
else useful there. Since this is a closed source application, I cannot do much 
in debugging it unfortunately. It manifests as a never completing analyzation 
pass. Unlike David Wang, I am on an Intel Core i7-1370P on a ThinkPad X1 Yoga 
Gen 8 (with TME) and I have not experienced any crashes with GDB or Go in any 
way, nor other program issues at all. I have not experienced any issues with 
rc3 (which is what I was previously using as I often check for updates every 
so often).

Unrelated to this, after suspending and docking my laptop to my dock, I get 
this now:

[50598.774359] ------------[ cut here ]------------
[50598.774371] kernfs: can not remove 'typec', no directory
[50598.774389] WARNING: CPU: 2 PID: 48932 at fs/kernfs/dir.c:1706 
kernfs_remove_by_name_ns+0xcf/0xe0
[50598.774404] Modules linked in: snd_usb_audio snd_usbmidi_lib snd_rawmidi 
mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag 
af_packet_diag netlink_diag xpad(O) xt_conntrack xt_MASQUERADE xt_set ip_set 
xt_addrtype xfrm_user xfrm_algo snd_seq_dummy snd_hrtimer snd_seq 
snd_seq_device nls_utf8 cifs cifs_md4 dns_resolver netfs iscsi_tcp nft_masq 
libiscsi_tcp libiscsi scsi_transport_iscsi target_core_user uio 
target_core_pscsi target_core_file target_core_iblock iscsi_target_mod 
target_core_mod xt_tcpudp nft_compat x_tables nft_fib_inet nft_fib_ipv4 
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject 
nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfcomm 
nf_tables qrtr at24 ee1004 bnep snd_ctl_led snd_soc_skl_hda_dsp 
snd_soc_intel_sof_board_helpers snd_soc_intel_hda_dsp_common snd_sof_probes 
vhba(O) snd_hda_codec_intelhdmi br_netfilter snd_hda_codec_alc269 bridge 
snd_hda_scodec_component stp snd_hda_codec_realtek_lib snd_soc_dmic llc 
snd_hda_codec_generic
[50598.774515]  snd_hda_intel pkcs8_key_parser snd_sof_pci_intel_tgl 
snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel 
soundwire_generic_allocation snd_sof_intel_hda_sdw_bpt 
snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink 
snd_sof_intel_hda snd_hda_codec_hdmi soundwire_cadence snd_sof_pci 
snd_sof_xtensa_dsp snd_sof snd_soc_avs snd_sof_utils snd_soc_hda_codec 
snd_soc_acpi_intel_match snd_hda_ext_core snd_soc_acpi_intel_sdca_quirks 
snd_hda_codec uvcvideo nls_ascii snd_soc_acpi snd_hda_core videobuf2_vmalloc 
nls_cp437 crc8 snd_intel_dspcfg btusb uvc intel_uncore_frequency soundwire_bus 
snd_intel_sdw_acpi btmtk hid_sensor_accel_3d hid_sensor_custom_intel_hinge 
hid_sensor_gyro_3d videobuf2_memops intel_uncore_frequency_common snd_soc_sdca 
snd_hwdep btrtl hid_sensor_trigger videobuf2_v4l2 coretemp snd_soc_core btbcm 
intel_pmc_core hid_sensor_iio_common videodev rapl mei_hdcp mei_pxp 
snd_compress btintel pmt_telemetry industrialio_triggered_buffer 
videobuf2_common intel_cstate spi_nor iosm
[50598.774603]  snd_pcm_dmaengine mei_me pmt_discovery bluetooth kfifo_buf mc 
intel_uncore pcspkr mtd wwan snd_pcm mei pmt_class button igen6_edac 
mei_vsc_hw soc_button_array ac acpi_pad industrialio intel_pmc_ssram_telemetry 
acpi_tad snd_timer joydev evdev snd soundcore hid_xpadneo(O) ff_memless 
parport_pc ppdev lp parport efi_pstore nvme_fabrics loop nfnetlink zram 
842_decompress 842_compress autofs4 hid_sensor_custom hid_multitouch psmouse 
i2c_i801 hid_sensor_hub i2c_hid_acpi serio_raw i2c_smbus hid_generic i2c_hid
[50598.774677] CPU: 2 UID: 0 PID: 48932 Comm: kworker/2:1 Tainted: G           
O        6.18.0-rc6 #5 PREEMPT(full) 
[50598.774688] Tainted: [O]=OOT_MODULE
[50598.774691] Hardware name: LENOVO 21HQCTO1WW/21HQCTO1WW, BIOS N3XET62W 
(1.37 ) 07/28/2025
[50598.774696] Workqueue: events ucsi_handle_connector_change
[50598.774708] RIP: 0010:kernfs_remove_by_name_ns+0xcf/0xe0
[50598.774715] Code: 5b 5d 41 5c 41 5d c3 cc cc cc cc 48 89 ef e8 98 1a c0 ff 
b8 fe ff ff ff eb e6 0f 0b eb 9f 48 c7 c7 b0 5e dc 9d e8 21 42 b7 ff <0f> 0b eb e5 
66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90
[50598.774721] RSP: 0018:ffffcf1ecce83db8 EFLAGS: 00010282
[50598.774727] RAX: 0000000000000000 RBX: ffff8c645507d400 RCX: 0000000000000027
[50598.774732] RDX: ffff8c709f298a48 RSI: 0000000000000001 RDI: ffff8c709f298a40
[50598.774736] RBP: ffff8c61825ad008 R08: 0000000000000000 R09: ffff8c70df772fe8
[50598.774740] R10: ffff8c70df742fa8 R11: 0000000000000003 R12: ffffffff9dfaadb1
[50598.774743] R13: ffff8c61825ad350 R14: 0000000000000001 R15: ffff8c6182582520
[50598.774746] FS:  0000000000000000(0000) GS:ffff8c70ffd3d000(0000) knlGS:
0000000000000000
[50598.774751] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[50598.774755] CR2: 00007f9af0c6f000 CR3: 000000099464c003 CR4: 
0000000000f72ef0
[50598.774760] PKRU: 55555554
[50598.774763] Call Trace:
[50598.774768]  <TASK>
[50598.774775]  typec_unregister_partner+0x6c/0x110
[50598.774787]  ucsi_unregister_partner+0x103/0x140
[50598.774794]  ucsi_handle_connector_change+0x34d/0x3e0
[50598.774803]  process_one_work+0x18b/0x340
[50598.774811]  worker_thread+0x256/0x3a0
[50598.774818]  ? __pfx_worker_thread+0x10/0x10
[50598.774824]  kthread+0xfc/0x240
[50598.774833]  ? __pfx_kthread+0x10/0x10
[50598.774841]  ? __pfx_kthread+0x10/0x10
[50598.774848]  ret_from_fork+0x1c9/0x200
[50598.774858]  ? __pfx_kthread+0x10/0x10
[50598.774865]  ret_from_fork_asm+0x1a/0x30
[50598.774877]  </TASK>
[50598.774880] ---[ end trace 0000000000000000 ]---
[50598.774885] ------------[ cut here ]------------
[50598.774888] kernfs: can not remove 'typec', no directory
[50598.774899] WARNING: CPU: 2 PID: 48932 at fs/kernfs/dir.c:1706 
kernfs_remove_by_name_ns+0xcf/0xe0
[50598.774907] Modules linked in: snd_usb_audio snd_usbmidi_lib snd_rawmidi 
mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag 
af_packet_diag netlink_diag xpad(O) xt_conntrack xt_MASQUERADE xt_set ip_set 
xt_addrtype xfrm_user xfrm_algo snd_seq_dummy snd_hrtimer snd_seq 
snd_seq_device nls_utf8 cifs cifs_md4 dns_resolver netfs iscsi_tcp nft_masq 
libiscsi_tcp libiscsi scsi_transport_iscsi target_core_user uio 
target_core_pscsi target_core_file target_core_iblock iscsi_target_mod 
target_core_mod xt_tcpudp nft_compat x_tables nft_fib_inet nft_fib_ipv4 
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject 
nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfcomm 
nf_tables qrtr at24 ee1004 bnep snd_ctl_led snd_soc_skl_hda_dsp 
snd_soc_intel_sof_board_helpers snd_soc_intel_hda_dsp_common snd_sof_probes 
vhba(O) snd_hda_codec_intelhdmi br_netfilter snd_hda_codec_alc269 bridge 
snd_hda_scodec_component stp snd_hda_codec_realtek_lib snd_soc_dmic llc 
snd_hda_codec_generic
[50598.774991]  snd_hda_intel pkcs8_key_parser snd_sof_pci_intel_tgl 
snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel 
soundwire_generic_allocation snd_sof_intel_hda_sdw_bpt 
snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink 
snd_sof_intel_hda snd_hda_codec_hdmi soundwire_cadence snd_sof_pci 
snd_sof_xtensa_dsp snd_sof snd_soc_avs snd_sof_utils snd_soc_hda_codec 
snd_soc_acpi_intel_match snd_hda_ext_core snd_soc_acpi_intel_sdca_quirks 
snd_hda_codec uvcvideo nls_ascii snd_soc_acpi snd_hda_core videobuf2_vmalloc 
nls_cp437 crc8 snd_intel_dspcfg btusb uvc intel_uncore_frequency soundwire_bus 
snd_intel_sdw_acpi btmtk hid_sensor_accel_3d hid_sensor_custom_intel_hinge 
hid_sensor_gyro_3d videobuf2_memops intel_uncore_frequency_common snd_soc_sdca 
snd_hwdep btrtl hid_sensor_trigger videobuf2_v4l2 coretemp snd_soc_core btbcm 
intel_pmc_core hid_sensor_iio_common videodev rapl mei_hdcp mei_pxp 
snd_compress btintel pmt_telemetry industrialio_triggered_buffer 
videobuf2_common intel_cstate spi_nor iosm
[50598.775061]  snd_pcm_dmaengine mei_me pmt_discovery bluetooth kfifo_buf mc 
intel_uncore pcspkr mtd wwan snd_pcm mei pmt_class button igen6_edac 
mei_vsc_hw soc_button_array ac acpi_pad industrialio intel_pmc_ssram_telemetry 
acpi_tad snd_timer joydev evdev snd soundcore hid_xpadneo(O) ff_memless 
parport_pc ppdev lp parport efi_pstore nvme_fabrics loop nfnetlink zram 
842_decompress 842_compress autofs4 hid_sensor_custom hid_multitouch psmouse 
i2c_i801 hid_sensor_hub i2c_hid_acpi serio_raw i2c_smbus hid_generic i2c_hid
[50598.775119] CPU: 2 UID: 0 PID: 48932 Comm: kworker/2:1 Tainted: G        W  
O        6.18.0-rc6 #5 PREEMPT(full) 
[50598.775127] Tainted: [W]=WARN, [O]=OOT_MODULE
[50598.775129] Hardware name: LENOVO 21HQCTO1WW/21HQCTO1WW, BIOS N3XET62W 
(1.37 ) 07/28/2025
[50598.775132] Workqueue: events ucsi_handle_connector_change
[50598.775141] RIP: 0010:kernfs_remove_by_name_ns+0xcf/0xe0
[50598.775147] Code: 5b 5d 41 5c 41 5d c3 cc cc cc cc 48 89 ef e8 98 1a c0 ff 
b8 fe ff ff ff eb e6 0f 0b eb 9f 48 c7 c7 b0 5e dc 9d e8 21 42 b7 ff <0f> 0b eb e5 
66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90
[50598.775151] RSP: 0018:ffffcf1ecce83db8 EFLAGS: 00010282
[50598.775156] RAX: 0000000000000000 RBX: ffff8c645507d400 RCX: 0000000000000027
[50598.775159] RDX: ffff8c709f298a48 RSI: 0000000000000001 RDI: ffff8c709f298a40
[50598.775162] RBP: ffff8c61825ad008 R08: 0000000000000000 R09: ffff8c70df772fe8
[50598.775165] R10: ffff8c70df742fa8 R11: 0000000000000003 R12: ffffffff9dfaadb1
[50598.775168] R13: ffff8c61825ad350 R14: 0000000000000001 R15: ffff8c6182582520
[50598.775171] FS:  0000000000000000(0000) GS:ffff8c70ffd3d000(0000) knlGS:
0000000000000000
[50598.775175] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[50598.775179] CR2: 00007f9af0c6f000 CR3: 000000099464c003 CR4: 
0000000000f72ef0
[50598.775182] PKRU: 55555554
[50598.775185] Call Trace:
[50598.775187]  <TASK>
[50598.775190]  typec_unregister_partner+0xbf/0x110
[50598.775198]  ucsi_unregister_partner+0x103/0x140
[50598.775205]  ucsi_handle_connector_change+0x34d/0x3e0
[50598.775214]  process_one_work+0x18b/0x340
[50598.775220]  worker_thread+0x256/0x3a0
[50598.775226]  ? __pfx_worker_thread+0x10/0x10
[50598.775232]  kthread+0xfc/0x240
[50598.775239]  ? __pfx_kthread+0x10/0x10
[50598.775247]  ? __pfx_kthread+0x10/0x10
[50598.775254]  ret_from_fork+0x1c9/0x200
[50598.775261]  ? __pfx_kthread+0x10/0x10
[50598.775268]  ret_from_fork_asm+0x1a/0x30
[50598.775280]  </TASK>
[50598.775282] ---[ end trace 0000000000000000 ]---





^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-18 17:23 ` Stephanie Gawroriski
@ 2025-11-18 18:01   ` Linus Torvalds
  2025-11-18 20:18     ` Stephanie Gawroriski
  0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2025-11-18 18:01 UTC (permalink / raw)
  To: Stephanie Gawroriski
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Heikki Krogerus

On Tue, 18 Nov 2025 at 09:23, Stephanie Gawroriski
<xerthesquirrel@gmail.com> wrote:
>
> Since rc6 I am getting crashes in JetBrain's Rider backend for CLion, which
> the backend is a C# application specifically, which gives in its logs:

Yeah, that looks like the same bug that David Wang and some other
people reported.

Can you check current top-of-tree, which includes that commit
5bebe8de1926 ("mm/huge_memory: Fix initialization of huge zero folio")
which hopefully fixes this for you.

> Unfortunately, I do not see any traces in `dmesg` at all and there is not much
> else useful there.

Yeah, that bug would only confuse user space due to an uninitialized
memory area, the kernel itself would be unaffected and wouldn't see
anything wrong (apart from then possibly killing misbehaving apps due
to the confusion).

> Unlike David Wang, I am on an Intel Core i7-1370P on a ThinkPad X1 Yoga
> Gen 8 (with TME) and I have not experienced any crashes with GDB or Go in any
> way, nor other program issues at all.

It's going to be a bit random, because that huge zero-page ends up
being used by some loads but not all. I think David Wang also reported
that it only happened for particular versions for him.

> Unrelated to this, after suspending and docking my laptop to my dock, I get
> this now:
>
> [50598.774359] ------------[ cut here ]------------
> [50598.774371] kernfs: can not remove 'typec', no directory
> [50598.774389] WARNING: CPU: 2 PID: 48932 at fs/kernfs/dir.c:1706
> kernfs_remove_by_name_ns+0xcf/0xe0

Ok, that is indeed unrelated, and should be mostly harmless apart from
the scary message. Do you see any other effects than the noise in the
logs?

Somebody is trying to remove a sysfs entry that has no parent,
presumably because it was never registered in the first place.

At a guess, it's some error handling cleanup that is done
unconditionally, unregistering an entry even when the original
registration failed. Or unregistering twice.

Looks like ucsi / typec handling:

> [50598.774763] Call Trace:
> [50598.774775]  typec_unregister_partner+0x6c/0x110
> [50598.774787]  ucsi_unregister_partner+0x103/0x140
> [50598.774794]  ucsi_handle_connector_change+0x34d/0x3e0
> [50598.774803]  process_one_work+0x18b/0x340
> [50598.774811]  worker_thread+0x256/0x3a0
> [50598.774824]  kthread+0xfc/0x240

but that said, I don't see why this would be new behavior. I don't see
anything that has changed in this area lately in the typec class
handling.

In fact, looking around, I see much older reports that look a bit like
this, so I don't think it's new.

Adding Greg and Heikki who might know what is going on. See

   https://lore.kernel.org/all/2203148.PYKUYFuaPT@arborvitaetree/

for original report.

                Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-18 18:01   ` Linus Torvalds
@ 2025-11-18 20:18     ` Stephanie Gawroriski
  2025-11-19  9:08       ` Heikki Krogerus
  0 siblings, 1 reply; 39+ messages in thread
From: Stephanie Gawroriski @ 2025-11-18 20:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Greg Kroah-Hartman, Heikki Krogerus

Hi!

On Tuesday, 18 November 2025 13:01:54 EST Linus Torvalds wrote:
> On Tue, 18 Nov 2025 at 09:23, Stephanie Gawroriski
> 
> <xerthesquirrel@gmail.com> wrote:
> > Since rc6 I am getting crashes in JetBrain's Rider backend for CLion,
> > which
> 
> > the backend is a C# application specifically, which gives in its logs:
> Yeah, that looks like the same bug that David Wang and some other
> people reported.
> 
> Can you check current top-of-tree, which includes that commit
> 5bebe8de1926 ("mm/huge_memory: Fix initialization of huge zero folio")
> which hopefully fixes this for you.

I have been running 5bebe8de1926 for a bit and so far no issues! Have tried a 
few suspends and re-runs to simulate my normal workflow, seems to be stable!

> > Unrelated to this, after suspending and docking my laptop to my dock, I
> > get
> > this now:
> > 
> > [50598.774359] ------------[ cut here ]------------
> > [50598.774371] kernfs: can not remove 'typec', no directory
> > [50598.774389] WARNING: CPU: 2 PID: 48932 at fs/kernfs/dir.c:1706
> > kernfs_remove_by_name_ns+0xcf/0xe0
> 
> Ok, that is indeed unrelated, and should be mostly harmless apart from
> the scary message. Do you see any other effects than the noise in the
> logs?

Not that I know of.

> Somebody is trying to remove a sysfs entry that has no parent,
> presumably because it was never registered in the first place.
> 
> At a guess, it's some error handling cleanup that is done
> unconditionally, unregistering an entry even when the original
> registration failed. Or unregistering twice.
> 
> Looks like ucsi / typec handling:
> > [50598.774763] Call Trace:
> > [50598.774775]  typec_unregister_partner+0x6c/0x110
> > [50598.774787]  ucsi_unregister_partner+0x103/0x140
> > [50598.774794]  ucsi_handle_connector_change+0x34d/0x3e0
> > [50598.774803]  process_one_work+0x18b/0x340
> > [50598.774811]  worker_thread+0x256/0x3a0
> > [50598.774824]  kthread+0xfc/0x240
> 
> but that said, I don't see why this would be new behavior. I don't see
> anything that has changed in this area lately in the typec class
> handling.
> 
> In fact, looking around, I see much older reports that look a bit like
> this, so I don't think it's new.
> 
> Adding Greg and Heikki who might know what is going on. See
> 
>    https://lore.kernel.org/all/2203148.PYKUYFuaPT@arborvitaetree/
> 
> for original report.
> 
>                 Linus

I really only noticed it when I was looking in dmesg very recently to see if 
there were any application crashes due to huge_memory issue. I do look in 
dmesg often enough, but mostly to see if devices are being recognized properly 
or other events such as when the embedded JVM I am working on split locks.




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-18 20:18     ` Stephanie Gawroriski
@ 2025-11-19  9:08       ` Heikki Krogerus
  2025-11-19 14:18         ` Stephanie Gawroriski
  2025-11-19 15:04         ` Stephanie Gawroriski
  0 siblings, 2 replies; 39+ messages in thread
From: Heikki Krogerus @ 2025-11-19  9:08 UTC (permalink / raw)
  To: Stephanie Gawroriski
  Cc: Linus Torvalds, Linux Kernel Mailing List, Greg Kroah-Hartman

Hi

> > > Unrelated to this, after suspending and docking my laptop to my dock, I
> > > get
> > > this now:
> > > 
> > > [50598.774359] ------------[ cut here ]------------
> > > [50598.774371] kernfs: can not remove 'typec', no directory
> > > [50598.774389] WARNING: CPU: 2 PID: 48932 at fs/kernfs/dir.c:1706
> > > kernfs_remove_by_name_ns+0xcf/0xe0
> > 
> > Ok, that is indeed unrelated, and should be mostly harmless apart from
> > the scary message. Do you see any other effects than the noise in the
> > logs?
> 
> Not that I know of.
> 
> > Somebody is trying to remove a sysfs entry that has no parent,
> > presumably because it was never registered in the first place.
> > 
> > At a guess, it's some error handling cleanup that is done
> > unconditionally, unregistering an entry even when the original
> > registration failed. Or unregistering twice.
> > 
> > Looks like ucsi / typec handling:
> > > [50598.774763] Call Trace:
> > > [50598.774775]  typec_unregister_partner+0x6c/0x110
> > > [50598.774787]  ucsi_unregister_partner+0x103/0x140
> > > [50598.774794]  ucsi_handle_connector_change+0x34d/0x3e0
> > > [50598.774803]  process_one_work+0x18b/0x340
> > > [50598.774811]  worker_thread+0x256/0x3a0
> > > [50598.774824]  kthread+0xfc/0x240
> > 
> > but that said, I don't see why this would be new behavior. I don't see
> > anything that has changed in this area lately in the typec class
> > handling.
> > 
> > In fact, looking around, I see much older reports that look a bit like
> > this, so I don't think it's new.
> > 
> > Adding Greg and Heikki who might know what is going on. See
> > 
> >    https://lore.kernel.org/all/2203148.PYKUYFuaPT@arborvitaetree/
> > 
> > for original report.
> > 
> >                 Linus
> 
> I really only noticed it when I was looking in dmesg very recently to see if 
> there were any application crashes due to huge_memory issue. I do look in 
> dmesg often enough, but mostly to see if devices are being recognized properly 
> or other events such as when the embedded JVM I am working on split locks.

Thanks for the report. It looks like the code does not increment the
reference count of the USB device that is liked to the typec partner.

Is it possible for you to test this?

diff --git a/drivers/usb/typec/class.c b/drivers/usb/typec/class.c
index 9b2647cb199b..4ace92af9856 100644
--- a/drivers/usb/typec/class.c
+++ b/drivers/usb/typec/class.c
@@ -805,6 +805,8 @@ static void typec_partner_link_device(struct typec_partner *partner, struct devi
                return;
        }
 
+       get_device(dev);
+
        if (partner->attach)
                partner->attach(partner, dev);
 }
@@ -816,6 +818,8 @@ static void typec_partner_unlink_device(struct typec_partner *partner, struct de
 
        if (partner->deattach)
                partner->deattach(partner, dev);
+
+       put_device(dev);
 }
 
 /**

-- 
heikki

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-19  9:08       ` Heikki Krogerus
@ 2025-11-19 14:18         ` Stephanie Gawroriski
  2025-11-19 15:04         ` Stephanie Gawroriski
  1 sibling, 0 replies; 39+ messages in thread
From: Stephanie Gawroriski @ 2025-11-19 14:18 UTC (permalink / raw)
  To: Heikki Krogerus
  Cc: Linus Torvalds, Linux Kernel Mailing List, Greg Kroah-Hartman

On Wednesday, 19 November 2025 04:08:35 EST Heikki Krogerus wrote:
> Hi
> 
> > > > Unrelated to this, after suspending and docking my laptop to my dock,
> > > > I
> > > > get
> > > > this now:
> > > > 
> > > > [50598.774359] ------------[ cut here ]------------
> > > > [50598.774371] kernfs: can not remove 'typec', no directory
> > > > [50598.774389] WARNING: CPU: 2 PID: 48932 at fs/kernfs/dir.c:1706
> > > > kernfs_remove_by_name_ns+0xcf/0xe0
> > > 
> > > Ok, that is indeed unrelated, and should be mostly harmless apart from
> > > the scary message. Do you see any other effects than the noise in the
> > > logs?
> > 
> > Not that I know of.
> > 
> > > Somebody is trying to remove a sysfs entry that has no parent,
> > > presumably because it was never registered in the first place.
> > > 
> > > At a guess, it's some error handling cleanup that is done
> > > unconditionally, unregistering an entry even when the original
> > > registration failed. Or unregistering twice.
> > > 
> > > Looks like ucsi / typec handling:
> > > > [50598.774763] Call Trace:
> > > > [50598.774775]  typec_unregister_partner+0x6c/0x110
> > > > [50598.774787]  ucsi_unregister_partner+0x103/0x140
> > > > [50598.774794]  ucsi_handle_connector_change+0x34d/0x3e0
> > > > [50598.774803]  process_one_work+0x18b/0x340
> > > > [50598.774811]  worker_thread+0x256/0x3a0
> > > > [50598.774824]  kthread+0xfc/0x240
> > > 
> > > but that said, I don't see why this would be new behavior. I don't see
> > > anything that has changed in this area lately in the typec class
> > > handling.
> > > 
> > > In fact, looking around, I see much older reports that look a bit like
> > > this, so I don't think it's new.
> > > 
> > > Adding Greg and Heikki who might know what is going on. See
> > > 
> > >    https://lore.kernel.org/all/2203148.PYKUYFuaPT@arborvitaetree/
> > > 
> > > for original report.
> > > 
> > >                 Linus
> > 
> > I really only noticed it when I was looking in dmesg very recently to see
> > if there were any application crashes due to huge_memory issue. I do look
> > in dmesg often enough, but mostly to see if devices are being recognized
> > properly or other events such as when the embedded JVM I am working on
> > split locks.
> Thanks for the report. It looks like the code does not increment the
> reference count of the USB device that is liked to the typec partner.
> 
> Is it possible for you to test this?
> 
> diff --git a/drivers/usb/typec/class.c b/drivers/usb/typec/class.c
> index 9b2647cb199b..4ace92af9856 100644
> --- a/drivers/usb/typec/class.c
> +++ b/drivers/usb/typec/class.c
> @@ -805,6 +805,8 @@ static void typec_partner_link_device(struct
> typec_partner *partner, struct devi return;
>         }
> 
> +       get_device(dev);
> +
>         if (partner->attach)
>                 partner->attach(partner, dev);
>  }
> @@ -816,6 +818,8 @@ static void typec_partner_unlink_device(struct
> typec_partner *partner, struct de
> 
>         if (partner->deattach)
>                 partner->deattach(partner, dev);
> +
> +       put_device(dev);
>  }
> 
>  /**

I can give it a try! I do notice the trace in dmesg, however it does not 
always appear when docking/undocking so I am not sure the conditions in which 
it occurs in.




^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-19  9:08       ` Heikki Krogerus
  2025-11-19 14:18         ` Stephanie Gawroriski
@ 2025-11-19 15:04         ` Stephanie Gawroriski
  2025-11-24  9:50           ` Heikki Krogerus
  1 sibling, 1 reply; 39+ messages in thread
From: Stephanie Gawroriski @ 2025-11-19 15:04 UTC (permalink / raw)
  To: Heikki Krogerus
  Cc: Linus Torvalds, Linux Kernel Mailing List, Greg Kroah-Hartman

On Wednesday, 19 November 2025 04:08:35 EST Heikki Krogerus wrote:
> Hi
> 
> > > > Unrelated to this, after suspending and docking my laptop to my dock,
> > > > I
> > > > get
> > > > this now:
> > > > 
> > > > [50598.774359] ------------[ cut here ]------------
> > > > [50598.774371] kernfs: can not remove 'typec', no directory
> > > > [50598.774389] WARNING: CPU: 2 PID: 48932 at fs/kernfs/dir.c:1706
> > > > kernfs_remove_by_name_ns+0xcf/0xe0
> > > 
> > > Ok, that is indeed unrelated, and should be mostly harmless apart from
> > > the scary message. Do you see any other effects than the noise in the
> > > logs?
> > 
> > Not that I know of.
> > 
> > > Somebody is trying to remove a sysfs entry that has no parent,
> > > presumably because it was never registered in the first place.
> > > 
> > > At a guess, it's some error handling cleanup that is done
> > > unconditionally, unregistering an entry even when the original
> > > registration failed. Or unregistering twice.
> > > 
> > > Looks like ucsi / typec handling:
> > > > [50598.774763] Call Trace:
> > > > [50598.774775]  typec_unregister_partner+0x6c/0x110
> > > > [50598.774787]  ucsi_unregister_partner+0x103/0x140
> > > > [50598.774794]  ucsi_handle_connector_change+0x34d/0x3e0
> > > > [50598.774803]  process_one_work+0x18b/0x340
> > > > [50598.774811]  worker_thread+0x256/0x3a0
> > > > [50598.774824]  kthread+0xfc/0x240
> > > 
> > > but that said, I don't see why this would be new behavior. I don't see
> > > anything that has changed in this area lately in the typec class
> > > handling.
> > > 
> > > In fact, looking around, I see much older reports that look a bit like
> > > this, so I don't think it's new.
> > > 
> > > Adding Greg and Heikki who might know what is going on. See
> > > 
> > >    https://lore.kernel.org/all/2203148.PYKUYFuaPT@arborvitaetree/
> > > 
> > > for original report.
> > > 
> > >                 Linus
> > 
> > I really only noticed it when I was looking in dmesg very recently to see
> > if there were any application crashes due to huge_memory issue. I do look
> > in dmesg often enough, but mostly to see if devices are being recognized
> > properly or other events such as when the embedded JVM I am working on
> > split locks.
> Thanks for the report. It looks like the code does not increment the
> reference count of the USB device that is liked to the typec partner.
> 
> Is it possible for you to test this?
> 
> diff --git a/drivers/usb/typec/class.c b/drivers/usb/typec/class.c
> index 9b2647cb199b..4ace92af9856 100644
> --- a/drivers/usb/typec/class.c
> +++ b/drivers/usb/typec/class.c
> @@ -805,6 +805,8 @@ static void typec_partner_link_device(struct
> typec_partner *partner, struct devi return;
>         }
> 
> +       get_device(dev);
> +
>         if (partner->attach)
>                 partner->attach(partner, dev);
>  }
> @@ -816,6 +818,8 @@ static void typec_partner_unlink_device(struct
> typec_partner *partner, struct de
> 
>         if (partner->deattach)
>                 partner->deattach(partner, dev);
> +
> +       put_device(dev);
>  }
> 
>  /**

Okay, so I figured out that it does appear if I boot already connected to my 
dock and then suspend the system. Unfortunately, the patch does not cause the 
trace to go away. I have put extra context around the trace, hopefully it has 
some use. Making a guess, but would the `typec` directory get removed from all 
the disconnects before the suspend?

[  197.556838] usb 3-3: USB disconnect, device number 5
[  197.556858] usb 3-3.1: USB disconnect, device number 6
[  197.557394] usb 3-3.4: USB disconnect, device number 7
[  197.557402] usb 3-3.4.1: USB disconnect, device number 9
[  197.557407] usb 3-3.4.1.1: USB disconnect, device number 10
[  197.557988] thunderbolt 1-0:1.1: retimer disconnected
[  197.563900] thunderbolt 1-1: device disconnected
[  197.591868] usb 2-3: USB disconnect, device number 2
[  197.591876] usb 2-3.4: USB disconnect, device number 3
[  197.591878] usb 2-3.4.2: USB disconnect, device number 4
[  197.592041] cdc_ncm 2-3.4.2:2.0 enx[REDACTED]: unregister 'cdc_ncm' 
usb-0000:00:0d.0-3.4.2, CDC NCM (NO ZLP)
[  197.645038] usb 3-3.4.1.2: USB disconnect, device number 11
[  197.835260] ------------[ cut here ]------------
[  197.835269] kernfs: can not remove 'typec', no directory
[  197.835284] WARNING: CPU: 6 PID: 3566 at fs/kernfs/dir.c:1706 
kernfs_remove_by_name_ns+0xcf/0xe0
[  197.835294] Modules linked in: xt_conntrack xt_MASQUERADE xt_set ip_set 
xt_addrtype xfrm_user xfrm_algo snd_seq_dummy snd_hrtimer snd_seq nls_utf8 
cifs cifs_md4 dns_resolver netfs iscsi_tcp nft_masq libiscsi_tcp libiscsi 
scsi_transport_iscsi target_core_user uio target_core_pscsi target_core_file 
target_core_iblock iscsi_target_mod target_core_mod xt_tcpudp nft_compat 
x_tables nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet 
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr at24 ee1004 rfcomm 
bnep snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_sof_board_helpers 
snd_soc_intel_hda_dsp_common snd_sof_probes vhba(O) snd_hda_codec_intelhdmi 
br_netfilter snd_hda_codec_alc269 bridge snd_hda_scodec_component stp 
snd_hda_codec_realtek_lib snd_soc_dmic llc snd_hda_codec_generic snd_hda_intel 
pkcs8_key_parser snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl 
snd_sof_intel_hda_generic soundwire_intel soundwire_generic_allocation
[  197.835371]  snd_sof_intel_hda_sdw_bpt snd_sof_intel_hda_common 
snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_hda_codec_hdmi 
soundwire_cadence snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_soc_avs 
snd_sof_utils snd_soc_hda_codec snd_soc_acpi_intel_match snd_hda_ext_core 
snd_soc_acpi_intel_sdca_quirks snd_hda_codec snd_soc_acpi uvcvideo 
snd_hda_core crc8 videobuf2_vmalloc snd_intel_dspcfg snd_usb_audio nls_ascii 
soundwire_bus hid_sensor_gyro_3d hid_sensor_custom_intel_hinge 
hid_sensor_accel_3d btusb uvc intel_uncore_frequency snd_intel_sdw_acpi 
snd_usbmidi_lib nls_cp437 snd_soc_sdca hid_sensor_trigger btmtk 
videobuf2_memops intel_uncore_frequency_common snd_hwdep snd_soc_core 
hid_sensor_iio_common btrtl videobuf2_v4l2 coretemp snd_rawmidi snd_compress 
intel_pmc_core btbcm industrialio_triggered_buffer videodev mei_hdcp mei_pxp 
rapl snd_seq_device snd_pcm_dmaengine pmt_telemetry btintel kfifo_buf 
videobuf2_common spi_nor iosm intel_cstate mei_me snd_pcm pmt_discovery 
industrialio bluetooth mc mtd wwan
[  197.835438]  intel_uncore mei snd_timer pcspkr pmt_class igen6_edac button 
soc_button_array ac intel_pmc_ssram_telemetry acpi_tad acpi_pad snd mei_vsc_hw 
joydev evdev soundcore vboxnetadp(O) vboxnetflt(O) vboxdrv(O) hid_xpadneo(O) 
ff_memless parport_pc ppdev lp parport loop nvme_fabrics efi_pstore nfnetlink 
zram 842_decompress 842_compress autofs4 hid_sensor_custom hid_multitouch 
psmouse i2c_i801 hid_sensor_hub i2c_hid_acpi serio_raw i2c_smbus hid_generic 
i2c_hid
[  197.835483] CPU: 6 UID: 0 PID: 3566 Comm: kworker/6:3 Tainted: G     U     
O        6.18.0-rc6 #7 PREEMPT(full) 
[  197.835490] Tainted: [U]=USER, [O]=OOT_MODULE
[  197.835492] Hardware name: LENOVO 21HQCTO1WW/21HQCTO1WW, BIOS N3XET62W 
(1.37 ) 07/28/2025
[  197.835495] Workqueue: events ucsi_handle_connector_change
[  197.835505] RIP: 0010:kernfs_remove_by_name_ns+0xcf/0xe0
[  197.835509] Code: 5b 5d 41 5c 41 5d c3 cc cc cc cc 48 89 ef e8 58 1a c0 ff 
b8 fe ff ff ff eb e6 0f 0b eb 9f 48 c7 c7 d0 5b bc a9 e8 e1 41 b7 ff <0f> 0b eb e5 
66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90
[  197.835513] RSP: 0018:ffffd2768b5dfda8 EFLAGS: 00010286
[  197.835517] RAX: 0000000000000000 RBX: ffff89d541a4c8b0 RCX: 0000000000000027
[  197.835520] RDX: ffff89e45f398a48 RSI: 0000000000000001 RDI: ffff89e45f398a40
[  197.835523] RBP: ffff89d55f722800 R08: 0000000000000000 R09: ffff89e49f761168
[  197.835525] R10: ffff89e49f731128 R11: 0000000000000003 R12: ffffffffa9daab01
[  197.835527] R13: ffff89d590c294e0 R14: 0000000000000000 R15: 0000000000000000
[  197.835529] FS:  0000000000000000(0000) GS:ffff89e4b403d000(0000) knlGS:
0000000000000000
[  197.835532] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  197.835535] CR2: 000056339a2b30c0 CR3: 000000019214d004 CR4: 
0000000000f70ef0
[  197.835538] PKRU: 55555554
[  197.835540] Call Trace:
[  197.835544]  <TASK>
[  197.835550]  typec_partner_unlink_device+0x30/0x60
[  197.835557]  typec_unregister_partner+0x64/0xa0
[  197.835562]  ucsi_unregister_partner+0x103/0x140
[  197.835567]  ucsi_handle_connector_change+0x34d/0x3e0
[  197.835580]  process_one_work+0x18b/0x340
[  197.835585]  worker_thread+0x256/0x3a0
[  197.835589]  ? __pfx_worker_thread+0x10/0x10
[  197.835593]  kthread+0xfc/0x240
[  197.835599]  ? __pfx_kthread+0x10/0x10
[  197.835604]  ? __pfx_kthread+0x10/0x10
[  197.835608]  ret_from_fork+0x1c9/0x200
[  197.835614]  ? __pfx_kthread+0x10/0x10
[  197.835619]  ret_from_fork_asm+0x1a/0x30
[  197.835627]  </TASK>
[  197.835628] ---[ end trace 0000000000000000 ]---
[  198.154306] usb 3-3.4.1.4: USB disconnect, device number 12
[  198.161398] usb 3-3.5: USB disconnect, device number 8
[  199.059821] pcieport 0000:50:00.0: not ready 1023ms after resume; giving up
[  199.059906] pcieport 0000:00:07.2: pciehp: Slot(5): Card not present
[  199.067233] pci_bus 0000:52: busn_res: [bus 52] is released
[  199.067486] pci_bus 0000:53: busn_res: [bus 53-5f] is released
[  199.067820] pci_bus 0000:60: busn_res: [bus 60-6c] is released
[  199.068013] pci_bus 0000:6d: busn_res: [bus 6d-78] is released
[  199.068295] pci_bus 0000:79: busn_res: [bus 79] is released
[  199.068463] pci_bus 0000:51: busn_res: [bus 51-79] is released
[  200.802421] wlan0: deauthenticating from [REDACTED] by local choice 
(Reason: 3=DEAUTH_LEAVING)
[  201.679172] PM: suspend entry (s2idle)
[  206.409358] Filesystems sync: 4.730 seconds
[  206.613445] Freezing user space processes
[  206.616008] Freezing user space processes completed (elapsed 0.002 seconds)
[  206.616016] OOM killer disabled.
[  206.616018] Freezing remaining freezable tasks
[  211.905879] Freezing remaining freezable tasks completed (elapsed 5.289 
seconds)
[  211.905912] printk: Suspending console(s) (use no_console_suspend to debug)
[  211.976568] sd 1:0:0:0: [sda] Synchronizing SCSI cache
[  212.442684] ACPI: EC: interrupt blocked
[  218.598684] typec port1-partner: PM: parent port1 should not be sleeping
[  218.798599] ACPI: EC: interrupt unblocked
[  219.056659] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin 
version 70.49.4
[  219.056672] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin 
version 7.9.3
[  219.073475] i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all 
workloads
[  219.074666] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
[  219.074676] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
[  219.075617] i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
[  219.105707] nvme nvme0: 20/0/0 default/read/poll queues
[  219.470735] thunderbolt 1-1: new device found, vendor=0x1df device=0x112
[  219.470749] thunderbolt 1-1: Sabrent Rocket docking station
[  219.470931] typec port1: bound usb3-port3 (ops connector_ops)
[  219.470955] typec port1: bound usb2-port3 (ops connector_ops)
[  219.470971] typec port1: bound usb4_port7 (ops connector_ops)
[  219.770124] thunderbolt 1-0:1.1: new retimer found, vendor=0x8087 
device=0x15ee
[  220.256910] mei_hdcp 0000:00:16.0-[REDACTED]: bound 0000:00:02.0 (ops 
i915_hdcp_ops)
[  220.258650] mei_pxp 0000:00:16.0-[REDACTED]: bound 0000:00:02.0 (ops 
i915_pxp_tee_component_ops)
[  220.258713] OOM killer enabled.
[  220.258719] Restarting tasks: Starting
[  220.261104] Restarting tasks: Done





^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-18 16:49                           ` Linus Torvalds
@ 2025-11-19 15:42                             ` Catalin Marinas
  0 siblings, 0 replies; 39+ messages in thread
From: Catalin Marinas @ 2025-11-19 15:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Hildenbrand (Red Hat), David Wang, lance.yang, b-padhi,
	akpm, linux-kernel, Jan Polensky

On Tue, Nov 18, 2025 at 08:49:42AM -0800, Linus Torvalds wrote:
> On Mon, 17 Nov 2025 at 23:28, David Hildenbrand (Red Hat)
> <david@kernel.org> wrote:
> >
> > Do you want to quickly send that patch with linux-mm on CC or do you
> > just want to commit it? If you're busy I can quickly send it around.
> 
> I applied it (with your fix for the silly inverted assignment to
> 'init') directly, since I wanted to get this fixed quickly as we're
> fairly late in the release cycle.
> 
> But if there is some thread on linux-mm that should be notified that
> you know of - this is the only one I'm personally aware of - please do
> give people there a heads-up.

Thanks Linus, David for the quick fix and upstreaming. I tested it as
well on arm64 and works fine both with MTE enabled and disabled. I had a
somewhat similar proposal here:

https://lore.kernel.org/r/aRIEkLw7BofLjOWs@arm.com

though dealing with all pages in tag_clear_highpages() is better, it
gives us some room for (small) optimisation later on (e.g. only flag the
head of a compound page like we do for hugetlb).

-- 
Catalin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-19 15:04         ` Stephanie Gawroriski
@ 2025-11-24  9:50           ` Heikki Krogerus
  2025-11-26 16:01             ` Stephanie Gawroriski
  0 siblings, 1 reply; 39+ messages in thread
From: Heikki Krogerus @ 2025-11-24  9:50 UTC (permalink / raw)
  To: Stephanie Gawroriski
  Cc: Linus Torvalds, Linux Kernel Mailing List, Greg Kroah-Hartman

Hi Stephanie,

Wed, Nov 19, 2025 at 10:04:35AM -0500, Stephanie Gawroriski kirjoitti:
> On Wednesday, 19 November 2025 04:08:35 EST Heikki Krogerus wrote:
> > Hi
> > 
> > > > > Unrelated to this, after suspending and docking my laptop to my dock,
> > > > > I
> > > > > get
> > > > > this now:
> > > > > 
> > > > > [50598.774359] ------------[ cut here ]------------
> > > > > [50598.774371] kernfs: can not remove 'typec', no directory
> > > > > [50598.774389] WARNING: CPU: 2 PID: 48932 at fs/kernfs/dir.c:1706
> > > > > kernfs_remove_by_name_ns+0xcf/0xe0
> > > > 
> > > > Ok, that is indeed unrelated, and should be mostly harmless apart from
> > > > the scary message. Do you see any other effects than the noise in the
> > > > logs?
> > > 
> > > Not that I know of.
> > > 
> > > > Somebody is trying to remove a sysfs entry that has no parent,
> > > > presumably because it was never registered in the first place.
> > > > 
> > > > At a guess, it's some error handling cleanup that is done
> > > > unconditionally, unregistering an entry even when the original
> > > > registration failed. Or unregistering twice.
> > > > 
> > > > Looks like ucsi / typec handling:
> > > > > [50598.774763] Call Trace:
> > > > > [50598.774775]  typec_unregister_partner+0x6c/0x110
> > > > > [50598.774787]  ucsi_unregister_partner+0x103/0x140
> > > > > [50598.774794]  ucsi_handle_connector_change+0x34d/0x3e0
> > > > > [50598.774803]  process_one_work+0x18b/0x340
> > > > > [50598.774811]  worker_thread+0x256/0x3a0
> > > > > [50598.774824]  kthread+0xfc/0x240
> > > > 
> > > > but that said, I don't see why this would be new behavior. I don't see
> > > > anything that has changed in this area lately in the typec class
> > > > handling.
> > > > 
> > > > In fact, looking around, I see much older reports that look a bit like
> > > > this, so I don't think it's new.
> > > > 
> > > > Adding Greg and Heikki who might know what is going on. See
> > > > 
> > > >    https://lore.kernel.org/all/2203148.PYKUYFuaPT@arborvitaetree/
> > > > 
> > > > for original report.
> > > > 
> > > >                 Linus
> > > 
> > > I really only noticed it when I was looking in dmesg very recently to see
> > > if there were any application crashes due to huge_memory issue. I do look
> > > in dmesg often enough, but mostly to see if devices are being recognized
> > > properly or other events such as when the embedded JVM I am working on
> > > split locks.
> > Thanks for the report. It looks like the code does not increment the
> > reference count of the USB device that is liked to the typec partner.
> > 
> > Is it possible for you to test this?
> > 
> > diff --git a/drivers/usb/typec/class.c b/drivers/usb/typec/class.c
> > index 9b2647cb199b..4ace92af9856 100644
> > --- a/drivers/usb/typec/class.c
> > +++ b/drivers/usb/typec/class.c
> > @@ -805,6 +805,8 @@ static void typec_partner_link_device(struct
> > typec_partner *partner, struct devi return;
> >         }
> > 
> > +       get_device(dev);
> > +
> >         if (partner->attach)
> >                 partner->attach(partner, dev);
> >  }
> > @@ -816,6 +818,8 @@ static void typec_partner_unlink_device(struct
> > typec_partner *partner, struct de
> > 
> >         if (partner->deattach)
> >                 partner->deattach(partner, dev);
> > +
> > +       put_device(dev);
> >  }
> > 
> >  /**
> 
> Okay, so I figured out that it does appear if I boot already connected to my 
> dock and then suspend the system. Unfortunately, the patch does not cause the 
> trace to go away. I have put extra context around the trace, hopefully it has 
> some use. Making a guess, but would the `typec` directory get removed from all 
> the disconnects before the suspend?

I'm still trying to figure this out, but let's rule out one of my
concerns..

Can you check are the symlinks pointing to the correct USB devices
under /sys/class/typec ?

It's probable easiest with just one USB device connected to the
system. With one USB device connected to the system there should be
only one partner under /sys/class/typec (for example port0-partner),
so it should be easy to check if the symlink under that partner
device is pointing to the correct USB device.

        % ls -la /sys/class/typec/port<x>-partner/

You should see a symlink to an USB device under it. Can you check if
it's the correct USB device?

If it's not the correct USB device, then the ACPI table may have wrong
_PLD (Physical Location of Device) objects for these USB port or USB
Type-C connector device nodes. The code uses the _PLD to link the
correct USB port to the USB Type-C connector.

thanks,

> [  197.556838] usb 3-3: USB disconnect, device number 5
> [  197.556858] usb 3-3.1: USB disconnect, device number 6
> [  197.557394] usb 3-3.4: USB disconnect, device number 7
> [  197.557402] usb 3-3.4.1: USB disconnect, device number 9
> [  197.557407] usb 3-3.4.1.1: USB disconnect, device number 10
> [  197.557988] thunderbolt 1-0:1.1: retimer disconnected
> [  197.563900] thunderbolt 1-1: device disconnected
> [  197.591868] usb 2-3: USB disconnect, device number 2
> [  197.591876] usb 2-3.4: USB disconnect, device number 3
> [  197.591878] usb 2-3.4.2: USB disconnect, device number 4
> [  197.592041] cdc_ncm 2-3.4.2:2.0 enx[REDACTED]: unregister 'cdc_ncm' 
> usb-0000:00:0d.0-3.4.2, CDC NCM (NO ZLP)
> [  197.645038] usb 3-3.4.1.2: USB disconnect, device number 11
> [  197.835260] ------------[ cut here ]------------
> [  197.835269] kernfs: can not remove 'typec', no directory
> [  197.835284] WARNING: CPU: 6 PID: 3566 at fs/kernfs/dir.c:1706 
> kernfs_remove_by_name_ns+0xcf/0xe0
> [  197.835294] Modules linked in: xt_conntrack xt_MASQUERADE xt_set ip_set 
> xt_addrtype xfrm_user xfrm_algo snd_seq_dummy snd_hrtimer snd_seq nls_utf8 
> cifs cifs_md4 dns_resolver netfs iscsi_tcp nft_masq libiscsi_tcp libiscsi 
> scsi_transport_iscsi target_core_user uio target_core_pscsi target_core_file 
> target_core_iblock iscsi_target_mod target_core_mod xt_tcpudp nft_compat 
> x_tables nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet 
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat 
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables qrtr at24 ee1004 rfcomm 
> bnep snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_sof_board_helpers 
> snd_soc_intel_hda_dsp_common snd_sof_probes vhba(O) snd_hda_codec_intelhdmi 
> br_netfilter snd_hda_codec_alc269 bridge snd_hda_scodec_component stp 
> snd_hda_codec_realtek_lib snd_soc_dmic llc snd_hda_codec_generic snd_hda_intel 
> pkcs8_key_parser snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl 
> snd_sof_intel_hda_generic soundwire_intel soundwire_generic_allocation
> [  197.835371]  snd_sof_intel_hda_sdw_bpt snd_sof_intel_hda_common 
> snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_hda_codec_hdmi 
> soundwire_cadence snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_soc_avs 
> snd_sof_utils snd_soc_hda_codec snd_soc_acpi_intel_match snd_hda_ext_core 
> snd_soc_acpi_intel_sdca_quirks snd_hda_codec snd_soc_acpi uvcvideo 
> snd_hda_core crc8 videobuf2_vmalloc snd_intel_dspcfg snd_usb_audio nls_ascii 
> soundwire_bus hid_sensor_gyro_3d hid_sensor_custom_intel_hinge 
> hid_sensor_accel_3d btusb uvc intel_uncore_frequency snd_intel_sdw_acpi 
> snd_usbmidi_lib nls_cp437 snd_soc_sdca hid_sensor_trigger btmtk 
> videobuf2_memops intel_uncore_frequency_common snd_hwdep snd_soc_core 
> hid_sensor_iio_common btrtl videobuf2_v4l2 coretemp snd_rawmidi snd_compress 
> intel_pmc_core btbcm industrialio_triggered_buffer videodev mei_hdcp mei_pxp 
> rapl snd_seq_device snd_pcm_dmaengine pmt_telemetry btintel kfifo_buf 
> videobuf2_common spi_nor iosm intel_cstate mei_me snd_pcm pmt_discovery 
> industrialio bluetooth mc mtd wwan
> [  197.835438]  intel_uncore mei snd_timer pcspkr pmt_class igen6_edac button 
> soc_button_array ac intel_pmc_ssram_telemetry acpi_tad acpi_pad snd mei_vsc_hw 
> joydev evdev soundcore vboxnetadp(O) vboxnetflt(O) vboxdrv(O) hid_xpadneo(O) 
> ff_memless parport_pc ppdev lp parport loop nvme_fabrics efi_pstore nfnetlink 
> zram 842_decompress 842_compress autofs4 hid_sensor_custom hid_multitouch 
> psmouse i2c_i801 hid_sensor_hub i2c_hid_acpi serio_raw i2c_smbus hid_generic 
> i2c_hid
> [  197.835483] CPU: 6 UID: 0 PID: 3566 Comm: kworker/6:3 Tainted: G     U     
> O        6.18.0-rc6 #7 PREEMPT(full) 
> [  197.835490] Tainted: [U]=USER, [O]=OOT_MODULE
> [  197.835492] Hardware name: LENOVO 21HQCTO1WW/21HQCTO1WW, BIOS N3XET62W 
> (1.37 ) 07/28/2025
> [  197.835495] Workqueue: events ucsi_handle_connector_change
> [  197.835505] RIP: 0010:kernfs_remove_by_name_ns+0xcf/0xe0
> [  197.835509] Code: 5b 5d 41 5c 41 5d c3 cc cc cc cc 48 89 ef e8 58 1a c0 ff 
> b8 fe ff ff ff eb e6 0f 0b eb 9f 48 c7 c7 d0 5b bc a9 e8 e1 41 b7 ff <0f> 0b eb e5 
> 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 90 90 90 90 90
> [  197.835513] RSP: 0018:ffffd2768b5dfda8 EFLAGS: 00010286
> [  197.835517] RAX: 0000000000000000 RBX: ffff89d541a4c8b0 RCX: 0000000000000027
> [  197.835520] RDX: ffff89e45f398a48 RSI: 0000000000000001 RDI: ffff89e45f398a40
> [  197.835523] RBP: ffff89d55f722800 R08: 0000000000000000 R09: ffff89e49f761168
> [  197.835525] R10: ffff89e49f731128 R11: 0000000000000003 R12: ffffffffa9daab01
> [  197.835527] R13: ffff89d590c294e0 R14: 0000000000000000 R15: 0000000000000000
> [  197.835529] FS:  0000000000000000(0000) GS:ffff89e4b403d000(0000) knlGS:
> 0000000000000000
> [  197.835532] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  197.835535] CR2: 000056339a2b30c0 CR3: 000000019214d004 CR4: 
> 0000000000f70ef0
> [  197.835538] PKRU: 55555554
> [  197.835540] Call Trace:
> [  197.835544]  <TASK>
> [  197.835550]  typec_partner_unlink_device+0x30/0x60
> [  197.835557]  typec_unregister_partner+0x64/0xa0
> [  197.835562]  ucsi_unregister_partner+0x103/0x140
> [  197.835567]  ucsi_handle_connector_change+0x34d/0x3e0
> [  197.835580]  process_one_work+0x18b/0x340
> [  197.835585]  worker_thread+0x256/0x3a0
> [  197.835589]  ? __pfx_worker_thread+0x10/0x10
> [  197.835593]  kthread+0xfc/0x240
> [  197.835599]  ? __pfx_kthread+0x10/0x10
> [  197.835604]  ? __pfx_kthread+0x10/0x10
> [  197.835608]  ret_from_fork+0x1c9/0x200
> [  197.835614]  ? __pfx_kthread+0x10/0x10
> [  197.835619]  ret_from_fork_asm+0x1a/0x30
> [  197.835627]  </TASK>
> [  197.835628] ---[ end trace 0000000000000000 ]---
> [  198.154306] usb 3-3.4.1.4: USB disconnect, device number 12
> [  198.161398] usb 3-3.5: USB disconnect, device number 8
> [  199.059821] pcieport 0000:50:00.0: not ready 1023ms after resume; giving up
> [  199.059906] pcieport 0000:00:07.2: pciehp: Slot(5): Card not present
> [  199.067233] pci_bus 0000:52: busn_res: [bus 52] is released
> [  199.067486] pci_bus 0000:53: busn_res: [bus 53-5f] is released
> [  199.067820] pci_bus 0000:60: busn_res: [bus 60-6c] is released
> [  199.068013] pci_bus 0000:6d: busn_res: [bus 6d-78] is released
> [  199.068295] pci_bus 0000:79: busn_res: [bus 79] is released
> [  199.068463] pci_bus 0000:51: busn_res: [bus 51-79] is released
> [  200.802421] wlan0: deauthenticating from [REDACTED] by local choice 
> (Reason: 3=DEAUTH_LEAVING)
> [  201.679172] PM: suspend entry (s2idle)
> [  206.409358] Filesystems sync: 4.730 seconds
> [  206.613445] Freezing user space processes
> [  206.616008] Freezing user space processes completed (elapsed 0.002 seconds)
> [  206.616016] OOM killer disabled.
> [  206.616018] Freezing remaining freezable tasks
> [  211.905879] Freezing remaining freezable tasks completed (elapsed 5.289 
> seconds)
> [  211.905912] printk: Suspending console(s) (use no_console_suspend to debug)
> [  211.976568] sd 1:0:0:0: [sda] Synchronizing SCSI cache
> [  212.442684] ACPI: EC: interrupt blocked
> [  218.598684] typec port1-partner: PM: parent port1 should not be sleeping
> [  218.798599] ACPI: EC: interrupt unblocked
> [  219.056659] i915 0000:00:02.0: [drm] GT0: GuC firmware i915/adlp_guc_70.bin 
> version 70.49.4
> [  219.056672] i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin 
> version 7.9.3
> [  219.073475] i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all 
> workloads
> [  219.074666] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
> [  219.074676] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
> [  219.075617] i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
> [  219.105707] nvme nvme0: 20/0/0 default/read/poll queues
> [  219.470735] thunderbolt 1-1: new device found, vendor=0x1df device=0x112
> [  219.470749] thunderbolt 1-1: Sabrent Rocket docking station
> [  219.470931] typec port1: bound usb3-port3 (ops connector_ops)
> [  219.470955] typec port1: bound usb2-port3 (ops connector_ops)
> [  219.470971] typec port1: bound usb4_port7 (ops connector_ops)
> [  219.770124] thunderbolt 1-0:1.1: new retimer found, vendor=0x8087 
> device=0x15ee
> [  220.256910] mei_hdcp 0000:00:16.0-[REDACTED]: bound 0000:00:02.0 (ops 
> i915_hdcp_ops)
> [  220.258650] mei_pxp 0000:00:16.0-[REDACTED]: bound 0000:00:02.0 (ops 
> i915_pxp_tee_component_ops)
> [  220.258713] OOM killer enabled.
> [  220.258719] Restarting tasks: Starting
> [  220.261104] Restarting tasks: Done
> 
> 
> 

-- 
heikki

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-24  9:50           ` Heikki Krogerus
@ 2025-11-26 16:01             ` Stephanie Gawroriski
  2025-11-27  9:53               ` Heikki Krogerus
  0 siblings, 1 reply; 39+ messages in thread
From: Stephanie Gawroriski @ 2025-11-26 16:01 UTC (permalink / raw)
  To: Heikki Krogerus
  Cc: Linus Torvalds, Linux Kernel Mailing List, Greg Kroah-Hartman

Hi!

On Monday, 24 November 2025 04:50:03 EST Heikki Krogerus wrote:
> I'm still trying to figure this out, but let's rule out one of my
> concerns..
> 
> Can you check are the symlinks pointing to the correct USB devices
> under /sys/class/typec ?
> 
> It's probable easiest with just one USB device connected to the
> system. With one USB device connected to the system there should be
> only one partner under /sys/class/typec (for example port0-partner),
> so it should be easy to check if the symlink under that partner
> device is pointing to the correct USB device.
> 
>         % ls -la /sys/class/typec/port<x>-partner/
> 
> You should see a symlink to an USB device under it. Can you check if
> it's the correct USB device?
> 
> If it's not the correct USB device, then the ACPI table may have wrong
> _PLD (Physical Location of Device) objects for these USB port or USB
> Type-C connector device nodes. The code uses the _PLD to link the
> correct USB port to the USB Type-C connector.
> 
> thanks,

The USB-C cable is plugged into the connector that is furthest to the back of 
the device. Bus 2 and Bus 3 are connected to the same port, the ethernet 
controller is connected to the dock, so I am guessing that it is using the 
USB-only interface of the USB-C port, not sure why it would be not under Bus 
3?

These are the outputs along with the USB and PCI tree:

stephanie@arborvitaetree:~$ ls -la /sys/class/typec/port1-partner/
total 0
drwxr-xr-x 4 root root    0 Nov 26 10:30 .
drwxr-xr-x 8 root root    0 Nov 24 15:14 ..
lrwxrwxrwx 1 root root    0 Nov 26 10:30 2-3 -> ../../../../../
pci0000:00/0000:00:0d.0/usb2/2-3
lrwxrwxrwx 1 root root    0 Nov 26 10:30 3-3 -> ../../../../../
pci0000:00/0000:00:14.0/usb3/3-3
-r--r--r-- 1 root root 4096 Nov 26 10:30 accessory_mode
lrwxrwxrwx 1 root root    0 Nov 25 23:42 device -> ../../port1
-r--r--r-- 1 root root 4096 Nov 26 10:30 number_of_alternate_modes
drwxr-xr-x 5 root root    0 Nov 25 23:42 pd2
drwxr-xr-x 2 root root    0 Nov 26 10:29 power
lrwxrwxrwx 1 root root    0 Nov 25 23:42 subsystem -> ../../../../../../class/
typec
-r--r--r-- 1 root root 4096 Nov 26 10:30 supports_usb_power_delivery
-rw-r--r-- 1 root root 4096 Nov 25 23:42 uevent
-r--r--r-- 1 root root 4096 Nov 26 10:30 usb_mode
lrwxrwxrwx 1 root root    0 Nov 26 10:30 usb_power_delivery -> pd2
-r--r--r-- 1 root root 4096 Nov 26 10:30 usb_power_delivery_revision

stephanie@arborvitaetree:/sys/class/typec$ lsusb -tv
/:  Bus 001.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/1p, 480M
    ID 1d6b:0002 Linux Foundation 2.0 root hub
/:  Bus 002.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/4p, 20000M/x2
    ID 1d6b:0003 Linux Foundation 3.0 root hub
    |__ Port 003: Dev 005, If 0, Class=Hub, Driver=hub/4p, 10000M
        ID 8087:0b40 Intel Corp. 
        |__ Port 004: Dev 006, If 0, Class=Hub, Driver=hub/4p, 10000M
            ID 2109:0822 VIA Labs, Inc. 
            |__ Port 002: Dev 007, If 0, Class=Communications, Driver=cdc_ncm, 
5000M
                ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
            |__ Port 002: Dev 007, If 1, Class=CDC Data, Driver=cdc_ncm, 5000M
                ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
/:  Bus 003.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/12p, 480M
    ID 1d6b:0002 Linux Foundation 2.0 root hub
    |__ Port 003: Dev 023, If 0, Class=Hub, Driver=hub/6p, 480M
        ID 1d5c:5801 Fresco Logic 
        |__ Port 001: Dev 024, If 0, Class=Billboard, Driver=[none], 1.5M
            ID 291a:87e0  
        |__ Port 004: Dev 025, If 0, Class=Hub, Driver=hub/4p, 480M
            ID 2109:2822 VIA Labs, Inc. 
            |__ Port 001: Dev 027, If 0, Class=Hub, Driver=hub/4p, 480M
                ID 1a40:0101 Terminus Technology Inc. Hub
                |__ Port 001: Dev 028, If 0, Class=Human Interface Device, 
Driver=usbhid, 1.5M
                    ID 17ef:608d Lenovo Optical Mouse
                |__ Port 002: Dev 029, If 0, Class=Human Interface Device, 
Driver=usbhid, 12M
                    ID 0c45:7692 Microdia 
                |__ Port 002: Dev 029, If 1, Class=Human Interface Device, 
Driver=usbhid, 12M
                    ID 0c45:7692 Microdia 
                |__ Port 004: Dev 030, If 0, Class=Video, Driver=uvcvideo, 
480M
                    ID 17ef:4836 Lenovo 
                |__ Port 004: Dev 030, If 1, Class=Video, Driver=uvcvideo, 
480M
                    ID 17ef:4836 Lenovo 
                |__ Port 004: Dev 030, If 2, Class=Audio, Driver=snd-usb-
audio, 480M
                    ID 17ef:4836 Lenovo 
                |__ Port 004: Dev 030, If 3, Class=Audio, Driver=snd-usb-
audio, 480M
                    ID 17ef:4836 Lenovo 
        |__ Port 005: Dev 026, If 0, Class=Billboard, Driver=[none], 12M
            ID 2eb9:0123  
        |__ Port 005: Dev 026, If 1, Class=Vendor Specific Class, 
Driver=[none], 12M
            ID 2eb9:0123  
    |__ Port 006: Dev 022, If 0, Class=Vendor Specific Class, Driver=[none], 
12M
        ID 06cb:00fc Synaptics, Inc. Prometheus Fingerprint Reader
    |__ Port 008: Dev 003, If 0, Class=Video, Driver=uvcvideo, 480M
        ID 30c9:0051 Luxvisions Innotech Limited 
    |__ Port 008: Dev 003, If 1, Class=Video, Driver=uvcvideo, 480M
        ID 30c9:0051 Luxvisions Innotech Limited 
    |__ Port 008: Dev 003, If 2, Class=Application Specific Interface, 
Driver=[none], 480M
        ID 30c9:0051 Luxvisions Innotech Limited 
    |__ Port 010: Dev 004, If 0, Class=Wireless, Driver=btusb, 12M
        ID 8087:0033 Intel Corp. AX211 Bluetooth
    |__ Port 010: Dev 004, If 1, Class=Wireless, Driver=btusb, 12M
        ID 8087:0033 Intel Corp. AX211 Bluetooth
/:  Bus 004.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/4p, 10000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub

stephanie@arborvitaetree:/sys/class/typec$ lspci -tv -nn
-[0000:00]-+-00.0  Intel Corporation Raptor Lake-P 6p+8e cores Host Bridge/
DRAM Controller [8086:a706]
           +-02.0  Intel Corporation Raptor Lake-P [Iris Xe Graphics] 
[8086:a7a0]
           +-04.0  Intel Corporation Raptor Lake Dynamic Platform and Thermal 
Framework Processor Participant [8086:a71d]
           +-06.0-[04]----00.0  Sandisk Corp WD Blue SN5000 NVMe SSD (DRAM-
less) [15b7:504a]
           +-07.0-[20-49]--
           +-07.2-[50-79]----00.0-[51-79]--+-00.0-[52]--
           |                               +-01.0-[53-5f]--
           |                               +-02.0-[60-6c]--
           |                               +-03.0-[6d-78]--
           |                               \-04.0-[79]--
           +-08.0  Intel Corporation GNA Scoring Accelerator module 
[8086:a74f]
           +-0a.0  Intel Corporation Raptor Lake Crashlog and Telemetry 
[8086:a77d]
           +-0d.0  Intel Corporation Raptor Lake-P Thunderbolt 4 USB 
Controller [8086:a71e]
           +-0d.2  Intel Corporation Raptor Lake-P Thunderbolt 4 NHI #0 
[8086:a73e]
           +-0d.3  Intel Corporation Raptor Lake-P Thunderbolt 4 NHI #1 
[8086:a76d]
           +-12.0  Intel Corporation Alder Lake-P Integrated Sensor Hub 
[8086:51fc]
           +-14.0  Intel Corporation Alder Lake PCH USB 3.2 xHCI Host 
Controller [8086:51ed]
           +-14.2  Intel Corporation Alder Lake PCH Shared SRAM [8086:51ef]
           +-14.3  Intel Corporation Raptor Lake PCH CNVi WiFi [8086:51f1]
           +-15.0  Intel Corporation Alder Lake PCH Serial IO I2C Controller 
#0 [8086:51e8]
           +-15.1  Intel Corporation Alder Lake PCH Serial IO I2C Controller 
#1 [8086:51e9]
           +-16.0  Intel Corporation Alder Lake PCH HECI Controller 
[8086:51e0]
           +-1c.0-[08]----00.0  Intel Corporation XMM7560 LTE Advanced Pro 
Modem [8086:7560]
           +-1f.0  Intel Corporation Raptor Lake LPC/eSPI Controller 
[8086:519d]
           +-1f.3  Intel Corporation Raptor Lake-P/U/H cAVS [8086:51ca]
           +-1f.4  Intel Corporation Alder Lake PCH-P SMBus Host Controller 
[8086:51a3]
           \-1f.5  Intel Corporation Alder Lake-P PCH SPI Controller 
[8086:51a4]

stephanie@arborvitaetree:/sys/class/typec$ ls -w 1 port0/
data_role
device
firmware_node
physical_location
port0.0
port0.1
port0.2
power
power_operation_mode
power_role
preferred_role
subsystem
supported_accessory_modes
uevent
usb2-port1
usb3-port1
usb4_port1
usb_capability
usb_power_delivery
usb_power_delivery_revision
usb_typec_revision
vconn_source
waiting_for_supplier

stephanie@arborvitaetree:/sys/class/typec$ ls -w 1 port1/
data_role
device
firmware_node
physical_location
port1.0
port1.1
port1.2
port1-partner
power
power_operation_mode
power_role
preferred_role
subsystem
supported_accessory_modes
uevent
usb2-port3
usb3-port3
usb4_port7
usb_capability
usb_power_delivery
usb_power_delivery_revision
usb_typec_revision
vconn_source
waiting_for_supplier







^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Linux 6.18-rc6
  2025-11-26 16:01             ` Stephanie Gawroriski
@ 2025-11-27  9:53               ` Heikki Krogerus
  0 siblings, 0 replies; 39+ messages in thread
From: Heikki Krogerus @ 2025-11-27  9:53 UTC (permalink / raw)
  To: Stephanie Gawroriski
  Cc: Linus Torvalds, Linux Kernel Mailing List, Greg Kroah-Hartman

Hi,

Wed, Nov 26, 2025 at 11:01:24AM -0500, Stephanie Gawroriski kirjoitti:
> Hi!
> 
> On Monday, 24 November 2025 04:50:03 EST Heikki Krogerus wrote:
> > I'm still trying to figure this out, but let's rule out one of my
> > concerns..
> > 
> > Can you check are the symlinks pointing to the correct USB devices
> > under /sys/class/typec ?
> > 
> > It's probable easiest with just one USB device connected to the
> > system. With one USB device connected to the system there should be
> > only one partner under /sys/class/typec (for example port0-partner),
> > so it should be easy to check if the symlink under that partner
> > device is pointing to the correct USB device.
> > 
> >         % ls -la /sys/class/typec/port<x>-partner/
> > 
> > You should see a symlink to an USB device under it. Can you check if
> > it's the correct USB device?
> > 
> > If it's not the correct USB device, then the ACPI table may have wrong
> > _PLD (Physical Location of Device) objects for these USB port or USB
> > Type-C connector device nodes. The code uses the _PLD to link the
> > correct USB port to the USB Type-C connector.
> > 
> > thanks,
> 
> The USB-C cable is plugged into the connector that is furthest to the back of 
> the device. Bus 2 and Bus 3 are connected to the same port, the ethernet 
> controller is connected to the dock, so I am guessing that it is using the 
> USB-only interface of the USB-C port, not sure why it would be not under Bus 
> 3?
> 
> These are the outputs along with the USB and PCI tree:
> 
> stephanie@arborvitaetree:~$ ls -la /sys/class/typec/port1-partner/
> total 0
> drwxr-xr-x 4 root root    0 Nov 26 10:30 .
> drwxr-xr-x 8 root root    0 Nov 24 15:14 ..
> lrwxrwxrwx 1 root root    0 Nov 26 10:30 2-3 -> ../../../../../
> pci0000:00/0000:00:0d.0/usb2/2-3
> lrwxrwxrwx 1 root root    0 Nov 26 10:30 3-3 -> ../../../../../
> pci0000:00/0000:00:14.0/usb3/3-3

These look correct to me, which is a relief.

At this point can I ask you to report this in bugzilla.kernel.org?

thanks,

> -r--r--r-- 1 root root 4096 Nov 26 10:30 accessory_mode
> lrwxrwxrwx 1 root root    0 Nov 25 23:42 device -> ../../port1
> -r--r--r-- 1 root root 4096 Nov 26 10:30 number_of_alternate_modes
> drwxr-xr-x 5 root root    0 Nov 25 23:42 pd2
> drwxr-xr-x 2 root root    0 Nov 26 10:29 power
> lrwxrwxrwx 1 root root    0 Nov 25 23:42 subsystem -> ../../../../../../class/
> typec
> -r--r--r-- 1 root root 4096 Nov 26 10:30 supports_usb_power_delivery
> -rw-r--r-- 1 root root 4096 Nov 25 23:42 uevent
> -r--r--r-- 1 root root 4096 Nov 26 10:30 usb_mode
> lrwxrwxrwx 1 root root    0 Nov 26 10:30 usb_power_delivery -> pd2
> -r--r--r-- 1 root root 4096 Nov 26 10:30 usb_power_delivery_revision
> 
> stephanie@arborvitaetree:/sys/class/typec$ lsusb -tv
> /:  Bus 001.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/1p, 480M
>     ID 1d6b:0002 Linux Foundation 2.0 root hub
> /:  Bus 002.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/4p, 20000M/x2
>     ID 1d6b:0003 Linux Foundation 3.0 root hub
>     |__ Port 003: Dev 005, If 0, Class=Hub, Driver=hub/4p, 10000M
>         ID 8087:0b40 Intel Corp. 
>         |__ Port 004: Dev 006, If 0, Class=Hub, Driver=hub/4p, 10000M
>             ID 2109:0822 VIA Labs, Inc. 
>             |__ Port 002: Dev 007, If 0, Class=Communications, Driver=cdc_ncm, 
> 5000M
>                 ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
>             |__ Port 002: Dev 007, If 1, Class=CDC Data, Driver=cdc_ncm, 5000M
>                 ID 0b95:1790 ASIX Electronics Corp. AX88179 Gigabit Ethernet
> /:  Bus 003.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/12p, 480M
>     ID 1d6b:0002 Linux Foundation 2.0 root hub
>     |__ Port 003: Dev 023, If 0, Class=Hub, Driver=hub/6p, 480M
>         ID 1d5c:5801 Fresco Logic 
>         |__ Port 001: Dev 024, If 0, Class=Billboard, Driver=[none], 1.5M
>             ID 291a:87e0  
>         |__ Port 004: Dev 025, If 0, Class=Hub, Driver=hub/4p, 480M
>             ID 2109:2822 VIA Labs, Inc. 
>             |__ Port 001: Dev 027, If 0, Class=Hub, Driver=hub/4p, 480M
>                 ID 1a40:0101 Terminus Technology Inc. Hub
>                 |__ Port 001: Dev 028, If 0, Class=Human Interface Device, 
> Driver=usbhid, 1.5M
>                     ID 17ef:608d Lenovo Optical Mouse
>                 |__ Port 002: Dev 029, If 0, Class=Human Interface Device, 
> Driver=usbhid, 12M
>                     ID 0c45:7692 Microdia 
>                 |__ Port 002: Dev 029, If 1, Class=Human Interface Device, 
> Driver=usbhid, 12M
>                     ID 0c45:7692 Microdia 
>                 |__ Port 004: Dev 030, If 0, Class=Video, Driver=uvcvideo, 
> 480M
>                     ID 17ef:4836 Lenovo 
>                 |__ Port 004: Dev 030, If 1, Class=Video, Driver=uvcvideo, 
> 480M
>                     ID 17ef:4836 Lenovo 
>                 |__ Port 004: Dev 030, If 2, Class=Audio, Driver=snd-usb-
> audio, 480M
>                     ID 17ef:4836 Lenovo 
>                 |__ Port 004: Dev 030, If 3, Class=Audio, Driver=snd-usb-
> audio, 480M
>                     ID 17ef:4836 Lenovo 
>         |__ Port 005: Dev 026, If 0, Class=Billboard, Driver=[none], 12M
>             ID 2eb9:0123  
>         |__ Port 005: Dev 026, If 1, Class=Vendor Specific Class, 
> Driver=[none], 12M
>             ID 2eb9:0123  
>     |__ Port 006: Dev 022, If 0, Class=Vendor Specific Class, Driver=[none], 
> 12M
>         ID 06cb:00fc Synaptics, Inc. Prometheus Fingerprint Reader
>     |__ Port 008: Dev 003, If 0, Class=Video, Driver=uvcvideo, 480M
>         ID 30c9:0051 Luxvisions Innotech Limited 
>     |__ Port 008: Dev 003, If 1, Class=Video, Driver=uvcvideo, 480M
>         ID 30c9:0051 Luxvisions Innotech Limited 
>     |__ Port 008: Dev 003, If 2, Class=Application Specific Interface, 
> Driver=[none], 480M
>         ID 30c9:0051 Luxvisions Innotech Limited 
>     |__ Port 010: Dev 004, If 0, Class=Wireless, Driver=btusb, 12M
>         ID 8087:0033 Intel Corp. AX211 Bluetooth
>     |__ Port 010: Dev 004, If 1, Class=Wireless, Driver=btusb, 12M
>         ID 8087:0033 Intel Corp. AX211 Bluetooth
> /:  Bus 004.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/4p, 10000M
>     ID 1d6b:0003 Linux Foundation 3.0 root hub
> 
> stephanie@arborvitaetree:/sys/class/typec$ lspci -tv -nn
> -[0000:00]-+-00.0  Intel Corporation Raptor Lake-P 6p+8e cores Host Bridge/
> DRAM Controller [8086:a706]
>            +-02.0  Intel Corporation Raptor Lake-P [Iris Xe Graphics] 
> [8086:a7a0]
>            +-04.0  Intel Corporation Raptor Lake Dynamic Platform and Thermal 
> Framework Processor Participant [8086:a71d]
>            +-06.0-[04]----00.0  Sandisk Corp WD Blue SN5000 NVMe SSD (DRAM-
> less) [15b7:504a]
>            +-07.0-[20-49]--
>            +-07.2-[50-79]----00.0-[51-79]--+-00.0-[52]--
>            |                               +-01.0-[53-5f]--
>            |                               +-02.0-[60-6c]--
>            |                               +-03.0-[6d-78]--
>            |                               \-04.0-[79]--
>            +-08.0  Intel Corporation GNA Scoring Accelerator module 
> [8086:a74f]
>            +-0a.0  Intel Corporation Raptor Lake Crashlog and Telemetry 
> [8086:a77d]
>            +-0d.0  Intel Corporation Raptor Lake-P Thunderbolt 4 USB 
> Controller [8086:a71e]
>            +-0d.2  Intel Corporation Raptor Lake-P Thunderbolt 4 NHI #0 
> [8086:a73e]
>            +-0d.3  Intel Corporation Raptor Lake-P Thunderbolt 4 NHI #1 
> [8086:a76d]
>            +-12.0  Intel Corporation Alder Lake-P Integrated Sensor Hub 
> [8086:51fc]
>            +-14.0  Intel Corporation Alder Lake PCH USB 3.2 xHCI Host 
> Controller [8086:51ed]
>            +-14.2  Intel Corporation Alder Lake PCH Shared SRAM [8086:51ef]
>            +-14.3  Intel Corporation Raptor Lake PCH CNVi WiFi [8086:51f1]
>            +-15.0  Intel Corporation Alder Lake PCH Serial IO I2C Controller 
> #0 [8086:51e8]
>            +-15.1  Intel Corporation Alder Lake PCH Serial IO I2C Controller 
> #1 [8086:51e9]
>            +-16.0  Intel Corporation Alder Lake PCH HECI Controller 
> [8086:51e0]
>            +-1c.0-[08]----00.0  Intel Corporation XMM7560 LTE Advanced Pro 
> Modem [8086:7560]
>            +-1f.0  Intel Corporation Raptor Lake LPC/eSPI Controller 
> [8086:519d]
>            +-1f.3  Intel Corporation Raptor Lake-P/U/H cAVS [8086:51ca]
>            +-1f.4  Intel Corporation Alder Lake PCH-P SMBus Host Controller 
> [8086:51a3]
>            \-1f.5  Intel Corporation Alder Lake-P PCH SPI Controller 
> [8086:51a4]
> 
> stephanie@arborvitaetree:/sys/class/typec$ ls -w 1 port0/
> data_role
> device
> firmware_node
> physical_location
> port0.0
> port0.1
> port0.2
> power
> power_operation_mode
> power_role
> preferred_role
> subsystem
> supported_accessory_modes
> uevent
> usb2-port1
> usb3-port1
> usb4_port1
> usb_capability
> usb_power_delivery
> usb_power_delivery_revision
> usb_typec_revision
> vconn_source
> waiting_for_supplier
> 
> stephanie@arborvitaetree:/sys/class/typec$ ls -w 1 port1/
> data_role
> device
> firmware_node
> physical_location
> port1.0
> port1.1
> port1.2
> port1-partner
> power
> power_operation_mode
> power_role
> preferred_role
> subsystem
> supported_accessory_modes
> uevent
> usb2-port3
> usb3-port3
> usb4_port7
> usb_capability
> usb_power_delivery
> usb_power_delivery_revision
> usb_typec_revision
> vconn_source
> waiting_for_supplier
> 
> 
> 
> 
> 

-- 
heikki

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2025-11-27  9:53 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-16 22:42 Linux 6.18-rc6 Linus Torvalds
2025-11-17  8:20 ` David Wang
2025-11-17 10:33   ` Linus Torvalds
2025-11-17 12:56     ` David Wang
2025-11-17 13:30       ` David Hildenbrand (Red Hat)
2025-11-17 13:45         ` David Wang
2025-11-17 14:08           ` David Hildenbrand (Red Hat)
2025-11-17 15:28             ` David Wang
2025-11-17 16:59             ` Xi Ruoyao
2025-11-17 21:19               ` Joan Bruguera Micó
2025-11-17 17:28             ` Linus Torvalds
2025-11-17 17:53               ` David Hildenbrand (Red Hat)
2025-11-17 17:59                 ` Linus Torvalds
2025-11-17 18:24                   ` David Hildenbrand (Red Hat)
2025-11-17 19:17                     ` David Hildenbrand (Red Hat)
2025-11-18  1:10                       ` Linus Torvalds
2025-11-18  4:13                         ` David Wang
2025-11-18 13:55                           ` David Wang
2025-11-18 14:12                             ` David Hildenbrand (Red Hat)
2025-11-18 14:33                               ` David Wang
2025-11-18 14:44                               ` Carlos Llamas
2025-11-18 14:51                                 ` David Hildenbrand (Red Hat)
2025-11-18 14:53                                   ` Carlos Llamas
2025-11-18 15:09                                   ` David Wang
2025-11-18  7:28                         ` David Hildenbrand (Red Hat)
2025-11-18 16:49                           ` Linus Torvalds
2025-11-19 15:42                             ` Catalin Marinas
2025-11-18  3:59             ` Carlos Llamas
2025-11-17 16:42       ` Linus Torvalds
2025-11-17 18:13 ` Guenter Roeck
2025-11-18 17:23 ` Stephanie Gawroriski
2025-11-18 18:01   ` Linus Torvalds
2025-11-18 20:18     ` Stephanie Gawroriski
2025-11-19  9:08       ` Heikki Krogerus
2025-11-19 14:18         ` Stephanie Gawroriski
2025-11-19 15:04         ` Stephanie Gawroriski
2025-11-24  9:50           ` Heikki Krogerus
2025-11-26 16:01             ` Stephanie Gawroriski
2025-11-27  9:53               ` Heikki Krogerus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox