Linux 6.11-rc1

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Linux 6.11-rc1
@ 2024-07-28 21:40 Linus Torvalds
  2024-07-29  9:28 ` Build regressions/improvements in v6.11-rc1 Geert Uytterhoeven
  2024-07-29 15:29 ` Linux 6.11-rc1 Guenter Roeck
  0 siblings, 2 replies; 59+ messages in thread
From: Linus Torvalds @ 2024-07-28 21:40 UTC (permalink / raw)
  To: Linux Kernel Mailing List

The merge window felt pretty normal, and the stats all look pretty
normal too. I was expecting things to be quieter because of summer
vacations, but that (still) doesn't actually seem to have been the
case.

There's 12k+ regular commits (and another 850 merge commits), so as
always the summary of this all is just my merge log. The diffstats are
also (once again) dominated by some big hardware descriptions (another
AMD GPU register dump accounts for ~45% of the lines in the diff, and
some more perf event JSON descriptor files account for another 5%).

But if you ignore those HW dumps, the diff too looks perfectly
regular: drivers account for a bit over half (even when not counting
the AMD register description noise). The rest is roughly one third
architecture updates (lots of it is dts files, so I guess I could have
lumped that in with "more hw descriptor tables"), one third tooling
and documentation, and one third "core kernel" (filesystems,
networking, VM and kernel). Very roughly.

If you want more details, you should get the git tree, and then narrow
things down based on interests.

              Linus

---

Al Viro (1):
    struct file leak fixes

Alex Williamson (1):
    VFIO updates

Alexandre Belloni (2):
    RTC updates
    i3c updates

Andreas Gruenbacher (1):
    gfs2 updates

Andreas Larsson (1):
    sparc updates

Andrew Morton (3):
    MM updates
    non-MM updates
    misc hotfixes

Anna Schumaker (1):
    NFS client updates

Ard Biesheuvel (2):
    EFI updates
    EFI fixes

Arnd Bergmann (5):
    SoC driver updates
    SoC dt updates
    SoC defconfig updates
    arm SoC platform updates
    asm-generic updates

Bartosz Golaszewski (4):
    power sequencing updates
    gpio updates
    power sequencing fixes
    gpio fix

Benjamin Tissoires (1):
    HID updates

Bjorn Andersson (3):
    hwspinlock updates
    remoteproc updates
    rpmsg updates

Bjorn Helgaas (1):
    pci updates

Borislav Petkov (14):
    EDAC updates
    RAS updates
    x86 alternatives updates
    x86 boot updates
    x86 cleanups
    x86 confidential computing updates
    x86 uaccess update
    x86 build update
    misc x86 updates
    x86 vmware updates
    x86 cpu mitigation updates
    x86 cpu model updates
    x86 resource control updates
    x86 SEV updates

Casey Schaufler (1):
    smack updates

Catalin Marinas (1):
    arm64 updates

Chandan Babu (1):
    xfs updates

Christian Brauner (13):
    misc vfs updates
    PG_error removal updates
    vfs module description updates
    vfs casefolding updates
    vfs mount API updates
    vfs inode / dentry updates
    vfs mount query updates
    namespace-fs updates
    pidfs updates
    iomap updates
    vfs fixes x 3

Christoph Hellwig (2):
    dma-mapping updates
    dma-mapping fix

Chuck Lever (1):
    nfsd updates

Corey Minyard (1):
    IPMI updates

Damien Le Moal (1):
    zonefs update

Daniel Thompson (1):
    kgdb updates

Dave Airlie (3):
    drm fixes
    drm updates
    drm fixes

Dave Jiang (1):
    CXL updates

David Kleikamp (1):
    jfs updates

David Sterba (3):
    affs updates
    btrfs updates
    btrfs fix

David Teigland (1):
    dlm updates

Dipen Patel (1):
    hardware timestamp update

Dmitry Torokhov (1):
    input updates

Dominik Brodowski (1):
    PCMCIA updates

Gabriel Krisman Bertazi (1):
    unicode update

Gao Xiang (2):
    erofs updates
    more erofs updates

Geert Uytterhoeven (2):
    m68k updates
    auxdisplay updates

Greg KH (5):
    tty / serial updates
    USB / Thunderbolt updates
    staging driver updates
    char / misc and other driver updates
    driver core updates

Guenter Roeck (1):
    hwmon updates

Helge Deller (2):
    fbdev updates
    parisc updates

Herbert Xu (1):
    crypto update

Huacai Chen (1):
    LoongArch updates

Ilpo Järvinen (1):
    x86 platform driver updates

Ilya Dryomov (1):
    ceph updates

Ingo Molnar (5):
    locking updates
    objtool updates
    scheduler updates
    performance events updates
    x86 percpu updates

Ira Weiny (1):
    libnvdimm updates

Jaegeuk Kim (1):
    f2fs updates

Jakub Kicinski (2):
    networking updates
    networking fixes

James Bottomley (1):
    SCSI updates

Jan Kara (2):
    fsnotify fix
    udf, ext2, isofs fixes and cleanups

Jarkko Sakkinen (3):
    tpm updates
    keys updates
    tpm fix

Jason Donenfeld (1):
    random number generator updates

Jason Gunthorpe (2):
    iommufd updates
    rdma updates

Jassi Brar (1):
    mailbox updates

Jens Axboe (7):
    io_uring updates
    block updates
    block integrity mapping updates
    more block updates
    io_uring fixes
    io_uring fixes
    block fixes

Joel Granados (2):
    sysctl updates
    sysctl constification

John Johansen (1):
    apparmor updates

John Paul Adrian Glaubitz (1):
    sh updates

Jonathan Corbet (1):
    documentation updates

Juergen Gross (2):
    xen updates
    xen fixes

Kees Cook (5):
    execve updates
    seccomp updates
    pstore updates
    hardening updates
    execve fix

Kent Overstreet (2):
    bcachefs updates
    bcachefs fixes

Konstantin Komarov (1):
    ntfs3 updates

Lee Jones (3):
    MFD updates
    backlight updates
    LED updates

Len Brown (1):
    turbostat updates

Linus Walleij (1):
    pin control updates

Luis Chamberlain (1):
    module update

Mark Brown (6):
    regmap updates
    regulator updates
    spi updates
    regmap fix
    regulator fixes
    spi fixes

Masahiro Yamada (2):
    Kbuild updates
    Kbuild fixes

Masami Hiramatsu (3):
    probes updates
    bootconfig update
    uprobe fix

Mauro Carvalho Chehab (1):
    media updates

Michael Ellerman (1):
    powerpc updates

Michael Tsirkin (1):
    virtio updates

Mickaël Salaün (2):
    landlock updates
    landlock fix

Miguel Ojeda (1):
    Rust updates

Mike Rapoport (1):
    memblock updates

Mikulas Patocka (1):
    device mapper updates

Miquel Raynal (1):
    MTD updates

Namhyung Kim (2):
    perf tools updates
    perf tools fixes

Namjae Jeon (1):
    exfat updates

Niklas Cassel (1):
    ata updates

Palmer Dabbelt (2):
    RISC-V updates
    more RISC-V updates

Paolo Abeni (1):
    networking fixes

Paolo Bonzini (1):
    kvm updates

Paul McKenney (6):
    arm byte cmpxchg
    memory model updates
    RCU updates
    torture-test updates
    KCSAN updates
    nolibc updates

Paul Moore (2):
    selinux update
    lsm updates

Petr Mladek (2):
    livepatching update
    printk updates

Rafael Wysocki (5):
    thermal control updates
    power management updates
    ACPI updates
    thermal control fix
    thermal control fix

Richard Weinberger (2):
    UML updates
    UBI and UBIFS updates

Rob Herring (2):
    devicetree updates
    more devicetree updates

Sebastian Reichel (2):
    HSI update
    power supply and reset updates

Shuah Khan (2):
    KUnit updates
    kselftest updates

Stephen Boyd (2):
    clk updates
    clk fixes

Steve French (3):
    smb client fixes
    smb server fixes
    more smb client updates

Steven Rostedt (4):
    tracing updates
    ftrace updates
    tracing tools updates
    tracing CREDITS file update

Takashi Iwai (2):
    sound updates
    sound fixes

Takashi Sakamoto (2):
    firewire updates
    firewire fixes

Ted Ts'o (1):
    ext4 updates

Tejun Heo (3):
    cgroup updates
    workqueue updates
    workqueue fix

Thomas Bogendoerfer (2):
    MIPS updates
    MIPS updates

Thomas Gleixner (6):
    debugobjects update
    CPU hotplug updates
    timer updates
    interrupt subsystem updates
    MSI interrupt updates
    timer migration updates

Tzung-Bi Shih (2):
    chrome platform updates
    chrome platform firmware update

Ulf Hansson (2):
    pmdomain updates
    MMC updates

Uwe Kleine-König (1):
    pwm updates

Vasily Gorbik (2):
    s390 updates
    more s390 updates

Vinod Koul (3):
    dmaengine updates
    soundwire updates
    phy updates

Vlastimil Babka (1):
    slab updates

Will Deacon (3):
    iommu updates
    arm64 fixes
    iommu fixes

Wim Van Sebroeck (1):
    watchdog updates

Wolfram Sang (2):
    i2c fixes
    more i2c updates

Yury Norov (1):
    bitmap updates

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Build regressions/improvements in v6.11-rc1
  2024-07-28 21:40 Linux 6.11-rc1 Linus Torvalds
@ 2024-07-29  9:28 ` Geert Uytterhoeven
  2024-07-29  9:35   ` Geert Uytterhoeven
  2024-07-29 15:29 ` Linux 6.11-rc1 Guenter Roeck
  1 sibling, 1 reply; 59+ messages in thread
From: Geert Uytterhoeven @ 2024-07-29  9:28 UTC (permalink / raw)
  To: linux-kernel

Below is the list of build error/warning regressions/improvements in
v6.11-rc1[1] compared to v6.10[2].

Summarized:
  - build errors: +7/-22
  - build warnings: +4/-19

Happy fixing! ;-)

Thanks to the linux-next team for providing the build service.

[1] http://kisskb.ellerman.id.au/kisskb/branch/linus/head/8400291e289ee6b2bf9779ff1c83a291501f017b/ (all 132 configs)
[2] http://kisskb.ellerman.id.au/kisskb/branch/linus/head/0c3836482481200ead7b416ca80c68a29cfdaabd/ (all 132 configs)


*** ERRORS ***

7 error regressions:
  + /kisskb/src/arch/mips/sgi-ip22/ip22-gio.c: error: initialization of 'int (*)(struct device *, const struct device_driver *)' from incompatible pointer type 'int (*)(struct device *, struct device_driver *)' [-Werror=incompatible-pointer-types]:  => 384:14
  + /kisskb/src/drivers/md/dm-integrity.c: error: logical not is only applied to the left hand side of comparison [-Werror=logical-not-parentheses]:  => 4718:45
  + /kisskb/src/fs/btrfs/inode.c: error: 'location.objectid' may be used uninitialized in this function [-Werror=maybe-uninitialized]:  => 5603:9
  + /kisskb/src/fs/btrfs/inode.c: error: 'location.type' may be used uninitialized in this function [-Werror=maybe-uninitialized]:  => 5674:5
  + /kisskb/src/include/linux/compiler_types.h: error: call to '__compiletime_assert_933' declared with attribute error: FIELD_GET: mask is not constant:  => 510:38
  + /kisskb/src/include/linux/compiler_types.h: error: call to '__compiletime_assert_934' declared with attribute error: FIELD_GET: mask is not constant:  => 510:38
  + /kisskb/src/kernel/fork.c: error: #warning clone3() entry point is missing, please fix [-Werror=cpp]:  => 3072:2

22 error improvements:
  - /kisskb/src/arch/sparc/include/asm/floppy_64.h: error: no previous prototype for 'sparc_floppy_irq' [-Werror=missing-prototypes]: 200:13 => 
  - /kisskb/src/arch/sparc/include/asm/floppy_64.h: error: no previous prototype for 'sun_pci_fd_dma_callback' [-Werror=missing-prototypes]: 437:6 => 
  - /kisskb/src/arch/sparc/power/hibernate.c: error: no previous prototype for 'pfn_is_nosave' [-Werror=missing-prototypes]: 22:5 => 
  - /kisskb/src/arch/sparc/power/hibernate.c: error: no previous prototype for 'restore_processor_state' [-Werror=missing-prototypes]: 35:6 => 
  - /kisskb/src/arch/sparc/power/hibernate.c: error: no previous prototype for 'save_processor_state' [-Werror=missing-prototypes]: 30:6 => 
  - /kisskb/src/arch/sparc/prom/misc_64.c: error: no previous prototype for 'prom_get_mmu_ihandle' [-Werror=missing-prototypes]: 165:5 => 
  - /kisskb/src/arch/sparc/prom/p1275.c: error: no previous prototype for 'prom_cif_init' [-Werror=missing-prototypes]: 52:6 => 
  - /kisskb/src/drivers/gpu/drm/msm/adreno/a6xx_gpu_state.c: error: unknown option after '#pragma GCC diagnostic' kind [-Werror=pragmas]: 16:9 => 
  - /kisskb/src/drivers/gpu/drm/msm/adreno/adreno_gen7_0_0_snapshot.h: error: 'gen7_0_0_external_core_regs' defined but not used [-Werror=unused-variable]: 924:19 => 
  - /kisskb/src/drivers/gpu/drm/msm/adreno/adreno_gen7_2_0_snapshot.h: error: 'gen7_2_0_external_core_regs' defined but not used [-Werror=unused-variable]: 748:19 => 
  - /kisskb/src/drivers/gpu/drm/msm/adreno/adreno_gen7_9_0_snapshot.h: error: 'gen7_9_0_external_core_regs' defined but not used [-Werror=unused-variable]: 1438:19 => 
  - /kisskb/src/drivers/gpu/drm/msm/adreno/adreno_gen7_9_0_snapshot.h: error: 'gen7_9_0_sptp_clusters' defined but not used [-Werror=unused-variable]: 1188:43 => 
  - /kisskb/src/fs/bcachefs/data_update.c: error: the frame size of 1028 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]: 338:1 => 
  - error: arch/sparc/kernel/process_32.o: relocation truncated to fit: R_SPARC_WDISP22 against `.text': (.fixup+0xc), (.fixup+0x4) => 
  - error: arch/sparc/kernel/signal_32.o: relocation truncated to fit: R_SPARC_WDISP22 against `.text': (.fixup+0x8), (.fixup+0x10), (.fixup+0x0), (.fixup+0x20), (.fixup+0x18) => 
  - error: relocation truncated to fit: R_SPARC_WDISP22 against `.init.text': (.head.text+0x5040), (.head.text+0x5100) => 
  - error: relocation truncated to fit: R_SPARC_WDISP22 against symbol `leon_smp_cpu_startup' defined in .text section in arch/sparc/kernel/trampoline_32.o: (.init.text+0xa4) => 
  - {standard input}: Error: displacement to undefined symbol .L137 overflows 8-bit field : 1105, 1031 => 
  - {standard input}: Error: displacement to undefined symbol .L158 overflows 8-bit field : 1110 => 
  - {standard input}: Error: pcrel too far: 1096, 1126, 1254, 1022, 1074, 1095, 1255, 1020, 1021 => 1397
  - {standard input}: Error: unknown pseudo-op: `.al': 1270 => 
  - {standard input}: Error: unknown pseudo-op: `.siz': 1273 => 


*** WARNINGS ***

4 warning regressions:
  + /kisskb/src/fs/btrfs/fiemap.c: warning: 'last_extent_end' may be used uninitialized in this function [-Wmaybe-uninitialized]:  => 822:19
  + /kisskb/src/fs/btrfs/inode.c: warning: 'location.objectid' may be used uninitialized in this function [-Wmaybe-uninitialized]:  => 5603:9
  + /kisskb/src/fs/btrfs/inode.c: warning: 'location.type' may be used uninitialized in this function [-Wmaybe-uninitialized]:  => 5674:5
  + /kisskb/src/kernel/fork.c: warning: #warning clone3() entry point is missing, please fix [-Wcpp]:  => 3072:2

19 warning improvements:
  - ./.config.32r1_defconfig: warning: override: CPU_BIG_ENDIAN changes choice state: 93 => 
  - ./.config.32r2_defconfig: warning: override: CPU_BIG_ENDIAN changes choice state: 93 => 
  - ./.config.32r6_defconfig: warning: override: CPU_BIG_ENDIAN changes choice state: 95 => 
  - ./.config.64r1_defconfig: warning: override: CPU_BIG_ENDIAN changes choice state: 96 => 
  - ./.config.64r2_defconfig: warning: override: CPU_BIG_ENDIAN changes choice state: 96 => 
  - ./.config.64r6_defconfig: warning: override: CPU_BIG_ENDIAN changes choice state: 98 => 
  - ./.config.micro32r2_defconfig: warning: override: CPU_BIG_ENDIAN changes choice state: 94 => 
  - .config: warning: override: ARCH_RV32I changes choice state: 6414 => 
  - .config: warning: override: CPU_BIG_ENDIAN changes choice state: 92, 95, 97, 93, 94 => 
  - /kisskb/src/arch/mips/sgi-ip22/ip22-berr.c: warning: no previous prototype for 'ip22_be_init' [-Wmissing-prototypes]: 113:13 => 
  - /kisskb/src/arch/mips/sgi-ip22/ip22-berr.c: warning: no previous prototype for 'ip22_be_interrupt' [-Wmissing-prototypes]: 89:6 => 
  - /kisskb/src/arch/mips/sgi-ip22/ip22-gio.c: warning: no previous prototype for 'ip22_gio_init' [-Wmissing-prototypes]: 398:12 => 
  - /kisskb/src/arch/mips/sgi-ip22/ip22-gio.c: warning: no previous prototype for 'ip22_gio_set_64bit' [-Wmissing-prototypes]: 249:6 => 
  - /kisskb/src/arch/mips/sgi-ip22/ip22-time.c: warning: no previous prototype for 'indy_8254timer_irq' [-Wmissing-prototypes]: 119:18 => 
  - /kisskb/src/arch/sparc/prom/misc_64.c: warning: no previous prototype for 'prom_get_mmu_ihandle' [-Wmissing-prototypes]: 165:5 => 
  - /kisskb/src/arch/sparc/prom/p1275.c: warning: no previous prototype for 'prom_cif_init' [-Wmissing-prototypes]: 52:6 => 
  - /kisskb/src/drivers/base/regmap/regcache-maple.c: warning: 'lower_index' is used uninitialized [-Wuninitialized]: 113:23 => 
  - /kisskb/src/drivers/base/regmap/regcache-maple.c: warning: 'lower_last' is used uninitialized [-Wuninitialized]: 113:36 => 
  - /kisskb/src/fs/btrfs/extent_io.c: warning: 'last_extent_end' may be used uninitialized in this function [-Wmaybe-uninitialized]: 3285:19 => 

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Build regressions/improvements in v6.11-rc1
  2024-07-29  9:28 ` Build regressions/improvements in v6.11-rc1 Geert Uytterhoeven
@ 2024-07-29  9:35   ` Geert Uytterhoeven
  2024-07-29  9:54     ` Arnd Bergmann
  0 siblings, 1 reply; 59+ messages in thread
From: Geert Uytterhoeven @ 2024-07-29  9:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: Greg Kroah-Hartman, linux-mips, dm-devel, linuxppc-dev,
	linux-btrfs, intel-xe, Arnd Bergmann, linux-sh, sparclinux

On Mon, 29 Jul 2024, Geert Uytterhoeven wrote:
> Below is the list of build error/warning regressions/improvements in
> v6.11-rc1[1] compared to v6.10[2].
>
> Summarized:
>  - build errors: +7/-22

> [1] http://kisskb.ellerman.id.au/kisskb/branch/linus/head/8400291e289ee6b2bf9779ff1c83a291501f017b/ (all 132 configs)

> 7 error regressions:
>  + /kisskb/src/arch/mips/sgi-ip22/ip22-gio.c: error: initialization of 'int (*)(struct device *, const struct device_driver *)' from incompatible pointer type 'int (*)(struct device *, struct device_driver *)' [-Werror=incompatible-pointer-types]:  => 384:14

mips-gcc8/ip22_defconfig

>  + /kisskb/src/drivers/md/dm-integrity.c: error: logical not is only applied to the left hand side of comparison [-Werror=logical-not-parentheses]:  => 4718:45

powerpc-gcc5/powerpc-all{mod,yes}config
powerpc-gcc5/ppc64le_allmodconfig

>  + /kisskb/src/fs/btrfs/inode.c: error: 'location.objectid' may be used uninitialized in this function [-Werror=maybe-uninitialized]:  => 5603:9
>  + /kisskb/src/fs/btrfs/inode.c: error: 'location.type' may be used uninitialized in this function [-Werror=maybe-uninitialized]:  => 5674:5

m68k-gcc8/m68k-allmodconfig
mips-gcc8/mips-allmodconfig
powerpc-gcc5/powerpc-all{mod,yes}config
powerpc-gcc5/ppc64_defconfig

>  + /kisskb/src/include/linux/compiler_types.h: error: call to '__compiletime_assert_933' declared with attribute error: FIELD_GET: mask is not constant:  => 510:38
>  + /kisskb/src/include/linux/compiler_types.h: error: call to '__compiletime_assert_934' declared with attribute error: FIELD_GET: mask is not constant:  => 510:38

     inlined from 'xe_oa_set_prop_oa_format' at /kisskb/src/drivers/gpu/drm/xe/xe_oa.c:1664:6:

powerpc-gcc5/powerpc-all{yes,mod}config
powerpc-gcc5/powerpc-allmodconfig
powerpc-gcc5/ppc64le_allmodconfig

(fix sent)

>  + /kisskb/src/kernel/fork.c: error: #warning clone3() entry point is missing, please fix [-Werror=cpp]:  => 3072:2

sh4-gcc13/se{7619,7750}_defconfig
sh4-gcc13/sh-all{mod,no,yes}config
sh4-gcc13/sh-defconfig
sparc64-gcc5/sparc-allnoconfig
sparc64-gcc{5,13}/sparc32_defconfig
sparc64-gcc{5,13}/sparc64-{allno,def}config
sparc64-gcc13/sparc-all{mod,no}config
sparc64-gcc13/sparc64-allmodconfig

Gr{oetje,eeting}s,

 						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
 							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Build regressions/improvements in v6.11-rc1
  2024-07-29  9:35   ` Geert Uytterhoeven
@ 2024-07-29  9:54     ` Arnd Bergmann
  2024-07-29 10:07       ` Geert Uytterhoeven
  0 siblings, 1 reply; 59+ messages in thread
From: Arnd Bergmann @ 2024-07-29  9:54 UTC (permalink / raw)
  To: Geert Uytterhoeven, linux-kernel
  Cc: Greg Kroah-Hartman, linux-mips, dm-devel, linuxppc-dev,
	linux-btrfs, intel-xe, linux-sh, sparclinux, linux-hexagon,
	linux-sh

On Mon, Jul 29, 2024, at 11:35, Geert Uytterhoeven wrote:
>
>>  + /kisskb/src/kernel/fork.c: error: #warning clone3() entry point is missing, please fix [-Werror=cpp]:  => 3072:2
>
> sh4-gcc13/se{7619,7750}_defconfig
> sh4-gcc13/sh-all{mod,no,yes}config
> sh4-gcc13/sh-defconfig
> sparc64-gcc5/sparc-allnoconfig
> sparc64-gcc{5,13}/sparc32_defconfig
> sparc64-gcc{5,13}/sparc64-{allno,def}config
> sparc64-gcc13/sparc-all{mod,no}config
> sparc64-gcc13/sparc64-allmodconfig

Hexagon and NIOS2 as well, but this is expected. I really just
moved the warning into the actual implementation, the warning
is the same as before. hexagon and sh look like they should be
trivial, it's just that nobody seems to care. I'm sure the
patches were posted before and never applied.

sparc and nios2 do need some real work to write and test
the wrappers.

It does look like CONFIG_WERROR did not fail the build before
505d66d1abfb ("clone3: drop __ARCH_WANT_SYS_CLONE3 macro")
as it probably was intended.

      Arnd

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Build regressions/improvements in v6.11-rc1
  2024-07-29  9:54     ` Arnd Bergmann
@ 2024-07-29 10:07       ` Geert Uytterhoeven
  0 siblings, 0 replies; 59+ messages in thread
From: Geert Uytterhoeven @ 2024-07-29 10:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-kernel, Greg Kroah-Hartman, linux-mips, dm-devel,
	linuxppc-dev, linux-btrfs, intel-xe, linux-sh, sparclinux,
	linux-hexagon

Hi Arnd,

On Mon, Jul 29, 2024 at 11:55 AM Arnd Bergmann <arnd@arndb.de> wrote:
> On Mon, Jul 29, 2024, at 11:35, Geert Uytterhoeven wrote:
> >>  + /kisskb/src/kernel/fork.c: error: #warning clone3() entry point is missing, please fix [-Werror=cpp]:  => 3072:2
> >
> > sh4-gcc13/se{7619,7750}_defconfig
> > sh4-gcc13/sh-all{mod,no,yes}config
> > sh4-gcc13/sh-defconfig
> > sparc64-gcc5/sparc-allnoconfig
> > sparc64-gcc{5,13}/sparc32_defconfig
> > sparc64-gcc{5,13}/sparc64-{allno,def}config
> > sparc64-gcc13/sparc-all{mod,no}config
> > sparc64-gcc13/sparc64-allmodconfig
>
> Hexagon and NIOS2 as well, but this is expected. I really just
> moved the warning into the actual implementation, the warning
> is the same as before. hexagon and sh look like they should be
> trivial, it's just that nobody seems to care. I'm sure the
> patches were posted before and never applied.
>
> sparc and nios2 do need some real work to write and test
> the wrappers.
>
> It does look like CONFIG_WERROR did not fail the build before
> 505d66d1abfb ("clone3: drop __ARCH_WANT_SYS_CLONE3 macro")
> as it probably was intended.

Indeed. The actual regression is that this turned into a fatal error
with -Werror.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-28 21:40 Linux 6.11-rc1 Linus Torvalds
  2024-07-29  9:28 ` Build regressions/improvements in v6.11-rc1 Geert Uytterhoeven
@ 2024-07-29 15:29 ` Guenter Roeck
  2024-07-29 19:23   ` Linus Torvalds
                     ` (2 more replies)
  1 sibling, 3 replies; 59+ messages in thread
From: Guenter Roeck @ 2024-07-29 15:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

On Sun, Jul 28, 2024 at 02:40:01PM -0700, Linus Torvalds wrote:
> The merge window felt pretty normal, and the stats all look pretty
> normal too. I was expecting things to be quieter because of summer
> vacations, but that (still) doesn't actually seem to have been the
> case.
> 
> There's 12k+ regular commits (and another 850 merge commits), so as
> always the summary of this all is just my merge log. The diffstats are
> also (once again) dominated by some big hardware descriptions (another
> AMD GPU register dump accounts for ~45% of the lines in the diff, and
> some more perf event JSON descriptor files account for another 5%).
> 
> But if you ignore those HW dumps, the diff too looks perfectly
> regular: drivers account for a bit over half (even when not counting
> the AMD register description noise). The rest is roughly one third
> architecture updates (lots of it is dts files, so I guess I could have
> lumped that in with "more hw descriptor tables"), one third tooling
> and documentation, and one third "core kernel" (filesystems,
> networking, VM and kernel). Very roughly.
> 
> If you want more details, you should get the git tree, and then narrow
> things down based on interests.
> 

Build results:
	total: 158 pass: 139 fail: 19
Failed builds:
	alpha:allmodconfig
	alpha:tinyconfig
	arcv2:tinyconfig
	arm:tinyconfig
	csky:tinyconfig
	hexagon:tinyconfig
	loongarch:tinyconfig
	m68k:tinyconfig
	microblaze:tinyconfig
	mips:tinyconfig
	nios2:tinyconfig
	openrisc:tinyconfig
	parisc:tinyconfig
	powerpc:tinyconfig
	riscv32:tinyconfig
	riscv64:tinyconfig
	sparc32:tinyconfig
	sparc64:tinyconfig
	xtensa:tinyconfig
Qemu test results:
	total: 533 pass: 493 fail: 40
Failed tests:
	arm:versatilepb:versatile_defconfig:aeabi:pci:scsi:mem128:net=default:versatile-pb:ext2
	arm:versatilepb:versatile_defconfig:aeabi:pci:flash64:mem128:net=default:versatile-pb:ext2
	arm:versatilepb:versatile_defconfig:aeabi:pci:mem128:net=default:versatile-pb:initrd
	arm:versatileab:versatile_defconfig:mem128:net=default:versatile-ab:initrd
	microblaze:petalogix-s3adsp1800:initrd
	microblaze:petalogix-s3adsp1800:rootfs
	microblaze:petalogix-ml605:initrd
	microblaze:petalogix-ml605:rootfs
	microblazeel:petalogix-s3adsp1800:initrd
	microblazeel:petalogix-s3adsp1800:rootfs
	microblazeel:petalogix-ml605:initrd
	microblazeel:petalogix-ml605:rootfs
	ppc:mpc8544ds:mpc85xx_defconfig:net=e1000:initrd
	ppc:mpc8544ds:mpc85xx_defconfig:scsi[53C895A]:net=ne2k_pci:btrfs
	ppc:mpc8544ds:mpc85xx_defconfig:sata-sii3112:net=rtl8139:ext2
	ppc:mpc8544ds:mpc85xx_defconfig:sdhci-mmc:net=usb-ohci:ext2
	ppc:mpc8544ds:mpc85xx_smp_defconfig:net=e1000:initrd
	ppc:mpc8544ds:mpc85xx_smp_defconfig:scsi[DC395]:net=i82550:ext2
	ppc:mpc8544ds:mpc85xx_smp_defconfig:scsi[53C895A]:net=usb-ohci:btrfs
	ppc:mpc8544ds:mpc85xx_smp_defconfig:sata-sii3112:net=ne2k_pci:ext2
	ppc:ppce500:corenet32_smp_defconfig:e500:net=rtl8139:initrd
	ppc:ppce500:corenet32_smp_defconfig:e500:net=virtio-net:nvme:btrfs
	ppc:ppce500:corenet32_smp_defconfig:e500:net=eTSEC:sdhci-mmc:ext2
	ppc:ppce500:corenet32_smp_defconfig:e500:net=e1000:mmc:cramfs
	ppc:ppce500:corenet32_smp_defconfig:e500:net=tulip:scsi[53C895A]:ext2
	ppc:ppce500:corenet32_smp_defconfig:e500:net=i82562:sata-sii3112:ext2
	riscv32:virt:rv32_defconfig:nofs:noscsi:net=e1000:initrd
	riscv32:virt:rv32,zbb=no:rv32_defconfig:nofs:noscsi:net=e1000e:virtio-blk:ext2
	riscv32:virt:rv32_defconfig:nofs:noscsi:net=i82801:virtio:ext2
	riscv32:virt:rv32,zbb=no:rv32_defconfig:nofs:noscsi:net=i82550:virtio-pci:ext2
	riscv32:virt:rv32_defconfig:nofs:noscsi:tpm-tis-device:net=e1000-82544gc:sdhci-mmc:ext2
	riscv32:virt:rv32,zbb=no:rv32_defconfig:nofs:noscsi:net=usb-ohci:nvme:ext2
	riscv32:virt:rv32_defconfig:nofs:noscsi:net=virtio-net-device:usb-ohci:ext2
	riscv32:virt:rv32,zbb=no:rv32_defconfig:nofs:noscsi:net=pcnet:usb-ehci:ext2
	riscv32:virt:rv32_defconfig:nofs:noscsi:net=virtio-net-pci:usb-xhci:ext2
	riscv32:virt:rv32,zbb=no:rv32_defconfig:nofs:noscsi:net=i82557a:usb-uas-ehci:ext2
	riscv32:virt:rv32_defconfig:nofs:noscsi:net=i82558a:usb-uas-xhci:ext2
	riscv32:virt:rv32_defconfig:nofs:noscsi:net=i82557b:scsi[virtio]:ext2
	riscv32:virt:rv32,zbb=no:rv32_defconfig:nofs:noscsi:net=i82557c:scsi[virtio-pci]:ext2
	i386:q35:pentium3:defconfig:pae:nosmp:net=ne2k_pci:initrd
Unit test results:
	pass: 316946 fail: 0

In summary, quite impressive in a negative sense. At least some of the
problems (such as the tinyconfig build failures, and some of the test
failures) have already been reported. I simply don't have the time for a
detailed analysis. Logs are available at https://kerneltests.org/builders,
in the "master" column, for those with time to track things down.

Guenter

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-29 15:29 ` Linux 6.11-rc1 Guenter Roeck
@ 2024-07-29 19:23   ` Linus Torvalds
  2024-07-29 19:50     ` Linus Torvalds
                       ` (2 more replies)
  2024-07-30 17:04   ` Guenter Roeck
  2024-08-02 17:35   ` Linus Walleij
  2 siblings, 3 replies; 59+ messages in thread
From: Linus Torvalds @ 2024-07-29 19:23 UTC (permalink / raw)
  To: Guenter Roeck, Peter Zijlstra, Sebastian Andrzej Siewior,
	Ingo Molnar
  Cc: Linux Kernel Mailing List

On Mon, 29 Jul 2024 at 08:29, Guenter Roeck <linux@roeck-us.net> wrote:
>
> In summary, quite impressive in a negative sense.

Grr. I think a lot of the build failures end up being due to commit
466e4d801cd4 ("task_work: Add TWA_NMI_CURRENT as an additional notify
mode") depending on IRQ_WORK, and that not existing everywhere.

I pushed out a tentative fix as commit cec6937dd1aa ("task_work: make
TWA_NMI_CURRENT handling conditional on IRQ_WORK"). I haven't set up a
build environment for those tiny targets, but it looked fairly
straightforward.

I think that explains at least most of the 'tinyconfig' build failures.

Not super-happy about how people apparently were discussing the build
failures for a long time, and didn't even bother mentioning them in
the pull requests. That broken commit came in through the perf-core
pull from Ingo.

And that fix (if it fixes it - I think it will) still leaves the alpha
allmodconfig build and all the failed tests.

I'll take a look.

              Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-29 19:23   ` Linus Torvalds
@ 2024-07-29 19:50     ` Linus Torvalds
  2024-07-29 21:34       ` Arnd Bergmann
  2024-07-30  7:54     ` Peter Zijlstra
  2024-07-31 15:45     ` Guenter Roeck
  2 siblings, 1 reply; 59+ messages in thread
From: Linus Torvalds @ 2024-07-29 19:50 UTC (permalink / raw)
  To: Guenter Roeck, Peter Zijlstra, Sebastian Andrzej Siewior,
	Ingo Molnar, Arnd Bergmann
  Cc: Linux Kernel Mailing List

On Mon, 29 Jul 2024 at 12:23, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> And that fix (if it fixes it - I think it will) still leaves the alpha
> allmodconfig build and all the failed tests.
>
> I'll take a look.

Well, the alpha allmodconfig case is apparently

  ERROR: modpost: "iowrite64be" [drivers/crypto/caam/caam_jr.ko] undefined!

which I suspect it just a result of commit beba3771d9e0 ("crypto:
caam: Make CRYPTO_DEV_FSL_CAAM dependent of COMPILE_TEST").

IOW, that is almost certainly simply due to better build test
coverage, not a new bug.

But I didn't look into *why* it would fail. We have a comment about
iowrite64be saying

 * These get provided from <asm-generic/iomap.h> since alpha does not
 * select GENERIC_IOMAP.

and I'm not sure why that isn't correct.

I get a feeling that lib/iomap.c is missing a couple of functions, but
didn't look into it a lot.

I suspect Arnd may be the right person to ask. Arnd?

           Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-29 19:50     ` Linus Torvalds
@ 2024-07-29 21:34       ` Arnd Bergmann
  2024-07-29 23:47         ` Linus Torvalds
  0 siblings, 1 reply; 59+ messages in thread
From: Arnd Bergmann @ 2024-07-29 21:34 UTC (permalink / raw)
  To: Linus Torvalds, Guenter Roeck, Peter Zijlstra,
	Sebastian Andrzej Siewior, Ingo Molnar, Johannes Berg
  Cc: Linux Kernel Mailing List

On Mon, Jul 29, 2024, at 21:50, Linus Torvalds wrote:
> On Mon, 29 Jul 2024 at 12:23, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> And that fix (if it fixes it - I think it will) still leaves the alpha
>> allmodconfig build and all the failed tests.
>>
>> I'll take a look.
>
> Well, the alpha allmodconfig case is apparently
>
>   ERROR: modpost: "iowrite64be" [drivers/crypto/caam/caam_jr.ko] undefined!
>
> which I suspect it just a result of commit beba3771d9e0 ("crypto:
> caam: Make CRYPTO_DEV_FSL_CAAM dependent of COMPILE_TEST").
>
> IOW, that is almost certainly simply due to better build test
> coverage, not a new bug.
>
> But I didn't look into *why* it would fail. We have a comment about
> iowrite64be saying
>
>  * These get provided from <asm-generic/iomap.h> since alpha does not
>  * select GENERIC_IOMAP.
>
> and I'm not sure why that isn't correct.
>
> I get a feeling that lib/iomap.c is missing a couple of functions, but
> didn't look into it a lot.
>
> I suspect Arnd may be the right person to ask. Arnd?

Yes, I've noticed this problem a few weeks ago with another
driver as we tried to fix the usage of iowrite64() on 32-bit
architectures. We actually have two old bugs here and still
need to make a decision about how to fix that properly:

- ioread64()/iowrite64() and their variants are defined
  differently on architectures depending on whether they use
  CONFIG_GENERIC_IOMAP (x86, um, and a few rare configs
  elsewhere) or not. On GENERIC_IOMAP architectures, there
  is no 64-bit PIO, so lib/iomap.c only provides the
  iowrite64_hi_lo()/iowrite64_lo_hi() etc wrappers that do
  a pair of 32-bit accessors for PIO but native 64-bit
  MMIO. On other 64-bit architectures, iowrite64() is the
  same as writeq() and it can operate on PCI I/O space as
  well. Drivers with big-endian registers tend to use
  iowriteXXbe() in order to the correct byteswap in the
  absence of writeX_be().

- Alpha (and I think parisc) uses the asm-generic/iomap.h
  header that is meant for GENERIC_IOMAP but then provides
  its own functions. It never had iowrite64be() and we
  didn't notice this in the absence of users. The caam driver
  includes include/linux/io-64-nonatomic-lo-hi.h, which
  then redirects iowrite64be() to iowrite64be_lo_hi()
  on x86 (since it does not define iowrite64be()) and
  on 32-bit architectures, but uses iowrite64be() from
  include/asm-generic/io.h on most other 64-bit
  architectures. On alpha it uses the incorrect
  prototype.

I suspect we can fix the alpha issue with the trivial
change below (haven't tested yet), but the way we are
inconsistent about these will likely keep biting us
unless we come up with a better way to handle them
across architectures.

      Arnd

diff --git a/arch/alpha/include/asm/io.h b/arch/alpha/include/asm/io.h
index 2bb8cbeedf91..52212e47e917 100644
--- a/arch/alpha/include/asm/io.h
+++ b/arch/alpha/include/asm/io.h
@@ -534,8 +534,10 @@ extern inline void writeq(u64 b, volatile void __iomem *addr)
 
 #define ioread16be(p) swab16(ioread16(p))
 #define ioread32be(p) swab32(ioread32(p))
+#define ioread64be(p) swab64(ioread64(p))
 #define iowrite16be(v,p) iowrite16(swab16(v), (p))
 #define iowrite32be(v,p) iowrite32(swab32(v), (p))
+#define iowrite64be(v,p) iowrite64(swab64(v), (p))
 
 #define inb_p          inb
 #define inw_p          inw
@@ -634,8 +637,6 @@ extern void outsl (unsigned long port, const void *src, unsigned long count);
  */
 #define ioread64 ioread64
 #define iowrite64 iowrite64
-#define ioread64be ioread64be
-#define iowrite64be iowrite64be
 #define ioread8_rep ioread8_rep
 #define ioread16_rep ioread16_rep
 #define ioread32_rep ioread32_rep

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-29 21:34       ` Arnd Bergmann
@ 2024-07-29 23:47         ` Linus Torvalds
  2024-07-30 15:47           ` Arnd Bergmann
  0 siblings, 1 reply; 59+ messages in thread
From: Linus Torvalds @ 2024-07-29 23:47 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Guenter Roeck, Peter Zijlstra, Sebastian Andrzej Siewior,
	Ingo Molnar, Johannes Berg, Linux Kernel Mailing List

On Mon, 29 Jul 2024 at 14:35, Arnd Bergmann <arnd@arndb.de> wrote:
>
> I suspect we can fix the alpha issue with the trivial
> change below (haven't tested yet), but the way we are
> inconsistent about these will likely keep biting us
> unless we come up with a better way to handle them
> across architectures.

Well, looking around, the other functions (ie things like
iowrite64be_lo_hi() etc) do end up being handled by lib/iomap.c, and
parisc does seem to implement its own versions.

So this may in fact be the only such case.

Knock wood.

Your suggested patch looks ObviouslyCorrect(tm) to me. I assume I'll
get it through the normal channels after testing?

          Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-29 19:23   ` Linus Torvalds
  2024-07-29 19:50     ` Linus Torvalds
@ 2024-07-30  7:54     ` Peter Zijlstra
  2024-07-31 15:45     ` Guenter Roeck
  2 siblings, 0 replies; 59+ messages in thread
From: Peter Zijlstra @ 2024-07-30  7:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Sebastian Andrzej Siewior, Ingo Molnar,
	Linux Kernel Mailing List

On Mon, Jul 29, 2024 at 12:23:01PM -0700, Linus Torvalds wrote:

> Not super-happy about how people apparently were discussing the build
> failures for a long time, and didn't even bother mentioning them in
> the pull requests. That broken commit came in through the perf-core
> pull from Ingo.

My bad, sorry. That issue seems to have completely slipped my mind :-(


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-29 23:47         ` Linus Torvalds
@ 2024-07-30 15:47           ` Arnd Bergmann
  0 siblings, 0 replies; 59+ messages in thread
From: Arnd Bergmann @ 2024-07-30 15:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Peter Zijlstra, Sebastian Andrzej Siewior,
	Ingo Molnar, Johannes Berg, Linux Kernel Mailing List

On Tue, Jul 30, 2024, at 01:47, Linus Torvalds wrote:
> On Mon, 29 Jul 2024 at 14:35, Arnd Bergmann <arnd@arndb.de> wrote:
>>
>> I suspect we can fix the alpha issue with the trivial
>> change below (haven't tested yet), but the way we are
>> inconsistent about these will likely keep biting us
>> unless we come up with a better way to handle them
>> across architectures.
>
> Well, looking around, the other functions (ie things like
> iowrite64be_lo_hi() etc) do end up being handled by lib/iomap.c, and
> parisc does seem to implement its own versions.
>
> So this may in fact be the only such case.
>
> Knock wood.
>
> Your suggested patch looks ObviouslyCorrect(tm) to me. I assume I'll
> get it through the normal channels after testing?

Yes, I've sent it with a proper description to the alpha
maintainers for feedback now and queued it up in the
asm-generic tree:

https://lore.kernel.org/lkml/20240730152744.2813600-1-arnd@kernel.org/T/#u

I also sent a fix for the uretprobe syscall number mess, will
send both once we have agreed on how to do that.

      Arnd

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-29 15:29 ` Linux 6.11-rc1 Guenter Roeck
  2024-07-29 19:23   ` Linus Torvalds
@ 2024-07-30 17:04   ` Guenter Roeck
  2024-07-30 17:20     ` Jens Axboe
  2024-07-30 18:53     ` Linus Torvalds
  2024-08-02 17:35   ` Linus Walleij
  2 siblings, 2 replies; 59+ messages in thread
From: Guenter Roeck @ 2024-07-30 17:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List, Jens Axboe

On Mon, Jul 29, 2024 at 08:29:20AM -0700, Guenter Roeck wrote:
> On Sun, Jul 28, 2024 at 02:40:01PM -0700, Linus Torvalds wrote:
> > The merge window felt pretty normal, and the stats all look pretty
> > normal too. I was expecting things to be quieter because of summer
> > vacations, but that (still) doesn't actually seem to have been the
> > case.
> > 
> > There's 12k+ regular commits (and another 850 merge commits), so as
> > always the summary of this all is just my merge log. The diffstats are
> > also (once again) dominated by some big hardware descriptions (another
> > AMD GPU register dump accounts for ~45% of the lines in the diff, and
> > some more perf event JSON descriptor files account for another 5%).
> > 
> > But if you ignore those HW dumps, the diff too looks perfectly
> > regular: drivers account for a bit over half (even when not counting
> > the AMD register description noise). The rest is roughly one third
> > architecture updates (lots of it is dts files, so I guess I could have
> > lumped that in with "more hw descriptor tables"), one third tooling
> > and documentation, and one third "core kernel" (filesystems,
> > networking, VM and kernel). Very roughly.
> > 
> > If you want more details, you should get the git tree, and then narrow
> > things down based on interests.
> > 
> 
> Build results:
> 	total: 158 pass: 139 fail: 19
> Failed builds:
...
> 	i386:q35:pentium3:defconfig:pae:nosmp:net=ne2k_pci:initrd

This failure bisects to commit 0256994887d7 ("Merge tag
'for-6.11/block-post-20240722' of git://git.kernel.dk/linux"). I have no
idea why that would be the case, but it is easy to reproduce. Maybe it is
coincidental. Either case, copying Jens in case he has an idea.

From the crash log:

[    3.605247] sr 2:0:0:0: Attached scsi generic sg0 type 5
[    3.764508] sched_clock: Marking stable (3740032902, 23766486)->(3766853760, -3054372)
[    3.768164] registered taskstats version 1
[    3.768271] Loading compiled-in X.509 certificates
[    3.990683] Btrfs loaded, zoned=no, fsverity=no
[    4.005012] cryptomgr_test (68) used greatest stack depth: 6136 bytes left
[    4.029889] traps: PANIC: double fault, error_code: 0x0
[    4.030257] Oops: double fault: 0000 [#1] PREEMPT PTI
[    4.030456] CPU: 0 UID: 0 PID: 70 Comm: modprobe Not tainted 6.11.0-rc1-00043-g94ede2a3e913 #1
[    4.030523] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    4.030613] EIP: asm_exc_page_fault+0x0/0x10
[    4.030886] Code: bf 3e c8 e9 23 06 00 00 66 90 8d 76 00 fc 6a 00 68 f0 bd 3e c8 e9 11 06 00 00 8d 76 00 fc 6a 00 68 54 c5 3e c8 e9 01 06 00 00 <8d> 76 00 fc 68 b0 e9 3e c8 e9 f3 05 00 00 66 90 8d 76 00 fc 6a 00
[    4.030949] EAX: 028af000 EBX: ffa03fbc ECX: 00000000 EDX: 00000000
[    4.030963] ESI: c2b51ff8 EDI: ffa04000 EBP: 42b51fb4 ESP: ffa0300c
[    4.030980] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00000006
[    4.031007] CR0: 80050033 CR2: ffa02ffc CR3: 08dc6000 CR4: 000006f0
[    4.031064] Call Trace:
[    4.031187]  <#DF>
[    4.031249]  ? show_regs+0x50/0x58
[    4.031296]  ? die+0x2f/0x90
[    4.031302]  ? vprintk+0x25/0x38
[    4.031315]  ? exc_double_fault+0x6d/0x7c
[    4.031327]  ? doublefault_shim+0x10a/0x118
[    4.031342]  ? asm_exc_int3+0x10/0x10
[    4.031353]  ? asm_exc_double_fault+0xa/0x10
[    4.031370]  </#DF>
[    4.031389]  <ENTRY_TRAMPOLINE>
[    4.031392]  ? asm_exc_int3+0x10/0x10
...
[    4.033360]  ? asm_exc_int3+0x10/0x10
[    4.033368]  ? restore_all_switch_stack+0x65/0xe6
[    4.033386]  </ENTRY_TRAMPOLINE>
[    4.033415] Modules linked in:
[    4.033685] ---[ end trace 0000000000000000 ]---
[    4.033741] EIP: asm_exc_page_fault+0x0/0x10
[    4.033750] Code: bf 3e c8 e9 23 06 00 00 66 90 8d 76 00 fc 6a 00 68 f0 bd 3e c8 e9 11 06 00 00 8d 76 00 fc 6a 00 68 54 c5 3e c8 e9 01 06 00 00 <8d> 76 00 fc 68 b0 e9 3e c8 e9 f3 05 00 00 66 90 8d 76 00 fc 6a 00
[    4.033757] EAX: 028af000 EBX: ffa03fbc ECX: 00000000 EDX: 00000000
[    4.033762] ESI: c2b51ff8 EDI: ffa04000 EBP: 42b51fb4 ESP: ffa0300c
[    4.033767] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00000006
[    4.033772] CR0: 80050033 CR2: ffa02ffc CR3: 08dc6000 CR4: 000006f0
[    4.033838] Kernel panic - not syncing: Fatal exception in interrupt
[    4.033980] Kernel Offset: disabled

Guenter

---
Bisect log:

# bad: [8400291e289ee6b2bf9779ff1c83a291501f017b] Linux 6.11-rc1
# good: [2c9b3512402ed192d1f43f4531fb5da947e72bd0] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect start 'v6.11-rc1' '2c9b3512402e'
# bad: [6dc2e98d5f1de162d1777aee97e59d75d70d07c5] s390: Remove protvirt and kvm config guards for uv code
git bisect bad 6dc2e98d5f1de162d1777aee97e59d75d70d07c5
# good: [30d77b7eef019fa4422980806e8b7cdc8674493e] mm/mglru: fix ineffective protection calculation
git bisect good 30d77b7eef019fa4422980806e8b7cdc8674493e
# good: [527eff227d4321c6ea453db1083bc4fdd4d3a3e8] Merge tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect good 527eff227d4321c6ea453db1083bc4fdd4d3a3e8
# bad: [a362ade892e3e4de69296cddb1a23a1efe701428] Merge tag 'loongarch-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
git bisect bad a362ade892e3e4de69296cddb1a23a1efe701428
# good: [dd018c238b8489b6dd8c06f6b962ea75d79115ff] Merge tag 'bcachefs-2024-07-22' of https://evilpiepirate.org/git/bcachefs
git bisect good dd018c238b8489b6dd8c06f6b962ea75d79115ff
# good: [89ed6c9ac69ec398ccb648f5f675b43e8ca679ca] blk-cgroup: move congestion_count to struct blkcg
git bisect good 89ed6c9ac69ec398ccb648f5f675b43e8ca679ca
# good: [3892b11eac5aaaeefbf717f1953288b77759d9e2] LoongArch: Check TIF_LOAD_WATCH to enable user space watchpoint
git bisect good 3892b11eac5aaaeefbf717f1953288b77759d9e2
# bad: [0256994887d7c89c2a41d872aac67605bda8f115] Merge tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux
git bisect bad 0256994887d7c89c2a41d872aac67605bda8f115
# good: [bf4c89fc8797f5c0964a0c3d561fbe7e8483b62f] block: don't call bio_uninit from bio_endio
git bisect good bf4c89fc8797f5c0964a0c3d561fbe7e8483b62f
# good: [85253bac4d02b1f95d6109c221aeccd7a262ec4d] block: don't free submitter owned integrity payload on I/O completion
git bisect good 85253bac4d02b1f95d6109c221aeccd7a262ec4d
# good: [74cc150282e41c6c0704cd305c9a4392dc64ef4d] block: don't free the integrity payload in bio_integrity_unmap_free_user
git bisect good 74cc150282e41c6c0704cd305c9a4392dc64ef4d
# first bad commit: [0256994887d7c89c2a41d872aac67605bda8f115] Merge tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 17:04   ` Guenter Roeck
@ 2024-07-30 17:20     ` Jens Axboe
  2024-07-30 18:22       ` Guenter Roeck
  2024-07-30 18:53     ` Linus Torvalds
  1 sibling, 1 reply; 59+ messages in thread
From: Jens Axboe @ 2024-07-30 17:20 UTC (permalink / raw)
  To: Guenter Roeck, Linus Torvalds; +Cc: Linux Kernel Mailing List

On 7/30/24 11:04 AM, Guenter Roeck wrote:
> On Mon, Jul 29, 2024 at 08:29:20AM -0700, Guenter Roeck wrote:
>> On Sun, Jul 28, 2024 at 02:40:01PM -0700, Linus Torvalds wrote:
>>> The merge window felt pretty normal, and the stats all look pretty
>>> normal too. I was expecting things to be quieter because of summer
>>> vacations, but that (still) doesn't actually seem to have been the
>>> case.
>>>
>>> There's 12k+ regular commits (and another 850 merge commits), so as
>>> always the summary of this all is just my merge log. The diffstats are
>>> also (once again) dominated by some big hardware descriptions (another
>>> AMD GPU register dump accounts for ~45% of the lines in the diff, and
>>> some more perf event JSON descriptor files account for another 5%).
>>>
>>> But if you ignore those HW dumps, the diff too looks perfectly
>>> regular: drivers account for a bit over half (even when not counting
>>> the AMD register description noise). The rest is roughly one third
>>> architecture updates (lots of it is dts files, so I guess I could have
>>> lumped that in with "more hw descriptor tables"), one third tooling
>>> and documentation, and one third "core kernel" (filesystems,
>>> networking, VM and kernel). Very roughly.
>>>
>>> If you want more details, you should get the git tree, and then narrow
>>> things down based on interests.
>>>
>>
>> Build results:
>> 	total: 158 pass: 139 fail: 19
>> Failed builds:
> ...
>> 	i386:q35:pentium3:defconfig:pae:nosmp:net=ne2k_pci:initrd
> 
> This failure bisects to commit 0256994887d7 ("Merge tag
> 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux"). I have no
> idea why that would be the case, but it is easy to reproduce. Maybe it is
> coincidental. Either case, copying Jens in case he has an idea.

I can take a look, but please post some details on what is actually
being run here so I can attempt to reproduce it. I looked at your
initial email too, and there's a link in there to:

https://kerneltests.org/builders

but I'm still not sure what's being run.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 17:20     ` Jens Axboe
@ 2024-07-30 18:22       ` Guenter Roeck
  2024-07-30 18:35         ` Jens Axboe
  0 siblings, 1 reply; 59+ messages in thread
From: Guenter Roeck @ 2024-07-30 18:22 UTC (permalink / raw)
  To: Jens Axboe, Linus Torvalds; +Cc: Linux Kernel Mailing List

On 7/30/24 10:20, Jens Axboe wrote:
> On 7/30/24 11:04 AM, Guenter Roeck wrote:
>> On Mon, Jul 29, 2024 at 08:29:20AM -0700, Guenter Roeck wrote:
>>> On Sun, Jul 28, 2024 at 02:40:01PM -0700, Linus Torvalds wrote:
>>>> The merge window felt pretty normal, and the stats all look pretty
>>>> normal too. I was expecting things to be quieter because of summer
>>>> vacations, but that (still) doesn't actually seem to have been the
>>>> case.
>>>>
>>>> There's 12k+ regular commits (and another 850 merge commits), so as
>>>> always the summary of this all is just my merge log. The diffstats are
>>>> also (once again) dominated by some big hardware descriptions (another
>>>> AMD GPU register dump accounts for ~45% of the lines in the diff, and
>>>> some more perf event JSON descriptor files account for another 5%).
>>>>
>>>> But if you ignore those HW dumps, the diff too looks perfectly
>>>> regular: drivers account for a bit over half (even when not counting
>>>> the AMD register description noise). The rest is roughly one third
>>>> architecture updates (lots of it is dts files, so I guess I could have
>>>> lumped that in with "more hw descriptor tables"), one third tooling
>>>> and documentation, and one third "core kernel" (filesystems,
>>>> networking, VM and kernel). Very roughly.
>>>>
>>>> If you want more details, you should get the git tree, and then narrow
>>>> things down based on interests.
>>>>
>>>
>>> Build results:
>>> 	total: 158 pass: 139 fail: 19
>>> Failed builds:
>> ...
>>> 	i386:q35:pentium3:defconfig:pae:nosmp:net=ne2k_pci:initrd
>>
>> This failure bisects to commit 0256994887d7 ("Merge tag
>> 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux"). I have no
>> idea why that would be the case, but it is easy to reproduce. Maybe it is
>> coincidental. Either case, copying Jens in case he has an idea.
> 
> I can take a look, but please post some details on what is actually
> being run here so I can attempt to reproduce it. I looked at your
> initial email too, and there's a link in there to:
> 
> https://kerneltests.org/builders
> 
> but I'm still not sure what's being run.
> 

Please see http://server.roeck-us.net/qemu/x86-nosmp/

Thanks,
Guenter


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 18:22       ` Guenter Roeck
@ 2024-07-30 18:35         ` Jens Axboe
  2024-07-30 18:54           ` Jens Axboe
  0 siblings, 1 reply; 59+ messages in thread
From: Jens Axboe @ 2024-07-30 18:35 UTC (permalink / raw)
  To: Guenter Roeck, Linus Torvalds; +Cc: Linux Kernel Mailing List

On 7/30/24 12:22 PM, Guenter Roeck wrote:
> On 7/30/24 10:20, Jens Axboe wrote:
>> On 7/30/24 11:04 AM, Guenter Roeck wrote:
>>> On Mon, Jul 29, 2024 at 08:29:20AM -0700, Guenter Roeck wrote:
>>>> On Sun, Jul 28, 2024 at 02:40:01PM -0700, Linus Torvalds wrote:
>>>>> The merge window felt pretty normal, and the stats all look pretty
>>>>> normal too. I was expecting things to be quieter because of summer
>>>>> vacations, but that (still) doesn't actually seem to have been the
>>>>> case.
>>>>>
>>>>> There's 12k+ regular commits (and another 850 merge commits), so as
>>>>> always the summary of this all is just my merge log. The diffstats are
>>>>> also (once again) dominated by some big hardware descriptions (another
>>>>> AMD GPU register dump accounts for ~45% of the lines in the diff, and
>>>>> some more perf event JSON descriptor files account for another 5%).
>>>>>
>>>>> But if you ignore those HW dumps, the diff too looks perfectly
>>>>> regular: drivers account for a bit over half (even when not counting
>>>>> the AMD register description noise). The rest is roughly one third
>>>>> architecture updates (lots of it is dts files, so I guess I could have
>>>>> lumped that in with "more hw descriptor tables"), one third tooling
>>>>> and documentation, and one third "core kernel" (filesystems,
>>>>> networking, VM and kernel). Very roughly.
>>>>>
>>>>> If you want more details, you should get the git tree, and then narrow
>>>>> things down based on interests.
>>>>>
>>>>
>>>> Build results:
>>>>     total: 158 pass: 139 fail: 19
>>>> Failed builds:
>>> ...
>>>>     i386:q35:pentium3:defconfig:pae:nosmp:net=ne2k_pci:initrd
>>>
>>> This failure bisects to commit 0256994887d7 ("Merge tag
>>> 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux"). I have no
>>> idea why that would be the case, but it is easy to reproduce. Maybe it is
>>> coincidental. Either case, copying Jens in case he has an idea.
>>
>> I can take a look, but please post some details on what is actually
>> being run here so I can attempt to reproduce it. I looked at your
>> initial email too, and there's a link in there to:
>>
>> https://kerneltests.org/builders
>>
>> but I'm still not sure what's being run.
>>
> 
> Please see http://server.roeck-us.net/qemu/x86-nosmp/

Works fine for me on current master, boots and run self tests and
then shuts down. Tried it 5 times now.

axboe@r7625 ~/g/linux-vm (master)> qemu-system-i386 --version
QEMU emulator version 8.2.4 (Debian 1:8.2.4+ds-1)
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers

Then tried 6.11-rc1 10 times in a loop, and also didn't see any failures.

I then switched to using gcc-11 as that seems to be what you are using,
and them it does indeed bomb during boot. Funky. I'll check the post
branch and see if it's anything from there.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 17:04   ` Guenter Roeck
  2024-07-30 17:20     ` Jens Axboe
@ 2024-07-30 18:53     ` Linus Torvalds
  2024-07-30 19:22       ` Peter Zijlstra
  2024-07-31 10:33       ` Peter Zijlstra
  1 sibling, 2 replies; 59+ messages in thread
From: Linus Torvalds @ 2024-07-30 18:53 UTC (permalink / raw)
  To: Guenter Roeck, Andy Lutomirski, Ingo Molnar, Peter Anvin
  Cc: Linux Kernel Mailing List, Jens Axboe, the arch/x86 maintainers

[ Adding x86-32 entry code people, more context at the thread in:

  https://lore.kernel.org/all/3f65bfad-bd04-4651-bbe3-e2b1925f1a13@kernel.dk/

  for people who were dragged in late ]

On Tue, 30 Jul 2024 at 10:04, Guenter Roeck <linux@roeck-us.net> wrote:
>
> From the crash log:

The full log is more informative, at

  http://server.roeck-us.net/qemu/x86-nosmp/

which has that config too.

> [    3.605247] sr 2:0:0:0: Attached scsi generic sg0 type 5
> [    3.764508] sched_clock: Marking stable (3740032902, 23766486)->(3766853760, -3054372)
> [    3.768164] registered taskstats version 1
> [    3.768271] Loading compiled-in X.509 certificates
> [    3.990683] Btrfs loaded, zoned=no, fsverity=no
> [    4.005012] cryptomgr_test (68) used greatest stack depth: 6136 bytes left
> [    4.029889] traps: PANIC: double fault, error_code: 0x0

Double faults are bad bad juju.  Nasty to debug, because it means
something went wrong at a horribly bad time.

> [    4.030613] EIP: asm_exc_page_fault+0x0/0x10

Sadly, this mainly says that taking a page fault was part of the
horribly bad time.

> [    4.031389]  <ENTRY_TRAMPOLINE>
> [    4.031392]  ? asm_exc_int3+0x10/0x10
> ...
> [    4.033360]  ? asm_exc_int3+0x10/0x10
> [    4.033368]  ? restore_all_switch_stack+0x65/0xe6
> [    4.033386]  </ENTRY_TRAMPOLINE>

Yeah "restore_all_switch_stack" is also part of "horribly bad time".

And from the full log, I see that the "..." is a *lot* of asm_exc_int3+0x10.

Which makes me think it's asm_exc_int3 just recursively failing.

Which will cause a stack overflow, and then - after a time - a double fault.

[ Time passes, I build the i386 kernel image with your config just to
get an image that looks like yours ]

Hmm. I think the stack dump output confused me. Because
"asm_exc_int3+0x10/0x10" doesn't end up making much sense, but it
turns out that "asm_exc_int3+0x10" is actually the same as
'asm_exc_page_fault'.

So it smells like we're taking a page fault, but somehow the page
fault text address has been unmapped, so taking a page fault causes a
page fault and then we end up finally in that same "no more stack,
double fault" situation.

Either page table corruption, or some issue with the page table mitigation.

The fact that it started happening with the block merge may be because
the block code causes some major corruption, or may just be random bad
luck and it just changed some alignment somewhere, and exposed a
hidden but pre-existing issue.

Jens separately said that he can see it with gcc-11, but not his
regular compiler, so regardless it seems to be compiler-dependent.

Let's see it x86 people have some idea, but that

   restore_all_switch_stack+0x65/0xe6

and doing an objdump to see the code generation, it is literally here:

        0f 20 d8                mov    %cr3,%eax
        0d 00 10 00 00          or     $0x1000,%eax
        0f 22 d8                mov    %eax,%cr3
        eb 16                   jmp    <restore_all_switch_stack+0x7d>

with that "jmp" instruction being the restore_all_switch_stack+0x65 address.

So the infinite page faults seem to literally happen right after the
"mov %eax,%cr3".

Definitely something wrong with the page tables. But where that
wrongness comes from, I have no idea.

            Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 18:35         ` Jens Axboe
@ 2024-07-30 18:54           ` Jens Axboe
  0 siblings, 0 replies; 59+ messages in thread
From: Jens Axboe @ 2024-07-30 18:54 UTC (permalink / raw)
  To: Guenter Roeck, Linus Torvalds; +Cc: Linux Kernel Mailing List

On 7/30/24 12:35 PM, Jens Axboe wrote:
> On 7/30/24 12:22 PM, Guenter Roeck wrote:
>> On 7/30/24 10:20, Jens Axboe wrote:
>>> On 7/30/24 11:04 AM, Guenter Roeck wrote:
>>>> On Mon, Jul 29, 2024 at 08:29:20AM -0700, Guenter Roeck wrote:
>>>>> On Sun, Jul 28, 2024 at 02:40:01PM -0700, Linus Torvalds wrote:
>>>>>> The merge window felt pretty normal, and the stats all look pretty
>>>>>> normal too. I was expecting things to be quieter because of summer
>>>>>> vacations, but that (still) doesn't actually seem to have been the
>>>>>> case.
>>>>>>
>>>>>> There's 12k+ regular commits (and another 850 merge commits), so as
>>>>>> always the summary of this all is just my merge log. The diffstats are
>>>>>> also (once again) dominated by some big hardware descriptions (another
>>>>>> AMD GPU register dump accounts for ~45% of the lines in the diff, and
>>>>>> some more perf event JSON descriptor files account for another 5%).
>>>>>>
>>>>>> But if you ignore those HW dumps, the diff too looks perfectly
>>>>>> regular: drivers account for a bit over half (even when not counting
>>>>>> the AMD register description noise). The rest is roughly one third
>>>>>> architecture updates (lots of it is dts files, so I guess I could have
>>>>>> lumped that in with "more hw descriptor tables"), one third tooling
>>>>>> and documentation, and one third "core kernel" (filesystems,
>>>>>> networking, VM and kernel). Very roughly.
>>>>>>
>>>>>> If you want more details, you should get the git tree, and then narrow
>>>>>> things down based on interests.
>>>>>>
>>>>>
>>>>> Build results:
>>>>>     total: 158 pass: 139 fail: 19
>>>>> Failed builds:
>>>> ...
>>>>>     i386:q35:pentium3:defconfig:pae:nosmp:net=ne2k_pci:initrd
>>>>
>>>> This failure bisects to commit 0256994887d7 ("Merge tag
>>>> 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux"). I have no
>>>> idea why that would be the case, but it is easy to reproduce. Maybe it is
>>>> coincidental. Either case, copying Jens in case he has an idea.
>>>
>>> I can take a look, but please post some details on what is actually
>>> being run here so I can attempt to reproduce it. I looked at your
>>> initial email too, and there's a link in there to:
>>>
>>> https://kerneltests.org/builders
>>>
>>> but I'm still not sure what's being run.
>>>
>>
>> Please see http://server.roeck-us.net/qemu/x86-nosmp/
> 
> Works fine for me on current master, boots and run self tests and
> then shuts down. Tried it 5 times now.
> 
> axboe@r7625 ~/g/linux-vm (master)> qemu-system-i386 --version
> QEMU emulator version 8.2.4 (Debian 1:8.2.4+ds-1)
> Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
> 
> Then tried 6.11-rc1 10 times in a loop, and also didn't see any failures.
> 
> I then switched to using gcc-11 as that seems to be what you are using,
> and them it does indeed bomb during boot. Funky. I'll check the post
> branch and see if it's anything from there.

I can fully revert that for-6.11/block-post merge and it still crashes
in the same way for me. So don't believe that's the culprit. It
consistently crashes with a double fault when starting cryptomgr, so
that may be a clue.

FWIW, if I disable KFENCE, then it boots just fine with gcc-11. Or if I
use gcc 13 or 14 it works just fine regardless of whether KFENCE is set
or not.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 18:53     ` Linus Torvalds
@ 2024-07-30 19:22       ` Peter Zijlstra
  2024-07-30 19:31         ` Jens Axboe
  2024-07-31 10:33       ` Peter Zijlstra
  1 sibling, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2024-07-30 19:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Andy Lutomirski, Ingo Molnar, Peter Anvin,
	Linux Kernel Mailing List, Jens Axboe, the arch/x86 maintainers

On Tue, Jul 30, 2024 at 11:53:31AM -0700, Linus Torvalds wrote:

> Which makes me think it's asm_exc_int3 just recursively failing.

Sounds like text_poke() going sideways, there's a jump_label fail out
there:

 https://lkml.kernel.org/r/20240730132626.GV26599@noisy.programming.kicks-ass.net

> Let's see it x86 people have some idea, but that
> 
>    restore_all_switch_stack+0x65/0xe6
> 
> and doing an objdump to see the code generation, it is literally here:
> 
>         0f 20 d8                mov    %cr3,%eax
>         0d 00 10 00 00          or     $0x1000,%eax
>         0f 22 d8                mov    %eax,%cr3

That looks like SWITCH_TO_USER_CR3

>         eb 16                   jmp    <restore_all_switch_stack+0x7d>
> 
> with that "jmp" instruction being the restore_all_switch_stack+0x65 address.

Thish would make this BUG_IF_WRONG_CR3, which starts with an ALTERNATIVE
jmp. I think we landed a pile of ALTERNATIVE patches this merge window.

That said, Boris did spend an awful lot of time testing them... but this
is 32bit so who knows how much time that got.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 19:22       ` Peter Zijlstra
@ 2024-07-30 19:31         ` Jens Axboe
  2024-07-30 19:34           ` Jens Axboe
  2024-07-30 19:38           ` Peter Zijlstra
  0 siblings, 2 replies; 59+ messages in thread
From: Jens Axboe @ 2024-07-30 19:31 UTC (permalink / raw)
  To: Peter Zijlstra, Linus Torvalds
  Cc: Guenter Roeck, Andy Lutomirski, Ingo Molnar, Peter Anvin,
	Linux Kernel Mailing List, the arch/x86 maintainers

On 7/30/24 1:22 PM, Peter Zijlstra wrote:
> On Tue, Jul 30, 2024 at 11:53:31AM -0700, Linus Torvalds wrote:
> 
>> Which makes me think it's asm_exc_int3 just recursively failing.
> 
> Sounds like text_poke() going sideways, there's a jump_label fail out
> there:
> 
>  https://lkml.kernel.org/r/20240730132626.GV26599@noisy.programming.kicks-ass.net

No change with this applied...

Also not sure if you read my link, but a few things to note:

- It only happens with gcc-11 here. I tried 12/13/14 and those
  are fine, don't have anything older

- It only happens with KFENCE enabled.


>> Let's see it x86 people have some idea, but that
>>
>>    restore_all_switch_stack+0x65/0xe6
>>
>> and doing an objdump to see the code generation, it is literally here:
>>
>>         0f 20 d8                mov    %cr3,%eax
>>         0d 00 10 00 00          or     $0x1000,%eax
>>         0f 22 d8                mov    %eax,%cr3
> 
> That looks like SWITCH_TO_USER_CR3
> 
>>         eb 16                   jmp    <restore_all_switch_stack+0x7d>
>>
>> with that "jmp" instruction being the restore_all_switch_stack+0x65 address.
> 
> Thish would make this BUG_IF_WRONG_CR3, which starts with an ALTERNATIVE
> jmp. I think we landed a pile of ALTERNATIVE patches this merge window.
> 
> That said, Boris did spend an awful lot of time testing them... but this
> is 32bit so who knows how much time that got.

Since I got this setup with Guenter's setup, it literally takes me seconds
to compile and test anything. So feel free to toss anything at it and we'll
see what sticks.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 19:31         ` Jens Axboe
@ 2024-07-30 19:34           ` Jens Axboe
  2024-07-30 19:38           ` Peter Zijlstra
  1 sibling, 0 replies; 59+ messages in thread
From: Jens Axboe @ 2024-07-30 19:34 UTC (permalink / raw)
  To: Peter Zijlstra, Linus Torvalds
  Cc: Guenter Roeck, Andy Lutomirski, Ingo Molnar, Peter Anvin,
	Linux Kernel Mailing List, the arch/x86 maintainers

On 7/30/24 1:31 PM, Jens Axboe wrote:
>> Thish would make this BUG_IF_WRONG_CR3, which starts with an ALTERNATIVE
>> jmp. I think we landed a pile of ALTERNATIVE patches this merge window.
>>
>> That said, Boris did spend an awful lot of time testing them... but this
>> is 32bit so who knows how much time that got.
> 
> Since I got this setup with Guenter's setup, it literally takes me seconds
> to compile and test anything. So feel free to toss anything at it and we'll
> see what sticks.

I reverted all the alternative changes, still crashes in the same way.
This is range 1467b49869df..208c6772d38392 fwiw.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 19:31         ` Jens Axboe
  2024-07-30 19:34           ` Jens Axboe
@ 2024-07-30 19:38           ` Peter Zijlstra
  2024-07-30 19:41             ` Linus Torvalds
                               ` (2 more replies)
  1 sibling, 3 replies; 59+ messages in thread
From: Peter Zijlstra @ 2024-07-30 19:38 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Guenter Roeck, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On Tue, Jul 30, 2024 at 01:31:18PM -0600, Jens Axboe wrote:
> On 7/30/24 1:22 PM, Peter Zijlstra wrote:
> > On Tue, Jul 30, 2024 at 11:53:31AM -0700, Linus Torvalds wrote:
> > 
> >> Which makes me think it's asm_exc_int3 just recursively failing.
> > 
> > Sounds like text_poke() going sideways, there's a jump_label fail out
> > there:
> > 
> >  https://lkml.kernel.org/r/20240730132626.GV26599@noisy.programming.kicks-ass.net
> 
> No change with this applied...
> 
> Also not sure if you read my link, but a few things to note:
> 
> - It only happens with gcc-11 here. I tried 12/13/14 and those
>   are fine, don't have anything older

One of my test boxes has 4.4 4.6 4.8 4.9 5 6 8 9 10 11 12 13

(now I gotta go figure out wth 7 went :-) And yeah, we don't support
most of those version anymore (phew).

So if its easy to setup, I could try older GCCs.

> - It only happens with KFENCE enabled.

I missed the KFENCE bit. Happen to have the .config handy, I couldn't
make much sense of Gunther's website in a hurry.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 19:38           ` Peter Zijlstra
@ 2024-07-30 19:41             ` Linus Torvalds
  2024-07-30 20:04             ` Guenter Roeck
  2024-07-30 20:24             ` Guenter Roeck
  2 siblings, 0 replies; 59+ messages in thread
From: Linus Torvalds @ 2024-07-30 19:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jens Axboe, Guenter Roeck, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On Tue, 30 Jul 2024 at 12:38, Peter Zijlstra <peterz@infradead.org> wrote:
>
>
> I missed the KFENCE bit. Happen to have the .config handy, I couldn't
> make much sense of Gunther's website in a hurry.

This is what you want to use:

  http://server.roeck-us.net/qemu/x86-nosmp/

It has that kernel config in there, along with the oops etc.

               Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 19:38           ` Peter Zijlstra
  2024-07-30 19:41             ` Linus Torvalds
@ 2024-07-30 20:04             ` Guenter Roeck
  2024-07-30 20:09               ` Peter Zijlstra
  2024-07-30 20:13               ` Linus Torvalds
  2024-07-30 20:24             ` Guenter Roeck
  2 siblings, 2 replies; 59+ messages in thread
From: Guenter Roeck @ 2024-07-30 20:04 UTC (permalink / raw)
  To: Peter Zijlstra, Jens Axboe
  Cc: Linus Torvalds, Andy Lutomirski, Ingo Molnar, Peter Anvin,
	Linux Kernel Mailing List, the arch/x86 maintainers

On 7/30/24 12:38, Peter Zijlstra wrote:
> On Tue, Jul 30, 2024 at 01:31:18PM -0600, Jens Axboe wrote:
>> On 7/30/24 1:22 PM, Peter Zijlstra wrote:
>>> On Tue, Jul 30, 2024 at 11:53:31AM -0700, Linus Torvalds wrote:
>>>
>>>> Which makes me think it's asm_exc_int3 just recursively failing.
>>>
>>> Sounds like text_poke() going sideways, there's a jump_label fail out
>>> there:
>>>
>>>   https://lkml.kernel.org/r/20240730132626.GV26599@noisy.programming.kicks-ass.net
>>
>> No change with this applied...
>>
>> Also not sure if you read my link, but a few things to note:
>>
>> - It only happens with gcc-11 here. I tried 12/13/14 and those
>>    are fine, don't have anything older
> 
> One of my test boxes has 4.4 4.6 4.8 4.9 5 6 8 9 10 11 12 13
> 
> (now I gotta go figure out wth 7 went :-) And yeah, we don't support
> most of those version anymore (phew).
> 
> So if its easy to setup, I could try older GCCs.
> 

WFM with gcc 9.4, 10.3, 12.4, and 13.3. gcc 11.4 and 11.5 both fail.

Maybe I should just switch to a more recent version of gcc and call it a day,
in the hope that it is a compiler (or qemu) problem and doesn't just hide
the problem.

Thoughts ?

Guenter

>> - It only happens with KFENCE enabled.
> 
> I missed the KFENCE bit. Happen to have the .config handy, I couldn't
> make much sense of Gunther's website in a hurry.
> 


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 20:04             ` Guenter Roeck
@ 2024-07-30 20:09               ` Peter Zijlstra
  2024-07-30 21:12                 ` Peter Zijlstra
  2024-07-30 23:29                 ` Guenter Roeck
  2024-07-30 20:13               ` Linus Torvalds
  1 sibling, 2 replies; 59+ messages in thread
From: Peter Zijlstra @ 2024-07-30 20:09 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Jens Axboe, Linus Torvalds, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On Tue, Jul 30, 2024 at 01:04:49PM -0700, Guenter Roeck wrote:
> On 7/30/24 12:38, Peter Zijlstra wrote:
> > On Tue, Jul 30, 2024 at 01:31:18PM -0600, Jens Axboe wrote:
> > > On 7/30/24 1:22 PM, Peter Zijlstra wrote:
> > > > On Tue, Jul 30, 2024 at 11:53:31AM -0700, Linus Torvalds wrote:
> > > > 
> > > > > Which makes me think it's asm_exc_int3 just recursively failing.
> > > > 
> > > > Sounds like text_poke() going sideways, there's a jump_label fail out
> > > > there:
> > > > 
> > > >   https://lkml.kernel.org/r/20240730132626.GV26599@noisy.programming.kicks-ass.net
> > > 
> > > No change with this applied...
> > > 
> > > Also not sure if you read my link, but a few things to note:
> > > 
> > > - It only happens with gcc-11 here. I tried 12/13/14 and those
> > >    are fine, don't have anything older
> > 
> > One of my test boxes has 4.4 4.6 4.8 4.9 5 6 8 9 10 11 12 13
> > 
> > (now I gotta go figure out wth 7 went :-) And yeah, we don't support
> > most of those version anymore (phew).
> > 
> > So if its easy to setup, I could try older GCCs.
> > 
> 
> WFM with gcc 9.4, 10.3, 12.4, and 13.3. gcc 11.4 and 11.5 both fail.

10.5 and 13.2 worked for me, and I can confirm 11.4 makes it go boom.

> Maybe I should just switch to a more recent version of gcc and call it a day,
> in the hope that it is a compiler (or qemu) problem and doesn't just hide
> the problem.
> 
> Thoughts ?

Tempting, but I think it would be good to figure out what in GCC-11
makes it sad, gcc-11 is still well within the supported range of GCCs
afaik.

Lets see if its something that wants to be bisected.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 20:04             ` Guenter Roeck
  2024-07-30 20:09               ` Peter Zijlstra
@ 2024-07-30 20:13               ` Linus Torvalds
  1 sibling, 0 replies; 59+ messages in thread
From: Linus Torvalds @ 2024-07-30 20:13 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Peter Zijlstra, Jens Axboe, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On Tue, 30 Jul 2024 at 13:04, Guenter Roeck <linux@roeck-us.net> wrote:
>
> Maybe I should just switch to a more recent version of gcc and call it a day,
> in the hope that it is a compiler (or qemu) problem and doesn't just hide
> the problem.

Well, if it's a gcc-11 problem, I think we still really want to know
what is going on. We are *not* all that close to dropping support for
gcc-11 yet.

And honestly, while it's often very convenient to blame the compiler,
compiler bugs are still very rare.

It's *much* more common that bad code just happens to work with a good
compiler than that good code happens to break with a bad compiler.

Yes, we obviously do hit real compiler bugs, but still ... We'd need
to actually see what goes wrong in the code generation before blaming
a compiler bug.

                  Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 19:38           ` Peter Zijlstra
  2024-07-30 19:41             ` Linus Torvalds
  2024-07-30 20:04             ` Guenter Roeck
@ 2024-07-30 20:24             ` Guenter Roeck
  2024-07-31 12:20               ` Peter Zijlstra
  2 siblings, 1 reply; 59+ messages in thread
From: Guenter Roeck @ 2024-07-30 20:24 UTC (permalink / raw)
  To: Peter Zijlstra, Jens Axboe
  Cc: Linus Torvalds, Andy Lutomirski, Ingo Molnar, Peter Anvin,
	Linux Kernel Mailing List, the arch/x86 maintainers

On 7/30/24 12:38, Peter Zijlstra wrote:
> On Tue, Jul 30, 2024 at 01:31:18PM -0600, Jens Axboe wrote:
>> On 7/30/24 1:22 PM, Peter Zijlstra wrote:
>>> On Tue, Jul 30, 2024 at 11:53:31AM -0700, Linus Torvalds wrote:
>>>
>>>> Which makes me think it's asm_exc_int3 just recursively failing.
>>>
>>> Sounds like text_poke() going sideways, there's a jump_label fail out
>>> there:
>>>
>>>   https://lkml.kernel.org/r/20240730132626.GV26599@noisy.programming.kicks-ass.net
>>
>> No change with this applied...
>>
>> Also not sure if you read my link, but a few things to note:
>>
>> - It only happens with gcc-11 here. I tried 12/13/14 and those
>>    are fine, don't have anything older
> 
> One of my test boxes has 4.4 4.6 4.8 4.9 5 6 8 9 10 11 12 13
> 
> (now I gotta go figure out wth 7 went :-) And yeah, we don't support
> most of those version anymore (phew).
> 
> So if its easy to setup, I could try older GCCs.
> 
>> - It only happens with KFENCE enabled.
> 
> I missed the KFENCE bit. Happen to have the .config handy, I couldn't
> make much sense of Gunther's website in a hurry.
> 

An interesting bit of information: The problem is seen with many,
but not all CPUs. For example, I don't see it with athlon, n270, Dhyana,
or EPYC. qemu32 is affected, but qemu64 is fine. But on the other side
both kvm32 and kvm64 are affected.

Guenter


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 20:09               ` Peter Zijlstra
@ 2024-07-30 21:12                 ` Peter Zijlstra
  2024-07-30 23:29                 ` Guenter Roeck
  1 sibling, 0 replies; 59+ messages in thread
From: Peter Zijlstra @ 2024-07-30 21:12 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Jens Axboe, Linus Torvalds, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On Tue, Jul 30, 2024 at 10:09:47PM +0200, Peter Zijlstra wrote:

> Lets see if its something that wants to be bisected.

Complete failure.. something along the way must've changed a critical
CONFIG symbol.  The .config I ended up with at v6.11-rc1 did no longer
reproduce.

I'll try again tomorrow if nobody beats me to it.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 20:09               ` Peter Zijlstra
  2024-07-30 21:12                 ` Peter Zijlstra
@ 2024-07-30 23:29                 ` Guenter Roeck
  2024-07-30 23:54                   ` Linus Torvalds
  1 sibling, 1 reply; 59+ messages in thread
From: Guenter Roeck @ 2024-07-30 23:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jens Axboe, Linus Torvalds, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On 7/30/24 13:09, Peter Zijlstra wrote:
> On Tue, Jul 30, 2024 at 01:04:49PM -0700, Guenter Roeck wrote:
>> On 7/30/24 12:38, Peter Zijlstra wrote:
>>> On Tue, Jul 30, 2024 at 01:31:18PM -0600, Jens Axboe wrote:
>>>> On 7/30/24 1:22 PM, Peter Zijlstra wrote:
>>>>> On Tue, Jul 30, 2024 at 11:53:31AM -0700, Linus Torvalds wrote:
>>>>>
>>>>>> Which makes me think it's asm_exc_int3 just recursively failing.
>>>>>
>>>>> Sounds like text_poke() going sideways, there's a jump_label fail out
>>>>> there:
>>>>>
>>>>>    https://lkml.kernel.org/r/20240730132626.GV26599@noisy.programming.kicks-ass.net
>>>>
>>>> No change with this applied...
>>>>
>>>> Also not sure if you read my link, but a few things to note:
>>>>
>>>> - It only happens with gcc-11 here. I tried 12/13/14 and those
>>>>     are fine, don't have anything older
>>>
>>> One of my test boxes has 4.4 4.6 4.8 4.9 5 6 8 9 10 11 12 13
>>>
>>> (now I gotta go figure out wth 7 went :-) And yeah, we don't support
>>> most of those version anymore (phew).
>>>
>>> So if its easy to setup, I could try older GCCs.
>>>
>>
>> WFM with gcc 9.4, 10.3, 12.4, and 13.3. gcc 11.4 and 11.5 both fail.
> 
> 10.5 and 13.2 worked for me, and I can confirm 11.4 makes it go boom.
> 
>> Maybe I should just switch to a more recent version of gcc and call it a day,
>> in the hope that it is a compiler (or qemu) problem and doesn't just hide
>> the problem.
>>
>> Thoughts ?
> 
> Tempting, but I think it would be good to figure out what in GCC-11
> makes it sad, gcc-11 is still well within the supported range of GCCs
> afaik.
> 
> Lets see if its something that wants to be bisected.

I tried bisecting several ways, but it always ends up at commit 0256994887d7
("Merge tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux").
Manual build confirmed that 0256994887d7 fails but 0256994887d7~1,
which is commit dd018c238b84 ("Merge tag 'bcachefs-2024-07-22' of
https://evilpiepirate.org/git/bcachefs") is fine, at least for me.

I then rebased 'for-6.11/block-post-20240722' on top of
dd018c238b84 and tried again. Result is below.

However, reverting this patch as well as the subsequent patches does not
fix the problem, and reverting the entire merge from the mainline kernel
doesn't fix it either.

The next step was to bisect starting from 0256994887d7, reverting the block merges
at each step. That points to the io_uring merge (second set of bisect results).
Hoever, reverting that merge doesn't help, and neither does reverting both
the block and the io_uring merges.

On the other side, reverting nothing but enabling CONFIG_CRYPTO_MANAGER_DISABLE_TESTS=y
makes the problem disappear. But that doesn't really help, especially since reverting
the patches touching CONFIG_CRYPTO_MANAGER_DISABLE_TESTS does _not_ help.

Baffled. Is it possible that the crashing code catches some page boundary ?

Guenter

---
# bad: [a9dd34ab77277f0fb7fa41a3edb8f0a71f7d791f] block: don't free the integrity payload in bio_integrity_unmap_free_user
# good: [dd018c238b8489b6dd8c06f6b962ea75d79115ff] Merge tag 'bcachefs-2024-07-22' of https://evilpiepirate.org/git/bcachefs
git bisect start 'HEAD' 'dd018c238b84'
# bad: [113799f9042573ba197de7a78a1e450cb40573ac] block: don't call bio_uninit from bio_endio
git bisect bad 113799f9042573ba197de7a78a1e450cb40573ac
# good: [473252aab8bf1a86e4266cb65f7baac1c10a70d9] block: also return bio_integrity_payload * from stubs
git bisect good 473252aab8bf1a86e4266cb65f7baac1c10a70d9
# first bad commit: [113799f9042573ba197de7a78a1e450cb40573ac] block: don't call bio_uninit from bio_endio

---
# bad: [8400291e289ee6b2bf9779ff1c83a291501f017b] Linux 6.11-rc1
# good: [0256994887d7c89c2a41d872aac67605bda8f115] Merge tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux
git bisect start 'v6.11-rc1' '0256994887d7'
# good: [b2eed73360dffea91ea64e8f19330c950dd42ebb] Merge tag 'linux-watchdog-6.11-rc1' of git://www.linux-watchdog.org/g
git bisect good b2eed73360dffea91ea64e8f19330c950dd42ebb
# good: [0ba9b1551185a8b42003b708b6a9c25a9808701e] Merge tag 'drm-next-2024-07-26' of https://gitlab.freedesktop.org/drl
git bisect good 0ba9b1551185a8b42003b708b6a9c25a9808701e
# good: [8e333791d4605dbce611c22f71a86721c9afc336] Merge tag 'gpio-fixes-for-v6.11-rc1' of git://git.kernel.org/pub/scmx
git bisect good 8e333791d4605dbce611c22f71a86721c9afc336
# bad: [5437f30d3458ad36e83ab96088d490ebfee844d8] Merge tag '6.11-rc-smb-client-fixes-part2' of git://git.samba.org/sfr6
git bisect bad 5437f30d3458ad36e83ab96088d490ebfee844d8
# good: [910bfc26d16d07df5a2bfcbc63f0aa9d1397e2ef] Merge tag 'rust-6.11' of https://github.com/Rust-for-Linux/linux
git bisect good 910bfc26d16d07df5a2bfcbc63f0aa9d1397e2ef
# bad: [8c9307474333d8d100870b45af00bfeb1872c836] Merge tag 'io_uring-6.11-20240726' of git://git.kernel.dk/linux
git bisect bad 8c9307474333d8d100870b45af00bfeb1872c836
# good: [29d63b94036e561a016ec8878b44aad6650d23e2] io_uring: align iowq and task request error handling
git bisect good 29d63b94036e561a016ec8878b44aad6650d23e2
# good: [358169617602f6f71b31e5c9532a09b95a34b043] io_uring/napi: pass ktime to io_napi_adjust_timeout
git bisect good 358169617602f6f71b31e5c9532a09b95a34b043
# good: [ef9ca17ca458ac7253ae71b552e601e49311fc48] hostfs: fix the host directory parse when mounting.
git bisect good ef9ca17ca458ac7253ae71b552e601e49311fc48
# good: [bc4eee85ca6ce5335efe314215841712b5531449] Merge tag 'vfs-6.11-rc1.fixes.3' of git://git.kernel.org/pub/scm/lins
git bisect good bc4eee85ca6ce5335efe314215841712b5531449
# first bad commit: [8c9307474333d8d100870b45af00bfeb1872c836] Merge tag 'io_uring-6.11-20240726' of git://git.kernel.dx



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 23:29                 ` Guenter Roeck
@ 2024-07-30 23:54                   ` Linus Torvalds
  2024-07-31  8:21                     ` Borislav Petkov
  2024-07-31 13:24                     ` Jens Axboe
  0 siblings, 2 replies; 59+ messages in thread
From: Linus Torvalds @ 2024-07-30 23:54 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Peter Zijlstra, Jens Axboe, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On Tue, 30 Jul 2024 at 16:29, Guenter Roeck <linux@roeck-us.net> wrote:
>
> Baffled. Is it possible that the crashing code catches some page boundary ?

We've definitely seen things like that before. Some alignment change
makes something cross a cacheline or page boundary, and it magically
causes a huge regression.

Usually it's about performance, though, not this kind of thing.

But I could imagine that some odd instruction rewriting thing goes
wrong only when the instruction crosses a page boundary, and that
we've never happened to hit that case, and then some kernel config
just moves the affected code around just enough.

That would then indirectly also explain why only some compiler
versions hit it - because it all depends on hitting that exact page
crosser.

You also seemed to say that it only happened with some CPU selections.
Maybe there's something wrong with the ALTERNATIVE() cleanups - I'm
looking at that new "nested alternatives macros" thing, and the odd
games we play with the origin and replacement lengths etc.

That all looks entirely crazy. That file was hard to read before, now
it's just incomprehensible to me.

                  Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 23:54                   ` Linus Torvalds
@ 2024-07-31  8:21                     ` Borislav Petkov
  2024-07-31  9:11                       ` Peter Zijlstra
  2024-07-31 14:37                       ` Guenter Roeck
  2024-07-31 13:24                     ` Jens Axboe
  1 sibling, 2 replies; 59+ messages in thread
From: Borislav Petkov @ 2024-07-31  8:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Peter Zijlstra, Jens Axboe, Andy Lutomirski,
	Ingo Molnar, Peter Anvin, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Tue, Jul 30, 2024 at 04:54:43PM -0700, Linus Torvalds wrote:
> You also seemed to say that it only happened with some CPU selections.
> Maybe there's something wrong with the ALTERNATIVE() cleanups - I'm
> looking at that new "nested alternatives macros" thing, and the odd
> games we play with the origin and replacement lengths etc.
> 
> That all looks entirely crazy. That file was hard to read before, now
> it's just incomprehensible to me.

I'm sorry to hear that. The reason we did it is because it was starting to
become really unwieldy to add a yet another alternative choice N in an
ALTERNATIVE_N call...

Anyway, I'll try to reproduce here. In the meantime, can anyone who can
reproduce - Guenter, Jens - boot that failing kernel with

  debug-alternative=-1

and copy dmesg and vmlinux somewhere for me?

It is a lot of output so make sure to catch it all.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31  8:21                     ` Borislav Petkov
@ 2024-07-31  9:11                       ` Peter Zijlstra
  2024-07-31 10:02                         ` Borislav Petkov
  2024-07-31 14:37                       ` Guenter Roeck
  1 sibling, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2024-07-31  9:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Linus Torvalds, Guenter Roeck, Jens Axboe, Andy Lutomirski,
	Ingo Molnar, Peter Anvin, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Wed, Jul 31, 2024 at 10:21:11AM +0200, Borislav Petkov wrote:
> On Tue, Jul 30, 2024 at 04:54:43PM -0700, Linus Torvalds wrote:
> > You also seemed to say that it only happened with some CPU selections.
> > Maybe there's something wrong with the ALTERNATIVE() cleanups - I'm
> > looking at that new "nested alternatives macros" thing, and the odd
> > games we play with the origin and replacement lengths etc.
> > 
> > That all looks entirely crazy. That file was hard to read before, now
> > it's just incomprehensible to me.
> 
> I'm sorry to hear that. The reason we did it is because it was starting to
> become really unwieldy to add a yet another alternative choice N in an
> ALTERNATIVE_N call...
> 
> Anyway, I'll try to reproduce here. In the meantime, can anyone who can
> reproduce - Guenter, Jens - boot that failing kernel with
> 
>   debug-alternative=-1
> 
> and copy dmesg and vmlinux somewhere for me?
> 
> It is a lot of output so make sure to catch it all.

So what I done instead is add: nokaslr to CMDLINE and -S -s to qemu and
am staring at the failing kernel in gdb.

So far all the alternatives in the affected paths look just fine.

Not that any of it is making sense, notably:

Code: bf 1e c2 e9 23 06 00 00 66 90 8d 76 00 fc 6a 00 68 f0 bd 1e c2 e9 11 06 00 00 8d 76 00 fc 6a 00 68 54 c5 1e c2 e9 01 06 00 00 <8d> 76 00 fc 68 b0 e9 1e c2 e9 f3 05 00 00 66 90 8d 76 00 fc 6a 00

decodes to:

   0:   bf 1e c2 e9 23          mov    $0x23e9c21e,%edi
   5:   06                      (bad)
   6:   00 00                   add    %al,(%rax)
   8:   66 90                   xchg   %ax,%ax
asm_exc_invalid_op:
   a:   8d 76 00                lea    0x0(%rsi),%esi
   d:   fc                      cld
   e:   6a 00                   push   $0x0
  10:   68 f0 bd 1e c2          push   $0xffffffffc21ebdf0
  15:   e9 11 06 00 00          jmp    0x62b
asm_exc_int3:
  1a:   8d 76 00                lea    0x0(%rsi),%esi
  1d:   fc                      cld
  1e:   6a 00                   push   $0x0
  20:   68 54 c5 1e c2          push   $0xffffffffc21ec554
  25:   e9 01 06 00 00          jmp    0x62b
asm_exc_page_fault:
  2a:*  8d 76 00                lea    0x0(%rsi),%esi           <-- trapping instruction
  2d:   fc                      cld
  2e:   68 b0 e9 1e c2          push   $0xffffffffc21ee9b0
  33:   e9 f3 05 00 00          jmp    0x62b
  38:   66 90                   xchg   %ax,%ax
asm_exc_machine_check:
  3a:   8d 76 00                lea    0x0(%rsi),%esi
  3d:   fc                      cld
  3e:   6a 00                   push   $0x0

And that trapping instruction is the CLAC nop (still a nop in the
faulting kernel image):

(gdb) disassemble asm_exc_page_fault
Dump of assembler code for function asm_exc_page_fault:
   0xc2200350 <+0>:     lea    0x0(%esi),%esi
   0xc2200353 <+3>:     cld
   0xc2200354 <+4>:     push   $0xc21ee9b0
   0xc2200359 <+9>:     jmp    0xc2200951 <handle_exception>
End of assembler dump.

And then we have the endless stream of:

  asm_exc_int3+0x10/0x10

which really is: asm_exc_page_fault+0x0/0x10, but that cannot be,
because then we'd have #DF much sooner.


The restore_all_switch_stack+0x65/0xe6 thing looks like so in the live
kernel image:

(gdb) disassemble restore_all_switch_stack
Dump of assembler code for function entry_INT80_32:
...
   0xc22008c5 <+353>:   mov    %cr3,%eax
   0xc22008c8 <+356>:   or     $0x1000,%eax
   0xc22008cd <+361>:   mov    %eax,%cr3
   0xc22008d0 <+364>:   mov    %esi,%esi		<--- here
   0xc22008d2 <+366>:   testl  $0x2,0x34(%esp)
   0xc22008da <+374>:   je     0xc22008e8 <entry_INT80_32+388>
   0xc22008dc <+376>:   mov    %cr3,%eax
   0xc22008df <+379>:   test   $0x1000,%eax
   0xc22008e4 <+384>:   jne    0xc22008e8 <entry_INT80_32+388>
   0xc22008e6 <+386>:   ud2
   0xc22008e8 <+388>:   pop    %ebx
...

So that is indeed BUG_IF_WRONG_CR3 and the JMP got patched to a NOP2.
Nothing strange there.


So yeah, no clue still.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31  9:11                       ` Peter Zijlstra
@ 2024-07-31 10:02                         ` Borislav Petkov
  0 siblings, 0 replies; 59+ messages in thread
From: Borislav Petkov @ 2024-07-31 10:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Guenter Roeck, Jens Axboe, Andy Lutomirski,
	Ingo Molnar, Peter Anvin, Linux Kernel Mailing List,
	the arch/x86 maintainers

Just a data point:

gcc-11 (Debian 11.2.0-19) 11.2.0 - does NOT repro.

Upgrading to

gcc-11 (Debian 11.5.0-1) 11.5.0

*does* repro.

Fun.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 18:53     ` Linus Torvalds
  2024-07-30 19:22       ` Peter Zijlstra
@ 2024-07-31 10:33       ` Peter Zijlstra
  2024-07-31 14:15         ` Peter Zijlstra
  1 sibling, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2024-07-31 10:33 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Andy Lutomirski, Ingo Molnar, Peter Anvin,
	Linux Kernel Mailing List, Jens Axboe, the arch/x86 maintainers

On Tue, Jul 30, 2024 at 11:53:31AM -0700, Linus Torvalds wrote:

> Definitely something wrong with the page tables. But where that
> wrongness comes from, I have no idea.

[   10.231081] CR0: 80050033 CR2: ffa02ffc CR3: 02bc6000 CR4: 000006f0

See CR3 being a user address.... but yeah, million dollar question is
how the fuck did that happen?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 20:24             ` Guenter Roeck
@ 2024-07-31 12:20               ` Peter Zijlstra
  2024-07-31 13:03                 ` Thomas Gleixner
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2024-07-31 12:20 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Jens Axboe, Linus Torvalds, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On Tue, Jul 30, 2024 at 01:24:34PM -0700, Guenter Roeck wrote:
> An interesting bit of information: The problem is seen with many,
> but not all CPUs. For example, I don't see it with athlon, n270, Dhyana,
> or EPYC. qemu32 is affected, but qemu64 is fine. But on the other side
> both kvm32 and kvm64 are affected.

pti=off makes it go away, could be those CPU models don't have meltdown
and as such don't enable PTI.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 12:20               ` Peter Zijlstra
@ 2024-07-31 13:03                 ` Thomas Gleixner
  2024-07-31 15:55                   ` Peter Zijlstra
  0 siblings, 1 reply; 59+ messages in thread
From: Thomas Gleixner @ 2024-07-31 13:03 UTC (permalink / raw)
  To: Peter Zijlstra, Guenter Roeck
  Cc: Jens Axboe, Linus Torvalds, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On Wed, Jul 31 2024 at 14:20, Peter Zijlstra wrote:
> On Tue, Jul 30, 2024 at 01:24:34PM -0700, Guenter Roeck wrote:
>> An interesting bit of information: The problem is seen with many,
>> but not all CPUs. For example, I don't see it with athlon, n270, Dhyana,
>> or EPYC. qemu32 is affected, but qemu64 is fine. But on the other side
>> both kvm32 and kvm64 are affected.
>
> pti=off makes it go away, could be those CPU models don't have meltdown
> and as such don't enable PTI.

The AMD ones don't have meltdown and neither does n270 which is an
in-order atom.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-30 23:54                   ` Linus Torvalds
  2024-07-31  8:21                     ` Borislav Petkov
@ 2024-07-31 13:24                     ` Jens Axboe
  1 sibling, 0 replies; 59+ messages in thread
From: Jens Axboe @ 2024-07-31 13:24 UTC (permalink / raw)
  To: Linus Torvalds, Guenter Roeck
  Cc: Peter Zijlstra, Andy Lutomirski, Ingo Molnar, Peter Anvin,
	Linux Kernel Mailing List, the arch/x86 maintainers

On 7/30/24 5:54 PM, Linus Torvalds wrote:
> You also seemed to say that it only happened with some CPU selections.
> Maybe there's something wrong with the ALTERNATIVE() cleanups - I'm
> looking at that new "nested alternatives macros" thing, and the odd
> games we play with the origin and replacement lengths etc.
> 
> That all looks entirely crazy. That file was hard to read before, now
> it's just incomprehensible to me.

As I reported earlier, I already tried with the alternative cleanups
reverted, and it made zero difference - it still goes boom in very much
the same way.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 10:33       ` Peter Zijlstra
@ 2024-07-31 14:15         ` Peter Zijlstra
  0 siblings, 0 replies; 59+ messages in thread
From: Peter Zijlstra @ 2024-07-31 14:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Andy Lutomirski, Ingo Molnar, Peter Anvin,
	Linux Kernel Mailing List, Jens Axboe, the arch/x86 maintainers

On Wed, Jul 31, 2024 at 12:33:32PM +0200, Peter Zijlstra wrote:
> On Tue, Jul 30, 2024 at 11:53:31AM -0700, Linus Torvalds wrote:
> 
> > Definitely something wrong with the page tables. But where that
> > wrongness comes from, I have no idea.
> 
> [   10.231081] CR0: 80050033 CR2: ffa02ffc CR3: 02bc6000 CR4: 000006f0
> 
> See CR3 being a user address.... but yeah, million dollar question is
> how the fuck did that happen?

Thomas just reminded me that CR3 is physical... duh.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31  8:21                     ` Borislav Petkov
  2024-07-31  9:11                       ` Peter Zijlstra
@ 2024-07-31 14:37                       ` Guenter Roeck
  1 sibling, 0 replies; 59+ messages in thread
From: Guenter Roeck @ 2024-07-31 14:37 UTC (permalink / raw)
  To: Borislav Petkov, Linus Torvalds
  Cc: Peter Zijlstra, Jens Axboe, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On 7/31/24 01:21, Borislav Petkov wrote:
> On Tue, Jul 30, 2024 at 04:54:43PM -0700, Linus Torvalds wrote:
>> You also seemed to say that it only happened with some CPU selections.
>> Maybe there's something wrong with the ALTERNATIVE() cleanups - I'm
>> looking at that new "nested alternatives macros" thing, and the odd
>> games we play with the origin and replacement lengths etc.
>>
>> That all looks entirely crazy. That file was hard to read before, now
>> it's just incomprehensible to me.
> 
> I'm sorry to hear that. The reason we did it is because it was starting to
> become really unwieldy to add a yet another alternative choice N in an
> ALTERNATIVE_N call...
> 
> Anyway, I'll try to reproduce here. In the meantime, can anyone who can
> reproduce - Guenter, Jens - boot that failing kernel with
> 
>    debug-alternative=-1
> 
> and copy dmesg and vmlinux somewhere for me?
> 
> It is a lot of output so make sure to catch it all.
> 
> Thx.
> 

See http://server.roeck-us.net/qemu/x86-nosmp/ for images; I copied
vmlinux there as well. Various logs are in
http://server.roeck-us.net/qemu/x86-nosmp/logs/; relevant

log-n270-good		boots
log-pentium2-bad	crashes
cpu-list		List of tested CPUs, with results
			Note that Opteron_G4 and Opteron_G5 are
			broken in upstream qemu since qemu v6.1.

Hope this helps,

Guenter


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-29 19:23   ` Linus Torvalds
  2024-07-29 19:50     ` Linus Torvalds
  2024-07-30  7:54     ` Peter Zijlstra
@ 2024-07-31 15:45     ` Guenter Roeck
  2 siblings, 0 replies; 59+ messages in thread
From: Guenter Roeck @ 2024-07-31 15:45 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra, Sebastian Andrzej Siewior,
	Ingo Molnar
  Cc: Linux Kernel Mailing List

On 7/29/24 12:23, Linus Torvalds wrote:
> On Mon, 29 Jul 2024 at 08:29, Guenter Roeck <linux@roeck-us.net> wrote:
>>
>> In summary, quite impressive in a negative sense.
> 
> Grr. I think a lot of the build failures end up being due to commit
> 466e4d801cd4 ("task_work: Add TWA_NMI_CURRENT as an additional notify
> mode") depending on IRQ_WORK, and that not existing everywhere.
> 
> I pushed out a tentative fix as commit cec6937dd1aa ("task_work: make
> TWA_NMI_CURRENT handling conditional on IRQ_WORK"). I haven't set up a
> build environment for those tiny targets, but it looked fairly
> straightforward.
> 
> I think that explains at least most of the 'tinyconfig' build failures.
> 

All "tinyconfig" build tests pass with v6.11-rc1-43-g94ede2a3e913,
so that problem has been fixed.

Thanks,
Guenter


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 13:03                 ` Thomas Gleixner
@ 2024-07-31 15:55                   ` Peter Zijlstra
  2024-07-31 16:17                     ` Linus Torvalds
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2024-07-31 15:55 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Guenter Roeck, Jens Axboe, Linus Torvalds, Andy Lutomirski,
	Ingo Molnar, Peter Anvin, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Wed, Jul 31, 2024 at 03:03:33PM +0200, Thomas Gleixner wrote:
> On Wed, Jul 31 2024 at 14:20, Peter Zijlstra wrote:
> > On Tue, Jul 30, 2024 at 01:24:34PM -0700, Guenter Roeck wrote:
> >> An interesting bit of information: The problem is seen with many,
> >> but not all CPUs. For example, I don't see it with athlon, n270, Dhyana,
> >> or EPYC. qemu32 is affected, but qemu64 is fine. But on the other side
> >> both kvm32 and kvm64 are affected.
> >
> > pti=off makes it go away, could be those CPU models don't have meltdown
> > and as such don't enable PTI.
> 
> The AMD ones don't have meltdown and neither does n270 which is an
> in-order atom.

Right, so Thomas found that i386-pti fails to map the entire entry text.
Specifically pti_clone_pgtable() hard relies -- and does not verify --
that the start address is aligned to the given granularity.

Now, i386 does not align __entry_text_start, and so the termination
condition goes sideways and pte_clone_entry() does not always work right
and it becomes a games of code layout roulette.

Still trying to figure out what the right fix is. I've tried page
aligning the section and using PTE cloning, and that works -- mostly. If
you hit a source PMD the clone logic still does a PMD level clone and
that might not be what we want, see the alignment thing again.

Also, should we just kill PTI on 32bit perhaps?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 15:55                   ` Peter Zijlstra
@ 2024-07-31 16:17                     ` Linus Torvalds
  2024-07-31 16:31                       ` Peter Zijlstra
  2024-07-31 16:49                       ` Linux 6.11-rc1 Guenter Roeck
  0 siblings, 2 replies; 59+ messages in thread
From: Linus Torvalds @ 2024-07-31 16:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Guenter Roeck, Jens Axboe, Andy Lutomirski,
	Ingo Molnar, Peter Anvin, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Wed, 31 Jul 2024 at 08:55, Peter Zijlstra <peterz@infradead.org> wrote:
>
> Right, so Thomas found that i386-pti fails to map the entire entry text.
> Specifically pti_clone_pgtable() hard relies -- and does not verify --
> that the start address is aligned to the given granularity.
>
> Now, i386 does not align __entry_text_start, and so the termination
> condition goes sideways and pte_clone_entry() does not always work right
> and it becomes a games of code layout roulette.

Lovely.

> Also, should we just kill PTI on 32bit perhaps?

I don't think there's much technical reason to keep it - I can't
imagine any security-conscious people actually use 32-bit x86 any more
- but apart from fixing this bug I wonder how much of a maintenance
burden it is? I think most of the code is shared with 64-bit, isn't
it? The 32-bit case in many ways is simpler, even if it happened to
hit this odd alignment issue because it's obviously also a lot less
tested.

I'd rather kill highmem and X86_PAE, but I also suspect that horror
has a much larger chance of still being used.

The day we finally get rid of HIGHMEM I will dance on its grave. I
have hated that thing for a long long time.

              Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 16:17                     ` Linus Torvalds
@ 2024-07-31 16:31                       ` Peter Zijlstra
  2024-07-31 16:50                         ` Guenter Roeck
                                           ` (3 more replies)
  2024-07-31 16:49                       ` Linux 6.11-rc1 Guenter Roeck
  1 sibling, 4 replies; 59+ messages in thread
From: Peter Zijlstra @ 2024-07-31 16:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Guenter Roeck, Jens Axboe, Andy Lutomirski,
	Ingo Molnar, Peter Anvin, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Wed, Jul 31, 2024 at 09:17:44AM -0700, Linus Torvalds wrote:
> On Wed, 31 Jul 2024 at 08:55, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Right, so Thomas found that i386-pti fails to map the entire entry text.
> > Specifically pti_clone_pgtable() hard relies -- and does not verify --
> > that the start address is aligned to the given granularity.
> >
> > Now, i386 does not align __entry_text_start, and so the termination
> > condition goes sideways and pte_clone_entry() does not always work right
> > and it becomes a games of code layout roulette.
> 
> Lovely.

:-)

This fixes the alignment assumptions and makes it all go again.

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 2e69abf4f852..bfdf5f45b137 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -374,14 +374,14 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
 			 */
 			*target_pmd = *pmd;
 
-			addr += PMD_SIZE;
+			addr = round_up(addr + 1, PMD_SIZE);
 
 		} else if (level == PTI_CLONE_PTE) {
 
 			/* Walk the page-table down to the pte level */
 			pte = pte_offset_kernel(pmd, addr);
 			if (pte_none(*pte)) {
-				addr += PAGE_SIZE;
+				addr = round_up(addr + 1, PAGE_SIZE);
 				continue;
 			}
 
@@ -401,7 +401,7 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
 			/* Clone the PTE */
 			*target_pte = *pte;
 
-			addr += PAGE_SIZE;
+			addr = round_up(addr + 1, PAGE_SIZE);
 
 		} else {
 			BUG();

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 16:17                     ` Linus Torvalds
  2024-07-31 16:31                       ` Peter Zijlstra
@ 2024-07-31 16:49                       ` Guenter Roeck
  2024-07-31 17:19                         ` Thomas Gleixner
  1 sibling, 1 reply; 59+ messages in thread
From: Guenter Roeck @ 2024-07-31 16:49 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra
  Cc: Thomas Gleixner, Jens Axboe, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On 7/31/24 09:17, Linus Torvalds wrote:
> On Wed, 31 Jul 2024 at 08:55, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> Right, so Thomas found that i386-pti fails to map the entire entry text.
>> Specifically pti_clone_pgtable() hard relies -- and does not verify --
>> that the start address is aligned to the given granularity.
>>
>> Now, i386 does not align __entry_text_start, and so the termination
>> condition goes sideways and pte_clone_entry() does not always work right
>> and it becomes a games of code layout roulette.
> 
> Lovely.
> 
>> Also, should we just kill PTI on 32bit perhaps?
> 
> I don't think there's much technical reason to keep it - I can't
> imagine any security-conscious people actually use 32-bit x86 any more
> - but apart from fixing this bug I wonder how much of a maintenance
> burden it is? I think most of the code is shared with 64-bit, isn't
> it? The 32-bit case in many ways is simpler, even if it happened to
> hit this odd alignment issue because it's obviously also a lot less
> tested.
> 
> I'd rather kill highmem and X86_PAE, but I also suspect that horror
> has a much larger chance of still being used.
> 

I guess there is at least one user - me with my annoying boot tests ;-).

But seriously the question is: How likely is it for that code to find
potential problems in the 64-bit code ? pti_clone_pgtable() doesn't
seem to be 32-bit specific.

Guenter


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 16:31                       ` Peter Zijlstra
@ 2024-07-31 16:50                         ` Guenter Roeck
  2024-07-31 16:51                         ` Peter Zijlstra
                                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 59+ messages in thread
From: Guenter Roeck @ 2024-07-31 16:50 UTC (permalink / raw)
  To: Peter Zijlstra, Linus Torvalds
  Cc: Thomas Gleixner, Jens Axboe, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On 7/31/24 09:31, Peter Zijlstra wrote:
> On Wed, Jul 31, 2024 at 09:17:44AM -0700, Linus Torvalds wrote:
>> On Wed, 31 Jul 2024 at 08:55, Peter Zijlstra <peterz@infradead.org> wrote:
>>>
>>> Right, so Thomas found that i386-pti fails to map the entire entry text.
>>> Specifically pti_clone_pgtable() hard relies -- and does not verify --
>>> that the start address is aligned to the given granularity.
>>>
>>> Now, i386 does not align __entry_text_start, and so the termination
>>> condition goes sideways and pte_clone_entry() does not always work right
>>> and it becomes a games of code layout roulette.
>>
>> Lovely.
> 
> :-)
> 
> This fixes the alignment assumptions and makes it all go again.
> 

Confirmed.

Tested-by: Guenter Roeck <linux@roeck-us.net>

Thanks,
Guenter

> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
> index 2e69abf4f852..bfdf5f45b137 100644
> --- a/arch/x86/mm/pti.c
> +++ b/arch/x86/mm/pti.c
> @@ -374,14 +374,14 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
>   			 */
>   			*target_pmd = *pmd;
>   
> -			addr += PMD_SIZE;
> +			addr = round_up(addr + 1, PMD_SIZE);
>   
>   		} else if (level == PTI_CLONE_PTE) {
>   
>   			/* Walk the page-table down to the pte level */
>   			pte = pte_offset_kernel(pmd, addr);
>   			if (pte_none(*pte)) {
> -				addr += PAGE_SIZE;
> +				addr = round_up(addr + 1, PAGE_SIZE);
>   				continue;
>   			}
>   
> @@ -401,7 +401,7 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
>   			/* Clone the PTE */
>   			*target_pte = *pte;
>   
> -			addr += PAGE_SIZE;
> +			addr = round_up(addr + 1, PAGE_SIZE);
>   
>   		} else {
>   			BUG();


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 16:31                       ` Peter Zijlstra
  2024-07-31 16:50                         ` Guenter Roeck
@ 2024-07-31 16:51                         ` Peter Zijlstra
  2024-07-31 17:26                           ` Thomas Gleixner
  2024-08-01 10:55                         ` [tip: x86/urgent] x86/mm: Fix pti_clone_pgtable() alignment assumption tip-bot2 for Peter Zijlstra
  2024-08-01 13:03                         ` tip-bot2 for Peter Zijlstra
  3 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2024-07-31 16:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Guenter Roeck, Jens Axboe, Andy Lutomirski,
	Ingo Molnar, Peter Anvin, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Wed, Jul 31, 2024 at 06:31:05PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 31, 2024 at 09:17:44AM -0700, Linus Torvalds wrote:
> > On Wed, 31 Jul 2024 at 08:55, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > Right, so Thomas found that i386-pti fails to map the entire entry text.
> > > Specifically pti_clone_pgtable() hard relies -- and does not verify --
> > > that the start address is aligned to the given granularity.
> > >
> > > Now, i386 does not align __entry_text_start, and so the termination
> > > condition goes sideways and pte_clone_entry() does not always work right
> > > and it becomes a games of code layout roulette.
> > 
> > Lovely.
> 
> :-)
> 
> This fixes the alignment assumptions and makes it all go again.

Thomas, this all still relies on the full text section being PMD mapped,
and since we don't have ALIGN_ENTRY_TEXT_END and _etext has PAGE_SIZE
alignment, can't have a PAGE mapped tail which then doesn't get cloned?

Do we want to make pto_clone_entry_text() use PTI_LEVEL_KERNEL_IMAGE
such that it will clone whatever it has?

> diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
> index 2e69abf4f852..bfdf5f45b137 100644
> --- a/arch/x86/mm/pti.c
> +++ b/arch/x86/mm/pti.c
> @@ -374,14 +374,14 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
>  			 */
>  			*target_pmd = *pmd;
>  
> -			addr += PMD_SIZE;
> +			addr = round_up(addr + 1, PMD_SIZE);
>  
>  		} else if (level == PTI_CLONE_PTE) {
>  
>  			/* Walk the page-table down to the pte level */
>  			pte = pte_offset_kernel(pmd, addr);
>  			if (pte_none(*pte)) {
> -				addr += PAGE_SIZE;
> +				addr = round_up(addr + 1, PAGE_SIZE);
>  				continue;
>  			}
>  
> @@ -401,7 +401,7 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
>  			/* Clone the PTE */
>  			*target_pte = *pte;
>  
> -			addr += PAGE_SIZE;
> +			addr = round_up(addr + 1, PAGE_SIZE);
>  
>  		} else {
>  			BUG();

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 16:49                       ` Linux 6.11-rc1 Guenter Roeck
@ 2024-07-31 17:19                         ` Thomas Gleixner
  0 siblings, 0 replies; 59+ messages in thread
From: Thomas Gleixner @ 2024-07-31 17:19 UTC (permalink / raw)
  To: Guenter Roeck, Linus Torvalds, Peter Zijlstra
  Cc: Jens Axboe, Andy Lutomirski, Ingo Molnar, Peter Anvin,
	Linux Kernel Mailing List, the arch/x86 maintainers

On Wed, Jul 31 2024 at 09:49, Guenter Roeck wrote:
> On 7/31/24 09:17, Linus Torvalds wrote:
> I guess there is at least one user - me with my annoying boot tests ;-).
>
> But seriously the question is: How likely is it for that code to find
> potential problems in the 64-bit code ? pti_clone_pgtable() doesn't
> seem to be 32-bit specific.

64-bit does not have the problem because everything is PMD aligned.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 16:51                         ` Peter Zijlstra
@ 2024-07-31 17:26                           ` Thomas Gleixner
  2024-07-31 21:20                             ` Peter Zijlstra
  0 siblings, 1 reply; 59+ messages in thread
From: Thomas Gleixner @ 2024-07-31 17:26 UTC (permalink / raw)
  To: Peter Zijlstra, Linus Torvalds
  Cc: Guenter Roeck, Jens Axboe, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On Wed, Jul 31 2024 at 18:51, Peter Zijlstra wrote:
> On Wed, Jul 31, 2024 at 06:31:05PM +0200, Peter Zijlstra wrote:
> Thomas, this all still relies on the full text section being PMD mapped,
> and since we don't have ALIGN_ENTRY_TEXT_END and _etext has PAGE_SIZE
> alignment, can't have a PAGE mapped tail which then doesn't get cloned?
>
> Do we want to make pto_clone_entry_text() use PTI_LEVEL_KERNEL_IMAGE
> such that it will clone whatever it has?

Yes, I think so.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 17:26                           ` Thomas Gleixner
@ 2024-07-31 21:20                             ` Peter Zijlstra
  2024-07-31 21:23                               ` Linus Torvalds
  2024-07-31 22:22                               ` Guenter Roeck
  0 siblings, 2 replies; 59+ messages in thread
From: Peter Zijlstra @ 2024-07-31 21:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Guenter Roeck, Jens Axboe, Andy Lutomirski,
	Ingo Molnar, Peter Anvin, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Wed, Jul 31, 2024 at 07:26:04PM +0200, Thomas Gleixner wrote:
> On Wed, Jul 31 2024 at 18:51, Peter Zijlstra wrote:
> > On Wed, Jul 31, 2024 at 06:31:05PM +0200, Peter Zijlstra wrote:
> > Thomas, this all still relies on the full text section being PMD mapped,
> > and since we don't have ALIGN_ENTRY_TEXT_END and _etext has PAGE_SIZE
> > alignment, can't have a PAGE mapped tail which then doesn't get cloned?
> >
> > Do we want to make pto_clone_entry_text() use PTI_LEVEL_KERNEL_IMAGE
> > such that it will clone whatever it has?
> 
> Yes, I think so.

The alternative is ripping that level thing out entirely, and simply
duplicate anything we find in the page-tables.

We could add something like:

	WARN_ON_ONCE(IS_ENABLED(CONFIG_X86_64));

in the PTE path, but do we really care?

---
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -47,16 +47,6 @@
 #define __GFP_NOTRACK	0
 #endif
 
-/*
- * Define the page-table levels we clone for user-space on 32
- * and 64 bit.
- */
-#ifdef CONFIG_X86_64
-#define	PTI_LEVEL_KERNEL_IMAGE	PTI_CLONE_PMD
-#else
-#define	PTI_LEVEL_KERNEL_IMAGE	PTI_CLONE_PTE
-#endif
-
 static void __init pti_print_if_insecure(const char *reason)
 {
 	if (boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
@@ -294,14 +284,7 @@ static void __init pti_setup_vsyscall(vo
 static void __init pti_setup_vsyscall(void) { }
 #endif
 
-enum pti_clone_level {
-	PTI_CLONE_PMD,
-	PTI_CLONE_PTE,
-};
-
-static void
-pti_clone_pgtable(unsigned long start, unsigned long end,
-		  enum pti_clone_level level)
+static void pti_clone_pgtable(unsigned long start, unsigned long end)
 {
 	unsigned long addr;
 
@@ -341,7 +324,7 @@ pti_clone_pgtable(unsigned long start, u
 			continue;
 		}
 
-		if (pmd_leaf(*pmd) || level == PTI_CLONE_PMD) {
+		if (pmd_leaf(*pmd)) {
 			target_pmd = pti_user_pagetable_walk_pmd(addr);
 			if (WARN_ON(!target_pmd))
 				return;
@@ -375,37 +358,33 @@ pti_clone_pgtable(unsigned long start, u
 			*target_pmd = *pmd;
 
 			addr = round_up(addr + 1, PMD_SIZE);
+			continue;
+		}
 
-		} else if (level == PTI_CLONE_PTE) {
-
-			/* Walk the page-table down to the pte level */
-			pte = pte_offset_kernel(pmd, addr);
-			if (pte_none(*pte)) {
-				addr = round_up(addr + 1, PAGE_SIZE);
-				continue;
-			}
-
-			/* Only clone present PTEs */
-			if (WARN_ON(!(pte_flags(*pte) & _PAGE_PRESENT)))
-				return;
+		/* Walk the page-table down to the pte level */
+		pte = pte_offset_kernel(pmd, addr);
+		if (pte_none(*pte)) {
+			addr = round_up(addr + 1, PAGE_SIZE);
+			continue;
+		}
 
-			/* Allocate PTE in the user page-table */
-			target_pte = pti_user_pagetable_walk_pte(addr);
-			if (WARN_ON(!target_pte))
-				return;
+		/* Only clone present PTEs */
+		if (WARN_ON(!(pte_flags(*pte) & _PAGE_PRESENT)))
+			return;
 
-			/* Set GLOBAL bit in both PTEs */
-			if (boot_cpu_has(X86_FEATURE_PGE))
-				*pte = pte_set_flags(*pte, _PAGE_GLOBAL);
+		/* Allocate PTE in the user page-table */
+		target_pte = pti_user_pagetable_walk_pte(addr);
+		if (WARN_ON(!target_pte))
+			return;
 
-			/* Clone the PTE */
-			*target_pte = *pte;
+		/* Set GLOBAL bit in both PTEs */
+		if (boot_cpu_has(X86_FEATURE_PGE))
+			*pte = pte_set_flags(*pte, _PAGE_GLOBAL);
 
-			addr = round_up(addr + 1, PAGE_SIZE);
+		/* Clone the PTE */
+		*target_pte = *pte;
 
-		} else {
-			BUG();
-		}
+		addr = round_up(addr + 1, PAGE_SIZE);
 	}
 }
 
@@ -475,7 +454,7 @@ static void __init pti_clone_user_shared
 	start = CPU_ENTRY_AREA_BASE;
 	end   = start + (PAGE_SIZE * CPU_ENTRY_AREA_PAGES);
 
-	pti_clone_pgtable(start, end, PTI_CLONE_PMD);
+	pti_clone_pgtable(start, end);
 }
 #endif /* CONFIG_X86_64 */
 
@@ -495,8 +474,7 @@ static void __init pti_setup_espfix64(vo
 static void pti_clone_entry_text(void)
 {
 	pti_clone_pgtable((unsigned long) __entry_text_start,
-			  (unsigned long) __entry_text_end,
-			  PTI_CLONE_PMD);
+			  (unsigned long) __entry_text_end);
 }
 
 /*
@@ -571,7 +549,7 @@ static void pti_clone_kernel_text(void)
 	 * pti_set_kernel_image_nonglobal() did to clear the
 	 * global bit.
 	 */
-	pti_clone_pgtable(start, end_clone, PTI_LEVEL_KERNEL_IMAGE);
+	pti_clone_pgtable(start, end_clone);
 
 	/*
 	 * pti_clone_pgtable() will set the global bit in any PMDs

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 21:20                             ` Peter Zijlstra
@ 2024-07-31 21:23                               ` Linus Torvalds
  2024-07-31 21:26                                 ` Peter Zijlstra
  2024-07-31 22:22                               ` Guenter Roeck
  1 sibling, 1 reply; 59+ messages in thread
From: Linus Torvalds @ 2024-07-31 21:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Guenter Roeck, Jens Axboe, Andy Lutomirski,
	Ingo Molnar, Peter Anvin, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Wed, 31 Jul 2024 at 14:20, Peter Zijlstra <peterz@infradead.org> wrote:
>
> The alternative is ripping that level thing out entirely, and simply
> duplicate anything we find in the page-tables.

That looks clean to me, and don't we want to clone the minimal range
anyway - even on x86-64?

           Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 21:23                               ` Linus Torvalds
@ 2024-07-31 21:26                                 ` Peter Zijlstra
  2024-07-31 21:41                                   ` Linus Torvalds
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2024-07-31 21:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Guenter Roeck, Jens Axboe, Andy Lutomirski,
	Ingo Molnar, Peter Anvin, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Wed, Jul 31, 2024 at 02:23:02PM -0700, Linus Torvalds wrote:
> On Wed, 31 Jul 2024 at 14:20, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > The alternative is ripping that level thing out entirely, and simply
> > duplicate anything we find in the page-tables.
> 
> That looks clean to me, and don't we want to clone the minimal range
> anyway - even on x86-64?

x86_64 has everything PMD aligned. It *should* never encounter a PTE.

Also, this thing blindly clones the format the kernel page-tables have,
it will not split a PMD into multiple PTE entries just to clone a
smaller range. It is super simple.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 21:26                                 ` Peter Zijlstra
@ 2024-07-31 21:41                                   ` Linus Torvalds
  2024-07-31 21:47                                     ` Thomas Gleixner
  0 siblings, 1 reply; 59+ messages in thread
From: Linus Torvalds @ 2024-07-31 21:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Guenter Roeck, Jens Axboe, Andy Lutomirski,
	Ingo Molnar, Peter Anvin, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Wed, 31 Jul 2024 at 14:26, Peter Zijlstra <peterz@infradead.org> wrote:
>
> x86_64 has everything PMD aligned. It *should* never encounter a PTE.

Ahh. I thought it only aligned the beginning, but yeah, I see that
ALIGN_ENTRY_TEXT_END is also PMD_SIZE aligned.

That smells of wasted memory, but I guess the TLB advantage is worth it.

         Linus

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 21:41                                   ` Linus Torvalds
@ 2024-07-31 21:47                                     ` Thomas Gleixner
  0 siblings, 0 replies; 59+ messages in thread
From: Thomas Gleixner @ 2024-07-31 21:47 UTC (permalink / raw)
  To: Linus Torvalds, Peter Zijlstra
  Cc: Guenter Roeck, Jens Axboe, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On Wed, Jul 31 2024 at 14:41, Linus Torvalds wrote:
> On Wed, 31 Jul 2024 at 14:26, Peter Zijlstra <peterz@infradead.org> wrote:
>>
>> x86_64 has everything PMD aligned. It *should* never encounter a PTE.
>
> Ahh. I thought it only aligned the beginning, but yeah, I see that
> ALIGN_ENTRY_TEXT_END is also PMD_SIZE aligned.
>
> That smells of wasted memory, but I guess the TLB advantage is worth it.

It definitely is.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 21:20                             ` Peter Zijlstra
  2024-07-31 21:23                               ` Linus Torvalds
@ 2024-07-31 22:22                               ` Guenter Roeck
  2024-08-01  8:54                                 ` Peter Zijlstra
  1 sibling, 1 reply; 59+ messages in thread
From: Guenter Roeck @ 2024-07-31 22:22 UTC (permalink / raw)
  To: Peter Zijlstra, Thomas Gleixner
  Cc: Linus Torvalds, Jens Axboe, Andy Lutomirski, Ingo Molnar,
	Peter Anvin, Linux Kernel Mailing List, the arch/x86 maintainers

On 7/31/24 14:20, Peter Zijlstra wrote:
> On Wed, Jul 31, 2024 at 07:26:04PM +0200, Thomas Gleixner wrote:
>> On Wed, Jul 31 2024 at 18:51, Peter Zijlstra wrote:
>>> On Wed, Jul 31, 2024 at 06:31:05PM +0200, Peter Zijlstra wrote:
>>> Thomas, this all still relies on the full text section being PMD mapped,
>>> and since we don't have ALIGN_ENTRY_TEXT_END and _etext has PAGE_SIZE
>>> alignment, can't have a PAGE mapped tail which then doesn't get cloned?
>>>
>>> Do we want to make pto_clone_entry_text() use PTI_LEVEL_KERNEL_IMAGE
>>> such that it will clone whatever it has?
>>
>> Yes, I think so.
> 
> The alternative is ripping that level thing out entirely, and simply
> duplicate anything we find in the page-tables.
> 

The patch below (on top of the previous one, because otherwise it doesn't
apply) causes qemu to bail out hard, with

...
[    3.658327] sr 2:0:0:0: Attached scsi generic sg0 type 5
[    3.858040] sched_clock: Marking stable (3834034034, 23728553)->(3865222956, -7460369)
[    3.861469] registered taskstats version 1
[    3.861584] Loading compiled-in X.509 certificates
[    4.082031] Btrfs loaded, zoned=no, fsverity=no
[    4.096034] cryptomgr_test (69) used greatest stack depth: 6136 bytes left

No backtrace or other message, it just exits immediately.

Guenter

> We could add something like:
> 
> 	WARN_ON_ONCE(IS_ENABLED(CONFIG_X86_64));
> 
> in the PTE path, but do we really care?
> 
> ---
> --- a/arch/x86/mm/pti.c
> +++ b/arch/x86/mm/pti.c
> @@ -47,16 +47,6 @@
>   #define __GFP_NOTRACK	0
>   #endif
>   
> -/*
> - * Define the page-table levels we clone for user-space on 32
> - * and 64 bit.
> - */
> -#ifdef CONFIG_X86_64
> -#define	PTI_LEVEL_KERNEL_IMAGE	PTI_CLONE_PMD
> -#else
> -#define	PTI_LEVEL_KERNEL_IMAGE	PTI_CLONE_PTE
> -#endif
> -
>   static void __init pti_print_if_insecure(const char *reason)
>   {
>   	if (boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
> @@ -294,14 +284,7 @@ static void __init pti_setup_vsyscall(vo
>   static void __init pti_setup_vsyscall(void) { }
>   #endif
>   
> -enum pti_clone_level {
> -	PTI_CLONE_PMD,
> -	PTI_CLONE_PTE,
> -};
> -
> -static void
> -pti_clone_pgtable(unsigned long start, unsigned long end,
> -		  enum pti_clone_level level)
> +static void pti_clone_pgtable(unsigned long start, unsigned long end)
>   {
>   	unsigned long addr;
>   
> @@ -341,7 +324,7 @@ pti_clone_pgtable(unsigned long start, u
>   			continue;
>   		}
>   
> -		if (pmd_leaf(*pmd) || level == PTI_CLONE_PMD) {
> +		if (pmd_leaf(*pmd)) {
>   			target_pmd = pti_user_pagetable_walk_pmd(addr);
>   			if (WARN_ON(!target_pmd))
>   				return;
> @@ -375,37 +358,33 @@ pti_clone_pgtable(unsigned long start, u
>   			*target_pmd = *pmd;
>   
>   			addr = round_up(addr + 1, PMD_SIZE);
> +			continue;
> +		}
>   
> -		} else if (level == PTI_CLONE_PTE) {
> -
> -			/* Walk the page-table down to the pte level */
> -			pte = pte_offset_kernel(pmd, addr);
> -			if (pte_none(*pte)) {
> -				addr = round_up(addr + 1, PAGE_SIZE);
> -				continue;
> -			}
> -
> -			/* Only clone present PTEs */
> -			if (WARN_ON(!(pte_flags(*pte) & _PAGE_PRESENT)))
> -				return;
> +		/* Walk the page-table down to the pte level */
> +		pte = pte_offset_kernel(pmd, addr);
> +		if (pte_none(*pte)) {
> +			addr = round_up(addr + 1, PAGE_SIZE);
> +			continue;
> +		}
>   
> -			/* Allocate PTE in the user page-table */
> -			target_pte = pti_user_pagetable_walk_pte(addr);
> -			if (WARN_ON(!target_pte))
> -				return;
> +		/* Only clone present PTEs */
> +		if (WARN_ON(!(pte_flags(*pte) & _PAGE_PRESENT)))
> +			return;
>   
> -			/* Set GLOBAL bit in both PTEs */
> -			if (boot_cpu_has(X86_FEATURE_PGE))
> -				*pte = pte_set_flags(*pte, _PAGE_GLOBAL);
> +		/* Allocate PTE in the user page-table */
> +		target_pte = pti_user_pagetable_walk_pte(addr);
> +		if (WARN_ON(!target_pte))
> +			return;
>   
> -			/* Clone the PTE */
> -			*target_pte = *pte;
> +		/* Set GLOBAL bit in both PTEs */
> +		if (boot_cpu_has(X86_FEATURE_PGE))
> +			*pte = pte_set_flags(*pte, _PAGE_GLOBAL);
>   
> -			addr = round_up(addr + 1, PAGE_SIZE);
> +		/* Clone the PTE */
> +		*target_pte = *pte;
>   
> -		} else {
> -			BUG();
> -		}
> +		addr = round_up(addr + 1, PAGE_SIZE);
>   	}
>   }
>   
> @@ -475,7 +454,7 @@ static void __init pti_clone_user_shared
>   	start = CPU_ENTRY_AREA_BASE;
>   	end   = start + (PAGE_SIZE * CPU_ENTRY_AREA_PAGES);
>   
> -	pti_clone_pgtable(start, end, PTI_CLONE_PMD);
> +	pti_clone_pgtable(start, end);
>   }
>   #endif /* CONFIG_X86_64 */
>   
> @@ -495,8 +474,7 @@ static void __init pti_setup_espfix64(vo
>   static void pti_clone_entry_text(void)
>   {
>   	pti_clone_pgtable((unsigned long) __entry_text_start,
> -			  (unsigned long) __entry_text_end,
> -			  PTI_CLONE_PMD);
> +			  (unsigned long) __entry_text_end);
>   }
>   
>   /*
> @@ -571,7 +549,7 @@ static void pti_clone_kernel_text(void)
>   	 * pti_set_kernel_image_nonglobal() did to clear the
>   	 * global bit.
>   	 */
> -	pti_clone_pgtable(start, end_clone, PTI_LEVEL_KERNEL_IMAGE);
> +	pti_clone_pgtable(start, end_clone);
>   
>   	/*
>   	 * pti_clone_pgtable() will set the global bit in any PMDs


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-31 22:22                               ` Guenter Roeck
@ 2024-08-01  8:54                                 ` Peter Zijlstra
  0 siblings, 0 replies; 59+ messages in thread
From: Peter Zijlstra @ 2024-08-01  8:54 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Thomas Gleixner, Linus Torvalds, Jens Axboe, Andy Lutomirski,
	Ingo Molnar, Peter Anvin, Linux Kernel Mailing List,
	the arch/x86 maintainers

On Wed, Jul 31, 2024 at 03:22:53PM -0700, Guenter Roeck wrote:
> On 7/31/24 14:20, Peter Zijlstra wrote:
> > On Wed, Jul 31, 2024 at 07:26:04PM +0200, Thomas Gleixner wrote:
> > > On Wed, Jul 31 2024 at 18:51, Peter Zijlstra wrote:
> > > > On Wed, Jul 31, 2024 at 06:31:05PM +0200, Peter Zijlstra wrote:
> > > > Thomas, this all still relies on the full text section being PMD mapped,
> > > > and since we don't have ALIGN_ENTRY_TEXT_END and _etext has PAGE_SIZE
> > > > alignment, can't have a PAGE mapped tail which then doesn't get cloned?
> > > > 
> > > > Do we want to make pto_clone_entry_text() use PTI_LEVEL_KERNEL_IMAGE
> > > > such that it will clone whatever it has?
> > > 
> > > Yes, I think so.
> > 
> > The alternative is ripping that level thing out entirely, and simply
> > duplicate anything we find in the page-tables.
> > 
> 
> The patch below (on top of the previous one, because otherwise it doesn't
> apply) causes qemu to bail out hard, with
> 
> ...
> [    3.658327] sr 2:0:0:0: Attached scsi generic sg0 type 5
> [    3.858040] sched_clock: Marking stable (3834034034, 23728553)->(3865222956, -7460369)
> [    3.861469] registered taskstats version 1
> [    3.861584] Loading compiled-in X.509 certificates
> [    4.082031] Btrfs loaded, zoned=no, fsverity=no
> [    4.096034] cryptomgr_test (69) used greatest stack depth: 6136 bytes left
> 
> No backtrace or other message, it just exits immediately.

Ha, I hadn't even compiled the thing :-) I was just wondering alound and
in patch form if the whole level thing was worth having in the first
place.

If it lives, I'll make sure to test it.

Thanks!

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [tip: x86/urgent] x86/mm: Fix pti_clone_pgtable() alignment assumption
  2024-07-31 16:31                       ` Peter Zijlstra
  2024-07-31 16:50                         ` Guenter Roeck
  2024-07-31 16:51                         ` Peter Zijlstra
@ 2024-08-01 10:55                         ` tip-bot2 for Peter Zijlstra
  2024-08-01 13:03                         ` tip-bot2 for Peter Zijlstra
  3 siblings, 0 replies; 59+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2024-08-01 10:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Guenter Roeck, Thomas Gleixner, Peter Zijlstra (Intel), x86,
	linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     36e2dcf9840019de70e517a9df890fff316dd522
Gitweb:        https://git.kernel.org/tip/36e2dcf9840019de70e517a9df890fff316dd522
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 31 Jul 2024 18:31:05 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 01 Aug 2024 12:48:22 +02:00

x86/mm: Fix pti_clone_pgtable() alignment assumption

Guenter reported dodgy crashes on an i386-nosmp build using GCC-11
that had the form of endless traps until entry stack exhaust and then
#DF from the stack guard.

It turned out that pti_clone_pgtable() had alignment assumptions on
the start address, notably it hard assumes start is PMD aligned. This
is true on x86_64, but very much not true on i386.

These assumptions can cause the end condition to malfunction, leading
to a 'short' clone. Guess what happens when the user mapping has a
short copy of the entry text?

Use the correct increment form for addr to avoid alignment
assumptions.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240731163105.GG33588@noisy.programming.kicks-ass.net
---
 arch/x86/mm/pti.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 2e69abf..48c5032 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -374,14 +374,14 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
 			 */
 			*target_pmd = *pmd;
 
-			addr += PMD_SIZE;
+			addr = round_up(addr + 1, PMD_SIZE);
 
 		} else if (level == PTI_CLONE_PTE) {
 
 			/* Walk the page-table down to the pte level */
 			pte = pte_offset_kernel(pmd, addr);
 			if (pte_none(*pte)) {
-				addr += PAGE_SIZE;
+				addr = round_up(addr + 1, PAGE_SIZE);
 				continue;
 			}
 
@@ -401,7 +401,7 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
 			/* Clone the PTE */
 			*target_pte = *pte;
 
-			addr += PAGE_SIZE;
+			addr = round_up(addr + 1, PAGE_SIZE);
 
 		} else {
 			BUG();

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [tip: x86/urgent] x86/mm: Fix pti_clone_pgtable() alignment assumption
  2024-07-31 16:31                       ` Peter Zijlstra
                                           ` (2 preceding siblings ...)
  2024-08-01 10:55                         ` [tip: x86/urgent] x86/mm: Fix pti_clone_pgtable() alignment assumption tip-bot2 for Peter Zijlstra
@ 2024-08-01 13:03                         ` tip-bot2 for Peter Zijlstra
  3 siblings, 0 replies; 59+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2024-08-01 13:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Guenter Roeck, Thomas Gleixner, Peter Zijlstra (Intel), x86,
	linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     41e71dbb0e0a0fe214545fe64af031303a08524c
Gitweb:        https://git.kernel.org/tip/41e71dbb0e0a0fe214545fe64af031303a08524c
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 31 Jul 2024 18:31:05 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 01 Aug 2024 14:52:56 +02:00

x86/mm: Fix pti_clone_pgtable() alignment assumption

Guenter reported dodgy crashes on an i386-nosmp build using GCC-11
that had the form of endless traps until entry stack exhaust and then
#DF from the stack guard.

It turned out that pti_clone_pgtable() had alignment assumptions on
the start address, notably it hard assumes start is PMD aligned. This
is true on x86_64, but very much not true on i386.

These assumptions can cause the end condition to malfunction, leading
to a 'short' clone. Guess what happens when the user mapping has a
short copy of the entry text?

Use the correct increment form for addr to avoid alignment
assumptions.

Fixes: 16a3fe634f6a ("x86/mm/pti: Clone kernel-image on PTE level for 32 bit")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240731163105.GG33588@noisy.programming.kicks-ass.net
---
 arch/x86/mm/pti.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 2e69abf..48c5032 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -374,14 +374,14 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
 			 */
 			*target_pmd = *pmd;
 
-			addr += PMD_SIZE;
+			addr = round_up(addr + 1, PMD_SIZE);
 
 		} else if (level == PTI_CLONE_PTE) {
 
 			/* Walk the page-table down to the pte level */
 			pte = pte_offset_kernel(pmd, addr);
 			if (pte_none(*pte)) {
-				addr += PAGE_SIZE;
+				addr = round_up(addr + 1, PAGE_SIZE);
 				continue;
 			}
 
@@ -401,7 +401,7 @@ pti_clone_pgtable(unsigned long start, unsigned long end,
 			/* Clone the PTE */
 			*target_pte = *pte;
 
-			addr += PAGE_SIZE;
+			addr = round_up(addr + 1, PAGE_SIZE);
 
 		} else {
 			BUG();

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-07-29 15:29 ` Linux 6.11-rc1 Guenter Roeck
  2024-07-29 19:23   ` Linus Torvalds
  2024-07-30 17:04   ` Guenter Roeck
@ 2024-08-02 17:35   ` Linus Walleij
  2024-08-02 19:40     ` Guenter Roeck
  2 siblings, 1 reply; 59+ messages in thread
From: Linus Walleij @ 2024-08-02 17:35 UTC (permalink / raw)
  To: Guenter Roeck, Rob Herring; +Cc: Linus Torvalds, Linux Kernel Mailing List

On Mon, Jul 29, 2024 at 5:29 PM Guenter Roeck <linux@roeck-us.net> wrote:

> Failed tests:
>         arm:versatilepb:versatile_defconfig:aeabi:pci:scsi:mem128:net=default:versatile-pb:ext2
>         arm:versatilepb:versatile_defconfig:aeabi:pci:flash64:mem128:net=default:versatile-pb:ext2
>         arm:versatilepb:versatile_defconfig:aeabi:pci:mem128:net=default:versatile-pb:initrd
>         arm:versatileab:versatile_defconfig:mem128:net=default:versatile-ab:initrd

I traced these fails down to:
commit 04f08ef291d4b8d76f8d198bf2929ad43b96eecf
"arm/arm64: dts: arm: Use generic clock and regulator nodenames"

The following oneliner fixes it:

diff --git a/arch/arm/boot/dts/arm/versatile-ab.dts
b/arch/arm/boot/dts/arm/versatile-ab.dts
index 6fe6b49f5d8e..289c3d093579 100644
--- a/arch/arm/boot/dts/arm/versatile-ab.dts
+++ b/arch/arm/boot/dts/arm/versatile-ab.dts
@@ -157,7 +157,7 @@ timclk: clock-1000000 {
                        clocks = <&xtal24mhz>;
                };

-               pclk: clock-24000000 {
+               pclk: pclk@24M {
                        #clock-cells = <0>;
                        compatible = "fixed-factor-clock";
                        clock-div = <1>;

(versatile-ab is included by versatile-pb hence it regresses)

The problem is: I don't know why.

Rob: any ideas? (Perhaps some uglyhack of mine, I don't know.)

If nothing comes up I'll send an "unknown cause" onliner revert.

Yours,
Linus Walleij

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: Linux 6.11-rc1
  2024-08-02 17:35   ` Linus Walleij
@ 2024-08-02 19:40     ` Guenter Roeck
  0 siblings, 0 replies; 59+ messages in thread
From: Guenter Roeck @ 2024-08-02 19:40 UTC (permalink / raw)
  To: Linus Walleij, Rob Herring; +Cc: Linus Torvalds, Linux Kernel Mailing List

On 8/2/24 10:35, Linus Walleij wrote:
> On Mon, Jul 29, 2024 at 5:29 PM Guenter Roeck <linux@roeck-us.net> wrote:
> 
>> Failed tests:
>>          arm:versatilepb:versatile_defconfig:aeabi:pci:scsi:mem128:net=default:versatile-pb:ext2
>>          arm:versatilepb:versatile_defconfig:aeabi:pci:flash64:mem128:net=default:versatile-pb:ext2
>>          arm:versatilepb:versatile_defconfig:aeabi:pci:mem128:net=default:versatile-pb:initrd
>>          arm:versatileab:versatile_defconfig:mem128:net=default:versatile-ab:initrd
> 
> I traced these fails down to:
> commit 04f08ef291d4b8d76f8d198bf2929ad43b96eecf
> "arm/arm64: dts: arm: Use generic clock and regulator nodenames"
> 
> The following oneliner fixes it:
> 
> diff --git a/arch/arm/boot/dts/arm/versatile-ab.dts
> b/arch/arm/boot/dts/arm/versatile-ab.dts
> index 6fe6b49f5d8e..289c3d093579 100644
> --- a/arch/arm/boot/dts/arm/versatile-ab.dts
> +++ b/arch/arm/boot/dts/arm/versatile-ab.dts
> @@ -157,7 +157,7 @@ timclk: clock-1000000 {
>                          clocks = <&xtal24mhz>;
>                  };
> 
> -               pclk: clock-24000000 {
> +               pclk: pclk@24M {
>                          #clock-cells = <0>;
>                          compatible = "fixed-factor-clock";
>                          clock-div = <1>;
> 
> (versatile-ab is included by versatile-pb hence it regresses)
> 
> The problem is: I don't know why.
> 
> Rob: any ideas? (Perhaps some uglyhack of mine, I don't know.)
> 

Rob already sent a patch fixing the problem.

https://lore.kernel.org/r/20240730210030.2150467-2-robh@kernel.org

Guenter


^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2024-08-02 19:40 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-28 21:40 Linux 6.11-rc1 Linus Torvalds
2024-07-29  9:28 ` Build regressions/improvements in v6.11-rc1 Geert Uytterhoeven
2024-07-29  9:35   ` Geert Uytterhoeven
2024-07-29  9:54     ` Arnd Bergmann
2024-07-29 10:07       ` Geert Uytterhoeven
2024-07-29 15:29 ` Linux 6.11-rc1 Guenter Roeck
2024-07-29 19:23   ` Linus Torvalds
2024-07-29 19:50     ` Linus Torvalds
2024-07-29 21:34       ` Arnd Bergmann
2024-07-29 23:47         ` Linus Torvalds
2024-07-30 15:47           ` Arnd Bergmann
2024-07-30  7:54     ` Peter Zijlstra
2024-07-31 15:45     ` Guenter Roeck
2024-07-30 17:04   ` Guenter Roeck
2024-07-30 17:20     ` Jens Axboe
2024-07-30 18:22       ` Guenter Roeck
2024-07-30 18:35         ` Jens Axboe
2024-07-30 18:54           ` Jens Axboe
2024-07-30 18:53     ` Linus Torvalds
2024-07-30 19:22       ` Peter Zijlstra
2024-07-30 19:31         ` Jens Axboe
2024-07-30 19:34           ` Jens Axboe
2024-07-30 19:38           ` Peter Zijlstra
2024-07-30 19:41             ` Linus Torvalds
2024-07-30 20:04             ` Guenter Roeck
2024-07-30 20:09               ` Peter Zijlstra
2024-07-30 21:12                 ` Peter Zijlstra
2024-07-30 23:29                 ` Guenter Roeck
2024-07-30 23:54                   ` Linus Torvalds
2024-07-31  8:21                     ` Borislav Petkov
2024-07-31  9:11                       ` Peter Zijlstra
2024-07-31 10:02                         ` Borislav Petkov
2024-07-31 14:37                       ` Guenter Roeck
2024-07-31 13:24                     ` Jens Axboe
2024-07-30 20:13               ` Linus Torvalds
2024-07-30 20:24             ` Guenter Roeck
2024-07-31 12:20               ` Peter Zijlstra
2024-07-31 13:03                 ` Thomas Gleixner
2024-07-31 15:55                   ` Peter Zijlstra
2024-07-31 16:17                     ` Linus Torvalds
2024-07-31 16:31                       ` Peter Zijlstra
2024-07-31 16:50                         ` Guenter Roeck
2024-07-31 16:51                         ` Peter Zijlstra
2024-07-31 17:26                           ` Thomas Gleixner
2024-07-31 21:20                             ` Peter Zijlstra
2024-07-31 21:23                               ` Linus Torvalds
2024-07-31 21:26                                 ` Peter Zijlstra
2024-07-31 21:41                                   ` Linus Torvalds
2024-07-31 21:47                                     ` Thomas Gleixner
2024-07-31 22:22                               ` Guenter Roeck
2024-08-01  8:54                                 ` Peter Zijlstra
2024-08-01 10:55                         ` [tip: x86/urgent] x86/mm: Fix pti_clone_pgtable() alignment assumption tip-bot2 for Peter Zijlstra
2024-08-01 13:03                         ` tip-bot2 for Peter Zijlstra
2024-07-31 16:49                       ` Linux 6.11-rc1 Guenter Roeck
2024-07-31 17:19                         ` Thomas Gleixner
2024-07-31 10:33       ` Peter Zijlstra
2024-07-31 14:15         ` Peter Zijlstra
2024-08-02 17:35   ` Linus Walleij
2024-08-02 19:40     ` Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox