public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	linux-ext4@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	dmaengine@vger.kernel.org
Cc: Marek Szyprowski <m.szyprowski@samsung.com>,
	Robin Murphy <robin.murphy@arm.com>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Vinod Koul <vkoul@kernel.org>, Frank Li <Frank.Li@kernel.org>
Subject: Re: BUG: net-next (7.0-rc6 based and later) fails to boot on Jetson Xavier NX
Date: Wed, 8 Apr 2026 17:08:34 +0100	[thread overview]
Message-ID: <adZ9grUg71f518Fg@shell.armlinux.org.uk> (raw)
In-Reply-To: <adZfTi3R6jtsjXx-@shell.armlinux.org.uk>

On Wed, Apr 08, 2026 at 02:59:42PM +0100, Russell King (Oracle) wrote:
> On Wed, Apr 08, 2026 at 02:07:36PM +0100, Russell King (Oracle) wrote:
> > Hi,
> > 
> > Just a heads-up that current net-next (v7.0-rc6 based) fails to boot on
> > my nVidia Jetson Xavier platform. v7.0-rc5 and v6.14 based net-next both
> > boot fine. This is an arm64 platform.
> > 
> > The problem appears to be completely random in terms of its symptoms,
> > and looks like severe memory corruption - every boot seems to produce
> > a different problem. The common theme is, although the kernel gets to
> > userspace, it never gets anywhere close to a login prompt before
> > failing in some way.
> > 
> > The last net-next+ boot (which is currently v7.0-rc6 based) resulted
> > in:
> > 
> > tegra-mc 2c00000.memory-controller: xusb_hostw: secure write @0x00000003ffffff00: VPR violation ((null))
> > ...
> > irq 91: nobody cared (try booting with the "irqpoll" option)
> > ...
> > depmod: ERROR: could not open directory /lib/modules/7.0.0-rc6-net-next+: No such file or directory
> > ...
> > Unable to handle kernel paging request at virtual address 0003201fd50320cf
> > 
> > 
> > A previous boot of the exact same kernel didn't oops, but was unable
> > to find the block device to mount for /mnt via block UUID.
> > 
> > A previous boot to that resulted in an oops.
> > 
> > 
> > The intersting thing is - the depmod error above is incorrect:
> > 
> > root@tegra-ubuntu:~# ls -ld /lib/modules/7.0.0-rc6-net-next+
> > drwxrwxr-x 3 root root 4096 Apr  8 10:23 /lib/modules/7.0.0-rc6-net-next+
> > 
> > The directory is definitely there, and is readable - checked after
> > booting back into net-next based on 7.0-rc5. In some of these boots,
> > stmmac hasn't probed yet, which rules out my changes.
> > 
> > Rootfs is ext4, and it seems there were a lot of ext4 commits merged
> > between rc5 and rc6, but nothing for rc7.
> > 
> > My current net-next head is dfecb0c5af3b. Merging rc7 on top also
> > fails, I suspect also randomly, with that I just got:
> > 
> > EXT4-fs (mmcblk0p1): VFS: Can't find ext4 filesystem
> > mount: /mnt: wrong fs type, bad option, bad superblock on /dev/mmcblk0p1, missing codepage or helper program, or other error.
> > mount: /mnt/: can't find PARTUUID=741c0777-391a-4bce-a222-455e180ece2a.
> > Unable to handle kernel paging request at virtual address f9bf0011ac0fb893
> > Mem abort info:
> >   ESR = 0x0000000096000004
> >   EC = 0x25: DABT (current EL), IL = 32 bits
> >   SET = 0, FnV = 0
> >   EA = 0, S1PTW = 0
> >   FSC = 0x04: level 0 translation fault
> > Data abort info:
> >   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> >   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> >   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > [f9bf0011ac0fb893] address between user and kernel address ranges
> > Internal error: Oops: 0000000096000004 [#1]  SMP
> > Modules linked in:
> > CPU: 1 UID: 0 PID: 936 Comm: mount Not tainted 7.0.0-rc7-net-next+ #649 PREEMPT
> > Hardware name: NVIDIA NVIDIA Jetson Xavier NX Developer Kit/Jetson, BIOS 6.0-37391689 08/28/2024
> > pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > pc : refill_objects+0x298/0x5ec
> > lr : refill_objects+0x1f0/0x5ec
> > 
> > ...
> > 
> > Call trace:
> >  refill_objects+0x298/0x5ec (P)
> >  __pcs_replace_empty_main+0x13c/0x3a8
> >  kmem_cache_alloc_noprof+0x324/0x3a0
> >  alloc_iova+0x3c/0x290
> >  alloc_iova_fast+0x168/0x2d4
> >  iommu_dma_alloc_iova+0x84/0x154
> >  iommu_dma_map_sg+0x2c4/0x538
> >  __dma_map_sg_attrs+0x124/0x2c0
> >  dma_map_sg_attrs+0x10/0x20
> >  sdhci_pre_dma_transfer+0xb8/0x164
> >  sdhci_pre_req+0x38/0x44
> >  mmc_blk_mq_issue_rq+0x3dc/0x920
> >  mmc_mq_queue_rq+0x104/0x2b0
> >  __blk_mq_issue_directly+0x38/0xb0
> >  blk_mq_request_issue_directly+0x54/0xb4
> >  blk_mq_issue_direct+0x84/0x180
> >  blk_mq_dispatch_queue_requests+0x1a8/0x2e0
> >  blk_mq_flush_plug_list+0x60/0x140
> >  __blk_flush_plug+0xe0/0x11c
> >  blk_finish_plug+0x38/0x4c
> >  read_pages+0x158/0x260
> >  page_cache_ra_unbounded+0x158/0x3e0
> >  force_page_cache_ra+0xb0/0xe4
> >  page_cache_sync_ra+0x88/0x480
> >  filemap_get_pages+0xd8/0x850
> >  filemap_read+0xdc/0x3d8
> >  blkdev_read_iter+0x84/0x198
> >  vfs_read+0x208/0x2d8
> >  ksys_read+0x58/0xf4
> >  __arm64_sys_read+0x1c/0x28
> >  invoke_syscall.constprop.0+0x50/0xe0
> >  do_el0_svc+0x40/0xc0
> >  el0_svc+0x48/0x2a0
> >  el0t_64_sync_handler+0xa0/0xe4
> >  el0t_64_sync+0x19c/0x1a0
> > Code: 54000189 f9000022 aa0203e4 b9402ae3 (f8634840)
> > ---[ end trace 0000000000000000 ]---
> > Kernel panic - not syncing: Oops: Fatal exception
> > 
> > Looking at the changes between rc5 and rc6, there's one drivers/block
> > change for zram (which is used on this platform), one change in
> > drivers/base for regmap, nothing for drivers/mmc, but plenty for
> > fs/ext4. There are five DMA API changes.
> > 
> > Now building straight -rc7. If that also fails, my plan is to start
> > bisecting rc5..rc6, which will likely take most of the rest of the
> > day. So, in the mean time I'm sending this as a heads-up that rc6
> > and onwards has a problem.
> 
> Plain -rc7 fails (another random oops):
> 
> Root device found: PARTUUID=741c0777-391a-4bce-a222-455e180ece2a
> depmod: ERROR: could not open directory /lib/modules/7.0.0-rc7-net-next+: No such file or directory
> depmod: FATAL: could not search modules: No such file or directory
> usb 2-3: new SuperSpeed Plus Gen 2x1 USB device number 2 using tegra-xusb
> hub 2-3:1.0: USB hub found
> hub 2-3:1.0: 4 ports detected
> usb 1-3: new full-speed USB device number 3 using tegra-xusb
> Unable to handle kernel paging request at virtual address 0003201fd50320cf
> Mem abort info:
>   ESR = 0x0000000096000004
>   EC = 0x25: DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
>   FSC = 0x04: level 0 translation fault
> Data abort info:
>   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [0003201fd50320cf] address between user and kernel address ranges
> Internal error: Oops: 0000000096000004 [#1]  SMP
> Modules linked in:
> CPU: 1 UID: 0 PID: 917 Comm: mount Not tainted 7.0.0-rc7-net-next+ #649 PREEMPT
> Hardware name: NVIDIA NVIDIA Jetson Xavier NX Developer Kit/Jetson, BIOS 6.0-37391689 08/28/2024
> pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : refill_objects+0x298/0x5ec
> lr : refill_objects+0x1f0/0x5ec
> sp : ffff80008606b500
> x29: ffff80008606b500 x28: 0000000000000001 x27: fffffdffc20e6200
> x26: 0000000000000006 x25: 0000000000000000 x24: 000000000000003c
> x23: ffff0000809e4840 x22: ffff0000809dba00 x21: ffff80008606b5a0
> x20: ffff000081133820 x19: fffffdffc20e6220 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000100 x15: 0000000000000000
> x14: 0000000000000000 x13: 0000000000000000 x12: ffff800081e5faa8
> x11: ffff800082192c70 x10: ffff8000814074dc x9 : 0000000000000050
> x8 : ffff80008606b490 x7 : ffff000083988b40 x6 : ffff80008606b4a0
> x5 : 000000080015000f x4 : d503201fd503201f x3 : 00000000000000b0
> x2 : d503201fd503201f x1 : ffff000081133828 x0 : d503201fd503201f
> Call trace:
>  refill_objects+0x298/0x5ec (P)
>  __pcs_replace_empty_main+0x13c/0x3a8
>  kmem_cache_alloc_noprof+0x324/0x3a0
>  mempool_alloc_slab+0x1c/0x28
>  mempool_alloc_noprof+0x98/0xe0
>  bio_alloc_bioset+0x160/0x3e0
>  do_mpage_readpage+0x3d0/0x618
>  mpage_readahead+0xb8/0x144
>  blkdev_readahead+0x18/0x24
>  read_pages+0x58/0x260
>  page_cache_ra_unbounded+0x158/0x3e0
>  force_page_cache_ra+0xb0/0xe4
>  page_cache_sync_ra+0x88/0x480
>  filemap_get_pages+0xd8/0x850
>  filemap_read+0xdc/0x3d8
>  blkdev_read_iter+0x84/0x198
>  vfs_read+0x208/0x2d8
>  ksys_read+0x58/0xf4
>  __arm64_sys_read+0x1c/0x28
>  invoke_syscall.constprop.0+0x50/0xe0
>  do_el0_svc+0x40/0xc0
>  el0_svc+0x48/0x2a0
>  el0t_64_sync_handler+0xa0/0xe4
>  el0t_64_sync+0x19c/0x1a0
> Code: 54000189 f9000022 aa0203e4 b9402ae3 (f8634840)
> ---[ end trace 0000000000000000 ]---
> 
> Now starting the bisect between 7.0-rc5 and 7.0-rc6.

The rebase is still progressing, but it's landed on:

c7d812e33f3e dmaengine: xilinx: xilinx_dma: Fix unmasked residue subtraction

and while this boots to a login prompt, it spat out a BUG():

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:591
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 56, name: kworker/u24:3
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by kworker/u24:3/56:
 #0: ffff000080042148 ((wq_completion)events_unbound#2){+.+.}-{0:0}, at: process_one_work+0x184/0x780
 #1: ffff80008299bdf8 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1ac/0x780
 #2: ffff0000808b48f8 (&dev->mutex){....}-{4:4}, at: __device_attach+0x2c/0x188
irq event stamp: 10872
hardirqs last  enabled at (10871): [<ffff80008013a410>] ktime_get+0x130/0x180
hardirqs last disabled at (10872): [<ffff800080d61ac8>] _raw_spin_lock_irqsave+0x84/0x88
softirqs last  enabled at (9216): [<ffff80008002807c>] fpsimd_save_and_flush_current_state+0x3c/0x80
softirqs last disabled at (9214): [<ffff800080028098>] fpsimd_save_and_flush_current_state+0x58/0x80
CPU: 5 UID: 0 PID: 56 Comm: kworker/u24:3 Not tainted 7.0.0-rc1-bisect+ #654 PREEMPT
Hardware name: NVIDIA NVIDIA Jetson Xavier NX Developer Kit/Jetson, BIOS 6.0-37391689 08/28/2024
Workqueue: events_unbound deferred_probe_work_func
Call trace:
 show_stack+0x18/0x30 (C)
 dump_stack_lvl+0x6c/0x94
 dump_stack+0x18/0x24
 __might_resched+0x154/0x220
 __might_sleep+0x48/0x80
 __mutex_lock+0x48/0x800
 mutex_lock_nested+0x24/0x30
 pinmux_disable_setting+0x9c/0x180
 pinctrl_commit_state+0x5c/0x260
 pinctrl_pm_select_idle_state+0x4c/0xa0
 tegra_i2c_runtime_suspend+0x2c/0x3c
 pm_generic_runtime_suspend+0x2c/0x44
 __rpm_callback+0x48/0x1ec
 rpm_callback+0x74/0x80
 rpm_suspend+0xec/0x630
 rpm_idle+0x2c0/0x420
 __pm_runtime_idle+0x44/0x160
 tegra_i2c_probe+0x2e4/0x640
 platform_probe+0x5c/0xa4
 really_probe+0xbc/0x2c0
 __driver_probe_device+0x78/0x120
 driver_probe_device+0x3c/0x160
 __device_attach_driver+0xbc/0x160
 bus_for_each_drv+0x70/0xb8
 __device_attach+0xa4/0x188
 device_initial_probe+0x50/0x54
 bus_probe_device+0x38/0xa4
 deferred_probe_work_func+0x90/0xcc
 process_one_work+0x204/0x780
 worker_thread+0x1c8/0x36c
 kthread+0x138/0x144
 ret_from_fork+0x10/0x20

This is reproducible.

Adding Vinod and Frank, and dmaengine mailing list.

Bisect continuing, assuming this is a "good" commit as it isn't
producing the boot failure with random memory corruption.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

  parent reply	other threads:[~2026-04-08 16:08 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08 13:07 BUG: net-next (7.0-rc6 based and later) fails to boot on Jetson Xavier NX Russell King (Oracle)
2026-04-08 13:59 ` Russell King (Oracle)
2026-04-08 15:22   ` Linus Torvalds
2026-04-08 16:08   ` Russell King (Oracle) [this message]
2026-04-08 16:16     ` Russell King (Oracle)
2026-04-08 16:40       ` Robin Murphy
2026-04-08 19:52         ` Russell King (Oracle)
2026-04-08 16:22     ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adZ9grUg71f518Fg@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=Frank.Li@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=dmaengine@vger.kernel.org \
    --cc=iommu@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=netdev@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=vkoul@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox