From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
linux-ext4@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
dmaengine@vger.kernel.org
Cc: Marek Szyprowski <m.szyprowski@samsung.com>,
Robin Murphy <robin.murphy@arm.com>,
Theodore Ts'o <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Vinod Koul <vkoul@kernel.org>, Frank Li <Frank.Li@kernel.org>
Subject: Re: BUG: net-next (7.0-rc6 based and later) fails to boot on Jetson Xavier NX
Date: Wed, 8 Apr 2026 17:08:34 +0100 [thread overview]
Message-ID: <adZ9grUg71f518Fg@shell.armlinux.org.uk> (raw)
In-Reply-To: <adZfTi3R6jtsjXx-@shell.armlinux.org.uk>
On Wed, Apr 08, 2026 at 02:59:42PM +0100, Russell King (Oracle) wrote:
> On Wed, Apr 08, 2026 at 02:07:36PM +0100, Russell King (Oracle) wrote:
> > Hi,
> >
> > Just a heads-up that current net-next (v7.0-rc6 based) fails to boot on
> > my nVidia Jetson Xavier platform. v7.0-rc5 and v6.14 based net-next both
> > boot fine. This is an arm64 platform.
> >
> > The problem appears to be completely random in terms of its symptoms,
> > and looks like severe memory corruption - every boot seems to produce
> > a different problem. The common theme is, although the kernel gets to
> > userspace, it never gets anywhere close to a login prompt before
> > failing in some way.
> >
> > The last net-next+ boot (which is currently v7.0-rc6 based) resulted
> > in:
> >
> > tegra-mc 2c00000.memory-controller: xusb_hostw: secure write @0x00000003ffffff00: VPR violation ((null))
> > ...
> > irq 91: nobody cared (try booting with the "irqpoll" option)
> > ...
> > depmod: ERROR: could not open directory /lib/modules/7.0.0-rc6-net-next+: No such file or directory
> > ...
> > Unable to handle kernel paging request at virtual address 0003201fd50320cf
> >
> >
> > A previous boot of the exact same kernel didn't oops, but was unable
> > to find the block device to mount for /mnt via block UUID.
> >
> > A previous boot to that resulted in an oops.
> >
> >
> > The intersting thing is - the depmod error above is incorrect:
> >
> > root@tegra-ubuntu:~# ls -ld /lib/modules/7.0.0-rc6-net-next+
> > drwxrwxr-x 3 root root 4096 Apr 8 10:23 /lib/modules/7.0.0-rc6-net-next+
> >
> > The directory is definitely there, and is readable - checked after
> > booting back into net-next based on 7.0-rc5. In some of these boots,
> > stmmac hasn't probed yet, which rules out my changes.
> >
> > Rootfs is ext4, and it seems there were a lot of ext4 commits merged
> > between rc5 and rc6, but nothing for rc7.
> >
> > My current net-next head is dfecb0c5af3b. Merging rc7 on top also
> > fails, I suspect also randomly, with that I just got:
> >
> > EXT4-fs (mmcblk0p1): VFS: Can't find ext4 filesystem
> > mount: /mnt: wrong fs type, bad option, bad superblock on /dev/mmcblk0p1, missing codepage or helper program, or other error.
> > mount: /mnt/: can't find PARTUUID=741c0777-391a-4bce-a222-455e180ece2a.
> > Unable to handle kernel paging request at virtual address f9bf0011ac0fb893
> > Mem abort info:
> > ESR = 0x0000000096000004
> > EC = 0x25: DABT (current EL), IL = 32 bits
> > SET = 0, FnV = 0
> > EA = 0, S1PTW = 0
> > FSC = 0x04: level 0 translation fault
> > Data abort info:
> > ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> > CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > [f9bf0011ac0fb893] address between user and kernel address ranges
> > Internal error: Oops: 0000000096000004 [#1] SMP
> > Modules linked in:
> > CPU: 1 UID: 0 PID: 936 Comm: mount Not tainted 7.0.0-rc7-net-next+ #649 PREEMPT
> > Hardware name: NVIDIA NVIDIA Jetson Xavier NX Developer Kit/Jetson, BIOS 6.0-37391689 08/28/2024
> > pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > pc : refill_objects+0x298/0x5ec
> > lr : refill_objects+0x1f0/0x5ec
> >
> > ...
> >
> > Call trace:
> > refill_objects+0x298/0x5ec (P)
> > __pcs_replace_empty_main+0x13c/0x3a8
> > kmem_cache_alloc_noprof+0x324/0x3a0
> > alloc_iova+0x3c/0x290
> > alloc_iova_fast+0x168/0x2d4
> > iommu_dma_alloc_iova+0x84/0x154
> > iommu_dma_map_sg+0x2c4/0x538
> > __dma_map_sg_attrs+0x124/0x2c0
> > dma_map_sg_attrs+0x10/0x20
> > sdhci_pre_dma_transfer+0xb8/0x164
> > sdhci_pre_req+0x38/0x44
> > mmc_blk_mq_issue_rq+0x3dc/0x920
> > mmc_mq_queue_rq+0x104/0x2b0
> > __blk_mq_issue_directly+0x38/0xb0
> > blk_mq_request_issue_directly+0x54/0xb4
> > blk_mq_issue_direct+0x84/0x180
> > blk_mq_dispatch_queue_requests+0x1a8/0x2e0
> > blk_mq_flush_plug_list+0x60/0x140
> > __blk_flush_plug+0xe0/0x11c
> > blk_finish_plug+0x38/0x4c
> > read_pages+0x158/0x260
> > page_cache_ra_unbounded+0x158/0x3e0
> > force_page_cache_ra+0xb0/0xe4
> > page_cache_sync_ra+0x88/0x480
> > filemap_get_pages+0xd8/0x850
> > filemap_read+0xdc/0x3d8
> > blkdev_read_iter+0x84/0x198
> > vfs_read+0x208/0x2d8
> > ksys_read+0x58/0xf4
> > __arm64_sys_read+0x1c/0x28
> > invoke_syscall.constprop.0+0x50/0xe0
> > do_el0_svc+0x40/0xc0
> > el0_svc+0x48/0x2a0
> > el0t_64_sync_handler+0xa0/0xe4
> > el0t_64_sync+0x19c/0x1a0
> > Code: 54000189 f9000022 aa0203e4 b9402ae3 (f8634840)
> > ---[ end trace 0000000000000000 ]---
> > Kernel panic - not syncing: Oops: Fatal exception
> >
> > Looking at the changes between rc5 and rc6, there's one drivers/block
> > change for zram (which is used on this platform), one change in
> > drivers/base for regmap, nothing for drivers/mmc, but plenty for
> > fs/ext4. There are five DMA API changes.
> >
> > Now building straight -rc7. If that also fails, my plan is to start
> > bisecting rc5..rc6, which will likely take most of the rest of the
> > day. So, in the mean time I'm sending this as a heads-up that rc6
> > and onwards has a problem.
>
> Plain -rc7 fails (another random oops):
>
> Root device found: PARTUUID=741c0777-391a-4bce-a222-455e180ece2a
> depmod: ERROR: could not open directory /lib/modules/7.0.0-rc7-net-next+: No such file or directory
> depmod: FATAL: could not search modules: No such file or directory
> usb 2-3: new SuperSpeed Plus Gen 2x1 USB device number 2 using tegra-xusb
> hub 2-3:1.0: USB hub found
> hub 2-3:1.0: 4 ports detected
> usb 1-3: new full-speed USB device number 3 using tegra-xusb
> Unable to handle kernel paging request at virtual address 0003201fd50320cf
> Mem abort info:
> ESR = 0x0000000096000004
> EC = 0x25: DABT (current EL), IL = 32 bits
> SET = 0, FnV = 0
> EA = 0, S1PTW = 0
> FSC = 0x04: level 0 translation fault
> Data abort info:
> ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [0003201fd50320cf] address between user and kernel address ranges
> Internal error: Oops: 0000000096000004 [#1] SMP
> Modules linked in:
> CPU: 1 UID: 0 PID: 917 Comm: mount Not tainted 7.0.0-rc7-net-next+ #649 PREEMPT
> Hardware name: NVIDIA NVIDIA Jetson Xavier NX Developer Kit/Jetson, BIOS 6.0-37391689 08/28/2024
> pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : refill_objects+0x298/0x5ec
> lr : refill_objects+0x1f0/0x5ec
> sp : ffff80008606b500
> x29: ffff80008606b500 x28: 0000000000000001 x27: fffffdffc20e6200
> x26: 0000000000000006 x25: 0000000000000000 x24: 000000000000003c
> x23: ffff0000809e4840 x22: ffff0000809dba00 x21: ffff80008606b5a0
> x20: ffff000081133820 x19: fffffdffc20e6220 x18: 0000000000000000
> x17: 0000000000000000 x16: 0000000000000100 x15: 0000000000000000
> x14: 0000000000000000 x13: 0000000000000000 x12: ffff800081e5faa8
> x11: ffff800082192c70 x10: ffff8000814074dc x9 : 0000000000000050
> x8 : ffff80008606b490 x7 : ffff000083988b40 x6 : ffff80008606b4a0
> x5 : 000000080015000f x4 : d503201fd503201f x3 : 00000000000000b0
> x2 : d503201fd503201f x1 : ffff000081133828 x0 : d503201fd503201f
> Call trace:
> refill_objects+0x298/0x5ec (P)
> __pcs_replace_empty_main+0x13c/0x3a8
> kmem_cache_alloc_noprof+0x324/0x3a0
> mempool_alloc_slab+0x1c/0x28
> mempool_alloc_noprof+0x98/0xe0
> bio_alloc_bioset+0x160/0x3e0
> do_mpage_readpage+0x3d0/0x618
> mpage_readahead+0xb8/0x144
> blkdev_readahead+0x18/0x24
> read_pages+0x58/0x260
> page_cache_ra_unbounded+0x158/0x3e0
> force_page_cache_ra+0xb0/0xe4
> page_cache_sync_ra+0x88/0x480
> filemap_get_pages+0xd8/0x850
> filemap_read+0xdc/0x3d8
> blkdev_read_iter+0x84/0x198
> vfs_read+0x208/0x2d8
> ksys_read+0x58/0xf4
> __arm64_sys_read+0x1c/0x28
> invoke_syscall.constprop.0+0x50/0xe0
> do_el0_svc+0x40/0xc0
> el0_svc+0x48/0x2a0
> el0t_64_sync_handler+0xa0/0xe4
> el0t_64_sync+0x19c/0x1a0
> Code: 54000189 f9000022 aa0203e4 b9402ae3 (f8634840)
> ---[ end trace 0000000000000000 ]---
>
> Now starting the bisect between 7.0-rc5 and 7.0-rc6.
The rebase is still progressing, but it's landed on:
c7d812e33f3e dmaengine: xilinx: xilinx_dma: Fix unmasked residue subtraction
and while this boots to a login prompt, it spat out a BUG():
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:591
in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 56, name: kworker/u24:3
preempt_count: 0, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by kworker/u24:3/56:
#0: ffff000080042148 ((wq_completion)events_unbound#2){+.+.}-{0:0}, at: process_one_work+0x184/0x780
#1: ffff80008299bdf8 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1ac/0x780
#2: ffff0000808b48f8 (&dev->mutex){....}-{4:4}, at: __device_attach+0x2c/0x188
irq event stamp: 10872
hardirqs last enabled at (10871): [<ffff80008013a410>] ktime_get+0x130/0x180
hardirqs last disabled at (10872): [<ffff800080d61ac8>] _raw_spin_lock_irqsave+0x84/0x88
softirqs last enabled at (9216): [<ffff80008002807c>] fpsimd_save_and_flush_current_state+0x3c/0x80
softirqs last disabled at (9214): [<ffff800080028098>] fpsimd_save_and_flush_current_state+0x58/0x80
CPU: 5 UID: 0 PID: 56 Comm: kworker/u24:3 Not tainted 7.0.0-rc1-bisect+ #654 PREEMPT
Hardware name: NVIDIA NVIDIA Jetson Xavier NX Developer Kit/Jetson, BIOS 6.0-37391689 08/28/2024
Workqueue: events_unbound deferred_probe_work_func
Call trace:
show_stack+0x18/0x30 (C)
dump_stack_lvl+0x6c/0x94
dump_stack+0x18/0x24
__might_resched+0x154/0x220
__might_sleep+0x48/0x80
__mutex_lock+0x48/0x800
mutex_lock_nested+0x24/0x30
pinmux_disable_setting+0x9c/0x180
pinctrl_commit_state+0x5c/0x260
pinctrl_pm_select_idle_state+0x4c/0xa0
tegra_i2c_runtime_suspend+0x2c/0x3c
pm_generic_runtime_suspend+0x2c/0x44
__rpm_callback+0x48/0x1ec
rpm_callback+0x74/0x80
rpm_suspend+0xec/0x630
rpm_idle+0x2c0/0x420
__pm_runtime_idle+0x44/0x160
tegra_i2c_probe+0x2e4/0x640
platform_probe+0x5c/0xa4
really_probe+0xbc/0x2c0
__driver_probe_device+0x78/0x120
driver_probe_device+0x3c/0x160
__device_attach_driver+0xbc/0x160
bus_for_each_drv+0x70/0xb8
__device_attach+0xa4/0x188
device_initial_probe+0x50/0x54
bus_probe_device+0x38/0xa4
deferred_probe_work_func+0x90/0xcc
process_one_work+0x204/0x780
worker_thread+0x1c8/0x36c
kthread+0x138/0x144
ret_from_fork+0x10/0x20
This is reproducible.
Adding Vinod and Frank, and dmaengine mailing list.
Bisect continuing, assuming this is a "good" commit as it isn't
producing the boot failure with random memory corruption.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
next prev parent reply other threads:[~2026-04-08 16:08 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-08 13:07 BUG: net-next (7.0-rc6 based and later) fails to boot on Jetson Xavier NX Russell King (Oracle)
2026-04-08 13:59 ` Russell King (Oracle)
2026-04-08 15:22 ` Linus Torvalds
2026-04-08 16:08 ` Russell King (Oracle) [this message]
2026-04-08 16:16 ` Russell King (Oracle)
2026-04-08 16:40 ` Robin Murphy
2026-04-08 19:52 ` Russell King (Oracle)
2026-04-09 12:24 ` Will Deacon
2026-04-09 15:37 ` Linus Torvalds
2026-04-09 16:16 ` Russell King (Oracle)
2026-04-08 16:22 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adZ9grUg71f518Fg@shell.armlinux.org.uk \
--to=linux@armlinux.org.uk \
--cc=Frank.Li@kernel.org \
--cc=adilger.kernel@dilger.ca \
--cc=dmaengine@vger.kernel.org \
--cc=iommu@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=netdev@vger.kernel.org \
--cc=robin.murphy@arm.com \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=vkoul@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.