public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Russell King (Oracle)" <linux@armlinux.org.uk>
To: Robin Murphy <robin.murphy@arm.com>
Cc: netdev@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	linux-ext4@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	dmaengine@vger.kernel.org,
	Marek Szyprowski <m.szyprowski@samsung.com>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Vinod Koul <vkoul@kernel.org>, Frank Li <Frank.Li@kernel.org>
Subject: Re: BUG: net-next (7.0-rc6 based and later) fails to boot on Jetson Xavier NX
Date: Wed, 8 Apr 2026 20:52:32 +0100	[thread overview]
Message-ID: <adayAMR_dEA6W5vW@shell.armlinux.org.uk> (raw)
In-Reply-To: <3a1d0520-3402-47b2-9d7b-4e14a3cd07a4@arm.com>

On Wed, Apr 08, 2026 at 05:40:48PM +0100, Robin Murphy wrote:
> On 2026-04-08 5:16 pm, Russell King (Oracle) wrote:
> > On Wed, Apr 08, 2026 at 05:08:34PM +0100, Russell King (Oracle) wrote:
> > > The rebase is still progressing, but it's landed on:
> > > 
> > > c7d812e33f3e dmaengine: xilinx: xilinx_dma: Fix unmasked residue subtraction
> 
> FWIW I don't see a Tegra having the Xilinx IP in it anyway - judging by the
> DT it has their own tegra-gpcdma engine...
> 
> There's a fair chance this could be 90c5def10bea ("iommu: Do not call
> drivers for empty gathers"), which JonH also reported causing boot issues on
> Tegras - in short, SMMU TLB maintenance may not be completed properly which
> could lead to recycled DMA addresses causing exactly this kind of random
> memory corruption. I CC'd you on a patch:
> 
> https://lore.kernel.org/linux-iommu/20260408162846.GE3357077@nvidia.com/T/#t

Okay, bisect complete, and... no idea. It seems to suggest that 7.0-rc6
is actually fine - it ended up blaming Linus' tagging of 7.0-rc6 which
only changed the makefile. So, my assumption that because rc6 was merged
into net-next last Thursday which fails, net-next+rc7 fails, rc7 also
fails, that rc6 would also fail seems to be false.

Right, rc7 built with the same .config that rc6 was built with
definitely fails, this time with:

Root device found: PARTUUID=741c0777-391a-4bce-a222-455e180ece2a
depmod: ERROR: could not open directory /lib/modules/7.0.0-rc7-bisect: No such file or directory
depmod: FATAL: could not search modules: No such file or directory
usb 2-3: new SuperSpeed Plus Gen 2x1 USB device number 2 using tegra-xusb
hub 2-3:1.0: USB hub found
hub 2-3:1.0: 4 ports detected
usb 1-3: new full-speed USB device number 3 using tegra-xusb
EXT4-fs (mmcblk0p1): VFS: Can't find ext4 filesystem
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/mmcblk0p1, missing codepage or helper program, or other error.
mount: /mnt/: can't find PARTUUID=741c0777-391a-4bce-a222-455e180ece2a.
get_swap_device: Bad swap file entry 1800c00008
get_swap_device: Bad swap file entry 1800c00008
get_swap_device: Bad swap file entry 1800c00008

So, it seems rc6 -> rc7 => fails
net-next with rc5 -> net-next with rc6 => fails

However, before I test anything else, I've just built the same rc7
which failed above with your patch applied - and that boots fine.

Now, each Thursday, net-next gets updated as that's the day that the
net tree gets sent for merging into mainline. This causes net-next's
version to increase. So something in current net-next plus in rc7 is
causing this problem.

The commit you claim needs fixing is:

$ git describe --contains 90c5def10bea
v7.0-rc7~29^2~2

which I had assumed wouldn't be in net-next.

Now, mainline had this on Thursday:

commit f8f5627a8aeab15183eef8930bf75ba88a51622f
Merge: 4c2c526b5adf ec7067e66119
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Apr 2 09:57:06 2026 -0700

    Merge tag 'net-7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

commit 4c2c526b5adfb580bd95316bf179327d5ee26da8
Merge: 2ec9074b28a0 8b72aa5704c7
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Thu Apr 2 09:53:16 2026 -0700

    Merge tag 'iommu-fixes-v7.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

and merging iommu-fixes-v7.0-rc6 introduced the buggy 90c5def10bea
commit into -rc7.

However, as soon as Linus merged net-7.0-rc7, netdev maintainers merged
that exact commit back into net-next:

commit 8ffb33d7709b59ff60560f48960a73bd8a55be95
Merge: 269389ba5398 f8f5627a8aea
Author: Jakub Kicinski <kuba@kernel.org>
Date:   Thu Apr 2 10:57:09 2026 -0700

    Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Thereby bringing in that buggy commit into net-next, but with net-next
identifying itself as 7.0-rc6.

That's... confusing, but explains why current net-next which reports
itself as 7.0-rc6 _and_ rc7 both fail, but rc6 itself does not. It
also means I've wasted an entire afternoon running a useless bisect
between rc5 and rc6 due to the version numbers in net-next being
meaningless.

What's the status on the iommu fix? Is it merged into mainline yet?
If it isn't already, that means net-next remains unbootable going
into the merge window without manually carrying the fix locally.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

  reply	other threads:[~2026-04-08 19:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-08 13:07 BUG: net-next (7.0-rc6 based and later) fails to boot on Jetson Xavier NX Russell King (Oracle)
2026-04-08 13:59 ` Russell King (Oracle)
2026-04-08 15:22   ` Linus Torvalds
2026-04-08 16:08   ` Russell King (Oracle)
2026-04-08 16:16     ` Russell King (Oracle)
2026-04-08 16:40       ` Robin Murphy
2026-04-08 19:52         ` Russell King (Oracle) [this message]
2026-04-08 16:22     ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adayAMR_dEA6W5vW@shell.armlinux.org.uk \
    --to=linux@armlinux.org.uk \
    --cc=Frank.Li@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=dmaengine@vger.kernel.org \
    --cc=iommu@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=netdev@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=vkoul@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox