linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Veronika Kabatova <vkabatov@redhat.com>
To: Will Deacon <will@kernel.org>
Cc: catalin marinas <catalin.marinas@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	CKI Project <cki-project@redhat.com>
Subject: Re: ❌ FAIL: Test report for	kernel?5.11.0-rc7 (arm-next)
Date: Wed, 10 Feb 2021 12:07:23 -0500 (EST)	[thread overview]
Message-ID: <864217240.27291416.1612976843835.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20210210160936.GA28813@willie-the-truck>



----- Original Message -----
> From: "Will Deacon" <will@kernel.org>
> To: "Veronika Kabatova" <vkabatov@redhat.com>
> Cc: "catalin marinas" <catalin.marinas@arm.com>, "CKI Project" <cki-project@redhat.com>,
> linux-arm-kernel@lists.infradead.org
> Sent: Wednesday, February 10, 2021 5:09:37 PM
> Subject: Re: ❌ FAIL: Test report for	kernel?5.11.0-rc7 (arm-next)
> 
> Hi Veronika,
> 
> Thanks for the help with this.
> 
> On Wed, Feb 10, 2021 at 10:24:31AM -0500, Veronika Kabatova wrote:
> > > > On Tue, Feb 09, 2021 at 09:07:50PM -0000, CKI Project wrote:
> > > > >     Host 2:
> > > > >        ❌ Boot test
> > > > >        ⚡⚡⚡ selinux-policy: serge-testsuite
> > > > >        ⚡⚡⚡ storage: software RAID testing
> > > > >        🚧 ⚡⚡⚡ xfstests - ext4
> > > > >        🚧 ⚡⚡⚡ xfstests - xfs
> > > > >        🚧 ⚡⚡⚡ xfstests - btrfs
> > > > >        🚧 ⚡⚡⚡ IPMI driver test
> > > > >        🚧 ⚡⚡⚡ IPMItool loop stress test
> > > > >        🚧 ⚡⚡⚡ Storage blktests
> > > > >        🚧 ⚡⚡⚡ Storage block - filesystem fio test
> > > > >        🚧 ⚡⚡⚡ Storage block - queue scheduler test
> > > > >        🚧 ⚡⚡⚡ Storage nvme - tcp
> > > > >        🚧 ⚡⚡⚡ Storage: swraid mdadm raid_module test
> > > > >        🚧 ⚡⚡⚡ stress: stress-ng
> > > > 
> > > > Which system (e.g. soc) is host 2 and are there are known infra issues
> > > > at
> > > > the moment? I did push some changes which affect the early boot path,
> > > > so we
> > > > may well be running into a kernel bug, but I'd just like to make sure
> > > > before
> > > > we dive in trying to debug that, especially as we haven't seen failures
> > > > on
> > > > other systems (and host 1 seems ok).
> > > > 
> > > 
> > > Hi, the machine in question is a Cavium ThunderX2 Sabre. It booted a
> > > stable
> > > kernel just a few days back okay. The last messages I can see in the raw
> > > console log from this run are:
> > > 
> > > EFI stub: Booting Linux Kernel...
> > > EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled
> > > EFI stub: Using DTB from configuration table
> > > EFI stub: Exiting boot services and installing virtual address map...
> > > 
> > > and then it times out after hour and half. I'm not aware of any ongoing
> > > issues, however sometimes the link between the lab controller and the
> > > machines can sometimes go wrong after reboot and lead to a similarly
> > > looking problem.
> > > 
> > > I'll resubmit the test job on that same machine to check if that was
> > > the case and let you know right after it boots.
> > > 
> > 
> > Hi, I have a few results back:
> > 
> > - resubmitted the same kernel: gets stuck in the same spot
> > - tried the new version pushed today: gets stuck in the same spot
> 
> That's odd, as I just received a pass report for that branch!
> 
> https://lore.kernel.org/r/cki.598435E2D5.M3C5MKJ1NV@redhat.com
> 
> Is it just flakey, perhaps? Obviously, that's not great either, but it will
> make bisection more challenging.
> 

We have a large number of machines (both physical and virtual) and it's
impossible to run all tests on all of them, so they are randomly picked as
long as they fit the distro and test requirements. The distribution for
ARM tree is 1 physical and 1 "any" machine (which usually ends up being
virtual). The jobs from the report you linked ran on different machines
and didn't pick the one that failed to boot previously, so I manually
forced my testing to pick that machine to eliminate some variables.

The machine in question can on course be somewhat flaky (hard to eliminate
that possibility completely), but I checked our historical data and it
didn't fail to boot a single time other than with these two new kernels.

> > - tried the version from last week: boots ok
> >
> > There is an extra message from the run that managed to boot, which is not
> > present with any of the runs that failed:
> > 
> > EFI stub: ERROR: FIRMWARE BUG: efi_loaded_image_t::image_base has bogus
> > value
> > 
> > But this message is not present with the stable run that I mentioned
> > previously.
> 
> Interesting. Are those messages in the logs anywhere? It would be handy to
> include them, if possible.
> 

The messages are from before the kernel boot banner which is the marker
we use for log inclusion (to reduce console log spam from distro
installation which uses a different kernel and thus makes debugging less
straightforward). The same EFI messages are present before the kernel
banner in the new report you linked, and with the passing job from the
previous runs as well:

EFI stub: Booting Linux Kernel... 
EFI stub: Using DTB from configuration table 
EFI stub: Exiting boot services and installing virtual address map... 
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0af1] 
[    0.000000] Linux version 5.11.0-rc7 (cki@runner-3uc3rmvr-project-2-concurrent-2lpn99) (aarch64-linux-gnu-gcc (GCC) 10.2.1 20200826 (Red Hat Cross 10.2.1-3), GNU ld version 2.35.1-1.fc33) #1 SMP Wed Feb 10 09:47:23 UTC 2021 
[    0.000000] efi: EFI v2.70 by EDK II 
[    0.000000] efi: SMBIOS 3.0=0x1bf760000 MEMATTR=0x1be656018 ACPI 2.0=0x1bc030000 RNG=0x1bf86cf98 MEMRESERVE=0x1bc3d3e18  
[    0.000000] efi: seeding entropy pool 
....

and

EFI stub: Booting Linux Kernel... 
EFI stub: EFI_RNG_PROTOCOL unavailable, KASLR will be disabled 
EFI stub: Using DTB from configuration table 
EFI stub: Exiting boot services and installing virtual address map... 
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0af1] 
[    0.000000] Linux version 5.11.0-rc7 (cki@runner-3uc3rmvr-project-2-concurrent-2lpn99) (aarch64-linux-gnu-gcc (GCC) 10.2.1 20200826 (Red Hat Cross 10.2.1-3), GNU ld version 2.35.1-1.fc33) #1 SMP Wed Feb 10 09:47:23 UTC 2021 
[    0.000000] efi: EFI v2.70 by American Megatrends 
....

The failing machine/kernel combos get stuck right after that last EFI
line before the kernel messages come in.


Let me know if I should test some other versions or if you need some
other information!

Veronika

> Cheers,
> 
> Will
> 
> 
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-02-10 17:08 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-09 21:07 ❌ FAIL: Test report for kernel 5.11.0-rc7 (arm-next) CKI Project
2021-02-10  9:29 ` Will Deacon
2021-02-10 12:01   ` Veronika Kabatova
2021-02-10 15:24     ` Veronika Kabatova
2021-02-10 16:09       ` ❌ FAIL: Test report for kernel?5.11.0-rc7 (arm-next) Will Deacon
2021-02-10 17:07         ` Veronika Kabatova [this message]
2021-02-10 17:31           ` ❌ FAIL: Test report for?kernel?5.11.0-rc7 (arm-next) Will Deacon
2021-02-10 18:06             ` Veronika Kabatova
2021-02-10 18:56               ` ❌ FAIL: Test report?for?kernel?5.11.0-rc7 (arm-next) Will Deacon
2021-02-10 19:31                 ` Veronika Kabatova
2021-02-10 20:17                   ` ❌ FAIL: Test?report?for?kernel?5.11.0-rc7 (arm-next) Will Deacon
2021-02-10 20:32                     ` Will Deacon
2021-02-11 10:46                       ` Veronika Kabatova
2021-02-11 11:50                         ` ❌ FAIL:?Test?report?for?kernel?5.11.0-rc7 (arm-next) Will Deacon
2021-02-11 12:25                           ` Veronika Kabatova
2021-02-15 13:13                             ` Veronika Kabatova
2021-02-15 18:24                               ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=864217240.27291416.1612976843835.JavaMail.zimbra@redhat.com \
    --to=vkabatov@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=cki-project@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).