From: Will Deacon <will@kernel.org>
To: Veronika Kabatova <vkabatov@redhat.com>
Cc: Ard Biesheuvel <ardb@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Marc Zyngier <maz@kernel.org>,
Memory Management <mm-qe@redhat.com>,
skt-results-master@redhat.com, Jeff Bastian <jbastian@redhat.com>,
CKI Project <cki-project@redhat.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Jan Stancek <jstancek@redhat.com>,
Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: ❌ FAIL: Test report for kernel 5.13.0-rc4 (arm-next, 8124c8a6)
Date: Fri, 11 Jun 2021 11:34:57 +0100 [thread overview]
Message-ID: <20210611103457.GC15274@willie-the-truck> (raw)
In-Reply-To: <CA+tGwn=BG6nUQcs_Q2a1h=UDPwR6ern2wQPwQ84kz=c2xB5-Wg@mail.gmail.com>
On Thu, Jun 10, 2021 at 01:59:12PM +0200, Veronika Kabatova wrote:
> On Thu, Jun 3, 2021 at 12:44 PM Veronika Kabatova <vkabatov@redhat.com> wrote:
> >
> > On Wed, Jun 2, 2021 at 7:10 PM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Wed, Jun 02, 2021 at 01:00:47PM +0200, Veronika Kabatova wrote:
> > > > On Wed, Jun 2, 2021 at 12:51 PM Will Deacon <will@kernel.org> wrote:
> > > > > On Wed, Jun 02, 2021 at 12:40:07PM +0200, Ard Biesheuvel wrote:
> > > > > > On Wed, 2 Jun 2021 at 12:12, Will Deacon <will@kernel.org> wrote:
> > > > > > > On Wed, Jun 02, 2021 at 01:35:01AM -0000, CKI Project wrote:
> > > > > > > > stress: stress-ng
> > > > > > >
> > > > > > > This explodes pretty badly. Some CPUs detect RCU stalls when trying to use
> > > > > > > the EFI "efi_read_time" service, which eventually fails but soon after we
> > > > > > > explode trying to access memory which I think is mapped by
> > > > > > > acpi_os_ioremap(), so it looks like the f/w might be the culprit here. Is
> > > > > > > the "HPE Apollo 70" machine known to have bad EFI firmware?
> > > > > > >
> > > > > > > https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/datawarehouse-public/2021/06/01/313156257/build_aarch64_redhat%3A1310052388/tests/stress_stress_ng/10079827_aarch64_2_dmesg.log
> > > > > > >
> > > > > > > (scroll to the end for the fireworks)
> > > > > > >
> > > > > >
> > > > > > Wow that looks pretty horrible. I take it this tree has your MAIR changes?
> > > > >
> > > > > Nope, this is just vanilla -rc4! I'm trying to get a "known good" base
> > > > > before I throw all the new things at it :)
> > > > >
> > > > > > Would be useful to have a log with efi=debug, to see what the EFI
> > > > > > memory map looks like.
> > > > >
> > > > > Veronika -- please could you help us with that?
> > > >
> > > > Sure, I'll get a rerun with that option and report back when I have any
> > > > results. I am also planning just a plain rerun on the machine to see if it
> > > > reproduces somewhat reliably, however the machine is taken up by
> > > > other automation now so it will take a while.
> > >
> > > Thanks. In the meantime, I've pushed a bunch of new stuff into for-kernelci,
> > > so I can at least see if it regresses when compared to the three failures
> > > we're seeing here.
> > >
> >
> > Hi,
> >
> > I don't have very good news so far. We did 4 targeted runs with the machine
> > and weren't able to reproduce the panic. However, there was a panic hit in
> > the new test run you should have in the inbox and it also reproduced in a
> > completely unrelated test run with *this* kernel (not the new one). In all 3
> > cases the HW model is the same, but they were all different machines.
> >
> > I'm currently doing a full run which includes all tests from the run instead
> > of just stress-ng to see if it reproduces that way - there was a panic case
> > last year (not ARM specific :) that we weren't able to pinpoint to a nice
> > reproducer and had to run multiple tests to trigger it so it's possible this
> > one is similar. I'll try to pair down the tests if this strategy works and
> > keep you updated.
> >
>
> I just wanted to follow up here. Outside of the single run I mentioned
> previously, we are still unable to reproduce the panic. We tried a lot of
> runs on the various machines of the model that hit it, with both full test
> runs and stress-ng test only.
>
> We'll still reach out if we manage to hit it in the future, but it looks like
> a race condition that's not easy to reproduce. Of course if anyone has
> an idea we should try (whether it's about reproducing or debugging what
> the problem is) we can try that.
Thanks for the follow-up, Veronika. I also noticed that it seems to have
disappeared from subsequent runs :/
Will
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
prev parent reply other threads:[~2021-06-11 10:37 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-02 1:35 ❌ FAIL: Test report for kernel 5.13.0-rc4 (arm-next, 8124c8a6) CKI Project
2021-06-02 10:12 ` Will Deacon
2021-06-02 10:34 ` ? " Mark Rutland
[not found] ` <CAMj1kXENsUScpTg294HDiQUUXuUBz438hysEM9M4N-Jcq+q2fA@mail.gmail.com>
2021-06-02 10:51 ` ❌ " Will Deacon
[not found] ` <CA+tGwnn+9XCDB69LxY1AEoNih_qCovwYsuNHzbwyUN8LmZTTAg@mail.gmail.com>
2021-06-02 17:10 ` Will Deacon
[not found] ` <CA+tGwn=Y63hMdHpP16i4YD1qx-hwnSxfWsNgC+Kh-SDxXZpqGA@mail.gmail.com>
[not found] ` <CA+tGwn=BG6nUQcs_Q2a1h=UDPwR6ern2wQPwQ84kz=c2xB5-Wg@mail.gmail.com>
2021-06-11 10:34 ` Will Deacon [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210611103457.GC15274@willie-the-truck \
--to=will@kernel.org \
--cc=anshuman.khandual@arm.com \
--cc=ardb@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=cki-project@redhat.com \
--cc=jbastian@redhat.com \
--cc=jstancek@redhat.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=lorenzo.pieralisi@arm.com \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=mm-qe@redhat.com \
--cc=skt-results-master@redhat.com \
--cc=vkabatov@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox