public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: Veronika Kabatova <vkabatov@redhat.com>
Cc: Ard Biesheuvel <ardb@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	Memory Management <mm-qe@redhat.com>,
	skt-results-master@redhat.com, Jeff Bastian <jbastian@redhat.com>,
	CKI Project <cki-project@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Jan Stancek <jstancek@redhat.com>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: ❌ FAIL: Test report for kernel 5.13.0-rc4 (arm-next, 8124c8a6)
Date: Fri, 11 Jun 2021 11:34:57 +0100	[thread overview]
Message-ID: <20210611103457.GC15274@willie-the-truck> (raw)
In-Reply-To: <CA+tGwn=BG6nUQcs_Q2a1h=UDPwR6ern2wQPwQ84kz=c2xB5-Wg@mail.gmail.com>

On Thu, Jun 10, 2021 at 01:59:12PM +0200, Veronika Kabatova wrote:
> On Thu, Jun 3, 2021 at 12:44 PM Veronika Kabatova <vkabatov@redhat.com> wrote:
> >
> > On Wed, Jun 2, 2021 at 7:10 PM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Wed, Jun 02, 2021 at 01:00:47PM +0200, Veronika Kabatova wrote:
> > > > On Wed, Jun 2, 2021 at 12:51 PM Will Deacon <will@kernel.org> wrote:
> > > > > On Wed, Jun 02, 2021 at 12:40:07PM +0200, Ard Biesheuvel wrote:
> > > > > > On Wed, 2 Jun 2021 at 12:12, Will Deacon <will@kernel.org> wrote:
> > > > > > > On Wed, Jun 02, 2021 at 01:35:01AM -0000, CKI Project wrote:
> > > > > > > >      stress: stress-ng
> > > > > > >
> > > > > > > This explodes pretty badly. Some CPUs detect RCU stalls when trying to use
> > > > > > > the EFI "efi_read_time" service, which eventually fails but soon after we
> > > > > > > explode trying to access memory which I think is mapped by
> > > > > > > acpi_os_ioremap(), so it looks like the f/w might be the culprit here. Is
> > > > > > > the "HPE Apollo 70" machine known to have bad EFI firmware?
> > > > > > >
> > > > > > > https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/datawarehouse-public/2021/06/01/313156257/build_aarch64_redhat%3A1310052388/tests/stress_stress_ng/10079827_aarch64_2_dmesg.log
> > > > > > >
> > > > > > > (scroll to the end for the fireworks)
> > > > > > >
> > > > > >
> > > > > > Wow that looks pretty horrible. I take it this tree has your MAIR changes?
> > > > >
> > > > > Nope, this is just vanilla -rc4! I'm trying to get a "known good" base
> > > > > before I throw all the new things at it :)
> > > > >
> > > > > > Would be useful to have a log with efi=debug, to see what the EFI
> > > > > > memory map looks like.
> > > > >
> > > > > Veronika -- please could you help us with that?
> > > >
> > > > Sure, I'll get a rerun with that option and report back when I have any
> > > > results. I am also planning just a plain rerun on the machine to see if it
> > > > reproduces somewhat reliably, however the machine is taken up by
> > > > other automation now so it will take a while.
> > >
> > > Thanks. In the meantime, I've pushed a bunch of new stuff into for-kernelci,
> > > so I can at least see if it regresses when compared to the three failures
> > > we're seeing here.
> > >
> >
> > Hi,
> >
> > I don't have very good news so far. We did 4 targeted runs with the machine
> > and weren't able to reproduce the panic. However, there was a panic hit in
> > the new test run you should have in the inbox and it also reproduced in a
> > completely unrelated test run with *this* kernel (not the new one). In all 3
> > cases the HW model is the same, but they were all different machines.
> >
> > I'm currently doing a full run which includes all tests from the run instead
> > of just stress-ng to see if it reproduces that way - there was a panic case
> > last year (not ARM specific :) that we weren't able to pinpoint to a nice
> > reproducer and had to run multiple tests to trigger it so it's possible this
> > one is similar. I'll try to pair down the tests if this strategy works and
> > keep you updated.
> >
> 
> I just wanted to follow up here. Outside of the single run I mentioned
> previously, we are still unable to reproduce the panic. We tried a lot of
> runs on the various machines of the model that hit it, with both full test
> runs and stress-ng test only.
> 
> We'll still reach out if we manage to hit it in the future, but it looks like
> a race condition that's not easy to reproduce. Of course if anyone has
> an idea we should try (whether it's about reproducing or debugging what
> the problem is) we can try that.

Thanks for the follow-up, Veronika. I also noticed that it seems to have
disappeared from subsequent runs :/

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

      parent reply	other threads:[~2021-06-11 10:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-02  1:35 ❌ FAIL: Test report for kernel 5.13.0-rc4 (arm-next, 8124c8a6) CKI Project
2021-06-02 10:12 ` Will Deacon
2021-06-02 10:34   ` ? " Mark Rutland
     [not found]   ` <CAMj1kXENsUScpTg294HDiQUUXuUBz438hysEM9M4N-Jcq+q2fA@mail.gmail.com>
2021-06-02 10:51     `  " Will Deacon
     [not found]       ` <CA+tGwnn+9XCDB69LxY1AEoNih_qCovwYsuNHzbwyUN8LmZTTAg@mail.gmail.com>
2021-06-02 17:10         ` Will Deacon
     [not found]           ` <CA+tGwn=Y63hMdHpP16i4YD1qx-hwnSxfWsNgC+Kh-SDxXZpqGA@mail.gmail.com>
     [not found]             ` <CA+tGwn=BG6nUQcs_Q2a1h=UDPwR6ern2wQPwQ84kz=c2xB5-Wg@mail.gmail.com>
2021-06-11 10:34               ` Will Deacon [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210611103457.GC15274@willie-the-truck \
    --to=will@kernel.org \
    --cc=anshuman.khandual@arm.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=cki-project@redhat.com \
    --cc=jbastian@redhat.com \
    --cc=jstancek@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=mm-qe@redhat.com \
    --cc=skt-results-master@redhat.com \
    --cc=vkabatov@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox