[ FYI, this series was not Cc'd to LKML. ] On Thu, Nov 26, 2020 at 12:35PM +0000, Mark Rutland wrote: > Hi, > > Dmitry and Marco both reported some weirdness with lockdep on arm64 erroneously > reporting the hardware IRQ state, and inexplicable RCU stalls: > > https://lore.kernel.org/r/CACT4Y+aAzoJ48Mh1wNYD17pJqyEcDnrxGfApir=-j171TnQXhw@mail.gmail.com > https://lore.kernel.org/r/20201119193819.GA2601289@elver.google.com > > Having investigated, I believe that this is largely down to the arm64 entry > code not correctly managing RCU, lockdep, irq flag tracing, and context > tracking. This series attempts to fix those cases, and I've Cc'd folk from the > previous threads as a heads-up. > > Today, the arm64 entry code: > > * Doesn't correctly save/restore the lockdep/tracing view of the HW IRQ > state, leaving this inconsistent. > > * Doesn't correctly wake/sleep RCU arounds its use (e.g. by the IRQ tracing > functions). > > * Calls the context tracking functions (which wake and sleep RCU) at the wrong > point w.r.t. lockdep, tracing. > > Fixing all this requires reworking the entry/exit sequences along the lines of > the generic/x86 entry code. Moving arm64 over to the generic entry code > requires signficant changes to both arm64 and the generic code, so for now I've > added arm64-specific helpers to achieve the same thing. There's a lot of > cleanup we could do here as a follow-up, but for now I've tried to do the bare > minimum to make things work as expected without making it unmaintainable. > > The patches are based on v5.10-rc3, and I've pushed them out to my > arm64/entry-fixes branch on kernel.org: > > git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/entry-fixes > > Marco was able to test a WIP version of this, which seemed to address the > issues he was seeing. Since then I've had to alter the debug exception > handling, but I'm not expecting problems there. In future we'll want to make > more changes to the debug cases to align with x86, handling single-step, > watchpoints, and breakpoints as NMIs, but this will require significant > refactoring of the way we handle BRKs. For now I don't believe that there's a > major problem in practice with the approach taken in this series. > > This version has seen an overnight soak under Syzkaller, where all the reports > I have so far look sound. I have been testing with additional debug patches: > > git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/entry-fixes > > ... which I do not think we should merge now, but intent to respin in future > with all the other cleanup. So, I was hoping that this would fix all the problems I was seeing when running the ftrace tests ... unfortunately, it didn't. :-( Perhaps the WIP version you had only worked because it ended up disabling lockdep early? I've attached the log and the symbolized report. Thanks, -- Marco